Here is the original data dimension:
import pandas as pd
pfile = '/home/gaow/Documents/GTEx/peer_analysis/Heart_Atrial_Appendage_PEER_covariates.txt'
peer = pd.read_csv(pfile, header = 0, sep = '\t', index_col = 0).transpose()
peer.shape
So there are 264 samples in this tissue and 35 PEER factors has been computed from expression data (see why 35 PEER factors are generated).
After orth
:
import scipy.linalg
bmat = scipy.linalg.orth(peer)
bmat.shape
So there should in fact be only 31 independent factors.
The first output is the Q matrix (original dimension ...) and the 2nd is the row sum of the R matrix. Note that R matrix has a few zeros particularly at larger index. Maybe setting PEER = 35 is bad idea to start with -- but I was only following the guideline!
import numpy as np
np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})
qmat = np.linalg.qr(peer, mode='reduced')[0]
print(qmat.shape)
rmat = np.linalg.qr(peer, mode='reduced')[1]
print(np.around(rmat.sum(axis=1)))
%sessioninfo