Multivariate Bayesian variable selection regression

PEER analysis result not orthogonal

Here is the original data dimension:

In [ ]:
import pandas as pd
pfile = '/home/gaow/Documents/GTEx/peer_analysis/Heart_Atrial_Appendage_PEER_covariates.txt'
In [1]:
peer = pd.read_csv(pfile, header = 0, sep = '\t', index_col = 0).transpose()
peer.shape
Out[1]:
(264, 35)

So there are 264 samples in this tissue and 35 PEER factors has been computed from expression data (see why 35 PEER factors are generated).

Orthonormal basis via SVD

After orth:

In [2]:
import scipy.linalg
bmat = scipy.linalg.orth(peer)
bmat.shape
Out[2]:
(264, 31)

So there should in fact be only 31 independent factors.

Orthonormal basis via QR

The first output is the Q matrix (original dimension ...) and the 2nd is the row sum of the R matrix. Note that R matrix has a few zeros particularly at larger index. Maybe setting PEER = 35 is bad idea to start with -- but I was only following the guideline!

In [3]:
import numpy as np
np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})
qmat = np.linalg.qr(peer, mode='reduced')[0]
print(qmat.shape)
rmat = np.linalg.qr(peer, mode='reduced')[1]
print(np.around(rmat.sum(axis=1)))
(264, 35)
[1.00 -1.00 2.00 3.00 -4.00 -5.00 -10.00 -12.00 13.00 -15.00 -16.00 -15.00
 15.00 -16.00 16.00 -16.00 -17.00 14.00 15.00 15.00 15.00 14.00 -15.00
 15.00 15.00 13.00 14.00 -10.00 0.00 10.00 -0.00 -0.00 -0.00 0.00 0.00]
In [4]:
%sessioninfo

SoS

SoS Version
0.9.8.10

Python3

Kernel
python3
Language
Python3
Version
3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
pandas
0.20.2
scipy
0.19.1
numpy
1.13.1

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago