Multivariate Bayesian variable selection regression

Analysis with R2HESS

For comparison here I prototype multivariate fine-mapping with one of the few other software out there, R2HESS.

Here is the paper describing the methods. The package is an R wrapper for the MT-HESS program. In the README the author says: "If you have used previous versions of the package, these methods are more or less the same as before. Proper documentation, and hopefully some better functionality, will follow soon in an updated package."

In [1]:
%cd /home/gaow/tmp/26-Feb-2019
/home/gaow/Documents/TempDir/26-Feb-2019

Software and installation

I obtained the software package R2HESS version 1.99 (Oct 2017) from the authors. It comes with a README file that contains installation instructions. It is straightforward to follow:

R CMD INSTALL R2HESS_1.99-1.tar.gz

Data

Here for simplicity I use simulated data just to test out the package and see the output format. I set 2 causal variables per condition.

In [2]:
set.seed(1)
dat = mvsusieR::mvsusie_sim1(600,2000,5,2)
In [3]:
names(dat)
  1. 'X'
  2. 'y'
  3. 'd'
  4. 'n'
  5. 'p'
  6. 'r'
  7. 'V'
  8. 'b'

Analysis of 5 conditions for GTEx scale data

In [5]:
Ydat = data.frame(dat$y)
colnames(Ydat) = c('A1', 'B1', 'C1', 'D1', 'E1')
Xdat = data.frame(dat$X)
In [6]:
library(R2HESS)
output.dir = './5conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
In [7]:
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
In [8]:
test1_time = end_time - start_time
test1_time
Time difference of 5.403033 mins

It takes 5 min to complete the computation for 5 conditions. Not too bad.

In [9]:
load("config.Rda")
In [10]:
r2hess.checkConvergence(config)
[1] "dim omega =  22001" "dim omega =  5"    
[1] "dim rho =  22001" "dim rho =  2000" 
In [11]:
r2hess.plotParams(config)
[1] "dim tailprob rho =  2000" "dim tailprob rho =  1"   
[1] "dim av omega_kj =  5"    "dim av omega_kj =  2000"
In [12]:
r2hess.plotAssoc(config, threshold=0.5)
[1] "dim MPPI =  5"    "dim MPPI =  2000"
In [13]:
out <- r2hess.toptableAssoc(config, minthresh=0.5)
[1] "dim MPPI =  5"    "dim MPPI =  2000"
[1] 10  4
     thresholds bFDR
[1,]          1    0
In [14]:
out <- r2hess.toptablePredictors(config, threshold=0.5)
[1] "dim MPPI =  5"    "dim MPPI =  2000"
[1] "MPPI threshold =  0.5"

Analysis of 10 conditions for GTEx scale data

In [15]:
dat = mvsusieR::mvsusie_sim1(600,2000,10,2)
Ydat = data.frame(dat$y)
colnames(Ydat) = c('A1', 'B1', 'C1', 'D1', 'E1', "F1", "G1", "H1", "I1", "J1")
Xdat = data.frame(dat$X)
In [16]:
output.dir = './10conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
In [17]:
test2_time = end_time - start_time
test2_time
Time difference of 10.43567 mins

Analysis of 50 conditions for GTEx scale data

In [18]:
dat = mvsusieR::mvsusie_sim1(600,2000,50,2)
Ydat = data.frame(dat$y)
Xdat = data.frame(dat$X)
In [19]:
output.dir = './50conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
In [20]:
test3_time = end_time - start_time
test3_time
Time difference of 50.05239 mins

It roughly scales linearly with number of conditions.


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago