For comparison here I prototype multivariate fine-mapping with one of the few other software out there, R2HESS
.
Here is the paper describing the methods. The package is an R wrapper for the MT-HESS
program. In the README
the author says: "If you have used previous versions of the package, these methods are more or less the same as before. Proper documentation, and hopefully some better functionality, will follow soon in an updated package."
%cd /home/gaow/tmp/26-Feb-2019
I obtained the software package R2HESS version 1.99 (Oct 2017) from the authors. It comes with a README file that contains installation instructions. It is straightforward to follow:
R CMD INSTALL R2HESS_1.99-1.tar.gz
Here for simplicity I use simulated data just to test out the package and see the output format. I set 2 causal variables per condition.
set.seed(1)
dat = mvsusieR::mvsusie_sim1(600,2000,5,2)
names(dat)
Ydat = data.frame(dat$y)
colnames(Ydat) = c('A1', 'B1', 'C1', 'D1', 'E1')
Xdat = data.frame(dat$X)
library(R2HESS)
output.dir = './5conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
test1_time = end_time - start_time
test1_time
It takes 5 min to complete the computation for 5 conditions. Not too bad.
load("config.Rda")
r2hess.checkConvergence(config)
r2hess.plotParams(config)
r2hess.plotAssoc(config, threshold=0.5)
out <- r2hess.toptableAssoc(config, minthresh=0.5)
out <- r2hess.toptablePredictors(config, threshold=0.5)
dat = mvsusieR::mvsusie_sim1(600,2000,10,2)
Ydat = data.frame(dat$y)
colnames(Ydat) = c('A1', 'B1', 'C1', 'D1', 'E1', "F1", "G1", "H1", "I1", "J1")
Xdat = data.frame(dat$X)
output.dir = './10conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
test2_time = end_time - start_time
test2_time
dat = mvsusieR::mvsusie_sim1(600,2000,50,2)
Ydat = data.frame(dat$y)
Xdat = data.frame(dat$X)
output.dir = './50conditions'
config <- r2hess.makeConfig(Xdat, Ydat, output.dir, r=ncol(Ydat))
start_time <- Sys.time()
r2hess.run(config)
end_time <- Sys.time()
test3_time = end_time - start_time
test3_time
It roughly scales linearly with number of conditions.