Multivariate Bayesian variable selection regression

M&M on GTEx data pipeline

Pipelines to run M&M analysis. Each gene is already saved on RDS format, see analysis/20180515_Extract_Benchmark_Data.ipynb.

In [14]:
[global]
from glob import glob
parameter: wd = path("/project/compbio/GTEx_eQTL/mnm_results_V7")
parameter: model_file = path('/project/compbio/GTEx_eQTL/mashr_flashr_workflow_output/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds')
parameter: v_file = path('/project/compbio/GTEx_eQTL/mashr_flashr_workflow_output/FastQTLSumStats.mash.Vhat.rds')
parameter: data_dir = path('/home/gaow/Documents/GTExV8/Toys')

[default_1]
input: glob(f'{data_dir:a}/Multi_Tissues.*.RDS'), group_by = 1, concurrent = True
output: f"{wd:a}/{_input:bn}.mnm.rds"
task: trunk_workers = 1, trunk_size = 1, walltime = '60m', mem = '20G', cores = 1, tags = f'{step_name}_{_output:bn}'
R: expand = '${ }', stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout'
    dat = readRDS(${_input:r})
    mash = readRDS(${model_file:r})$fitted_g
    V = readRDS(${v_file:r})
    mash_init = mvsusieR:::MashInitializer$new(mash$Ulist, mash$grid, prior_weights = mash$pi[-1], null_weight = mash$pi[1], V = V, alpha = 1)
    start_time <- Sys.time()
    res = mvsusieR::susie(dat$X,dat$y_res,
                  L=10,V=mash_init,
                  precompute_covariances=TRUE,
                  compute_objective=FALSE)
    end_time <- Sys.time()
    print(end_time - start_time)
    saveRDS(res, ${_output:r})

To run the pipeline,

sos run analysis/20190605_MNM_Workflow.ipynb \
        -c midway2.yml -q midway2 -J 40 -j 4

A preview of results,

In [1]:
%cd /project/compbio/GTEx_eQTL/mnm_results_V7
/project/compbio/GTEx_eQTL/mnm_results_V7
In [152]:
# res = readRDS('Multi_Tissues.ENSG00000108384.mnm.rds')
# res = readRDS('Multi_Tissues.ENSG00000272030.mnm.rds')
# res = readRDS('Multi_Tissues.ENSG00000025772.mnm.rds')
# res = readRDS('Multi_Tissues.ENSG00000225439.mnm.rds')
res = readRDS('Multi_Tissues.ENSG00000267508.mnm.rds')
In [153]:
pdf('susie_plot.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [154]:
%preview susie_plot.pdf -s png --dpi 150
> susie_plot.pdf (32.7 KiB):
In [155]:
p = mvsusieR::mvsusie_plot(res)
Suggested PDF canvas width: 11.5 height: 44.1 
In [156]:
pdf('mvsusie_plot.pdf', width = 20, height = 12)
print(p$plot)
dev.off()
png: 2
In [157]:
%preview mvsusie_plot.pdf -s png --dpi 80
> mvsusie_plot.pdf (14.1 KiB):
In [ ]:


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago