Multivariate Bayesian variable selection regression

M&M benchmark XIII

This is benchmark using a non-trivial simulation scheme, and analyzing it using our current implementation of EM updates to prior scalar. See this notebook for details how it is executed.

Conclusion and next steps

  1. Results look largely good -- in terms of power, purity, FDR and convergence
  2. There are some overlapping CS observed.
  3. Next step: finalize on the ED procedure (important for real data analysis too), add ED based prior

Should we add in susieR as a comparison and if so how to compare PIP fairly?

In [1]:
%cd ~/GIT/mvarbvs/dsc/mnm_prototype
/project2/mstephens/gaow/mvarbvs/dsc/mnm_prototype
In [3]:
dat = dscrutils::dscquery('mnm_20200430', targets = c('simulate', "mnm", "mnm.resid_method", 
                                                      'susie_scores', 'susie_scores.total', 'susie_scores.valid', 'susie_scores.size', 
                                                      'susie_scores.purity', 'susie_scores.top', 'susie_scores.n_causal', 'susie_scores.included_causal', 
                                                      'susie_scores.overlap_var', 'susie_scores.overlap_cs','susie_scores.false_pos_cond_discoveries', 
                                                      'susie_scores.false_neg_cond_discoveries', 'susie_scores.true_cond_discoveries', 'susie_scores.converged'),
                          module.output.files = "susie_scores", verbose = F)
In [17]:
dat$method = NA
dat$method[which(!is.na(dat$mnm.resid_method))] = paste(dat$mnm[which(!is.na(dat$mnm.resid_method))], paste(dat$mnm.resid_method[which(!is.na(dat$mnm.resid_method))], "residual", sep = '_'), sep = '+')
dat$method[which(is.na(dat$mnm.resid_method))] = dat$mnm[which(is.na(dat$mnm.resid_method))]
In [18]:
dim(dat)
  1. 4800
  2. 19
In [19]:
saveRDS(dat, '../data/finemap_output.20200430.rds')
In [20]:
head(dat)
DSCsimulatemnmmnm.resid_methodsusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.topsusie_scores.n_causalsusie_scores.included_causalsusie_scores.overlap_varsusie_scores.overlap_cssusie_scores.false_pos_cond_discoveriessusie_scores.false_neg_cond_discoveriessusie_scores.true_cond_discoveriessusie_scores.convergedsusie_scores.output.filemethod
1 artificial_mixture mnm_oracle oracle 3 2 1 1 2 3 2 0 0 45 54 51 TRUE susie_scores/small_data_1_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
1 artificial_mixture mnm_oracle oracle 3 3 2 1 1 3 3 0 0 0 52 98 TRUE susie_scores/small_data_2_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
1 artificial_mixture mnm_oracle oracle 3 3 1 1 3 3 3 0 0 0 53 97 TRUE susie_scores/small_data_3_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
1 artificial_mixture mnm_oracle oracle 3 3 2 1 1 3 3 0 0 0 53 97 TRUE susie_scores/small_data_4_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
1 artificial_mixture mnm_oracle oracle 3 3 8 1 2 3 3 3 1 0 52 98 TRUE susie_scores/small_data_5_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
1 artificial_mixture mnm_oracle oracle 3 3 3 1 1 3 3 0 0 0 52 98 TRUE susie_scores/small_data_6_artificial_mixture_1_mnm_oracle_1_susie_scores_1mnm_oracle+oracle_residual
In [21]:
res = dat[,-c(1,3,4)]
colnames(res) = c('scenario', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap_var', 'overlap_cs', 'false_positive_cross_cond', 'false_negative_cross_cond', 'true_positive_cross_cond', 'converged', 'filename', 'method')

Purity of CS

In [22]:
purity = aggregate(purity~scenario + method, res, mean)
purity = purity[order(purity$scenario),]
purity
scenariomethodpurity
1artificial_mixture mnm_identity+flash_residual 0.9949241
3artificial_mixture mnm_identity+oracle_residual0.9998368
5artificial_mixture mnm_naive+flash_residual 0.9981513
7artificial_mixture mnm_naive+oracle_residual 0.9998860
9artificial_mixture mnm_oracle+flash_residual 0.9978131
11artificial_mixture mnm_oracle+oracle_residual 0.9999236
13artificial_mixture mnm_shared+flash_residual 0.9864727
15artificial_mixture mnm_shared+oracle_residual 0.9998992
2gtex_mixture mnm_identity+flash_residual 0.9538403
4gtex_mixture mnm_identity+oracle_residual0.9559678
6gtex_mixture mnm_naive+flash_residual 0.9599634
8gtex_mixture mnm_naive+oracle_residual 0.9614058
10gtex_mixture mnm_oracle+flash_residual 0.9724481
12gtex_mixture mnm_oracle+oracle_residual 0.9754157
14gtex_mixture mnm_shared+flash_residual 0.9989778
16gtex_mixture mnm_shared+oracle_residual 0.9973283

Power of CS

In [23]:
total_true_included = aggregate(total_true_included ~ scenario + method, res, sum)
total_true = aggregate(total_true ~  scenario + method, res, sum)
cs_overlap = aggregate(overlap_cs ~  scenario + method, res, sum)
snp_overlap = aggregate(overlap_var ~  scenario + method, res, sum)
power = merge(total_true_included, total_true, by = c( 'scenario' , 'method'))
power = merge(power, cs_overlap,  by = c( 'scenario' , 'method'))
power = merge(power, snp_overlap,  by = c( 'scenario' , 'method'))
power$power = round(power$total_true_included/power$total_true,3)
power$overlap_cs = round(power$overlap_cs, 3)
power$overlap_var = round(power$overlap_var, 3)
power = power[order(power$scenario),]
power
scenariomethodtotal_true_includedtotal_trueoverlap_csoverlap_varpower
artificial_mixture mnm_identity+flash_residual 788 900 0 0 0.876
artificial_mixture mnm_identity+oracle_residual786 900 0 0 0.873
artificial_mixture mnm_naive+flash_residual 845 900 12 745 0.939
artificial_mixture mnm_naive+oracle_residual 826 900 0 0 0.918
artificial_mixture mnm_oracle+flash_residual 897 900 26 362 0.997
artificial_mixture mnm_oracle+oracle_residual 890 900 22 328 0.989
artificial_mixture mnm_shared+flash_residual 662 900 0 0 0.736
artificial_mixture mnm_shared+oracle_residual 614 900 2 33 0.682
gtex_mixture mnm_identity+flash_residual 620 900 0 0 0.689
gtex_mixture mnm_identity+oracle_residual658 900 0 0 0.731
gtex_mixture mnm_naive+flash_residual 707 900 52 817 0.786
gtex_mixture mnm_naive+oracle_residual 710 900 66 1293 0.789
gtex_mixture mnm_oracle+flash_residual 788 900 0 0 0.876
gtex_mixture mnm_oracle+oracle_residual 785 900 0 0 0.872
gtex_mixture mnm_shared+flash_residual 314 900 0 0 0.349
gtex_mixture mnm_shared+oracle_residual 315 900 0 0 0.350

FDR of CS no missing data

In [24]:
valid = aggregate(valid ~ scenario + method, res, sum)
total = aggregate(total ~ scenario + method, res, sum)
fdr = merge(valid, total, by = c( 'scenario' , 'method'))
fdr$fdr = round((fdr$total - fdr$valid)/fdr$total,3)
fdr = fdr[order(fdr$scenario),]
fdr
scenariomethodvalidtotalfdr
artificial_mixture mnm_identity+flash_residual 771 817 0.056
artificial_mixture mnm_identity+oracle_residual776 830 0.065
artificial_mixture mnm_naive+flash_residual 838 853 0.018
artificial_mixture mnm_naive+oracle_residual 816 841 0.030
artificial_mixture mnm_oracle+flash_residual 895 899 0.004
artificial_mixture mnm_oracle+oracle_residual 888 900 0.013
artificial_mixture mnm_shared+flash_residual 646 650 0.006
artificial_mixture mnm_shared+oracle_residual 605 623 0.029
gtex_mixture mnm_identity+flash_residual 605 643 0.059
gtex_mixture mnm_identity+oracle_residual649 688 0.057
gtex_mixture mnm_naive+flash_residual 735 748 0.017
gtex_mixture mnm_naive+oracle_residual 752 768 0.021
gtex_mixture mnm_oracle+flash_residual 773 813 0.049
gtex_mixture mnm_oracle+oracle_residual 776 818 0.051
gtex_mixture mnm_shared+flash_residual 300 300 0.000
gtex_mixture mnm_shared+oracle_residual 306 308 0.006

Convergence

Based on ELBO. In principle all runs should converge by ELBO. If it is not converged, then it means either ELBO is not non-increasing or it exceeds max iteration.

In [25]:
elbo_converged = aggregate(converged~scenario + method, res, mean)
elbo_converged = elbo_converged[order(elbo_converged$scenario),]
elbo_converged
scenariomethodconverged
1artificial_mixture mnm_identity+flash_residual 1.0000000
3artificial_mixture mnm_identity+oracle_residual1.0000000
5artificial_mixture mnm_naive+flash_residual 1.0000000
7artificial_mixture mnm_naive+oracle_residual 1.0000000
9artificial_mixture mnm_oracle+flash_residual 1.0000000
11artificial_mixture mnm_oracle+oracle_residual 1.0000000
13artificial_mixture mnm_shared+flash_residual 1.0000000
15artificial_mixture mnm_shared+oracle_residual 0.9933333
2gtex_mixture mnm_identity+flash_residual 1.0000000
4gtex_mixture mnm_identity+oracle_residual1.0000000
6gtex_mixture mnm_naive+flash_residual 1.0000000
8gtex_mixture mnm_naive+oracle_residual 1.0000000
10gtex_mixture mnm_oracle+flash_residual 1.0000000
12gtex_mixture mnm_oracle+oracle_residual 1.0000000
14gtex_mixture mnm_shared+flash_residual 1.0000000
16gtex_mixture mnm_shared+oracle_residual 1.0000000

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago