Multivariate Bayesian variable selection regression

M&M ASH benchmark VI

This is a continuation of Part V where I set total PVE is set to 0.1 and assume 1 or 2 causal variables per region. I added in evaluation of lfsr per condition.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

  1. The "oracle" prior is mostly better than using other priors, for all scenarios.
  2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

Conclusion

  1. The expected observations above are both true, with some interesting exceptions
    • "oracle" mixture prior is not better than using mixture prior on some other scenarios -- overfitting of mixture prior?
    • Singleton oracle is bad
  2. Power table: model mis-specification will result in overlaps, but there is no overlapping issue in mixture model
  3. Overlaps of singleton results are prevalent as expected
  4. mixture prior has great FDR control on CS
  5. mixture prior has good lfsr control on effect estimates

The benchmark was executd on UChicago midway

./finemap.dsc --host mnm_R5.yml --R 5 -c 12

This executes the default pipeline in finemap.dsc file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc
/home/gaow/Documents/GIT/github/mnm-twas/dsc
In [2]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap susie_scores.false_pos_cond_discoveries susie_scores.false_neg_cond_discoveries susie_scores.true_cond_discoveries", omit.file.columns = T, verbose = F)
end_time <- Sys.time()
In [3]:
end_time - start_time
Time difference of 13.34753 mins
In [4]:
head(out)
DSCsharing_patternmnmmnm.eff_modesusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.topsusie_scores.n_causalsusie_scores.included_causalsusie_scores.overlapsusie_scores.false_pos_cond_discoveriessusie_scores.false_neg_cond_discoveriessusie_scores.true_cond_discoveries
1 identity mnm_identityidentity 2 1 16 0.9314858 0 1 1 0 3 2 5
1 identity mnm_identityidentity 1 1 1 1.0000000 1 1 1 0 0 0 5
1 identity mnm_identityidentity 2 2 5 0.9823716 0 2 2 0 0 0 10
1 identity mnm_identityidentity 2 2 12 0.9753366 2 2 2 0 0 0 10
1 identity mnm_identityidentity 3 3 9 0.9706318 1 3 3 0 0 0 15
1 identity mnm_identityidentity 1 1 4 0.9939019 1 1 1 0 0 0 5
In [5]:
dim(out)
  1. 24500
  2. 15
In [6]:
saveRDS(out, '../data/finemap_output.query_result.rds')
In [7]:
res = out[,c(2,4,5,6,7,8,9,10,11,12,13,14,15)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap', 'false_positive_cross_cond', 'false_negative_cross_cond', 'true_positive_cross_cond')

Purity of CS

In [8]:
purity = aggregate(purity~pattern + method, res, mean)
purity
patternmethodpurity
high_het high_het 0.9827047
identity high_het 0.9842457
low_het high_het 0.9841991
mid_het high_het 0.9833802
mixture01high_het 0.9365145
shared high_het 0.9824573
singletonhigh_het 0.8627706
high_het identity 0.9823101
identity identity 0.9841695
low_het identity 0.9835865
mid_het identity 0.9829505
mixture01identity 0.9355703
shared identity 0.9825129
singletonidentity 0.8587500
high_het low_het 0.9833144
identity low_het 0.9852966
low_het low_het 0.9854448
mid_het low_het 0.9833520
mixture01low_het 0.9304827
shared low_het 0.9838405
singletonlow_het 0.8329095
high_het mid_het 0.9838315
identity mid_het 0.9838237
low_het mid_het 0.9848168
mid_het mid_het 0.9833310
mixture01mid_het 0.9321916
shared mid_het 0.9833230
singletonmid_het 0.8554835
high_het mixture_10.9847903
identity mixture_10.9858474
low_het mixture_10.9851506
mid_het mixture_10.9852112
mixture01mixture_10.9367697
shared mixture_10.9846586
singletonmixture_10.8568686
high_het shared 0.8838506
identity shared 0.8531477
low_het shared 0.9554263
mid_het shared 0.8993438
mixture01shared 0.7760270
shared shared 0.9849261
singletonshared 0.3716790
high_het singleton0.8938148
identity singleton0.8905021
low_het singleton0.8991093
mid_het singleton0.8925364
mixture01singleton0.8730581
shared singleton0.8994284
singletonsingleton0.8699070
In [9]:
aggregate(purity~method, purity, mean)
methodpurity
high_het 0.9594674
identity 0.9585500
low_het 0.9549486
mid_het 0.9581144
mixture_10.9598995
shared 0.8177715
singleton0.8883366

Size of CS

In [10]:
size = aggregate(size~pattern+method, res, median)
size
patternmethodsize
high_het high_het 3.00
identity high_het 3.50
low_het high_het 3.50
mid_het high_het 4.00
mixture01high_het 5.00
shared high_het 4.00
singletonhigh_het 6.00
high_het identity 3.00
identity identity 3.50
low_het identity 3.50
mid_het identity 4.00
mixture01identity 5.00
shared identity 4.00
singletonidentity 6.00
high_het low_het 3.00
identity low_het 3.50
low_het low_het 3.00
mid_het low_het 4.00
mixture01low_het 5.00
shared low_het 4.00
singletonlow_het 6.00
high_het mid_het 3.00
identity mid_het 3.50
low_het mid_het 3.00
mid_het mid_het 4.00
mixture01mid_het 5.00
shared mid_het 4.00
singletonmid_het 6.00
high_het mixture_1 3.00
identity mixture_1 3.25
low_het mixture_1 3.00
mid_het mixture_1 3.50
mixture01mixture_1 5.00
shared mixture_1 4.00
singletonmixture_1 6.00
high_het shared 4.00
identity shared 5.00
low_het shared 4.00
mid_het shared 5.00
mixture01shared 4.00
shared shared 4.00
singletonshared 0.00
high_het singleton10.00
identity singleton10.00
low_het singleton10.00
mid_het singleton10.50
mixture01singleton10.00
shared singleton11.00
singletonsingleton 6.50
In [11]:
aggregate(size~method, size, mean)
methodsize
high_het 4.142857
identity 4.142857
low_het 4.071429
mid_het 4.071429
mixture_13.964286
shared 3.714286
singleton9.714286

Power of CS

In [12]:
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power = power[order(power$method),]
power
patternmethodtotal_true_includedtotal_trueoverlappower
1high_het high_het 792 874 0.084 0.9061785
8identity high_het 793 856 0.024 0.9264019
15low_het high_het 786 857 0.078 0.9171529
22mid_het high_het 809 881 0.000 0.9182747
29mixture01high_het 706 849 0.212 0.8315665
36shared high_het 789 854 0.256 0.9238876
43singletonhigh_het 548 816 0.000 0.6715686
2high_het identity 791 874 0.138 0.9050343
9identity identity 794 856 0.024 0.9275701
16low_het identity 789 857 0.116 0.9206534
23mid_het identity 809 881 0.000 0.9182747
30mixture01identity 704 849 0.206 0.8292108
37shared identity 791 854 0.292 0.9262295
44singletonidentity 546 816 0.000 0.6691176
3high_het low_het 796 874 0.054 0.9107551
10identity low_het 794 856 0.390 0.9275701
17low_het low_het 784 857 0.038 0.9148191
24mid_het low_het 818 881 0.384 0.9284904
31mixture01low_het 696 849 0.262 0.8197880
38shared low_het 791 854 0.256 0.9262295
45singletonlow_het 517 816 0.000 0.6335784
4high_het mid_het 793 874 0.084 0.9073227
11identity mid_het 794 856 0.024 0.9275701
18low_het mid_het 784 857 0.018 0.9148191
25mid_het mid_het 809 881 0.000 0.9182747
32mixture01mid_het 705 849 0.250 0.8303887
39shared mid_het 789 854 0.256 0.9238876
46singletonmid_het 536 816 0.000 0.6568627
5high_het mixture_1783 874 0.000 0.8958810
12identity mixture_1783 856 0.000 0.9147196
19low_het mixture_1787 857 0.068 0.9183197
26mid_het mixture_1806 881 0.000 0.9148695
33mixture01mixture_1696 849 0.036 0.8197880
40shared mixture_1794 854 0.034 0.9297424
47singletonmixture_1525 816 0.000 0.6433824
6high_het shared 685 874 3.718 0.7837529
13identity shared 665 856 5.768 0.7768692
20low_het shared 770 857 2.226 0.8984831
27mid_het shared 725 881 3.076 0.8229285
34mixture01shared 589 849 5.840 0.6937574
41shared shared 796 854 0.270 0.9320843
48singletonshared 216 816 0.000 0.2647059
7high_het singleton747 874 126.100 0.8546911
14identity singleton727 856 125.698 0.8492991
21low_het singleton724 857 116.864 0.8448075
28mid_het singleton759 881 144.052 0.8615210
35mixture01singleton663 849 104.002 0.7809187
42shared singleton758 854 128.340 0.8875878
49singletonsingleton541 816 0.000 0.6629902
In [13]:
aggregate(power~method, power, mean)
methodpower
high_het 0.8707187
identity 0.8708701
low_het 0.8658901
mid_het 0.8684465
mixture_10.8623861
shared 0.7389402
singleton0.8202593

FDR of CS

In [14]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr = fdr[order(fdr$method),]
fdr
patternmethodvalidtotalfdr
1high_het high_het 787 868 0.09331797
8identity high_het 792 859 0.07799767
15low_het high_het 782 843 0.07236062
22mid_het high_het 800 875 0.08571429
29mixture01 high_het 698 759 0.08036891
36shared high_het 788 856 0.07943925
43singleton high_het 540 603 0.10447761
2high_het identity 787 868 0.09331797
9identity identity 793 859 0.07683353
16low_het identity 786 844 0.06872038
23mid_het identity 800 875 0.08571429
30mixture01 identity 696 758 0.08179420
37shared identity 791 858 0.07808858
44singleton identity 539 602 0.10465116
3high_het low_het 791 856 0.07593458
10identity low_het 794 849 0.06478210
17low_het low_het 778 837 0.07048984
24mid_het low_het 809 861 0.06039489
31mixture01 low_het 690 742 0.07008086
38shared low_het 790 852 0.07276995
45singleton low_het 509 555 0.08288288
4high_het mid_het 787 863 0.08806489
11identity mid_het 793 859 0.07683353
18low_het mid_het 779 839 0.07151371
25mid_het mid_het 800 872 0.08256881
32mixture01 mid_het 698 759 0.08036891
39shared mid_het 788 855 0.07836257
46singleton mid_het 528 588 0.10204082
5high_het mixture_1 777 832 0.06610577
12identity mixture_1 781 831 0.06016847
19low_het mixture_1 779 827 0.05804111
26mid_het mixture_1 796 845 0.05798817
33mixture01 mixture_1 688 728 0.05494505
40shared mixture_1 793 834 0.04916067
47singleton mixture_1 519 549 0.05464481
6high_het shared 727 757 0.03963012
13identity shared 719 746 0.03619303
20low_het shared 801 849 0.05653710
27mid_het shared 758 793 0.04413619
34mixture01 shared 613 642 0.04517134
41shared shared 796 841 0.05350773
48singleton shared 214 224 0.04464286
7high_het singleton 1890 2020 0.06435644
14identity singleton 1946 2087 0.06756109
21low_het singleton 1931 2065 0.06489104
28mid_het singleton 1998 2147 0.06939916
35mixture01 singleton 1587 1696 0.06426887
42shared singleton 2084 2237 0.06839517
49singleton singleton 533 565 0.05663717
In [15]:
aggregate(fdr~method, fdr, mean)
methodfdr
high_het 0.08481090
identity 0.08416001
low_het 0.07104787
mid_het 0.08282189
mixture_1 0.05729344
shared 0.04568834
singleton 0.06507271

Power for per signal per condition estimates

We compute lfsr on per signal per condition basis. We call it a signal in the condition if lfsr is smaller than 0.05.

In [16]:
tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fn = aggregate(false_negative_cross_cond ~ pattern + method, res, sum)
power = merge(tp, fn, by = c("pattern", "method"))
In [17]:
power$power = power$true_positive_cross_cond/(power$true_positive_cross_cond + power$false_negative_cross_cond)
power = power[order(power$method),]
power
patternmethodtrue_positive_cross_condfalse_negative_cross_condpower
1high_het high_het 3901 105 0.9737893
8identity high_het 3938 74 0.9815553
15low_het high_het 3895 64 0.9838343
22mid_het high_het 3967 99 0.9756517
29mixture01high_het 2890 51 0.9826590
36shared high_het 3927 84 0.9790576
43singletonhigh_het 540 18 0.9677419
2high_het identity 3908 95 0.9762678
9identity identity 3941 79 0.9803483
16low_het identity 3906 71 0.9821473
23mid_het identity 3968 104 0.9744597
30mixture01identity 2886 56 0.9809653
37shared identity 3935 99 0.9754586
44singletonidentity 539 20 0.9642218
3high_het low_het 3925 72 0.9819865
10identity low_het 3938 50 0.9874624
17low_het low_het 3879 43 0.9890362
24mid_het low_het 4012 64 0.9842983
31mixture01low_het 2888 29 0.9900583
38shared low_het 3939 61 0.9847500
45singletonlow_het 509 10 0.9807322
4high_het mid_het 3904 93 0.9767325
11identity mid_het 3945 74 0.9815875
18low_het mid_het 3877 61 0.9845099
25mid_het mid_het 3962 95 0.9765837
32mixture01mid_het 2905 48 0.9837453
39shared mid_het 3926 77 0.9807644
46singletonmid_het 528 16 0.9705882
5high_het mixture_13860 36 0.9907598
12identity mixture_13888 20 0.9948823
19low_het mixture_13890 16 0.9959037
26mid_het mixture_13959 36 0.9909887
33mixture01mixture_12850 7 0.9975499
40shared mixture_13955 14 0.9964727
47singletonmixture_1 519 0 1.0000000
6high_het shared 3198 457 0.8749658
13identity shared 3015 603 0.8333333
20low_het shared 3781 228 0.9431280
27mid_het shared 3440 369 0.9031242
34mixture01shared 2578 248 0.9122435
41shared shared 3975 11 0.9972403
48singletonshared 214 0 1.0000000
7high_het singleton1878 8094 0.1883273
14identity singleton1939 8355 0.1883622
21low_het singleton1918 8273 0.1882053
28mid_het singleton1990 8596 0.1879841
35mixture01singleton1582 6169 0.2041027
42shared singleton2078 8954 0.1883611
49singletonsingleton 533 0 1.0000000
In [18]:
aggregate(power~method, power, mean)
methodpower
high_het 0.9777556
identity 0.9762670
low_het 0.9854748
mid_het 0.9792159
mixture_10.9952224
shared 0.9234336
singleton0.3064775

FDR for per signal per condition estimates

In [19]:
tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fp = aggregate(false_positive_cross_cond ~ pattern + method, res, sum)
fdr = merge(tp, fp, by = c("pattern", "method"))
fdr$fdr = fdr$false_positive_cross_cond/(fdr$true_positive_cross_cond + fdr$false_positive_cross_cond)
fdr = fdr[order(fdr$method),]
fdr
patternmethodtrue_positive_cross_condfalse_positive_cross_condfdr
1high_het high_het 3901 334 0.07886659
8identity high_het 3938 283 0.06704572
15low_het high_het 3895 256 0.06167189
22mid_het high_het 3967 309 0.07226380
29mixture01 high_het 2890 275 0.08688784
36shared high_het 3927 269 0.06410867
43singleton high_het 540 329 0.37859609
2high_het identity 3908 337 0.07938751
9identity identity 3941 275 0.06522770
16low_het identity 3906 243 0.05856833
23mid_het identity 3968 303 0.07094357
30mixture01 identity 2886 279 0.08815166
37shared identity 3935 256 0.06108327
44singleton identity 539 334 0.38258877
3high_het low_het 3925 283 0.06725285
10identity low_het 3938 257 0.06126341
17low_het low_het 3879 263 0.06349590
24mid_het low_het 4012 229 0.05399670
31mixture01 low_het 2888 258 0.08200890
38shared low_het 3939 260 0.06191950
45singleton low_het 509 295 0.36691542
4high_het mid_het 3904 318 0.07531975
11identity mid_het 3945 276 0.06538735
18low_het mid_het 3877 257 0.06216739
25mid_het mid_het 3962 303 0.07104338
32mixture01 mid_het 2905 280 0.08791209
39shared mid_het 3926 272 0.06479276
46singleton mid_het 528 319 0.37662338
5high_het mixture_1 3860 264 0.06401552
12identity mixture_1 3888 247 0.05973398
19low_het mixture_1 3890 229 0.05559602
26mid_het mixture_1 3959 230 0.05490571
33mixture01 mixture_1 2850 170 0.05629139
40shared mixture_1 3955 201 0.04836381
47singleton mixture_1 519 33 0.05978261
6high_het shared 3198 130 0.03906250
13identity shared 3015 112 0.03581708
20low_het shared 3781 236 0.05875031
27mid_het shared 3440 156 0.04338154
34mixture01 shared 2578 205 0.07366152
41shared shared 3975 219 0.05221745
48singleton shared 214 283 0.56941650
7high_het singleton 1878 128 0.06380857
14identity singleton 1939 141 0.06778846
21low_het singleton 1918 134 0.06530214
28mid_het singleton 1990 149 0.06965872
35mixture01 singleton 1582 109 0.06445890
42shared singleton 2078 153 0.06857911
49singleton singleton 533 32 0.05663717

Performance of effect size estimates

Total number of true discoveries over total number of signals to detect??


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago