Multivariate Bayesian variable selection regression

M&M ASH benchmark V

This is a continuation of Part V where I set total PVE is set to 0.15 and assume 2 causal variables per region. But here, the two SNPs have the same effects sampled from the multivariate distribution. Also I use $R = 5$ conditions and run it on $J=1000$ and 150 genes.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

  1. The "oracle" prior is always better than using other priors, for all scenarios.
  2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

Conclusion

...

The benchmark was executd on UChicago midway

./finemap.dsc --host mnm_R5.yml --R 5

This executes the default pipeline in finemap.dsc file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc
/scratch/midway2/gaow/GIT/github/mnm-twas/dsc
In [2]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.overlap", omit.file.columns = T)
end_time <- Sys.time()
Loading dsc-query output from CSV file.
Reading DSC outputs:
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values
 - susie_scores.n_causal: extracted atomic values
 - susie_scores.overlap: extracted atomic values
In [3]:
end_time - start_time
Time difference of 2.014167 mins
In [4]:
head(out)
DSCsharing_patternmnmmnm.eff_modesusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.topsusie_scores.n_causalsusie_scores.overlap
1 identity mnm_identityidentity 2 2 3.5 1.0000000 1 2 0
1 identity mnm_identityidentity 1 1 1.0 1.0000000 1 1 0
1 identity mnm_identityidentity 1 1 9.0 0.9886828 0 1 0
1 identity mnm_identityidentity 1 1 10.0 1.0000000 0 1 0
1 identity mnm_identityidentity 1 1 13.0 0.9875783 0 1 0
1 identity mnm_identityidentity 1 1 4.0 0.9960325 0 1 0
In [6]:
res = out[,c(2,4,5,6,7,8,9,10,11)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'overlap')

Purity of CS

In [7]:
aggregate(purity~pattern + method, res, mean)
patternmethodpurity
high_het high_het 0.9916862
identity high_het 0.9917146
low_het high_het 0.9884244
mid_het high_het 0.9923806
mixture01high_het 0.9757964
shared high_het 0.9904957
singletonhigh_het 0.9241387
high_het identity 0.9913949
identity identity 0.9917000
low_het identity 0.9886923
mid_het identity 0.9919891
mixture01identity 0.9769534
shared identity 0.9904997
singletonidentity 0.9229322
high_het low_het 0.9917899
identity low_het 0.9921787
low_het low_het 0.9918518
mid_het low_het 0.9924335
mixture01low_het 0.9739553
shared low_het 0.9902886
singletonlow_het 0.9205336
high_het mid_het 0.9916245
identity mid_het 0.9924849
low_het mid_het 0.9897733
mid_het mid_het 0.9923608
mixture01mid_het 0.9741854
shared mid_het 0.9903303
singletonmid_het 0.9208219
high_het mixture_10.9861684
identity mixture_10.9922034
low_het mixture_10.9912703
mid_het mixture_10.9920922
mixture01mixture_10.9803773
shared mixture_10.9911939
singletonmixture_10.9369633
high_het shared 0.9152451
identity shared 0.9340737
low_het shared 0.9732253
mid_het shared 0.9395782
mixture01shared 0.8718224
shared shared 0.9912402
singletonshared 0.6337326
high_het singleton0.9268676
identity singleton0.9294301
low_het singleton0.9348478
mid_het singleton0.9363354
mixture01singleton0.9361214
shared singleton0.9310871
singletonsingleton0.9360988

Size of CS

In [8]:
aggregate(size~pattern+method, res, median)
patternmethodsize
high_het high_het 2.25
identity high_het 3.00
low_het high_het 2.00
mid_het high_het 3.00
mixture01high_het 4.00
shared high_het 2.50
singletonhigh_het 6.25
high_het identity 2.25
identity identity 3.00
low_het identity 2.00
mid_het identity 3.00
mixture01identity 4.00
shared identity 2.50
singletonidentity 6.75
high_het low_het 2.50
identity low_het 3.00
low_het low_het 2.00
mid_het low_het 3.00
mixture01low_het 4.00
shared low_het 2.50
singletonlow_het 6.00
high_het mid_het 2.25
identity mid_het 3.00
low_het mid_het 2.00
mid_het mid_het 3.00
mixture01mid_het 4.00
shared mid_het 2.50
singletonmid_het 6.25
high_het mixture_12.25
identity mixture_13.00
low_het mixture_12.00
mid_het mixture_13.00
mixture01mixture_14.00
shared mixture_12.00
singletonmixture_18.00
high_het shared 4.00
identity shared 5.00
low_het shared 3.00
mid_het shared 4.00
mixture01shared 3.75
shared shared 2.50
singletonshared 2.25
high_het singleton8.00
identity singleton9.00
low_het singleton6.50
mid_het singleton9.00
mixture01singleton8.00
shared singleton8.25
singletonsingleton8.00

Power

In [11]:
valid = aggregate(valid ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(valid, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$valid/power$total_true
power
patternmethodvalidtotal_trueoverlappower
high_het high_het 238 250 7.040000000.9520000
high_het identity 240 250 6.880000000.9600000
high_het low_het 241 250 3.126666670.9640000
high_het mid_het 241 250 2.653333330.9640000
high_het mixture_1 235 250 0.000000000.9400000
high_het shared 264 250 60.693333331.0560000
high_het singleton 683 250 209.060000002.7320000
identity high_het 254 266 0.193333330.9548872
identity identity 255 266 0.193333330.9586466
identity low_het 255 266 0.366666670.9586466
identity mid_het 254 266 0.200000000.9548872
identity mixture_1 249 266 0.000000000.9360902
identity shared 276 266 67.720000001.0375940
identity singleton 750 266 224.220000002.8195489
low_het high_het 230 255 0.000000000.9019608
low_het identity 230 255 0.000000000.9019608
low_het low_het 230 255 0.000000000.9019608
low_het mid_het 230 255 0.000000000.9019608
low_het mixture_1 228 255 0.000000000.8941176
low_het shared 243 255 3.946666670.9529412
low_het singleton 660 255 129.873333332.5882353
mid_het high_het 239 251 0.000000000.9521912
mid_het identity 239 251 0.000000000.9521912
mid_het low_het 242 251 0.060000000.9641434
mid_het mid_het 241 251 0.000000000.9601594
mid_het mixture_1 234 251 0.000000000.9322709
mid_het shared 264 251 4.240000001.0517928
mid_het singleton 718 251 186.186666672.8605578
mixture01 high_het 231 254 0.026666670.9094488
mixture01 identity 231 254 0.026666670.9094488
mixture01 low_het 224 254 0.366666670.8818898
mixture01 mid_het 229 254 0.000000000.9015748
mixture01 mixture_1 226 254 0.000000000.8897638
mixture01 shared 205 254 4.133333330.8070866
mixture01 singleton 565 254 127.506666672.2244094
shared high_het 245 263 0.000000000.9315589
shared identity 244 263 0.000000000.9277567
shared low_het 245 263 0.000000000.9315589
shared mid_het 245 263 0.000000000.9315589
shared mixture_1 245 263 0.000000000.9315589
shared shared 243 263 0.000000000.9239544
shared singleton 700 263 181.520000002.6615970
singleton high_het 185 241 0.000000000.7676349
singleton identity 190 241 0.000000000.7883817
singleton low_het 185 241 0.000000000.7676349
singleton mid_het 184 241 0.000000000.7634855
singleton mixture_1 190 241 0.000000000.7883817
singleton shared 108 241 0.000000000.4481328
singleton singleton 196 241 0.000000000.8132780

FDR

In [12]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
patternmethodvalidtotalfdr
high_het high_het 238 259 0.08108108
high_het identity 240 260 0.07692308
high_het low_het 241 257 0.06225681
high_het mid_het 241 259 0.06949807
high_het mixture_1 235 245 0.04081633
high_het shared 264 267 0.01123596
high_het singleton 683 726 0.05922865
identity high_het 254 271 0.06273063
identity identity 255 272 0.06250000
identity low_het 255 269 0.05204461
identity mid_het 254 269 0.05576208
identity mixture_1 249 267 0.06741573
identity shared 276 292 0.05479452
identity singleton 750 806 0.06947891
low_het high_het 230 254 0.09448819
low_het identity 230 254 0.09448819
low_het low_het 230 251 0.08366534
low_het mid_het 230 253 0.09090909
low_het mixture_1 228 246 0.07317073
low_het shared 243 260 0.06538462
low_het singleton 660 711 0.07172996
mid_het high_het 239 257 0.07003891
mid_het identity 239 256 0.06640625
mid_het low_het 242 257 0.05836576
mid_het mid_het 241 256 0.05859375
mid_het mixture_1 234 251 0.06772908
mid_het shared 264 277 0.04693141
mid_het singleton 718 775 0.07354839
mixture01 high_het 231 252 0.08333333
mixture01 identity 231 252 0.08333333
mixture01 low_het 224 245 0.08571429
mixture01 mid_het 229 250 0.08400000
mixture01 mixture_1 226 241 0.06224066
mixture01 shared 205 217 0.05529954
mixture01 singleton 565 604 0.06456954
shared high_het 245 269 0.08921933
shared identity 244 269 0.09293680
shared low_het 245 267 0.08239700
shared mid_het 245 268 0.08582090
shared mixture_1 245 261 0.06130268
shared shared 243 267 0.08988764
shared singleton 700 763 0.08256881
singleton high_het 185 211 0.12322275
singleton identity 190 214 0.11214953
singleton low_het 185 206 0.10194175
singleton mid_het 184 210 0.12380952
singleton mixture_1 190 201 0.05472637
singleton shared 108 116 0.06896552
singleton singleton 196 205 0.04390244

Top-hit rate (how often the strongest SNP is causal)

In [20]:
top_hit = aggregate(top_hit ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
top_rate = merge(top_hit, total_true, by = c("pattern", "method"))
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate
patternmethodtop_hittotal_truetop_rate
high_het high_het 174 300 0.58000000
high_het identity 174 300 0.58000000
high_het low_het 171 300 0.57000000
high_het mid_het 172 300 0.57333333
high_het mixture_1 164 300 0.54666667
high_het shared 99 300 0.33000000
high_het singleton 404 300 1.34666667
identity high_het 174 300 0.58000000
identity identity 174 300 0.58000000
identity low_het 174 300 0.58000000
identity mid_het 174 300 0.58000000
identity mixture_1 167 300 0.55666667
identity shared 71 300 0.23666667
identity singleton 359 300 1.19666667
low_het high_het 167 300 0.55666667
low_het identity 166 300 0.55333333
low_het low_het 167 300 0.55666667
low_het mid_het 167 300 0.55666667
low_het mixture_1 155 300 0.51666667
low_het shared 109 300 0.36333333
low_het singleton 380 300 1.26666667
mid_het high_het 167 300 0.55666667
mid_het identity 167 300 0.55666667
mid_het low_het 169 300 0.56333333
mid_het mid_het 167 300 0.55666667
mid_het mixture_1 158 300 0.52666667
mid_het shared 97 300 0.32333333
mid_het singleton 456 300 1.52000000
mixture01 high_het 144 300 0.48000000
mixture01 identity 144 300 0.48000000
mixture01 low_het 145 300 0.48333333
mixture01 mid_het 143 300 0.47666667
mixture01 mixture_1 144 300 0.48000000
mixture01 shared 107 300 0.35666667
mixture01 singleton 342 300 1.14000000
shared high_het 158 300 0.52666667
shared identity 158 300 0.52666667
shared low_het 159 300 0.53000000
shared mid_het 158 300 0.52666667
shared mixture_1 159 300 0.53000000
shared shared 158 300 0.52666667
shared singleton 382 300 1.27333333
singleton high_het 94 300 0.31333333
singleton identity 93 300 0.31000000
singleton low_het 93 300 0.31000000
singleton mid_het 93 300 0.31000000
singleton mixture_1 85 300 0.28333333
singleton shared 8 300 0.02666667
singleton singleton 87 300 0.29000000

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago