Multivariate Bayesian variable selection regression

M&M ASH benchmark V

This is a continuation of Part V where I set total PVE is set to 0.15 and assume 2 causal variables per region. But here, the two SNPs have the same effects sampled from the multivariate distribution. Also I use $R = 5$ conditions and run it on $J=1000$ and 150 genes.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

  1. The "oracle" prior is always better than using other priors, for all scenarios.
  2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

Conclusion

...

The benchmark was executd on UChicago midway

./finemap.dsc --host mnm_R5.yml --R 5

This executes the default pipeline in finemap.dsc file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc
/scratch/midway2/gaow/GIT/github/mnm-twas/dsc
In [ ]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap", omit.file.columns = T, verbose = F)
end_time <- Sys.time()
In [ ]:
saveRDS(out, 'finemap_output/benchmark_v.rds')
In [4]:
# out = readRDS('finemap_output/benchmark_v.rds')
In [3]:
end_time - start_time
Time difference of 2.014167 mins
In [5]:
head(out)
DSCsharing_patternmnmmnm.eff_modesusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.topsusie_scores.n_causalsusie_scores.included_causalsusie_scores.overlap
1 identity mnm_identityidentity 1 1 1.0 1.0000000 1 1 1 0
1 identity mnm_identityidentity 2 2 3.5 0.9813743 1 2 2 0
1 identity mnm_identityidentity 1 1 2.0 1.0000000 1 1 1 0
1 identity mnm_identityidentity 1 1 6.0 1.0000000 0 1 1 0
1 identity mnm_identityidentity 1 1 17.0 0.9615369 0 1 1 0
1 identity mnm_identityidentity 3 2 3.0 0.9806151 1 2 2 0
In [6]:
res = out[,c(2,4,5,6,7,8,9,10,11,12)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap')

Purity of CS

In [7]:
aggregate(purity~pattern + method, res, mean)
patternmethodpurity
high_het high_het 0.9861015
identity high_het 0.9872195
low_het high_het 0.9836635
mid_het high_het 0.9868067
mixture01high_het 0.9534270
shared high_het 0.9825523
singletonhigh_het 0.8427970
high_het identity 0.9847402
identity identity 0.9870663
low_het identity 0.9827262
mid_het identity 0.9870603
mixture01identity 0.9498540
shared identity 0.9835191
singletonidentity 0.8427763
high_het low_het 0.9854991
identity low_het 0.9866075
low_het low_het 0.9833000
mid_het low_het 0.9859065
mixture01low_het 0.9474465
shared low_het 0.9844297
singletonlow_het 0.8326556
high_het mid_het 0.9864478
identity mid_het 0.9868500
low_het mid_het 0.9840402
mid_het mid_het 0.9864577
mixture01mid_het 0.9526083
shared mid_het 0.9823964
singletonmid_het 0.8327097
high_het mixture_10.9833868
identity mixture_10.9875281
low_het mixture_10.9854068
mid_het mixture_10.9862468
mixture01mixture_10.9550036
shared mixture_10.9848500
singletonmixture_10.8470917
high_het shared 0.8397760
identity shared 0.8470565
low_het shared 0.9676177
mid_het shared 0.9261310
mixture01shared 0.8227725
shared shared 0.9839748
singletonshared 0.4069221
high_het singleton0.8932618
identity singleton0.8987961
low_het singleton0.8890945
mid_het singleton0.8896349
mixture01singleton0.8861702
shared singleton0.8907897
singletonsingleton0.8560818

Size of CS

In [8]:
aggregate(size~pattern+method, res, median)
patternmethodsize
high_het high_het 3.50
identity high_het 3.00
low_het high_het 4.00
mid_het high_het 3.00
mixture01high_het 4.50
shared high_het 4.00
singletonhigh_het 9.00
high_het identity 3.50
identity identity 3.00
low_het identity 4.00
mid_het identity 3.00
mixture01identity 4.25
shared identity 4.00
singletonidentity 8.50
high_het low_het 3.50
identity low_het 3.00
low_het low_het 4.00
mid_het low_het 3.00
mixture01low_het 4.25
shared low_het 4.00
singletonlow_het 9.00
high_het mid_het 3.25
identity mid_het 3.25
low_het mid_het 4.00
mid_het mid_het 3.00
mixture01mid_het 4.75
shared mid_het 4.00
singletonmid_het 9.00
high_het mixture_1 3.50
identity mixture_1 3.00
low_het mixture_1 4.00
mid_het mixture_1 3.00
mixture01mixture_1 4.25
shared mixture_1 4.00
singletonmixture_1 9.00
high_het shared 4.75
identity shared 4.00
low_het shared 5.00
mid_het shared 4.00
mixture01shared 3.25
shared shared 4.00
singletonshared 0.00
high_het singleton11.00
identity singleton 9.25
low_het singleton12.50
mid_het singleton 9.50
mixture01singleton12.25
shared singleton11.00
singletonsingleton 9.00

Power

In [15]:
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power
patternmethodtotal_true_includedtotal_trueoverlappower
high_het high_het 249 272 1.46000000.9154412
high_het identity 250 272 1.49333330.9191176
high_het low_het 248 272 0.00000000.9117647
high_het mid_het 249 272 1.47333330.9154412
high_het mixture_1 241 272 0.00000000.8860294
high_het shared 206 272 9.40000000.7573529
high_het singleton 235 272 209.53333330.8639706
identity high_het 228 244 0.00000000.9344262
identity identity 228 244 0.00000000.9344262
identity low_het 226 244 0.00000000.9262295
identity mid_het 227 244 0.00000000.9303279
identity mixture_1 224 244 0.00000000.9180328
identity shared 182 244 3.78666670.7459016
identity singleton 217 244 133.88000000.8893443
low_het high_het 238 267 0.00000000.8913858
low_het identity 240 267 0.00000000.8988764
low_het low_het 242 267 0.00000000.9063670
low_het mid_het 238 267 0.00000000.8913858
low_het mixture_1 234 267 0.00000000.8764045
low_het shared 238 267 1.58000000.8913858
low_het singleton 231 267 131.75333330.8651685
mid_het high_het 225 253 0.00000000.8893281
mid_het identity 227 253 0.00000000.8972332
mid_het low_het 225 253 0.00000000.8893281
mid_het mid_het 225 253 0.00000000.8893281
mid_het mixture_1 225 253 0.00000000.8893281
mid_het shared 207 253 1.64666670.8181818
mid_het singleton 205 253 152.22666670.8102767
mixture01 high_het 216 257 0.00000000.8404669
mixture01 identity 216 257 0.00000000.8404669
mixture01 low_het 212 257 0.00000000.8249027
mixture01 mid_het 216 257 0.00000000.8404669
mixture01 mixture_1 219 257 0.00000000.8521401
mixture01 shared 189 257 8.75333330.7354086
mixture01 singleton 209 257 166.04666670.8132296
shared high_het 248 270 0.34000000.9185185
shared identity 245 270 0.00000000.9074074
shared low_het 251 270 4.77333330.9296296
shared mid_het 251 270 4.80666670.9296296
shared mixture_1 248 270 0.34666670.9185185
shared shared 254 270 4.74000000.9407407
shared singleton 227 270 212.40666670.8407407
singleton high_het 166 244 0.00000000.6803279
singleton identity 167 244 0.00000000.6844262
singleton low_het 158 244 0.00000000.6475410
singleton mid_het 162 244 0.00000000.6639344
singleton mixture_1 159 244 0.00000000.6516393
singleton shared 71 244 0.00000000.2909836
singleton singleton 162 244 0.00000000.6639344

FDR

In [16]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
patternmethodvalidtotalfdr
high_het high_het 250 270 0.07407407
high_het identity 251 271 0.07380074
high_het low_het 248 265 0.06415094
high_het mid_het 250 268 0.06716418
high_het mixture_1 240 258 0.06976744
high_het shared 220 227 0.03083700
high_het singleton 616 668 0.07784431
identity high_het 227 242 0.06198347
identity identity 227 242 0.06198347
identity low_het 225 239 0.05857741
identity mid_het 226 240 0.05833333
identity mixture_1 223 231 0.03463203
identity shared 202 208 0.02884615
identity singleton 574 604 0.04966887
low_het high_het 238 268 0.11194030
low_het identity 239 269 0.11152416
low_het low_het 242 266 0.09022556
low_het mid_het 238 267 0.10861423
low_het mixture_1 234 254 0.07874016
low_het shared 247 264 0.06439394
low_het singleton 607 666 0.08858859
mid_het high_het 223 246 0.09349593
mid_het identity 225 245 0.08163265
mid_het low_het 223 245 0.08979592
mid_het mid_het 223 246 0.09349593
mid_het mixture_1 223 238 0.06302521
mid_het shared 226 234 0.03418803
mid_het singleton 572 611 0.06382979
mixture01 high_het 212 235 0.09787234
mixture01 identity 212 234 0.09401709
mixture01 low_het 209 233 0.10300429
mixture01 mid_het 212 235 0.09787234
mixture01 mixture_1 215 230 0.06521739
mixture01 shared 196 211 0.07109005
mixture01 singleton 511 564 0.09397163
shared high_het 247 270 0.08518519
shared identity 243 269 0.09665428
shared low_het 251 268 0.06343284
shared mid_het 251 272 0.07720588
shared mixture_1 247 264 0.06439394
shared shared 254 269 0.05576208
shared singleton 622 669 0.07025411
singleton high_het 164 183 0.10382514
singleton identity 165 184 0.10326087
singleton low_het 156 170 0.08235294
singleton mid_het 160 176 0.09090909
singleton mixture_1 157 167 0.05988024
singleton shared 70 72 0.02777778
singleton singleton 160 170 0.05882353

Top-hit rate (how often the strongest SNP is causal)

In [17]:
top_hit = aggregate(top_hit ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
top_rate = merge(top_hit, total_true, by = c("pattern", "method"))
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate
patternmethodtop_hittotal_truetop_rate
high_het high_het 139 272 0.51102941
high_het identity 138 272 0.50735294
high_het low_het 138 272 0.50735294
high_het mid_het 140 272 0.51470588
high_het mixture_1 129 272 0.47426471
high_het shared 112 272 0.41176471
high_het singleton 262 272 0.96323529
identity high_het 135 244 0.55327869
identity identity 135 244 0.55327869
identity low_het 135 244 0.55327869
identity mid_het 135 244 0.55327869
identity mixture_1 133 244 0.54508197
identity shared 105 244 0.43032787
identity singleton 260 244 1.06557377
low_het high_het 135 267 0.50561798
low_het identity 134 267 0.50187266
low_het low_het 133 267 0.49812734
low_het mid_het 135 267 0.50561798
low_het mixture_1 128 267 0.47940075
low_het shared 129 267 0.48314607
low_het singleton 248 267 0.92883895
mid_het high_het 138 253 0.54545455
mid_het identity 139 253 0.54940711
mid_het low_het 137 253 0.54150198
mid_het mid_het 137 253 0.54150198
mid_het mixture_1 139 253 0.54940711
mid_het shared 131 253 0.51778656
mid_het singleton 284 253 1.12252964
mixture01 high_het 115 257 0.44747082
mixture01 identity 115 257 0.44747082
mixture01 low_het 114 257 0.44357977
mixture01 mid_het 115 257 0.44747082
mixture01 mixture_1 117 257 0.45525292
mixture01 shared 107 257 0.41634241
mixture01 singleton 221 257 0.85992218
shared high_het 137 270 0.50740741
shared identity 134 270 0.49629630
shared low_het 138 270 0.51111111
shared mid_het 138 270 0.51111111
shared mixture_1 134 270 0.49629630
shared shared 136 270 0.50370370
shared singleton 257 270 0.95185185
singleton high_het 61 244 0.25000000
singleton identity 63 244 0.25819672
singleton low_het 55 244 0.22540984
singleton mid_het 58 244 0.23770492
singleton mixture_1 61 244 0.25000000
singleton shared 20 244 0.08196721
singleton singleton 64 244 0.26229508

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago