Multivariate Bayesian variable selection regression

M&M ASH benchmark Part II

This is a continuation of Part I where I use only $R=2$ conditions, 1 causal SNP of PVE = 0.05, with simple singleton, identity and fully shared patterns. The goal is to ensure all computations are correct.

Conclusion

Results below seems to make sense, but still:

  1. The power gain of shared compared to identity is minimal
  2. top_hit_rate is lower for shared compared to identity which is a bit puzzling
./finemap.dsc --target sanity_check -o sanity_check
In [43]:
%cd ~/GIT/github/mnm-twas/dsc
/home/gaow/GIT/github/mnm-twas/dsc
In [34]:
library('dscrutils')
out = dscquery('sanity_check', "hundred_data.dataset sharing_pattern.n_signal susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top", groups="sharing_pattern: singleton, identity, shared")
Loading dsc-query output from CSV file.
Reading DSC outputs:
 - sharing_pattern.n_signal: extracted atomic values
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values
In [35]:
head(out)
DSChundred_data.datasetsharing_patternsharing_pattern.n_signalsusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.top
1 ~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds singleton 1 1 1 15 0.931293607944096 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000031823.RDSsingleton 1 1 1 10 0.916386609486197 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000062194.RDSsingleton 1 1 1 8 0.89922481268286 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000073150.RDSsingleton 1 1 1 17 0.965612490834539 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000078319.RDSsingleton 1 1 1 110 0.797004256892404 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000081277.RDSsingleton 1 1 1 13 0.810629508314749 0
In [36]:
out[,c(4,5,6,7,8,9)] = as.numeric(as.matrix(out[,c(4,5,6,7,8,9)]))
res = out[,c(3,4,5,6,7,8,9)]
colnames(res) = c('pattern', 'total_true', 'total', 'valid', 'size', 'purity', 'top_hit')

Purity of CS

In [38]:
aggregate(purity~pattern, res, mean)
patternpurity
identity 0.9401603
shared 0.9442816
singleton0.7290132

Size of CS

In [39]:
aggregate(size~pattern, res, median)
patternsize
identity 4
shared 5
singleton7

Power

In [40]:
valid = aggregate(valid ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
power = merge(valid, total_true, by = "pattern")
power$power = power$valid/power$total_true
power
patternvalidtotal_truepower
identity 97 100 0.97
shared 98 100 0.98
singleton77 100 0.77

FDR

In [41]:
valid = aggregate(valid ~ pattern, res, sum)
total = aggregate(total ~ pattern, res, sum)
fdr = merge(valid, total, by = "pattern")
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
patternvalidtotalfdr
identity 97 98 0.01020408
shared 98 101 0.02970297
singleton 77 81 0.04938272

Top-hit rate (how often the strongest SNP is causal)

In [42]:
top_hit = aggregate(top_hit ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
top_rate = merge(top_hit, total_true, by = "pattern")
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate
patterntop_hittotal_truetop_rate
identity 63 100 0.63
shared 55 100 0.55
singleton30 100 0.30

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago