Multivariate Bayesian variable selection regression

M&M ASH benchmark Part III

This is a continuation of Part II where instead of looking at 1 causal SNP of PVE = 0.05 I look at 2 causal SNPs with total PVE set to 0.15.

Conclusion

The expected increase trend in power of singleton, identity and shared is completely reversed!

./finemap.dsc --target sanity_check -o sanity_check3 -c 39
In [10]:
%cd ~/GIT/github/mnm-twas/dsc
/home/gaow/GIT/github/mnm-twas/dsc
In [11]:
library('dscrutils')
out = dscquery('sanity_check3', "hundred_data.dataset sharing_pattern.n_signal susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top", groups="sharing_pattern: singleton, identity, shared")
Loading dsc-query output from CSV file.
Reading DSC outputs:
 - sharing_pattern.n_signal: extracted atomic values
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values
In [12]:
head(out)
DSChundred_data.datasetsharing_patternsharing_pattern.n_signalsusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.top
1 ~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds singleton 2 2 2 7.5 0.997902685565869 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000031823.RDSsingleton 2 1 1 1 1 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000062194.RDSsingleton 2 2 2 8.5 0.980400377080065 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000073150.RDSsingleton 2 1 1 7 0.984587614689054 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000078319.RDSsingleton 2 1 1 2 0.999964909747404 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000081277.RDSsingleton 2 1 1 25 0.966136167252649 0
In [13]:
out[,c(4,5,6,7,8,9)] = as.numeric(as.matrix(out[,c(4,5,6,7,8,9)]))
res = out[,c(3,4,5,6,7,8,9)]
colnames(res) = c('pattern', 'total_true', 'total', 'valid', 'size', 'purity', 'top_hit')

Purity of CS

In [14]:
aggregate(purity~pattern, res, mean)
patternpurity
identity 0.9652615
shared 0.9714742
singleton0.9612232

Size of CS

In [15]:
aggregate(size~pattern, res, median)
patternsize
identity 7.25
shared 4.75
singleton7.00

Power

In [16]:
valid = aggregate(valid ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
power = merge(valid, total_true, by = "pattern")
power$power = power$valid/power$total_true
power
patternvalidtotal_truepower
identity 148 200 0.740
shared 126 200 0.630
singleton157 200 0.785

FDR

In [17]:
valid = aggregate(valid ~ pattern, res, sum)
total = aggregate(total ~ pattern, res, sum)
fdr = merge(valid, total, by = "pattern")
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
patternvalidtotalfdr
identity 148 157 0.05732484
shared 126 135 0.06666667
singleton 157 159 0.01257862

Top-hit rate (how often the strongest SNP is causal)

In [18]:
top_hit = aggregate(top_hit ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
top_rate = merge(top_hit, total_true, by = "pattern")
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate
patterntop_hittotal_truetop_rate
identity 79 200 0.395
shared 75 200 0.375
singleton77 200 0.385

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago