Multivariate Bayesian variable selection regression

M&M ASH benchmark Part IV

This is a continuation of Part III where instead of looking at 1 causal SNP of PVE = 0.05 I look at a range of causal SNPs per gene with 50% having 1 causal, 30% two causal and 20% three causal. The total PVE is set to 0.15.

Conclusion

identity is still better than singleton but result for shared does not make sense.

./finemap.dsc --target sanity_check -o sanity_check2 -c 39
In [1]:
%cd ~/GIT/github/mnm-twas/dsc
/home/gaow/GIT/github/mnm-twas/dsc
In [2]:
library('dscrutils')
out = dscquery('sanity_check2', "hundred_data.dataset sharing_pattern.n_signal susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top", groups="sharing_pattern: singleton, identity, shared")
Loading dsc-query output from CSV file.
Reading DSC outputs:
 - sharing_pattern.n_signal: extracted atomic values
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values
In [3]:
head(out)
DSChundred_data.datasetsharing_patternsharing_pattern.n_signalsusie_scores.totalsusie_scores.validsusie_scores.sizesusie_scores.puritysusie_scores.top
1 ~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds singleton 2 2 2 15.5 0.977735077359261 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000031823.RDSsingleton 3 1 1 1 1 1
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000062194.RDSsingleton 1 1 1 26 0.95151813430003 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000073150.RDSsingleton 1 1 1 12 0.957423948447025 0
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000078319.RDSsingleton 2 2 2 2 1 2
1 ~/Documents/GTExV8/Toys/Thyroid.ENSG00000081277.RDSsingleton 2 2 2 3.5 0.933894486138865 1
In [4]:
out[,c(4,5,6,7,8,9)] = as.numeric(as.matrix(out[,c(4,5,6,7,8,9)]))
res = out[,c(3,4,5,6,7,8,9)]
colnames(res) = c('pattern', 'total_true', 'total', 'valid', 'size', 'purity', 'top_hit')

Purity of CS

In [5]:
aggregate(purity~pattern, res, mean)
patternpurity
identity 0.9767568
shared 0.9812358
singleton0.9686577

Size of CS

In [6]:
aggregate(size~pattern, res, median)
patternsize
identity 5.00
shared 5.25
singleton7.00

Power

In [7]:
valid = aggregate(valid ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
power = merge(valid, total_true, by = "pattern")
power$power = power$valid/power$total_true
power
patternvalidtotal_truepower
identity 146 173 0.8439306
shared 115 163 0.7055215
singleton132 161 0.8198758

FDR

In [8]:
valid = aggregate(valid ~ pattern, res, sum)
total = aggregate(total ~ pattern, res, sum)
fdr = merge(valid, total, by = "pattern")
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
patternvalidtotalfdr
identity 146 152 0.03947368
shared 115 123 0.06504065
singleton 132 135 0.02222222

Top-hit rate (how often the strongest SNP is causal)

In [9]:
top_hit = aggregate(top_hit ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
top_rate = merge(top_hit, total_true, by = "pattern")
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate
patterntop_hittotal_truetop_rate
identity 84 173 0.4855491
shared 55 163 0.3374233
singleton68 161 0.4223602
In [ ]:


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago