M&M ASH benchmark Part I¶

Moving on to multivariate analysis we start with some performance benchmarks.

Perviously established benchmark here has roughly implemented the framework of multivariate simulations but was incomplete and never used. The benchmark is now moved to here to facilicate potential collaboration using M&M for prediction. Development of DSC benchmark in this repo is therefore discontinuted.
This website will continue to post benchmark results
Since the application are in the context of GTEx we will be using this repository for the data application, whether it be eQTL or sQTL.

Aim, data, methods and experimental design¶

The relevant material are in a document on Overleaf shared with project collaborators and is being actively developed. I will not recap it in this note. However the narrative below will follow from the structure in that document.

It is however useful to recap the patterns of sharing we evaluate in benchmark. They include:

Condition specific effects
Low, moderate and high correlations across conditions
- including a case of 100% correlated
Mixture settings:
- dict(identity=0.1,equal_effects=0.2,singleton=0.2,simple_het_1=0.1,simple_het_2=0.1,simple_het_3=0.1,null=0.2)

Benchmarks evaluated¶

An initial benchmark as a first pass¶

As a first pass we use $R=5$ conditions on 100 toy data-sets. We put together evaluations of M&M CS for experiments under several patterns of sharing as documented in the section above. To fit M&M this time we simply input the underlying priors $U$ (and their weights in the context of mixture simlulation) and residual covariance $V$.

./finemap.dsc --target first_pass

This benchmark takes roughly 10 minutes to complete using my 40 core desktop server.

%cd ~/GIT/github/mnm-twas/dsc

/home/gaow/GIT/github/mnm-twas/dsc

library('dscrutils')
out = dscquery('finemap_output', "hundred_data.dataset sharing_pattern.n_signal susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top")

Loading dsc-query output from CSV file.
Reading DSC outputs:
 - sharing_pattern.n_signal: extracted atomic values
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values

head(out)

res = out[,c(3,4,5,6,7,8,9)]
colnames(res) = c('pattern', 'total_true', 'total', 'valid', 'size', 'purity', 'top_hit')

Purity of CS¶

aggregate(purity~pattern, res, mean)

Size of CS¶

aggregate(size~pattern, res, median)

Power¶

valid = aggregate(valid ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
power = merge(valid, total_true, by = "pattern")
power$power = power$valid/power$total_true
power

FDR¶

valid = aggregate(valid ~ pattern, res, sum)
total = aggregate(total ~ pattern, res, sum)
fdr = merge(valid, total, by = "pattern")
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr

Top-hit rate (how often the strongest SNP is causal)¶

top_hit = aggregate(top_hit ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
top_rate = merge(top_hit, total_true, by = "pattern")
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate

pattern	purity
high_het	0.9950535
identity	0.9952877
low_het	0.9939219
mid_het	0.9873239
mixture01	0.8644837
shared	0.9954203
singleton	0.9744209

pattern	size
high_het	2.00
identity	1.25
low_het	2.00
mid_het	2.00
mixture01	2.00
shared	2.00
singleton	4.25

pattern	valid	total_true	power
high_het	120	123	0.9756098
identity	117	122	0.9590164
low_het	115	123	0.9349593
mid_het	125	130	0.9615385
mixture01	95	135	0.7037037
shared	118	136	0.8676471
singleton	126	130	0.9692308

pattern	valid	total	fdr
high_het	120	123	0.024390244
identity	117	126	0.071428571
low_het	115	121	0.049586777
mid_het	125	128	0.023437500
mixture01	95	98	0.030612245
shared	118	119	0.008403361
singleton	126	127	0.007874016

pattern	top_hit	total_true	top_rate
high_het	73	123	0.5934959
identity	77	122	0.6311475
low_het	77	123	0.6260163
mid_het	92	130	0.7076923
mixture01	63	135	0.4666667
shared	79	136	0.5808824
singleton	69	130	0.5307692

DSC	hundred_data.dataset	sharing_pattern	sharing_pattern.n_signal	susie_scores.total	susie_scores.valid	susie_scores.size	susie_scores.purity	susie_scores.top
1	~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds	identity	2	2	2	8	0.9921241	0
1	~/Documents/GTExV8/Toys/Thyroid.ENSG00000031823.RDS	identity	1	1	1	1	1.0000000	1
1	~/Documents/GTExV8/Toys/Thyroid.ENSG00000062194.RDS	identity	1	1	1	1	1.0000000	1
1	~/Documents/GTExV8/Toys/Thyroid.ENSG00000073150.RDS	identity	1	1	1	1	1.0000000	1
1	~/Documents/GTExV8/Toys/Thyroid.ENSG00000078319.RDS	identity	1	1	1	1	1.0000000	1
1	~/Documents/GTExV8/Toys/Thyroid.ENSG00000081277.RDS	identity	1	1	1	2	0.9953505	1