Moving on to multivariate analysis we start with some performance benchmarks.
The relevant material are in a document on Overleaf shared with project collaborators and is being actively developed. I will not recap it in this note. However the narrative below will follow from the structure in that document.
It is however useful to recap the patterns of sharing we evaluate in benchmark. They include:
dict(identity=0.1,equal_effects=0.2,singleton=0.2,simple_het_1=0.1,simple_het_2=0.1,simple_het_3=0.1,null=0.2)
As a first pass we use $R=5$ conditions on 100 toy data-sets. We put together evaluations of M&M CS for experiments under several patterns of sharing as documented in the section above. To fit M&M this time we simply input the underlying priors $U$ (and their weights in the context of mixture simlulation) and residual covariance $V$.
./finemap.dsc --target first_pass
This benchmark takes roughly 10 minutes to complete using my 40 core desktop server.
%cd ~/GIT/github/mnm-twas/dsc
library('dscrutils')
out = dscquery('finemap_output', "hundred_data.dataset sharing_pattern.n_signal susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top")
head(out)
res = out[,c(3,4,5,6,7,8,9)]
colnames(res) = c('pattern', 'total_true', 'total', 'valid', 'size', 'purity', 'top_hit')
aggregate(purity~pattern, res, mean)
aggregate(size~pattern, res, median)
valid = aggregate(valid ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
power = merge(valid, total_true, by = "pattern")
power$power = power$valid/power$total_true
power
valid = aggregate(valid ~ pattern, res, sum)
total = aggregate(total ~ pattern, res, sum)
fdr = merge(valid, total, by = "pattern")
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
top_hit = aggregate(top_hit ~ pattern, res, sum)
total_true = aggregate(total_true ~ pattern, res, sum)
top_rate = merge(top_hit, total_true, by = "pattern")
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate