This is a continuation of Part V where I set total PVE is set to 0.15 and assume 2 causal variables per region. But here, the two SNPs have the same effects sampled from the multivariate distribution. Also I use $R = 5$ conditions and run it on $J=1000$ and 150 genes.
The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:
...
The benchmark was executd on UChicago midway
./finemap.dsc --host mnm_R5.yml --R 5
This executes the default
pipeline in finemap.dsc
file, as of today (2019.02.04).
%cd ~/GIT/github/mnm-twas/dsc
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap", omit.file.columns = T, verbose = F)
end_time <- Sys.time()
saveRDS(out, 'finemap_output/benchmark_v.rds')
# out = readRDS('finemap_output/benchmark_v.rds')
end_time - start_time
head(out)
res = out[,c(2,4,5,6,7,8,9,10,11,12)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap')
aggregate(purity~pattern + method, res, mean)
aggregate(size~pattern+method, res, median)
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap, by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr
top_hit = aggregate(top_hit ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
top_rate = merge(top_hit, total_true, by = c("pattern", "method"))
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate