I look at various types of underlying mixture distribution of effect size, including those explored in Figure 2 of ash paper, with different proportions of $\pi_0 \in \{0.5, 0.9, 0.995, 0.999\}$, and different genotype structure

In addition to using the original genotype data we also created a genotype data without any LD -- we permute for each variant the sample genotypes to break any potential inter-variant correlations. Here is what we get for the LD structure, compared with the original:

One replicate multiple scenarios
sos run 20170624_Simulation_Procedures.ipynb --pi0 0.5 0.9 0.995 0.999 \
--shape spiky near_normal flat_top skew big_normal 'bimodal' \
--n_rep 1
sos run 20170624_Simulation_Procedures.ipynb --pi0 0.5 0.9 0.995 0.999 \
--shape spiky near_normal flat_top skew big_normal 'bimodal' \
--n_rep 1 --permute_genotype True
One scenario multiple replicates
sos run 20170624_Simulation_Procedures.ipynb --pi0 0.999 \
--shape near_normal \
--n_rep 20
sos run 20170624_Simulation_Procedures.ipynb --pi0 0.999 \
--shape near_normal \
--n_rep 20 --permute_genotype True
For simulated data we perform 3 analysis:
lm())mr-ash analysis ash analysis using results from 1Results are stored at: /project/compbio/GTEx_eQTL/MRASH_results/Simulation.
These are results with permuted in the file name.
mr-ash will generate a quite sparse solution with much over estimated $\hat{\sigma}$. In the case of 50% signal mr-ash estimates $\hat{\sigma} > 4000$ whereas the truth is 1. (see file 0p5_big_normal_1.expr.analyzed.pdf). There is a big difference between mr-ash and ash results, too. Effect size estimate of mr-ash is very problematic.0p5_near_normal_1.expr.analyzed.pdf and 0p5_spiky_1.expr.analyzed.pdf)ash and mr-ash can do quite good job. There does not seem to have a particular harm using mr-ash. (see file 0p999_big_normal_1.expr.analyzed.pdf)mr-ash and ash in the more realistic scenario above is even less obvious when signal is more spiky. (see file 0p999_near_normal_1.expr.analyzed.pdf and 0p999_spiky_1.expr.analyzed.pdf). In fact both mr-ash and ash recovers the true signal less well than the more slap situation (but maybe slap situation is more realistic?).These are results without permuted in the file name.
mr-ash suffers the same problem as before though the estimate of effect size is much better. But in this scenario ash result over estimates effect size even more than plain univariate analysis. (see file 0p5_big_normal_1.expr.analyzed.pdf)0p999_big_normal_1.expr.analyzed.pdf) the advantage of mr-ash is obvious. However there is still a slight over-estimate of $\hat{\sigma}$.mr-ash does better than ash. (see file 0p999_near_normal_1.expr.analyzed.pdf and 0p999_spiky_1.expr.analyzed.pdf)Here we compare CDF of estimates with the truth for multiple replicates. Here I only looked what we think the most "realistic" situation: $\pi_0 = 99.9%$ and near-normal mixture. The good result below is consistent with the observation that the signals mr-ash identifies agree with the truth, in this setup.
parameter: fns = glob.glob('/home/gaow/Documents/GTEx/Simulation/TY.genotype.permuted.0p999_near_normal_*.expr.analyzed.rds')
output: '/home/gaow/Documents/GTEx/Simulation/cdf_compare.rds'
R:
x = seq(-6,6,length = 500)
cdf_dat_all = list()
idx = 0
for (d in c(${fns!r,})) {
idx = idx + 1
dat = readRDS(d)
cdf_dat = NULL
for (table in names(dat)) {
res = dat[[table]]
g0 = ashr::normalmix(c(res$meta$pi0, (1 - res$meta$pi0) * res$meta$pi), rep(0, length(res$meta$sigma)+1), c(0, res$meta$sigma))
g1 = ashr::normalmix(res$mr.ash$w, rep(0, length(res$mr.ash$sa)), sqrt(res$mr.ash$sa))
d0 = data.frame(x = x,y = as.numeric(ashr::mixcdf(g0,x)), method="truth", scenario = table)
d1 = data.frame(x = x,y = as.numeric(ashr::mixcdf(g1,x)), method="mr-ash", scenario = table)
if (is.null(cdf_dat)) cdf_dat = rbind(d0, d1)
else cdf_dat = rbind(cdf_dat, d0, d1)
}
cdf_dat_all[[idx]] = cdf_dat
}
saveRDS(cdf_dat_all, file = ${output!r})
For even just one replicate:
dat = readRDS('/home/gaow/Documents/GTEx/Simulation/cdf_compare.rds')
library(ggplot2)
ggplot(dat[[1]], aes(x = x,y = y,color = method)) +
geom_line(lwd = 1.5,alpha = 0.7) + facet_grid(.~scenario) +
theme(legend.position = "bottom")