The goal of this benchmark is to understand behavior of susie
fits under a number of simple setup and flavors of susie
method.
%revisions --source
We report
We do not report lfsr
for CS because in univariate case we are not interested in the sign of effect, nor their effect estimates (?). But in multivariate case it is relevant.
We use a very simple linear regression model where causal effects randomly falls on SNPs in the region of interest. Parameters include:
We try out different flavors of susie that:
We select randomly 50 genes from GTEx, using a range of 1000 SNPs from the TSS up / down stream.
All simulation codes (including more simulation used elsewhere) are done in DSC here.
Here are some notebooks that specifically focus on outcome from the phenotype simulation code, including many fancier (perhaps overly-engineered) simulation models:
Here are some very prelimary results exploring SSE model behavior. Some might be out-dated and has errors.
For each susie fit (per scenario per method flavor) I plot:
Results:
FIXME: comment on some interesting observations. eg #552.
For each susie fit, we can evaluate each CS identified the minimum LD vs the size of set. We can plot this for each $L$. We pool results from all genes into one plot. Here is a 160 row table of the said plots.
FIXME: comment on some interesting observations.
FIXME: more justification?
From the purity vs size plots above, we observe that LD purity seems a good criteria to define mappable CS. Using any small threshold already seem reasonable, although we will use a threshold of 0.2.
To evaluate how reliable our CS are (false positives), we plot percentage of CS that contains causal SNPs under various LD filters. We expect to see roughly 95% CS in fact has captured a signal. See plot below:
%preview ../dsc/benchmark/ld_20180516.png
Here on average we do have 95% CS that contains signals. However the variance seems big. For the outlier case, it means that in some scenarios when we only keep very "pure" CS we will miss all signals -- in other words the pure CS is false positive; or, for some senarios (the one with zero%) there is no power, or simply nothing, at all.
In plot above we found that there seem "outlier" cases that our CS has very low capture rate, or in other words, high false positive.
So for eg LD cutoff 0.25, we show above what the 160 scenarios x methods combinations look like, for "acceptables" (over 90% CS captures something) vs outliers (otherwise).
What we learned:
We compare susie with DAP and CAVIAR in terms of PIP of SNPs and ROC from pulled simulations.
For the PIP comparsion we want to check if susie provides "reasonable" PIP, computed by the variational parameters $\mathbf{\alpha}$: $$\text{PIP}:= 1 - \prod_l(1-\alpha_l)$$
We select which SNPs to compare:
The trends largely agree, but it is difficult to make a quantitive evaluation because here we do not distinguish between true and false signals.
From simulation setting we know the true underlying "signal". For all sets identifyed by DAP (defined by cluster prob > 0.95) and susie (defined by mappable 95% CS) we count for each analysis replicate (simulated gene):
We see that susie performs better in terms of both sensitivity and specificity. Choice of susie prior does not seem to matter big deal:
FIXME: will have to double-check my code, or to look into DAP results deeper, in case it is a mistake I made that discredits DAP