Setup of SuSiE benchmark¶

The goal of this benchmark is to understand behavior of susie fits under a number of simple setup and flavors of susie method.

%revisions --source

What do we report from susie?¶

We report

95% (Bayesian) confidence sets (CS): a. Size of a set: number of variables it includes b. Purity of a set: smallest pair-wise LD
PIP per SNP, defined by $1 - \prod_{l\in M}(1-\alpha_l)$, where $l \in M$ means (perhaps) we should only consider "mappable" CS.

We do not report lfsr for CS because in univariate case we are not interested in the sign of effect, nor their effect estimates (?). But in multivariate case it is relevant.

Data used¶

We use data from GTEx for eQTL mapping.

Simulation setup¶

We use a very simple linear regression model where causal effects randomly falls on SNPs in the region of interest. Parameters include:

Total PVE of causal variants (to determine residual variance)
Number of causal variants

We try out different flavors of susie that:

Fix residual variance or estimate it
Different values of prior

We select randomly 50 genes from GTEx, using a range of 1000 SNPs from the TSS up / down stream.

All simulation codes (including more simulation used elsewhere) are done in DSC here.

Here are some notebooks that specifically focus on outcome from the phenotype simulation code, including many fancier (perhaps overly-engineered) simulation models:

Univeriate mixture model and LD situation
Univariate mixture model with some results
A draft version of multivariate mixture model -- this version has been refactored in the current DSC

How does the SSE model work?¶

Here are some very prelimary results exploring SSE model behavior. Some might be out-dated and has errors.

Joint analysis using the first prototype of M&M implementation (Python code) and the 2nd prototype of M&M (R code). It appears promising that signal between the 2 examined conditions are shared.
Univeriate analysis with susie also showing CS computed.

How do we summarize and present these information¶

For individual susie fits¶

For each susie fit (per scenario per method flavor) I plot:

Scatter plot of the true effect size
Scatter plot of the fitted effect size
Line plot of each CS discovered

Results:

Here is a link to a table of 8000 rows of gene + scenario + methods combination, with the said plots displayed per row.
Here is the workflow that generates the table.
Here are some interactive analysis of the workflow above.

FIXME: comment on some interesting observations. eg #552.

For purity of fits¶

For each susie fit, we can evaluate each CS identified the minimum LD vs the size of set. We can plot this for each $L$. We pool results from all genes into one plot. Here is a 160 row table of the said plots.

FIXME: comment on some interesting observations.

How do we define mappable CS?¶

FIXME: more justification?

From the purity vs size plots above, we observe that LD purity seems a good criteria to define mappable CS. Using any small threshold already seem reasonable, although we will use a threshold of 0.2.

To evaluate how reliable our CS are (false positives), we plot percentage of CS that contains causal SNPs under various LD filters. We expect to see roughly 95% CS in fact has captured a signal. See plot below:

%preview ../dsc/benchmark/ld_20180516.png

Here on average we do have 95% CS that contains signals. However the variance seems big. For the outlier case, it means that in some scenarios when we only keep very "pure" CS we will miss all signals -- in other words the pure CS is false positive; or, for some senarios (the one with zero%) there is no power, or simply nothing, at all.

The workflow that generates the figure

Focus on LD cutoff 0.25¶

In plot above we found that there seem "outlier" cases that our CS has very low capture rate, or in other words, high false positive.

So for eg LD cutoff 0.25, we show above what the 160 scenarios x methods combinations look like, for "acceptables" (over 90% CS captures something) vs outliers (otherwise).

The codes and plot

What we learned:

indeed we see that the false positives are near null cases
being "conservative" in residual (est_residual=False) is better

Comparison with DAP and CAVIAR¶

We compare susie with DAP and CAVIAR in terms of PIP of SNPs and ROC from pulled simulations.

PIP¶

For the PIP comparsion we want to check if susie provides "reasonable" PIP, computed by the variational parameters $\mathbf{\alpha}$: $$\text{PIP}:= 1 - \prod_l(1-\alpha_l)$$

We select which SNPs to compare:

SNPs must be "mappable": that is, we used min(LD) threshold 0.2 to filter CS
For SNPs inside CS thus generated, we look at their PIP, and extract PIP for these SNPs from DAP and CAVIAR, and make scattered plot with correlation annotated.

The codes and plot

The trends largely agree, but it is difficult to make a quantitive evaluation because here we do not distinguish between true and false signals.

ROC¶

From simulation setting we know the true underlying "signal". For all sets identifyed by DAP (defined by cluster prob > 0.95) and susie (defined by mappable 95% CS) we count for each analysis replicate (simulated gene):

Number of times a true signal is captured by a set (true positives)
Number of times that a set does not capture any true signals (false positives)

We see that susie performs better in terms of both sensitivity and specificity. Choice of susie prior does not seem to matter big deal:

The codes and result

FIXME: will have to double-check my code, or to look into DAP results deeper, in case it is a mistake I made that discredits DAP

Revision	Author	Date	Message
d4e8a8c	Gao Wang	2018-05-24	Update documentation
ff41e21	Gao Wang	2018-05-24	Add benchmark results comment