Multivariate Bayesian variable selection regression

Setup of SuSiE benchmark

The goal of this benchmark is to understand behavior of susie fits under a number of simple setup and flavors of susie method.

In [6]:
%revisions --source
Revision Author Date Message
d4e8a8c Gao Wang 2018-05-24 Update documentation
ff41e21 Gao Wang 2018-05-24 Add benchmark results comment

What do we report from susie?

We report

  1. 95% (Bayesian) confidence sets (CS): a. Size of a set: number of variables it includes b. Purity of a set: smallest pair-wise LD
  2. PIP per SNP, defined by $1 - \prod_{l\in M}(1-\alpha_l)$, where $l \in M$ means (perhaps) we should only consider "mappable" CS.

We do not report lfsr for CS because in univariate case we are not interested in the sign of effect, nor their effect estimates (?). But in multivariate case it is relevant.

Data used

We use data from GTEx for eQTL mapping.

Simulation setup

We use a very simple linear regression model where causal effects randomly falls on SNPs in the region of interest. Parameters include:

  • Total PVE of causal variants (to determine residual variance)
  • Number of causal variants

We try out different flavors of susie that:

  • Fix residual variance or estimate it
  • Different values of prior

We select randomly 50 genes from GTEx, using a range of 1000 SNPs from the TSS up / down stream.

All simulation codes (including more simulation used elsewhere) are done in DSC here.

Here are some notebooks that specifically focus on outcome from the phenotype simulation code, including many fancier (perhaps overly-engineered) simulation models:

How does the SSE model work?

Here are some very prelimary results exploring SSE model behavior. Some might be out-dated and has errors.

How do we summarize and present these information

For individual susie fits

For each susie fit (per scenario per method flavor) I plot:

  • Scatter plot of the true effect size
  • Scatter plot of the fitted effect size
  • Line plot of each CS discovered

Results:

  • Here is a link to a table of 8000 rows of gene + scenario + methods combination, with the said plots displayed per row.
  • Here is the workflow that generates the table.
  • Here are some interactive analysis of the workflow above.

FIXME: comment on some interesting observations. eg #552.

For purity of fits

For each susie fit, we can evaluate each CS identified the minimum LD vs the size of set. We can plot this for each $L$. We pool results from all genes into one plot. Here is a 160 row table of the said plots.

FIXME: comment on some interesting observations.

How do we define mappable CS?

FIXME: more justification?

From the purity vs size plots above, we observe that LD purity seems a good criteria to define mappable CS. Using any small threshold already seem reasonable, although we will use a threshold of 0.2.

To evaluate how reliable our CS are (false positives), we plot percentage of CS that contains causal SNPs under various LD filters. We expect to see roughly 95% CS in fact has captured a signal. See plot below:

In [5]:
%preview ../dsc/benchmark/ld_20180516.png
> ../dsc/benchmark/ld_20180516.png (57.8 KiB):

Here on average we do have 95% CS that contains signals. However the variance seems big. For the outlier case, it means that in some scenarios when we only keep very "pure" CS we will miss all signals -- in other words the pure CS is false positive; or, for some senarios (the one with zero%) there is no power, or simply nothing, at all.

Focus on LD cutoff 0.25

In plot above we found that there seem "outlier" cases that our CS has very low capture rate, or in other words, high false positive.

So for eg LD cutoff 0.25, we show above what the 160 scenarios x methods combinations look like, for "acceptables" (over 90% CS captures something) vs outliers (otherwise).

What we learned:

  1. indeed we see that the false positives are near null cases
  2. being "conservative" in residual (est_residual=False) is better

Comparison with DAP and CAVIAR

We compare susie with DAP and CAVIAR in terms of PIP of SNPs and ROC from pulled simulations.

PIP

For the PIP comparsion we want to check if susie provides "reasonable" PIP, computed by the variational parameters $\mathbf{\alpha}$: $$\text{PIP}:= 1 - \prod_l(1-\alpha_l)$$

We select which SNPs to compare:

  1. SNPs must be "mappable": that is, we used min(LD) threshold 0.2 to filter CS
  2. For SNPs inside CS thus generated, we look at their PIP, and extract PIP for these SNPs from DAP and CAVIAR, and make scattered plot with correlation annotated.

The trends largely agree, but it is difficult to make a quantitive evaluation because here we do not distinguish between true and false signals.

ROC

From simulation setting we know the true underlying "signal". For all sets identifyed by DAP (defined by cluster prob > 0.95) and susie (defined by mappable 95% CS) we count for each analysis replicate (simulated gene):

  1. Number of times a true signal is captured by a set (true positives)
  2. Number of times that a set does not capture any true signals (false positives)

We see that susie performs better in terms of both sensitivity and specificity. Choice of susie prior does not seem to matter big deal:

FIXME: will have to double-check my code, or to look into DAP results deeper, in case it is a mistake I made that discredits DAP


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago