Multivariate Bayesian variable selection regression

A modular approach to M&M ASH model

Update: for a more formal version of this documentation please see this overleaf writeup.

Motivation

In genetic association studies there is great interest in finding multiple causal variants (eQTL or GWAS hits). Several methods for fine mapping in univariate association problems (eg, eQTL discovery with single tissues) have been developed, but for multi-condition analysis this problem is more difficult to address. DAP (Wen 2016) uses an MCMC based method (DAP) for fine mapping but it lacks a principled approach to combine genome-wide information to estimate hyperparameters required by the algorithm, due to computational limitations.

On the other side of the coin, linkage Disequilibrium (LD) can impact effect estimates in association analysis. In the case of univariate regression (snp-by-snp analysis) LD causes effect size estimates not of the true effect of each SNP, but the "LD-convolved" effect that essentially is the combined effect (the sum) of all SNPs in LD with the SNP of interest. So not only are these estimates correlated but more fundamentally they are not consistent estimate of multiple regression (multiple SNP per phenotype) coefficients, when there are indeed more than one causal variant for the condition of interest. The same argument can be generalized to conventional multivariate regression.

A particular type of multivariate regression, the mash (Urbut 2016) paper that uses summary statistics from simple regression is also influenced in the same manner. By taking LD-convoluted effect size estimate as input, the resulting output of mash is not consistent estimate of "the truth" (ie, if all SNPs are typed, then in the absence of population structure a multiple-SNP regression estimate can be interpreted as the causal effects), and our application of mash in GTEx data discovered eQTLs having opposite direction of effect size in brain vs non-brain tissues. However we suspect this is more likely due to multiple eQTLs in negative LD that the current mash procedure failed to account for, rather than true effects being negative.

So we attempt to fix both problems with the M&M ASH model, or m&m for short hereafter. We have developed with m&m assuming identity covariance and m&m assuming diagonal plus low rank covariance, along with a very first draft implementation, based on the variational inference framework similar to varbvs (Carbonatto 2012) and mr-ash to hand issues with LD, yet in the multivariate notation that fits into the mash mixture model. At the same time there are other potentially connected work in the lab including rss (Zhu 2016), which can be extended to the summary statistics version of mr-ash, and BMASS (Turchin), which can be considered a special case of mash (if we are willing to use known residual covariance matrix). Building m&m from scratch already involves implementing varbvs / mr-ash and mash as special cases; adding other special cases to m&m seems overly ambitious, and may result in a monster (rather than master!) algorithm / software that claims to do everything yet excels in nothing (in terms of performance) compared to existing individual pieces that have already been carefully designed, well engineered, extensively tested and properly documented*. Therefore we want to adopt a modular design to m&m that harnesses, rather than reinvent, all related Stephens lab work.

* one aspect the monster algorithm is surely to suffer is that the mixture components has to be initialized up-front, that means it has to be computed with LD-convoluted effect size estimates. But the modular approach will not have the problem.

The modular m&m should work as follows:

  1. Estimate deconvoluted effect size for each condition
  2. Use 1 to perform mash to obtain hyperparameters (mixture proportions) at genome level
  3. Trait analysis units (genes, LD blocks) independent and use prior from 2 to initialize multivariate fine mapping method such as DAP.

Step one: deconvoluted effect size estimation

The goal here is to obtain LD-deconvoluted effect size estimates for each condition separately (univariate analysis). This is a variable selection problem that can be addressed with a number of existing methods but we'll use mr-ash because 1) the idea of ASH is powerful and 2) the variational EM algorithm (VEM) is efficient.

There is identifiability problem with mr-ash because VEM will converge to local optima and the effect size it reports for the SNP identified may not be the causal SNP. But it is irrelevant because as previously mentioned we'll take a hybrid approach, ie, use VEM to estimate hyperparameters, and condition MCMC on these hyperparameter estimates to perform fine-mapping with these hyperparameters. It is in fact advantageous to use VEM at this stage -- as precursor to mash we would prefer sparse solution (via VEM) rather than averaged solution (via MCMC), because sparsed mode is more representative than mean of modes.

However we do have to ensure that each of mr-ash analysis on different conditions will give the same sparse mode. One way to ensure this is to assume effects across all conditions are the same and fit a fixed effect model, using estimates from there to initialize mr-ash hoping the VEM converge to the same mode this way. But Wei has pointed out that VEM computation always favors $X_1$ in the block of $X_1, ..., X_p$, thus as long as the order of $X$ remains the same across univariate analysis we should be good.

Another issue to verify is that the hyperparameters $\mathbf{\pi}$ obtained via running ash on effect size estimates from mr-ash agrees with those estimated from mr-ash. If this is verified, we can claim that the hyperparameters estimated by mash using mr-ash deconvoluted data is equivalent to fitting the monster algorithm.

Also, instead of using posterior mean of $\beta$ from mr-ash, we should use likelihood:

$$P(Y|\beta_1) \propto \frac{P(\beta_1|Y)}{P(\beta_1)}$$

where $P(\beta_1)$ is available from the VEM updates.

Step two: multivariate analysis on deconvoluted effect size estimates

We use mash to perform another shrinkage in order to learn typical patterns of sparsity, sharing and correlations among effects, and pass this knowledge to the next step. First, as with mash we initialize $U_k$ using a number of decompositions on covariance matrix of effect size. But I'm not sure about whether we still take the "top SNP" per unit (gene) across conditions?

Then we estimate $U_k$ via some deconvolution method (Bovy or Gerard), use the result in the mixture prior and write up the likelihood of $\hat{\beta}$ to feed into the algorithm for mixture proportion estimation (a convex problem).

Step three: fine-mapping for eQTL discovery

To generate posterior mean of effect size -- the fine-mapping. DAP can be applied in parallel to all units using the same prior from previous step. I still have to read about it.

What to do with mr-ash?

This is a separate topic from m&m yet it is important because mr-ash is the first step in m&m. Currently we have these verions of mr-ash:

  • Full data versions:
    • matlab version (varbvs by Peter)
    • R version (mvash by Wei)
    • R/C++ version (a special case of mnmashr by Gao)
  • Summary data version:
    • matlab version (rss by Xiang)
    • R version (by Nick)

For the full data version we'll consolidate our efforts to varbvs package. I had a conversation with Peter: it is not hard, according to him, to come up with an R/C version that fits in the varbvs engineering framework. As a first pass m&m will start from full data, because we do have the luxury to access individual level GTEx data and for eQTL problem the scale of computation per gene is not big enough to induce the need of approximating $X^TX$ with banded structure and use summary version of mr-ash.

As for where mr-ash paper goes, we've previously agreed to write it up as an extension to the large scale BVS problem using adaptive shrinkage prior, and to provide better estimate of effect size distribution due to "adaptive" (meaning better) shrinkage. It is more interesting to make a non-trivial application in the paper than to just focus on selling the method part. Xiang Zhu is planning on performing effect size distribution estimation using summary data. We'd like to apply mr-ash on traits in UK Biobank data to provide the community a proper GWAS analysis which by itself would be a great application but using mr-ash for it makes it even greater!

Next action items

Will discuss it on github.


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago