mvSuSiE

Multivariate Bayesian variable selection via Adaptive Shrinkage, with applications to eQTL fine-mapping


Background

This project is mainly aimed at fine-mapping for eQTL discovery in multiple tissues. From methodology point of view it introduces an Empirical Bayes approach, ie, adaptive shrinkage for solving multivariate, multiple regression problem; hence the project code name m&m ash.

We started building the m&m model in summer 2016. It is essentially a multivariate version of the variational algorithm that does univariate regression with adaptive shrinkage method (credit to Wei and Peter). We have developed a version assuming identity covariance and a version assuming diagonal + low rank covariance (credit to David), along with an implementation. We then realized several limitations to it, from modeling and computational espects, that made us reconsider the design after some attempts to alleviate these issues (limiting to a simpler problem and combine genome-wide information efficiently).

In early 2017 we decide to take a modular approach to the problem. Essentially it involves combing in a smart way the univariate mr-ash (multiple regression with adaptive shrinkage, under development) for model selection and mash for multivariate analysis. With the release of GTEx V7 data (imputation, annotation + formatting and a number of other bioinformatics steps) our new method can be applied and demonstrated.

During our exploration of the modular approach we realized that the variational algorithm behind mr-ash has various issues, among which the most obvious are sensitivity to initialization and producing produces false positives. It is a large scale variable selection model good for prediction, but not fine-mapping. Again we got stuck.

Finally in late 2017 Matthew has came up with a new variational algorithm, inspired by FLASH, that natually result in fine-mapping results by design, with similar intuition as "conditional regression" (a simple ad hoc approach to do preliminary fine-mapping in the early days). A connection with single effect analysis was later made: turns out updating each effect, conditional on others, is a normal-means problem we are familiar with -- in the M&M setting the MASH updates. We thus call the fine-mapping model SuSiE, for Sum of Single Effects.

Univeriate analysis (variable selection and fine-mapping)

The first module is to perform multiple regression with adaptive shrinkage for variable selection and fine-mapping

mr-ash model

This is being developed and implemented in the varbvs package. We have created simulation data based on GTEx. Here are some results:

  • Simulation study results -- we've observed that mr-ash tend to over-shrink signals.
  • Real-data analysis code and result -- sensitive to initialization and false positives.

SSE (SuSiE) model

The SSE model has been implemented in susieR. To understand property of susie we designed simulation study with various settings focusing on flavors of susie. We then benchmark it with other fine-mapping methods as a reassurance -- our claim is that susie and all other methods identify the same sets of potentially causal variates because they all fit the same model; but susie does it at low computational cost and naturally results in interpretable confidences sets.

Multivariate analysis

For the multivariate part, we've continued to make improvements to mashr (credit to Matthew) that makes it potentially suitable for the scale of the GTEx project that now has > 50 tissues and possibly multiple eQTL's per gene for ~20K genes.

Project meetings

Here is minutes for project meetings.


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago