Multivariate Bayesian variable selection via Adaptive Shrinkage, with applications to eQTL fine-mapping
This project is mainly aimed at fine-mapping for eQTL discovery in multiple tissues. From methodology point of view it introduces an Empirical Bayes approach, ie, adaptive shrinkage for solving multivariate, multiple regression problem; hence the project code name m&m ash
.
We started building the m&m model in summer 2016. It is essentially a multivariate version of the variational algorithm that does univariate regression with adaptive shrinkage method (credit to Wei and Peter). We have developed a version assuming identity covariance and a version assuming diagonal + low rank covariance (credit to David), along with an implementation. We then realized several limitations to it, from modeling and computational espects, that made us reconsider the design after some attempts to alleviate these issues (limiting to a simpler problem and combine genome-wide information efficiently).
In early 2017 we decide to take a modular approach to the problem. Essentially it involves combing in a smart way the univariate mr-ash
(multiple regression with adaptive shrinkage, under development) for model selection and mash
for multivariate analysis. With the release of GTEx V7 data (imputation, annotation + formatting and a number of other bioinformatics steps) our new method can be applied and demonstrated.
During our exploration of the modular approach we realized that the variational algorithm behind mr-ash
has various issues, among which the most obvious are sensitivity to initialization and producing produces false positives. It is a large scale variable selection model good for prediction, but not fine-mapping. Again we got stuck.
Finally in late 2017 Matthew has came up with a new variational algorithm, inspired by FLASH, that natually result in fine-mapping results by design, with similar intuition as "conditional regression" (a simple ad hoc approach to do preliminary fine-mapping in the early days). A connection with single effect analysis was later made: turns out updating each effect, conditional on others, is a normal-means problem we are familiar with -- in the M&M setting the MASH
updates. We thus call the fine-mapping model SuSiE, for Sum of Single Effects.
The first module is to perform multiple regression with adaptive shrinkage for variable selection and fine-mapping
mr-ash
model¶This is being developed and implemented in the varbvs
package. We have created simulation data based on GTEx. Here are some results:
mr-ash
tend to over-shrink signals.SSE
(SuSiE) model¶The SSE model has been implemented in susieR
. To understand property of susie we designed simulation study with various settings focusing on flavors of susie. We then benchmark it with other fine-mapping methods as a reassurance -- our claim is that susie and all other methods identify the same sets of potentially causal variates because they all fit the same model; but susie does it at low computational cost and naturally results in interpretable confidences sets.
For the multivariate part, we've continued to make improvements to mashr
(credit to Matthew) that makes it potentially suitable for the scale of the GTEx project that now has > 50 tissues and possibly multiple eQTL's per gene for ~20K genes.