Here we summarize a whiteboard discussion lead by @pcarbo along with @NKweiwang and @gaow.
The discussion focused mostly in the context of eQTL mapping amount tissues though potentially m&m ash
is a more generic method.
Our goal (hypothesis) in this context is to find new patterns of sharing of effects and increase eQTL detection power via analyzing multiple SNPs jointly.
In particular we report "counts" compared to single SNP methods, i.e., how many more / less eQTL do we report.
Additionally we check if this approach gives us more accurate view of sharing.
@pcarbo suggests we start it simple by considering $J = 2$ (two tissues) and $P = 2$ (two SNPs) problem.
This aims to create a toy example where we can evaluate via simulation or in real data the difference between single-SNP and multi-SNP approach. We will contrast the difference between analyze the 2 SNPs separately vs. analyzing them jointly. This can be done to GTEx data with straightforward linear regression analysis. @gaow is going to invest it soon.
This aims to simulate / solve a situation simple enough that we can leverage to fully investigate properties of the multi-SNP approach in multiple tissues.
Currently we are having computational issues with $J > 2$, that the residual variance for response is a $J \times J$ matrix and there can be too many parameters to estimate.
@pcarbo points out that if we start with $J = 2$ and instead of using ash
we can simply enumerate the model underlying the "ground truth" (giving us a "2D spike-slab" mixture)
and we can possibly infer all parameters involved via variational EM.
In this setting the residual covariance matrix will have only 3 parameters to estimate at each iteration.
This simple model (with $J = 2$ and $P > 2$) and parameters to infer is outlined as follows:
Solving this model will not only give us estimate of effects (as ash
model does), but also give estimates of weights on mixture components that, unlike ash
weights, has clear interpretation.
This model can possibly be solved via:
We may need to be careful about parameterization of this model. For example we may want to re-parameterize the mixture components as follows:
[to be edited]
ash
model (as currently implemented in m&m ash
) to perform inference.
For this simple case we can solve $\Sigma$ the residual variance updates at each iteration.
We can also use this to evaluate diagonal $\Sigma$ approximation.
It will also provide ground truth of effect size to compare with m&m ash
estimates.
My concern with formulating and solving the model as described is that it might still be difficult and computationally intensive,
and even if we workout $J = 2$ case it is hard to justify that at $J > 2$ case we can safely switch to using ash
approach instead and all our investigation at $J = 2$ will remain held.m&m ash
may well be as good as solving this model,
although m&m ash
does not provide mixture proportion estimates.m&m ash
model that makes additional assumptions to deal with computational limitations of the $J = 2$ approach.