In addition to MTHESS when we first conceived the project in 2016, two other papers from the same group of authors have been published with software implementation locus (paper) and building on top of it, an efficient approach called atlasqtl (paper). atlasqtl
approach is designed specifically for detecting pleiotropic patterns (which the authors refer to as "hotspots"). Here we challenge ourselves with a simulated example from altlasqtl
documentation.
atlasqtl
has no vignettes but has great documentation for function call atlasqtl()
, the main function, with an example using simulated data.atlasqtl
is very fast for the scale of the problem with 50 responses; mvsusieR
with naive MASH mixture in this case is a lot slower but still acceptable.mvsusieR
in this experiment uses naive MASH mixture that has uniform weight on >700 components. It can be made a lot faster (reducing number of components) and better (improving weights) when actual MASH approach is used.Apart from computing time, on this particular example mvsusieR
should still win: the PIP it reports is cleaner; and additioally it has the potential to give CS which is a feature from SuSiE model. Also we now support summary statistics which is useful for application in large GWAS studies.
I copy the codes in "example" section atlasqtl
documentation to simulate N=500 samples, R=50 traits, P=2000 variables and L=10 causal variables (in their setting the setup was N=200, R=100, P=50 and L=10, but only 50 out of 100 traits have a genetic association).
seed = 1; set.seed(seed)
n <- 500; q = 50; p = 2000; p_act <- 10; q_act <- 50
X_act <- matrix(rbinom(n * p_act, size = 2, p = 0.25), nrow = n)
X_inact <- matrix(rbinom(n * (p - p_act), size = 2, p = 0.25), nrow = n)
shuff_x_ind <- sample(p)
shuff_y_ind <- sample(q)
X <- cbind(X_act, X_inact)[, shuff_x_ind]
dim(X)
pat <- matrix(FALSE, ncol = q, nrow = p)
bool_x <- shuff_x_ind <= p_act
bool_y <- shuff_y_ind <= q_act
pat_act <- beta_act <- matrix(0, nrow = p_act, ncol = q_act)
pat_act[sample(p_act * q_act, floor(p_act * q_act / 5))] <- 1
beta_act[as.logical(pat_act)] <- rnorm(sum(pat_act))
pat[bool_x, bool_y] <- pat_act
true_signal = which(rowSums(pat)>0)
true_signal
dim(pat)
Y_act <- matrix(rnorm(n * q_act, mean = X_act %*% beta_act), nrow = n)
Y_inact <- matrix(rnorm(n * (q - q_act)), nrow = n)
Y <- cbind(Y_act, Y_inact)[, shuff_y_ind]
dim(Y)
atlasqtl
¶Here I use all default atlasqtl
setting including the assumption on number of effects to capture -- mean is 2, variance is 10.
# Number of effects to look for: mean is 2, variance is 10
p0 <- c(mean(colSums(pat)), 10)
p0
start_time <- Sys.time()
res_atlas <- atlasqtl::atlasqtl(Y = Y, X = X, p0 = p0, user_seed = seed)
end_time <- Sys.time()
end_time - start_time
It takes 13 seconds for the job! Memory usage for this computation is minimal (about 0.3GB).
mvsusieR
¶Prior effect size in mvsusieR
is a required input. Here I use a naive MASH mixture prior -- using canonical covariances with scaled learned from data, and uniform weights,
X = X * 1.0
mash_prior = mvsusieR::create_mash_prior(sample_data = list(X=X,Y=Y, residual_variance=var(Y)), max_mixture_len=-1)
Now I fit M&M model. To be fair I allow for looking for up to 15 effects,
res_mvsusieR = mvsusieR::mvsusie(X,Y,L=15,prior_variance=mash_prior, precompute_covariances=TRUE)
res_mvsusieR$walltime
It takes 1061 seconds to complete it. Memory usage for this case with precompute_covariances
is about 1.5GB.
Again, the true signals are:
true_signal
Cross-condition PIP for atlasqtl
are:
atlpip = as.vector(1 - apply(1 - res_atlas$gam_vb, 1, prod))
Let's plot it with M&M's PIP,
plot(atlpip, res_mvsusieR$pip)
Let's see how well both methods recover the true signals at PIP cutoff 0.9,
which(atlpip>0.9)
which(res_mvsusieR$pip>0.9)
It seems they both do good job in recovering all the simulated signals. But M&M's PIP are "cleaner". I have tried several different seeds and see this pattern consistently. I would expect in this setting we should do better in the precision recall curve.
Caveat: is my computation of atlasqtl's cross-condition PIP correct? Here no filter is applied and different conditions are assumed independent ...
Previously the naive MASH mixture contains many priors -- 771 of them,
length(mash_prior$prior_variance$xUlist)
Now I'll focus on only using the non-singleton priors, reducing it to 71:
mash_prior_shared = mvsusieR::create_mash_prior(sample_data = list(X=X,Y=Y, residual_variance=var(Y)), max_mixture_len=-1, singletons=F)
length(mash_prior_shared$prior_variance$xUlist)
Let's time the run with this prior,
res_mvsusie_shared = mvsusieR::mvsusie(X,Y,L=15,prior_variance=mash_prior_shared, precompute_covariances=TRUE)
res_mvsusie_shared$walltime
It is now 175 seconds.