Comparing MASH analysis with simple multivariate analysis¶

Previously we showed that even though univariate analysis with degenerated MASH model gives identical results to SuSiE as expected (with non-decreasing ELBO), in multivariate calculations the ELBO is not always non-decreasing. To investigate the issue we will 1) further simplify the problem and 2) isolate the problem to posterior calculation versus ELBO calculations and check which part is problematic. The best way to achieve both is to implement a simple Bayesian multivariate regression model with prior $b \sim MVN(0, U)$ where $U$ is known, instead of using MASH prior for $b$.

This feature is now implemented in the BayesianMultivariateRegression class with an interface added to the main function such that the code will be triggered when prior variance is a matrix.

With this feature (and with Yuxin's sharp eyes!!) we were able to identify an issue caused by inconsistent interface between mvsusieR::susie and mvsusieR::MashInitializer in handling residual variances. After patching the issue (interface fix still needs to be finalized) we are able to get consistent result between simple Bayesian multivariate regression and MASH; and MASH ELBO now increases.

Test the calculation agree with univariate code¶

library(mvsusieR)
set.seed(2)
L = 5

Loading required package: mashr
Loading required package: ashr

dat = mvsusie_sim1(r=1)

Run the simulated univariate data with SuSiE,

res = susieR::susie(dat$X,dat$y,L=L,scaled_prior_variance=0.2,estimate_residual_variance=F,estimate_prior_variance=F)
res$elbo

Now run it with multivariate simple prior implementation in mvsusieR,

res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=0.2*var(dat$y),compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
dim(res$b2)

NULL

So it is confirmed that this implementation produces identical results as SuSiE runs.

Test multivariate analysis¶

set.seed(2)
dat = mvsusie_sim1(r=1)

devtools::load_all('~/GIT/software/mvsusieR')

Loading mvsusieR

res = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)

[1]   1 500

Error in private$.pip[j] * matrix(private$.posterior_b2[j, ]) - tcrossprod(private$.pip[j] * : non-conformable arrays
Traceback:

1. mvsusie(dat$X, dat$y, L = L, prior_variance = dat$V, compute_objective = T, 
 .     estimate_residual_variance = F, estimate_prior_variance = F)
2. mvsusie_core(data, s_init, L, residual_variance, prior_variance, 
 .     prior_weights, estimate_residual_variance, estimate_prior_variance, 
 .     estimate_prior_method, precompute_covariances, compute_objective, 
 .     max_iter, tol, track_fit, verbose)   # at line 88-90 of file /home/gaow/GIT/software/mvsusieR/R/mvsusie.R
3. SuSiE_model$fit(data, prior_weights, estimate_prior_method, verbose)   # at line 244 of file /home/gaow/GIT/software/mvsusieR/R/mvsusie.R
4. private$SER[[l]]$compute_kl(d)   # at line 53 of file /home/gaow/GIT/software/mvsusieR/R/ibss_algorithm.R
5. private$compute_expected_loglik_partial(d)   # at line 25 of file /home/gaow/GIT/software/mvsusieR/R/single_effect_model.R
6. private$compute_expected_loglik_partial_multivariate(d)   # at line 38 of file /home/gaow/GIT/software/mvsusieR/R/single_effect_model.R
7. lapply(1:nrow(private$.posterior_b1), function(j) private$.pip[j] * 
 .     matrix(private$.posterior_b2[j, ]) - tcrossprod(private$.pip[j] * 
 .     private$.posterior_b1[j, ]))   # at line 54 of file /home/gaow/GIT/software/mvsusieR/R/single_effect_model.R
8. FUN(X[[i]], ...)

Here the ELBO is non-decreasing, as expected.

Compare with MASH based model¶

m_init = mvsusieR:::MashInitializer$new(list(dat$V), 1, prior_weight=1, null_weight=0)

res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
dim(res$b2)

[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500
[1] 500   2
[1]   2   2 500

The result agree with above. Now we use a different prior choice -- a diagonal prior covariance. We analyze with simple Bayesian multivariate regression,

res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=0.2*diag(ncol(dat$y)),compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
res$elbo

and with MASH based regression,

m_init = mvsusieR:::MashInitializer$new(list(0.2*diag(ncol(dat$y))), 1, prior_weight=1, null_weight=0,alpha=0)

res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
res$elbo

So we are comfortable at this point that ELBO for multivariate analysis is done correctly.