Decreasing ELBO issue¶

Possibly caused by comparing and setting prior variances to zero when estimating them with EM.

Context¶

In mvSuSiE model with mixture prior, to estimate multivariate prior $b \sim \sum_k \pi_k MVN(0, \sigma_0^2U_k)$ we fix $U_k$ and $\pi_k$ and only estimate the scalar $\sigma_0^2$.

Recent updates to mvsusieR package implemented EM updates for $\sigma_0^2$ in both MASH regression and Bayesian multivariate regression (hereafter BMR, essentially MASH with just one component but we have separate, simpler code for it written in R for prototyping purpose). I verified with some tests that for IBSS using MASH mixture with only one matrix the result agree with IBSS using BMR. And BMR, when reduced to just one response, agrees with susieR::susie result. The EM estimate for prior variance scalar are identical and ELBO are non-decreasing.

Problem¶

However, as I moved on to more simulations I get warnings for decreasing ELBO with the EM estimates. It seems to happen only when there are many effect variables relative to conditions. For example my toy example below has 2 conditions each having 2 effect variables out of 50 variables. The problem is that EM method, as it is currently implemented, estimates $\sigma_0^2$ then at the end of an iteration, compares the update with zero and accept whichever has better likelihood for that single effect model. This resulted in decreasing ELBO.

So far we haven't observed the behavior in SuSiE -- no matter how many effect variables are there, the ELBO is non-decreasing using EM updates. I think both Yuxin and Kaiqian has done some benchmark with SuSiE's EM for prior but don't see an issue.

Possible cause¶

If I switch off the comparison between current estimate and zero at the end of each EM update (comment out this line), then in this example the algorithm takes longer to converge but ELBO is non-decreasing, and is eventually larger than any other approaches (eg, simple and optim methods). The estimates make more sense compared to the "truth" we know for this simulated data.

I think since EM is not directly maximizing that single effect loglik, we should not compare it with zero at each iteration? Otherwise comparing it and setting $\sigma_0^2 = 0$ will provide a very bad initialization for upcoming iterations.

To see the practical impact: a major difference between EM update and maximizing the single effect loglik directly is that the former is a function of posterior quantities whereas the latter only involves the data, with the prior to be estimated (here for instance). As a result, in EM updates if in a previous iteration sets the estimate to zero then it will impact the posterior of the next iteration and the estimate will never come back from zero. This is not the case for when prior is directly estimated as in optim.

Illustration¶

Here I use BMR not MASH regression to illustrate. To reproduce you need to run on R 3.6.1+.

devtools::load_all('~/GIT/software/mvsusieR')
set.seed(1)
dat = mvsusie_sim1(n=50,p=50,r=2,s=2)
L = 10

Loading mvsusieR

Loading required package: mashr

Loading required package: ashr

There are 2 conditions, each having 2 effect variables.

sum(rowSums(dat$b)>0)

Fit without estimating prior¶

We use fixed prior here provided by simulated data object; everything looks fine:

dat$V

fit0 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=F)

fit0$elbo

fit0$V

susieR::susie_plot(fit0, 'PIP', b = rowSums(dat$b))

`simple` method¶

Using simple method, only 1 out of 4 effects was captured.

fit1 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method='simple',
              track_fit=TRUE)

fit1$elbo

fit1$V

Looking at prior estimates in each iteration,

fit1$prior_history

`EM` method¶

Now we can see decreasing ELBO: from -183 to -185, compared to previously the ELBO was -182 at convergence,

fit2 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method = 'EM',
              track_fit=TRUE)

Warning message in m$get_objective(dump = TRUE):
“Objective is not non-decreasing”

fit2$elbo

fit2$V

And no effect was captured! It seems in the first iteration both prior are estimated non-zero. But the 2nd iteration they both became zero because zero maximizes the loglik in that iteration. However apparently in the next iteration setting prior to zero decreased the ELBO:

fit2$prior_history

However, when I set L = 5 it can capture 1 effect,

fit21 = mvsusie(dat$X,dat$y,L=4,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method = 'EM',
              track_fit=T)

Warning message in m$get_objective(dump = TRUE):
“Objective is not non-decreasing”

fit21$elbo

fit21$V

fit21$prior_history

susieR::susie_plot(fit21, 'PIP', b = rowSums(dat$b))

sum(fit21$pip)

`optim` method¶

In optim method we still have that check at the end of each update comparing current estimate with zero in terms of likelihood. Thus many V are set to zero. However, two effects were captured this time,

fit3 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method='optim',
              track_fit=T)

fit3$elbo

fit3$V

It seems unlike with the EM update, the estimates never hit zero:

fit3$prior_history

susieR::susie_plot(fit3, 'PIP', b = rowSums(dat$b))

`EM` update without checking with zero¶

I commented out this line) in my code, and run EM with L=10:

devtools::load_all('~/GIT/software/mvsusieR')

Loading mvsusieR

fit4 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method='EM', max_iter = 200,
              track_fit=T)

fit4$elbo

fit4$niter

It took 105 iterations to converge. The ELBO -181 is comparable to that obtained by optim.

The prior estimates are:

fit4$V

fit4$prior_history

Here it captures 2 effects, as also reflected below that two effects have large PIP,

susieR::susie_plot(fit4, 'PIP', b = rowSums(dat$b))

Final solution¶

Following from Matthew's suggestion, we check with zero and set all posteriors to zero, only after 10 iterations. This is the current default implementation of EM method,

fit5 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V, 
              compute_objective=T, estimate_residual_variance=F, 
              estimate_prior_variance=T, estimate_prior_method='EM',
              track_fit=T)

The convergence is now faster,

fit5$niter

fit5$elbo

This now achieves the same ELBO as optim method.

fit5$V

Here, two effects are captured, with very similar result compared to optim; but they appear in effects 1 and 9!

susieR::susie_plot(fit5, 'PIP', b = rowSums(dat$b))

0.46673960	0.05659291
0.05659291	0.50840182