Possibly caused by comparing and setting prior variances to zero when estimating them with EM.
In mvSuSiE model with mixture prior, to estimate multivariate prior $b \sim \sum_k \pi_k MVN(0, \sigma_0^2U_k)$ we fix $U_k$ and $\pi_k$ and only estimate the scalar $\sigma_0^2$.
Recent updates to mvsusieR
package implemented EM updates for $\sigma_0^2$ in both MASH regression and Bayesian multivariate regression (hereafter BMR, essentially MASH with just one component but we have separate, simpler code for it written in R for prototyping purpose). I verified with some tests that for IBSS using MASH mixture with only one matrix the result agree with IBSS using BMR. And BMR, when reduced to just one response, agrees with susieR::susie
result. The EM estimate for prior variance scalar are identical and ELBO are non-decreasing.
However, as I moved on to more simulations I get warnings for decreasing ELBO with the EM estimates. It seems to happen only when there are many effect variables relative to conditions. For example my toy example below has 2 conditions each having 2 effect variables out of 50 variables. The problem is that EM method, as it is currently implemented, estimates $\sigma_0^2$ then at the end of an iteration, compares the update with zero and accept whichever has better likelihood for that single effect model. This resulted in decreasing ELBO.
So far we haven't observed the behavior in SuSiE -- no matter how many effect variables are there, the ELBO is non-decreasing using EM updates. I think both Yuxin and Kaiqian has done some benchmark with SuSiE's EM for prior but don't see an issue.
If I switch off the comparison between current estimate and zero at the end of each EM update (comment out this line), then in this example the algorithm takes longer to converge but ELBO is non-decreasing, and is eventually larger than any other approaches (eg, simple
and optim
methods). The estimates make more sense compared to the "truth" we know for this simulated data.
I think since EM is not directly maximizing that single effect loglik, we should not compare it with zero at each iteration? Otherwise comparing it and setting $\sigma_0^2 = 0$ will provide a very bad initialization for upcoming iterations.
To see the practical impact: a major difference between EM update and maximizing the single effect loglik directly is that the former is a function of posterior quantities whereas the latter only involves the data, with the prior to be estimated (here for instance). As a result, in EM updates if in a previous iteration sets the estimate to zero then it will impact the posterior of the next iteration and the estimate will never come back from zero. This is not the case for when prior is directly estimated as in optim
.
Here I use BMR not MASH regression to illustrate. To reproduce you need to run on R 3.6.1+.
devtools::load_all('~/GIT/software/mvsusieR')
set.seed(1)
dat = mvsusie_sim1(n=50,p=50,r=2,s=2)
L = 10
There are 2 conditions, each having 2 effect variables.
sum(rowSums(dat$b)>0)
We use fixed prior here provided by simulated data object; everything looks fine:
dat$V
fit0 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=F)
fit0$elbo
fit0$V
susieR::susie_plot(fit0, 'PIP', b = rowSums(dat$b))
simple
method¶Using simple
method, only 1 out of 4 effects was captured.
fit1 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method='simple',
track_fit=TRUE)
fit1$elbo
fit1$V
Looking at prior estimates in each iteration,
fit1$prior_history
EM
method¶Now we can see decreasing ELBO: from -183 to -185, compared to previously the ELBO was -182 at convergence,
fit2 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method = 'EM',
track_fit=TRUE)
fit2$elbo
fit2$V
And no effect was captured! It seems in the first iteration both prior are estimated non-zero. But the 2nd iteration they both became zero because zero maximizes the loglik in that iteration. However apparently in the next iteration setting prior to zero decreased the ELBO:
fit2$prior_history
However, when I set L = 5
it can capture 1 effect,
fit21 = mvsusie(dat$X,dat$y,L=4,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method = 'EM',
track_fit=T)
fit21$elbo
fit21$V
fit21$prior_history
susieR::susie_plot(fit21, 'PIP', b = rowSums(dat$b))
sum(fit21$pip)
optim
method¶In optim
method we still have that check at the end of each update comparing current estimate with zero in terms of likelihood. Thus many V
are set to zero. However, two effects were captured this time,
fit3 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method='optim',
track_fit=T)
fit3$elbo
fit3$V
It seems unlike with the EM update, the estimates never hit zero:
fit3$prior_history
susieR::susie_plot(fit3, 'PIP', b = rowSums(dat$b))
devtools::load_all('~/GIT/software/mvsusieR')
fit4 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method='EM', max_iter = 200,
track_fit=T)
fit4$elbo
fit4$niter
It took 105 iterations to converge. The ELBO -181 is comparable to that obtained by optim
.
The prior estimates are:
fit4$V
fit4$prior_history
Here it captures 2 effects, as also reflected below that two effects have large PIP,
susieR::susie_plot(fit4, 'PIP', b = rowSums(dat$b))
Following from Matthew's suggestion, we check with zero and set all posteriors to zero, only after 10 iterations. This is the current default implementation of EM
method,
fit5 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,
compute_objective=T, estimate_residual_variance=F,
estimate_prior_variance=T, estimate_prior_method='EM',
track_fit=T)
The convergence is now faster,
fit5$niter
fit5$elbo
This now achieves the same ELBO as optim
method.
fit5$V
Here, two effects are captured, with very similar result compared to optim
; but they appear in effects 1 and 9!
susieR::susie_plot(fit5, 'PIP', b = rowSums(dat$b))