Here I compare EM with optim
in a larger simulated data-set involving 50 conditions. Example below is when EM
and optim
results are different and optim
result seems better.
dat = readRDS('lite_data_1_artificial_mixture_1_mnm_shared_1.rds')
This data-set can be found in mvsusieR/inst/prototypes
. Script to reproduce mvsusie
analysis is available via cat(dat$DSC_DEBUG$script)
. Simply copy that code and run the data generation; then run the analysis using EM
and optim
. I save the X
, Y
, meta
and resid_Y
data into em_optim_difference.rds
file for ease with loading it for further investigation.
result = mvsusie(X, Y, L=L, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=F)
result$V
It seems the prior estimate are consistantly large for all effects. This data-set has 3 simulated signals,
which(rowSums(meta$true_coef) != 0)
meta$true_coef[which(rowSums(meta$true_coef) != 0),]
Clearly this will not fit a model of "all shared" effects.
Using "all shared" prior we have captured 4 effects with non-zero PIP, 2 of them are real.
susieR::susie_plot(result, y='PIP',b=rowSums(meta$true_coef))
result$sets
From single effect CS result, L4
contains L2
and L5
contains L1
. As a result both L4
and L5
appear to contain a true signals. Because of overlap between L1
and L5
, the 3 variables in L1
(orange) appears to have marginal PIP 0.6 rather than 0.3. Both L4
and L5
have high purity.
result$alpha[,c(153,156,162)]
result$pip[c(153,156,162)]
Looking into the effect size,
result$coef[130,]
meta$true_coef[129,]
and 2nd moment of the estimated effect size,
tcrossprod(result$coef[130,])[1,1]
sum(diag(tcrossprod(result$coef[130,])))/50
So the estimated V
should be about this scale for the first effect.
result$coef[result$sets$cs$L4+1,]
optim
result¶It takes long time to compute, but the PIP is cleaner. The estimated prior scalar is also smaller.
result2 = mvsusie(X, Y, L=L, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='optim', track_fit=T, precompute_covariances=F)
result2$V
result2$elbo[length(result2$elbo)]
susieR::susie_plot(result2, y='PIP', b=rowSums(meta$true_coef))
result2$sets
result2$alpha[,c(153,156,162)]
It seems to have the same issue that the first and the 3rd effects are overlapping. But because they are identical CS, we only kept one of them.
susieR::susie_get_cs(result2, dedup=F)
However such duplicate is not adjusted for when computing PIP. This is why PIP for those 3 variables still sum to more than 1.
result2$pip[c(153,156,162)]
result2$coef[130,]
The estimated V
is expected to roughly have the scale of
sum(diag(tcrossprod(result2$coef[130,])))/50
But the actual estimated V
is a lot smaller.
Update: the conclusion here is wrong because coef
is after rescaling the data to original scale ... for comparsion before rescale ,see this notebook.
The example is a bit pathological because we knowingly use a wrong model for it. What if we use a more flexible MASH prior for it? Here I use canonical mixture with uniform weights,
prior = list(xUlist = mvsusieR:::create_cov_canonical(ncol(Y)))
m_init = create_mash_prior(mixture_prior = list(matrices=prior$xUlist, weights=prior$pi), null_weight=prior$null_weight, max_mixture_len=-1)
result3 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y,
compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=T)
result3$V
susieR::susie_plot(result3, y='PIP',b=rowSums(meta$true_coef))
result3$sets
The result is a bit better. We have now captured all 3 variables -- the new variable in L4
is a singleton effect.
max(result3$coef[44,])
min(result3$coef[44,])
But there is still overlap between L6
and L1
, even though L6
indeed does contain a signal. And estimated prior scalar for the 6th variable is again the magic number 0.13. Notice the purity for L6
is quite high as we would also conclude from the plot.
The data is simulated using artificial_mixture
. Here I'll analyze it with the same mixture prior structure and weight, but using EM to estimate the scale.
names(meta$prior$oracle)
m_init = create_mash_prior(mixture_prior = list(matrices=meta$prior$oracle$xUlist, weights=meta$prior$oracle$pi), null_weight=meta$prior$oracle$null_weight, max_mixture_len=-1)
result4 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y,
compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=T)
result4$V
susieR::susie_plot(result4, y='PIP',b=rowSums(meta$true_coef))
result4$sets
Unfortunately, oracle prior does not help here.
As I looked into this example further, I noticed previous runs with EM failed to converge after maximum iterations. I therefore increase the number of iterations to 1000:
result5 = mvsusie(X, Y, L=L, prior_variance=mvsusieR::MashInitializer$new(list(matrix(1,50,50)),1), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=T, max_iter = 1000)
result5$V
The estimated prior estimate is still different from optim
, but ELBO here is -8847 and is higher than -8854 from optim
.
result5$elbo[length(result5$elbo)]
susieR::susie_plot(result5, y='PIP',b=rowSums(meta$true_coef))
result5$niter
susieR::susie_get_cs(result5, dedup=F)
There is no longer overlapped CS in this analysis.
Now I try to use a naive mixture prior with more iterations. The convergence is faster, the estimated V
is closer to the scale of the effect of variable, and ELBO is also better. All three signals have been captured.
prior = list(xUlist = mvsusieR:::create_cov_canonical(ncol(Y)))
m_init = create_mash_prior(mixture_prior = list(matrices=prior$xUlist, weights=prior$pi), null_weight=prior$null_weight, max_mixture_len=-1)
result6 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y,
compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=T, max_iter=1500)
result6$niter
result6$V
susieR::susie_plot(result6, y='PIP',b=rowSums(meta$true_coef))
result6$elbo[length(result6$elbo)]
Here we run optim starting from the EM solution and vice versa, to see if that leads to agreement between these two approaches in estimating prior scalar.
First, initalize EM
with optim
result,
result7 = mvsusie(X, Y, L=L, s_init = result2, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='EM', track_fit=T, precompute_covariances=F)
result7$niter
Here is EM
result using optim
initialization,
result7$V
Here is optim
result,
result2$V
Now initialize optim
with EM
result,
result8 = mvsusie(X, Y, L=L, s_init = result5, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T,
estimate_prior_method='optim', track_fit=T, precompute_covariances=F)
result8$niter
Here is optim
result using EM
initialization,
result8$V
Here is EM
result,
result5$V