EM estimate vs direct optimization issue¶

Here I compare EM with optim in a larger simulated data-set involving 50 conditions. Example below is when EM and optim results are different and optim result seems better.

dat = readRDS('lite_data_1_artificial_mixture_1_mnm_shared_1.rds')

This data-set can be found in mvsusieR/inst/prototypes. Script to reproduce mvsusie analysis is available via cat(dat$DSC_DEBUG$script). Simply copy that code and run the data generation; then run the analysis using EM and optim. I save the X, Y, meta and resid_Y data into em_optim_difference.rds file for ease with loading it for further investigation.

EM result¶

result = mvsusie(X, Y, L=L, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                estimate_prior_method='EM', track_fit=T, precompute_covariances=F)

Loading mvsusieR

result$V

It seems the prior estimate are consistantly large for all effects. This data-set has 3 simulated signals,

which(rowSums(meta$true_coef) != 0)

meta$true_coef[which(rowSums(meta$true_coef) != 0),]

Clearly this will not fit a model of "all shared" effects.

Using "all shared" prior we have captured 4 effects with non-zero PIP, 2 of them are real.

susieR::susie_plot(result, y='PIP',b=rowSums(meta$true_coef))

result$sets

From single effect CS result, L4 contains L2 and L5 contains L1. As a result both L4 and L5 appear to contain a true signals. Because of overlap between L1 and L5, the 3 variables in L1 (orange) appears to have marginal PIP 0.6 rather than 0.3. Both L4 and L5 have high purity.

result$alpha[,c(153,156,162)]

result$pip[c(153,156,162)]

Looking into the effect size,

result$coef[130,]

meta$true_coef[129,]

and 2nd moment of the estimated effect size,

tcrossprod(result$coef[130,])[1,1]

sum(diag(tcrossprod(result$coef[130,])))/50

So the estimated V should be about this scale for the first effect.

result$coef[result$sets$cs$L4+1,]

`optim` result¶

It takes long time to compute, but the PIP is cleaner. The estimated prior scalar is also smaller.

result2 = mvsusie(X, Y, L=L, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                estimate_prior_method='optim', track_fit=T, precompute_covariances=F)

result2$V

result2$elbo[length(result2$elbo)]

susieR::susie_plot(result2, y='PIP', b=rowSums(meta$true_coef))

result2$sets

result2$alpha[,c(153,156,162)]

It seems to have the same issue that the first and the 3rd effects are overlapping. But because they are identical CS, we only kept one of them.

susieR::susie_get_cs(result2, dedup=F)

However such duplicate is not adjusted for when computing PIP. This is why PIP for those 3 variables still sum to more than 1.

result2$pip[c(153,156,162)]

result2$coef[130,]

The estimated V is expected to roughly have the scale of

sum(diag(tcrossprod(result2$coef[130,])))/50

But the actual estimated V is a lot smaller.

Update: the conclusion here is wrong because coef is after rescaling the data to original scale ... for comparsion before rescale ,see this notebook.

Analysis using naive canonical mixture prior¶

The example is a bit pathological because we knowingly use a wrong model for it. What if we use a more flexible MASH prior for it? Here I use canonical mixture with uniform weights,

prior = list(xUlist = mvsusieR:::create_cov_canonical(ncol(Y)))
m_init = create_mash_prior(mixture_prior = list(matrices=prior$xUlist, weights=prior$pi), null_weight=prior$null_weight, max_mixture_len=-1)
result3 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y, 
                 compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                 estimate_prior_method='EM', track_fit=T, precompute_covariances=T)

result3$V

susieR::susie_plot(result3, y='PIP',b=rowSums(meta$true_coef))

result3$sets

The result is a bit better. We have now captured all 3 variables -- the new variable in L4 is a singleton effect.

max(result3$coef[44,])

min(result3$coef[44,])

But there is still overlap between L6 and L1, even though L6 indeed does contain a signal. And estimated prior scalar for the 6th variable is again the magic number 0.13. Notice the purity for L6 is quite high as we would also conclude from the plot.

What if we use oracle mixture?¶

The data is simulated using artificial_mixture. Here I'll analyze it with the same mixture prior structure and weight, but using EM to estimate the scale.

names(meta$prior$oracle)

m_init = create_mash_prior(mixture_prior = list(matrices=meta$prior$oracle$xUlist, weights=meta$prior$oracle$pi), null_weight=meta$prior$oracle$null_weight, max_mixture_len=-1)
result4 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y, 
                 compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                 estimate_prior_method='EM', track_fit=T, precompute_covariances=T)

result4$V

susieR::susie_plot(result4, y='PIP',b=rowSums(meta$true_coef))

result4$sets

Unfortunately, oracle prior does not help here.

Updated results: more EM iterations¶

As I looked into this example further, I noticed previous runs with EM failed to converge after maximum iterations. I therefore increase the number of iterations to 1000:

result5 = mvsusie(X, Y, L=L, prior_variance=mvsusieR::MashInitializer$new(list(matrix(1,50,50)),1), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                estimate_prior_method='EM', track_fit=T, precompute_covariances=T, max_iter = 1000)

Loading mvsusieR

result5$V

The estimated prior estimate is still different from optim, but ELBO here is -8847 and is higher than -8854 from optim.

result5$elbo[length(result5$elbo)]

susieR::susie_plot(result5, y='PIP',b=rowSums(meta$true_coef))

result5$niter

susieR::susie_get_cs(result5, dedup=F)

There is no longer overlapped CS in this analysis.

Now I try to use a naive mixture prior with more iterations. The convergence is faster, the estimated V is closer to the scale of the effect of variable, and ELBO is also better. All three signals have been captured.

prior = list(xUlist = mvsusieR:::create_cov_canonical(ncol(Y)))
m_init = create_mash_prior(mixture_prior = list(matrices=prior$xUlist, weights=prior$pi), null_weight=prior$null_weight, max_mixture_len=-1)
result6 = mvsusie(X, Y, L=L, prior_variance=m_init, residual_variance=resid_Y, 
                 compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                 estimate_prior_method='EM', track_fit=T, precompute_covariances=T, max_iter=1500)

result6$niter

result6$V

susieR::susie_plot(result6, y='PIP',b=rowSums(meta$true_coef))

result6$elbo[length(result6$elbo)]

Assessing impact of initialization on agreement between EM and optim¶

Here we run optim starting from the EM solution and vice versa, to see if that leads to agreement between these two approaches in estimating prior scalar.

First, initalize EM with optim result,

result7 = mvsusie(X, Y, L=L, s_init = result2, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                estimate_prior_method='EM', track_fit=T, precompute_covariances=F)

Loading mvsusieR

result7$niter

Here is EM result using optim initialization,

result7$V

Here is optim result,

result2$V

Now initialize optim with EM result,

result8 = mvsusie(X, Y, L=L, s_init = result5, prior_variance=matrix(1,50,50), residual_variance=resid_Y, compute_objective=T, estimate_residual_variance=F, estimate_prior_variance=T, 
                estimate_prior_method='optim', track_fit=T, precompute_covariances=F)

result8$niter

Here is optim result using EM initialization,

result8$V

Here is EM result,

result5$V

0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.000000	1.05848305	⋯	0.0000000	0.0000000	0.00000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000
-0.4515795	0.2123624	-0.2313993	1.2908204	-0.1391929	0.4586080	1.1908100	0.5195072	-0.444041	0.55288239	⋯	0.8074542	1.1655001	0.55635426	0.4118661	-0.4539847	-0.3545635	0.5953944	-0.7013678	1.0421826	1.2604212
-0.7054777	-0.7788946	-0.6342108	0.7883361	-0.7436080	0.0631215	-0.7432346	-0.1296268	-1.130015	0.05221385	⋯	-0.7423689	0.5043965	0.01963459	-0.4722924	-0.2650975	-0.7530046	-0.9196253	-0.1616042	-0.2687143	-0.8154856

	min.abs.corr	mean.abs.corr	median.abs.corr
	<dbl>	<dbl>	<dbl>
L1	1.0000000	1.0000000	1.0000000
L2	1.0000000	1.0000000	1.0000000
L4	0.9845723	0.9957145	1.0000000
L5	0.9601872	0.9800936	0.9800936

3.333333e-01	3.333333e-01	3.333333e-01
6.349176e-142	6.349176e-142	6.349176e-142
3.333311e-01	3.333311e-01	3.333311e-01
1.788361e-11	1.788361e-11	1.788361e-11
8.156961e-02	8.156961e-02	8.156961e-02
4.694836e-03	4.694836e-03	4.694836e-03
4.694836e-03	4.694836e-03	4.694836e-03
4.694836e-03	4.694836e-03	4.694836e-03
4.694836e-03	4.694836e-03	4.694836e-03
4.694836e-03	4.694836e-03	4.694836e-03

0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	⋯	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810	0.127650810
0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	⋯	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951
0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	⋯	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951
0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	⋯	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951
0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	⋯	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951
0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	⋯	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951	0.006085951

	min.abs.corr	mean.abs.corr	median.abs.corr
	<dbl>	<dbl>	<dbl>
L1	1	1	1
L2	1	1	1