Here with susieR
version 0.4 I examine an example identified from simulations where SuSiE seems to have made mistakes. The data-set can be downloaded here.
%cd ../dsc/susie_comparison
null_weight
did help elimating the false discovery in this case example, although on average it did not seem to help much for CS coverage.
dat = readRDS('lm_less/liter_data_4_summarize_ld_1_lm_less_3.rds')$data
r=1
fitted = susieR::susie(dat$X, dat$Y[,r], L=5,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.1, compute_univariate_zscore = TRUE,
tol=1e-3, track_fit=TRUE)
b = dat$true_coef[,r]
b[which(b!=0)] = 1
susieR::susie_plot(fitted, y='z', b=b, main = paste('z-scores'))
Obviously one CS in cyan it identifies is problematic.
susieR::susie_plot(fitted, y='PIP', b=b, main = paste('z-scores'))
Let's look at the fit summary:
summary(fitted)$cs
Notice immediately that the 2nd single effect was dropped by the algorithm. It warrants to track the fit:
susieR::susie_plot_iteration(fitted, 5, 'test_track_fit')
%preview test_track_fit.gif
The BF is quite large. The z-score for variable 217 is quite small, though:
fitted$z[217]
The true effect sizes are:
cbind(dat$true_coef[,r][which(b==1)], which(b==1))
Where the first one is perhaps too small to pick up anyways. The observed z-scores for true causal variables are:
cbind(fitted$z[which(b==1)], which(b==1))
The 2nd signal, which has smallest z-score, is in fact quite close to the variable 217 SuSiE has picked up. Their correlation is:
cor(dat$X[,217], dat$X[,229])
They are highly correlated as expected. It is just a bit puzzling why SuSiE would rather pick 217 which has observed z-score of -0.05.
f2 = susieR::susie(dat$X, dat$Y[,r], L=5,
estimate_residual_variance=TRUE,
estimate_prior_variance=TRUE,
compute_univariate_zscore = TRUE,
tol=1e-3, track_fit=TRUE)
susieR::susie_plot(f2, y='PIP', b=b, main = paste('z-scores'))
Now SuSiE identifies two causal signals, but still not the first one. Let's see their PIP:
cbind(fitted$pip[which(b==1)], which(b==1))
PIP for the first signal is very small. But estimated prior variance are large for the 2 effects,
f2$V / var(dat$Y[,r])
summary(f2)$cs
f22 = susieR::susie(dat$X, dat$Y[,r], L=5,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.5,
compute_univariate_zscore = TRUE,
tol=1e-3, track_fit=TRUE)
summary(f22)$cs
It still makes a decision on including variable 217.
As pointed out by Matthew, in case L=5 susie is actually mis-using the “non-significant effects’ (2,4,5) to induce a correlation at variable 217. Here I try with L=3 for example,
summary(susieR::susie(dat$X, dat$Y[,r], L=3,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.1,
compute_univariate_zscore = TRUE,
tol=1e-3))$cs
Here, 229 is included, though in a low purity set. For L=4,
summary(susieR::susie(dat$X, dat$Y[,r], L=4,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.1,
compute_univariate_zscore = TRUE,
tol=1e-3))$cs
Again we are missing 229, but got 227 which is very close. This is a convergence issue to be revisited.
First, I use a small null weight (just to test the water),
f3 = susieR::susie(dat$X, dat$Y[,r], L=5,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.1,
null_weight = 0.5,
compute_univariate_zscore = TRUE,
tol=1e-3, track_fit=TRUE)
summary(f3)$cs
Interestingly it now picks another irrelevant CS. Although it has a small BF, compared to the BF with null variable it is still a lot larger. That is, a small penalty does not help but made it worse. This in fact highlights the originally missed 2nd effect. The convergence happened rather quickly. It seems there is a potential to make the 2nd CS a lot larger (to contain variables around position 250) thus removing it for good.
susieR::susie_plot_iteration(f3, 5, 'test_track_fit_3')
%preview test_track_fit_3.gif
The BF for set involving the null variable is small compared to others.
f3$lbf
Now I increase the null penalty,
f4 = susieR::susie(dat$X, dat$Y[,r], L=5,
estimate_residual_variance=TRUE,
scaled_prior_variance=0.1,
null_weight = 0.9,
compute_univariate_zscore = TRUE,
tol=1e-3, track_fit=TRUE)
summary(f4)$cs
susieR::susie_plot_iteration(f4, 5, 'test_track_fit_4')
%preview test_track_fit_4.gif
It seems the problematic variable is removed. I now try to trick summary
to display more information just to checkout how it helped,
f4$null_index=0
f4$sets = susieR::susie_get_CS(f4, cbind(dat$X,0), min_abs_corr=0)
summary(f4)$cs
f4$lbf
Notice that with the use of null variable, the previously "wrong" variable 217 now ends up in a CS with purity of 0.08, thus not selected. The null variable itself forms a separate CS, though, with a small BF.