Previously I've found an example that works well with EZ model, roughly works with EE model for a slighly modified simulation data-set but does not work with EE model on this data-set. Here I am trying to explore the problem a bit more, without implementing and checking for ELBO yet.
Note: my previous code has an error in the line e = MASS::mvrnorm(1, rep(0,R), diag(R) * 1)
that only generated 6 numbers, and were added cyclicly to g
to make y
. That breaks things. After I fix the line to using nrow(g)
I no longer can reproduce the problem as drastic as it was shown in the notebooks above.
setwd("/home/gaow/tmp/13-May-2019")
Copying codes from previous notebooks.
genotype = readRDS('Multi_Tissues.ENSG00000145214.RDS')$X
P = ncol(genotype)
R = 6
eff_factor = 1.5
snp1 = 184
snp2 = 354
set.seed(1)
b = matrix(0, P, R)
coef_value = b[which(apply(b,1,sum)!=0),]
sharing = matrix(0.75, (R/2), (R/2))
diag(sharing) = 1
b[snp1, 1:(R/2)] = abs(MASS::mvrnorm(1, rep(0,(R/2)), sharing)) / eff_factor
b[snp2, (R/2+1):R] = -abs(MASS::mvrnorm(1, rep(0,(R/2)), sharing)) / eff_factor
g = genotype %*% b
e = MASS::mvrnorm(nrow(g), rep(0,R), diag(R) * 1)
Y = g + e
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
U1 = matrix(0,R,R)
U1[1:(R/2),1:(R/2)] = sharing
U2 = matrix(0,R,R)
U2[(R/2+1):R,(R/2+1):R] = sharing
Ulist = list(U1=U1, U2=U2)
scaling = c(0.5,1) / eff_factor
mash_init = mvsusieR:::MashInitializer$new(Ulist, scaling, alpha = 0)
start_time <- Sys.time()
res = mvsusieR::susie(genotype, e,
L=10,V=mash_init,
precompute_covariances=FALSE,
compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
devtools::reload(pkgload::inst("mvsusieR"))
start_time <- Sys.time()
res = mvsusieR::susie(genotype, e,
L=1, max_iter=1, V=mash_init,
precompute_covariances=FALSE,
compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
So there indeed is something with the residual. The EE model with full IBSS picks it up. Now it is a question of whether the model with the two causal SNP has better ELBO than the model with this in the residual. I will not know it unless I have the ELBO -- so I'm working on that next.
start_time <- Sys.time()
res2 = mvsusieR::susie(genotype, Y,
L=10,V=mash_init,
precompute_covariances=FALSE,
compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
susieR::susie_plot(res2,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
res2$pip[snp1]
res2$pip[snp2]
The signals are captured, but there seems to be a false signal at position 6000+.
mash_init = mvsusieR:::MashInitializer$new(Ulist, scaling, alpha = 1)
start_time <- Sys.time()
res = mvsusieR::susie(genotype, e,
L=10,V=mash_init,
precompute_covariances=FALSE,
compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
start_time <- Sys.time()
res2 = mvsusieR::susie(genotype, Y,
L=10,V=mash_init,
precompute_covariances=FALSE,
compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
susieR::susie_plot(res2,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
res2$pip[snp1]
res2$pip[snp2]
Although the 2nd SNP did not make into the CS, there is no false positives. Here EZ seems conservative but this is a problem to worry later. Of immediate importance: