Multivariate Bayesian variable selection regression

Problem with EE model

I discovered while working on the example of "two SNP dispute" that for Exchangable Effect (EE) model, that is, setting MASH alpha=0, there seems convergence issues with the algorithm, for the specific simulation below. I cannot reproduce the problem with alpha=1 or with the 2nd block having positive sign of effect size ...

In [1]:
%cd /home/gaow/tmp/13-May-2019
/home/gaow/Documents/TempDir/13-May-2019

Find negatively correlation SNPs

In [2]:
genotype = readRDS('Multi_Tissues.ENSG00000145214.RDS')$X

Simulate multivariate phenotypes

In [3]:
P = ncol(genotype)
R = 6
eff_factor = 1.5
snp1 = 184
snp2 = 354
cor(genotype[,snp1], genotype[,snp2])
-0.8545869815011

The two SNPs are negatively correlated.

In [4]:
set.seed(1)
b = matrix(0, P, R)
sharing = matrix(0.75, (R/2), (R/2))
diag(sharing) = 1
print(sharing)
     [,1] [,2] [,3]
[1,] 1.00 0.75 0.75
[2,] 0.75 1.00 0.75
[3,] 0.75 0.75 1.00

Here above I have a matrix block. I'm going to create a blocky structure next such that the two halves of traits are drawn from this sharing pattern, but there is no sharing in between them.

Notice here I set the effect size of the 2nd block to be negative.

In [5]:
b[snp1, 1:(R/2)] = abs(MASS::mvrnorm(1, rep(0,(R/2)), sharing)) / eff_factor
b[snp2, (R/2+1):R] = -abs(MASS::mvrnorm(1, rep(0,(R/2)), sharing)) / eff_factor
print(b[snp1, 1:(R/2)])
print(b[snp2, (R/2+1):R])
[1] 0.1951948 0.3530102 0.5955379
[1] -1.136650 -1.039006 -0.736915
In [6]:
g = genotype %*% b
In [7]:
e = MASS::mvrnorm(1, rep(0,R), diag(R) * 1)
In [8]:
Y = g + e

Prepare MASH mixture prior

In [9]:
U1 = matrix(0,R,R)
U1[1:(R/2),1:(R/2)] = sharing
U2 = matrix(0,R,R)
U2[(R/2+1):R,(R/2+1):R] = sharing
Ulist = list(U1=U1, U2=U2)
scaling = c(0.5,1) / eff_factor
mash_init = mvsusieR:::MashInitializer$new(Ulist, scaling, alpha = 0)

Notice here I'm using the EE model by setting alpha = 0.

Fit MV-SuSiE

Using EE model it takes quite a while to converge,

In [10]:
start_time <- Sys.time()
res = mvsusieR::susie(genotype,Y,
                  L=10,V=mash_init,
                  precompute_covariances=FALSE,
                  compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
Time difference of 8.343501 mins

Number of iterations,

In [11]:
res$niter
18
In [12]:
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
In [13]:
rownames(res$coef) = c('intercept', snps)

Visualize results

In [14]:
pdf('susie_plot_toy.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [15]:
p = mvsusieR::mvsusie_plot(res)
Suggested PDF canvas width: 3 height: 5.4 
In [16]:
pdf('mvsusie_plot_toy.pdf', width = 8, height = 4)
print(p$plot)
dev.off()
png: 2
In [17]:
%preview susie_plot_toy.pdf -s png --dpi 150
> susie_plot_toy.pdf (31.5 KiB):
In [18]:
%preview mvsusie_plot_toy.pdf -s png --dpi 150
> mvsusie_plot_toy.pdf (5.5 KiB):

While it seems capable (see below) of capturing the two simulated causal SNPs, there are a few more others also detected and show different signs of effect in different conditions. This is not a problem with EZ model, though.

In [19]:
colnames(genotype)[snp1]
'chr4_112523_C_T_b38'
In [20]:
colnames(genotype)[snp2]
'chr4_173807_A_G_b38'

Try a different initialization

Since I have simulated the data I can try initializat it from the truth,

In [10]:
coef_value = b[which(apply(b,1,sum)!=0),]
coef_value
A matrix: 2 × 6 of type dbl
0.19519480.35301020.5955379 0.00000 0.000000 0.000000
0.00000000.00000000.0000000-1.13665-1.039006-0.736915

That is, two effects are non-zero and all other effects are zeros.

In [11]:
start_time <- Sys.time()
res = mvsusieR::susie(genotype,Y,
                  L=10,V=mash_init,
                  s_init = list(coef_value = coef_value, coef_index = c(snp1, snp2)),
                  precompute_covariances=FALSE,
                  compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
Time difference of 7.916978 mins
In [12]:
res$niter
17
In [13]:
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
rownames(res$coef) = c('intercept', snps)
pdf('susie_plot_toy.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [14]:
%preview susie_plot_toy.pdf -s png --dpi 150
> susie_plot_toy.pdf (31.5 KiB):
In [ ]:


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago