Multivariate Bayesian variable selection regression

Linkage vs pleiotropy: the Two-SNP example with missing data

This is continuation of notebook two_snps_dispute.ipynb.

Here I:

  1. Add in a couple of missing values in Y
  2. Use precomputed inverse matrices

I'd expect to see almost identical result indicating my implementation are correct.

In [1]:
%cd /home/gaow/tmp/13-May-2019
/home/gaow/Documents/TempDir/13-May-2019

Find negatively correlation SNPs

In [2]:
genotype = readRDS('Multi_Tissues.ENSG00000145214.RDS')$X
In [3]:
ld = cor(genotype)
In [4]:
idx = which(ld > -0.855 & ld < -0.85, arr.ind = T)
In [5]:
head(idx)
A matrix: 6 × 2 of type int
rowcol
chr4_173807_A_G_b38354184
chr4_205767_C_CA_b38454184
chr4_205769_C_T_b38455184
chr4_196264_A_G_b38417233
chr4_196753_C_T_b38420266
chr4_199265_C_G_b38427293

So let's check out 184 and 354:

In [6]:
cor(genotype[,184], genotype[,354])
-0.8545869815011
In [7]:
colnames(genotype)[184]
'chr4_112523_C_T_b38'
In [8]:
colnames(genotype)[354]
'chr4_173807_A_G_b38'

Simulate multivariate phenotypes

In [9]:
P = ncol(genotype)
R = 6
eff_factor = 1.5
In [10]:
set.seed(1)
b = matrix(0, P, R)
sharing = matrix(0.75, 3, 3)
diag(sharing) = 1
In [11]:
b[184, 1:3] = abs(MASS::mvrnorm(1, rep(0,3), sharing)) / eff_factor
b[354, 4:6] = -abs(MASS::mvrnorm(1, rep(0,3), sharing)) / eff_factor
print(b[184, 1:3])
print(b[354, 4:6])
[1] 0.1951948 0.3530102 0.5955379
[1] -1.136650 -1.039006 -0.736915
In [12]:
g = genotype %*% b
In [13]:
apply(g,2,sd) * eff_factor
  1. 0.202253269354167
  2. 0.365775366878507
  3. 0.617073192428342
  4. 1.17053036467497
  5. 1.06997601872598
  6. 0.758880489205929
In [14]:
e = MASS::mvrnorm(nrow(g), rep(0,R), diag(R) * 1)
In [15]:
Y = g + e

Prepare MASH mixture prior

In [16]:
U1 = matrix(0,R,R)
U1[1:3,1:3] = sharing
U2 = matrix(0,R,R)
U2[4:6,4:6] = sharing
Ulist = list(U1=U1, U2=U2)
scaling = c(0.5,1) / eff_factor
mash_init = mvsusieR:::MashInitializer$new(Ulist, scaling, alpha = 1)

Notice here I'm using the EZ model by setting alpha = 1.

Fit MV-SuSiE

With / without precomputed inverse matrices we compare the running time on this $R=6$ example,

In [17]:
start_time <- Sys.time()
res = mvsusieR::mvsusie(genotype,Y,
                  L=10,prior_variance=mash_init,
                  precompute_covariances=FALSE,
                  compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
                                                                              
Time difference of 23.3415 secs
In [18]:
start_time <- Sys.time()
res = mvsusieR::mvsusie(genotype,Y,
                  L=10,prior_variance=mash_init,
                  precompute_covariances=TRUE,
                  compute_objective=FALSE)
end_time <- Sys.time()
print(end_time - start_time)
                                                                              
Time difference of 19.46232 secs
In [19]:
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
In [20]:
rownames(res$coef) = c('intercept', snps)

Visualize results

In [21]:
pdf('susie_plot_toy.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [22]:
p = mvsusieR::mvsusie_plot(res)
Suggested PDF canvas width: 13.5 height: 5.4 
In [23]:
pdf('mvsusie_plot_toy.pdf', width = 8, height = 4)
print(p$plot)
dev.off()
png: 2
In [24]:
%preview susie_plot_toy.pdf -s png --dpi 150
> susie_plot_toy.pdf (31.6 KiB):
In [25]:
%preview mvsusie_plot_toy.pdf -s png --dpi 150
> mvsusie_plot_toy.pdf (6.1 KiB):

Above looks all good. Notice here I'm using the precompute covariances routine.

Add in some missing data

For each Y I want 5% data missing,

In [26]:
Y2 = Y
for (i in 1:ncol(Y)) {
    Y2[sample(1:nrow(Y), ceiling(nrow(Y) * 0.05)), i] = NA
}

And run the procedure again,

In [27]:
res = mvsusieR::mvsusie(genotype,Y2,
                  L=10,prior_variance=mash_init,
                  compute_objective=FALSE)
                                                                              
In [28]:
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
rownames(res$coef) = c('intercept', snps)
In [29]:
pdf('susie_plot_toy_missing.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [30]:
p = mvsusieR::mvsusie_plot(res)
Suggested PDF canvas width: 13.5 height: 5.4 
In [31]:
pdf('mvsusie_plot_toy_missing.pdf', width = 8, height = 4)
print(p$plot)
dev.off()
png: 2
In [32]:
%preview susie_plot_toy_missing.pdf -s png --dpi 150
> susie_plot_toy_missing.pdf (31.9 KiB):
In [33]:
%preview mvsusie_plot_toy_missing.pdf -s png --dpi 150
> mvsusie_plot_toy_missing.pdf (6.1 KiB):

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago