Multivariate Bayesian variable selection regression

Linkage vs pleiotropy: the Two-SNP example

Here I pick up a particular data-set and make a specific simulation case of linkage vs pleiotropy.

  1. In a data-set I identify SNPs with -0.8 correlations
  2. I then simulate for one SNP to be associated with 3 phenotypes, and the other to be associated with the rest, both with fixed positive effect size

See here for motivation of the analysis.

In [1]:
%cd /home/gaow/tmp/13-May-2019
/home/gaow/Documents/TempDir/13-May-2019

Find negatively correlation SNPs

In [2]:
genotype = readRDS('Multi_Tissues.ENSG00000089486.RDS')$X
In [3]:
ld = cor(genotype)

From MASH paper the variants in question are:

"4","ENSG00000089486.12","16_4525265_C_T_b37","16_4594671_C_T_b37"

Let's check them out. First liftOver here

chr16:4525264-4525265
chr16:4594670-4594671

to

chr16:4475263-4475264
chr16:4544669-4544670

which is chr16_4475264_C_T_b38 and chr16_4544670_C_T_b38

In [4]:
ld["chr16_4475264_C_T_b38", "chr16_4544670_C_T_b38"]
-0.734166843825794
In [5]:
snp1 = which(rownames(ld) == "chr16_4475264_C_T_b38")
In [6]:
snp2 = which(rownames(ld) == "chr16_4544670_C_T_b38")
In [7]:
apply(genotype,2,sum)[snp1] / nrow(genotype) / 2
chr16_4475264_C_T_b38: 0.719570405727924
In [8]:
apply(genotype,2,sum)[snp2] / nrow(genotype) / 2
chr16_4544670_C_T_b38: 0.356205250596659

Simulate multivariate phenotypes

In [9]:
P = ncol(genotype)
R = 6
eff_factor = 1.2
In [10]:
set.seed(1)
b = matrix(0, P, R)
sharing = matrix(0.75, 3, 3)
diag(sharing) = 1
In [11]:
b[snp1, 1:3] = abs(MASS::mvrnorm(1, rep(0,3), sharing)) / eff_factor
b[snp2, 4:6] = abs(MASS::mvrnorm(1, rep(0,3), sharing)) / eff_factor
print(b[snp1, 1:3])
print(b[snp2, 4:6])
[1] 0.2439935 0.4412627 0.7444224
[1] 1.4208123 1.2987575 0.9211438
In [12]:
g = genotype %*% b
In [13]:
apply(g,2,sd) * eff_factor
  1. 0.186719286922989
  2. 0.337682134363692
  3. 0.569679129723986
  4. 1.19284335202285
  5. 1.09037221013533
  6. 0.773346487923444
In [14]:
e = MASS::mvrnorm(nrow(g), rep(0,R), diag(R) * 1)
In [15]:
Y = g + e

Prepare MASH mixture prior

In [16]:
U1 = matrix(0,R,R)
U1[1:3,1:3] = sharing
U2 = matrix(0,R,R)
U2[4:6,4:6] = sharing
Ulist = list(U1=U1, U2=U2)
scaling = c(0.5,1) / eff_factor
mash_init = mvsusieR:::MashInitializer$new(Ulist, scaling, alpha = 1)

Fit MV-SuSiE

In [17]:
res = mvsusieR::mvsusie(genotype,Y,
                  L=10,prior_variance=mash_init,
                  compute_objective=FALSE)
                                                                              
In [18]:
snps = sub("_[A-Z]*_[A-Z]*_b38", "", colnames(genotype))
# conditions = c('Artery_Aorta', 'Artery_Coronary', 'Artery_Tibial', 'Caudate_basal_ganglia', 'Nucleus_accumbens_basal_ganglia', 'Putamen_basal_ganglia')
In [19]:
rownames(res$coef) = c('intercept', snps)
# colnames(res$coef) = conditions

Visualize results

Notice that with causal SNP being correlated, the sum of PIP in a CS is > 1.0.

In [20]:
res$alpha[1,res$sets$cs$L1]
  1. 0.0976191836031533
  2. 0.690593121567506
  3. 0.111866735497281
  4. 0.0293369894870043
  5. 0.0265909220598267
In [21]:
res$pip[res$sets$cs$L2]
  1. 0.0371401639660762
  2. 0.0226575460000187
  3. 0.0779721841147771
  4. 0.704292189639364
  5. 0.0139827928125186
  6. 0.221772858297005
  7. 0.0139573645659075
  8. 0.035328044959454
  9. 0.024139922959226
  10. 0.0279248559389905
  11. 0.0306415542896877
  12. 0.0344757272605833
  13. 0.0313096370122856
  14. 0.0426962869308725
  15. 0.0628903019535234
  16. 0.0558612984215553
  17. 0.0757747498530829
  18. 0.0653561299225651
  19. 0.0516263610879557
  20. 0.0362471530867727
  21. 0.0226926510106037
  22. 0.0310014748282297
  23. 0.0176355465502797
  24. 0.063733230993706
  25. 0.0194838026955932
In [22]:
pdf('susie_plot_toy.pdf', width=8, height=4)
susieR::susie_plot(res,y='PIP', main = 'Cross-condition Posterior Inclusion Probability', xlab = 'SNP positions', add_legend = F)
dev.off()
png: 2
In [23]:
p = mvsusieR::mvsusie_plot(res)
Suggested PDF canvas width: 15 height: 5.4 
In [24]:
pdf('mvsusie_plot_toy.pdf', width = 14, height = 4)
print(p$plot)
dev.off()
png: 2
In [25]:
%preview susie_plot_toy.pdf -s png --dpi 150
> susie_plot_toy.pdf (33.9 KiB):
In [26]:
%preview mvsusie_plot_toy.pdf -s png --dpi 150
> mvsusie_plot_toy.pdf (6.4 KiB):

For original effect size,

In [27]:
univariate_res = lapply(1:ncol(Y), function(i) susieR:::univariate_regression(genotype,Y[,i]))
res$bhat = do.call(cbind, lapply(1:ncol(Y), function(i) univariate_res[[i]]$betahat))
res$shat = do.call(cbind, lapply(1:ncol(Y), function(i) univariate_res[[i]]$sebetahat))
In [28]:
rownames(res$bhat) = snps
# colnames(res$bhat) = conditions
p = mvsusieR::mvsusie_plot(res, original_sumstat = TRUE)
Suggested PDF canvas width: 15 height: 5.4 
In [29]:
pdf('mvsusie_plot_toy_original.pdf', width = 14, height = 4.5)
print(p$plot)
dev.off()
png: 2
In [30]:
%preview mvsusie_plot_toy_original.pdf -s png --dpi 150
> mvsusie_plot_toy_original.pdf (8.8 KiB):

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago