Multivariate Bayesian variable selection regression

Multivariate regression simple prior with estimated scalar

Here I use multivariate prior (not mash mixture) for mvsusie call, but allow for the scalar of prior to be estimated. That is, I have implemeted estimate_prior_variance for this class.

In [1]:
library(mvsusieR)
set.seed(2)
Loading required package: mashr

Loading required package: ashr

In [2]:
dat = mvsusie_sim1(r=5,s=1)
In [3]:
dim(dat$X)
  1. 200
  2. 500
In [4]:
dim(dat$y)
  1. 200
  2. 5
In [5]:
true_pos = as.integer(apply(dat$b, 1, sum) != 0)
sum(true_pos)
5

As you can see in this simulated data-set there are 5 true effects. Now I set $L=10$. Here the prior used is not oracle prior but just 0.2 times covariance of $Y$.

In [6]:
L=10

Not estimating prior variance scalar

In [7]:
start = Sys.time()
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
Sys.time() - start
Time difference of 5.760026 secs
In [8]:
res$elbo
  1. -1655.50494461652
  2. -1620.25027050787
  3. -1617.45218786625
  4. -1613.61787403333
  5. -1613.47028661574
  6. -1613.46939008711
In [9]:
susieR::susie_plot(res,y='PIP', main = 'Default SuSiE plot for cross-condition PIP', xlab = 'SNP positions', add_legend = T, b=true_pos)

Estimating prior variance scalar directly

In [10]:
start = Sys.time()
res2 = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=T, estimate_prior_method='optim')
Sys.time() - start
Time difference of 4.452133 mins
In [11]:
res2$elbo
  1. -1624.32478431952
  2. -1572.59239254532
  3. -1571.96261856284
  4. -1571.95453315782
  5. -1571.95448105465

The ELBO is indeed better, but it takes much longer time -- per iteration time is about 50 times compared to when the scalar is not estimated.

The estimated scalars for prior variance are:

In [12]:
res2$V
  1. 0.568657549196573
  2. 0.503194309629131
  3. 0.445494954535737
  4. 0.484025124533229
  5. 0.533932020321646
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0

As expected only the first 5 effects are non-zero.

In [13]:
susieR::susie_plot(res2,y='PIP', main = 'Default SuSiE plot for cross-condition PIP', xlab = 'SNP positions', add_legend = T, b=true_pos)

The PIP plot also appears a lot cleaner.

"Simple" method to estimate prior

Here instead of estimating prior scalar I only compare it with setting it to zero:

In [14]:
start = Sys.time()
res2 = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=T, estimate_prior_method='simple')
Sys.time() - start
Time difference of 19.3359 secs
In [15]:
res2$elbo
  1. -1618.74386621887
  2. -1574.43053803949
  3. -1574.17412385583
  4. -1574.17119869015
  5. -1574.17117800401
In [16]:
res2$V
  1. 1
  2. 1
  3. 1
  4. 1
  5. 1
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0

It's a lot faster but the ELBO is slighly smaller.

EM update for prior scalar

In [17]:
start = Sys.time()
res2 = mvsusie(dat$X,dat$y,L=L,prior_variance=dat$V,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=T, estimate_prior_method='EM')
Sys.time() - start
Time difference of 23.36678 secs
In [18]:
res2$elbo
  1. -1655.50494461652
  2. -1580.75987829218
  3. -1573.05895758214
  4. -1571.97484354139
  5. -1571.95480372327
  6. -1571.95448648458
In [19]:
res2$V
  1. 0.568655116212092
  2. 0.503050767292613
  3. 0.445490629716591
  4. 0.484004532291694
  5. 0.533921336407041
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0
In [20]:
susieR::susie_plot(res2,y='PIP', main = 'Default SuSiE plot for cross-condition PIP', xlab = 'SNP positions', add_legend = T, b=true_pos)

Almost same result as using optim method.

Update for prior scalar for MASH mixture

First, try out the simple method in MASH to verify that it agrees with the multivariate prior (although it does, at least as far as our unit test can tell us),

In [21]:
start = Sys.time()
m_init = mvsusieR:::MashInitializer$new(list(dat$V), 1, 1, 0)
res2 = mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,estimate_prior_variance=T,precompute_covariances=T,compute_objective=T,estimate_residual_variance=F, estimate_prior_method='simple')
Sys.time() - start
Time difference of 2.153718 secs
In [22]:
res2$elbo
  1. -1618.74386621887
  2. -1574.43053803949
  3. -1574.17412385583
  4. -1574.17119869015
  5. -1574.17117800401
In [23]:
res2$V
  1. 1
  2. 1
  3. 1
  4. 1
  5. 1
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0

Now try EM method and see that agrees with the multivariate prior (although it does, at least as far as our unit test can tell us),

In [24]:
start = Sys.time()
m_init = mvsusieR:::MashInitializer$new(list(dat$V), 1, 1, 0)
res2 = mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,estimate_prior_variance=T,precompute_covariances=F,compute_objective=T,estimate_residual_variance=F, estimate_prior_method='EM')
Sys.time() - start
Time difference of 2.964305 secs
In [26]:
res2$elbo
  1. -1655.50494461652
  2. -1580.75987829218
  3. -1573.05895758214
  4. -1571.97484354139
  5. -1571.95480372327
  6. -1571.95448648458
In [27]:
res2$V
  1. 0.568655116212092
  2. 0.503050767292612
  3. 0.445490629716591
  4. 0.484004532291694
  5. 0.533921336407041
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago