Multivariate Bayesian variable selection regression

ELBO implementation and comparisons

I have implemented ELBO for M&M model based on write up in this document. See Section 8 for derivation details; also Section B for an independent re-derivation from Yuxin Zou in checking my work.

(the analytic form are not identical due to different simplifications but I coded both up and checked they agree).

susieR implementation in univariate case

Here I simulate one trait and run with susieR::susie as well as mvsusieR::susie to check if the elbo agree.

In [1]:
library(mvsusieR)
set.seed(2)
Loading required package: mashr
Loading required package: ashr

I pick L=5 and simulate a univariate trait,

In [2]:
L = 5
dat = mvsusie_sim1(r=1)

Then run SuSiE and get the ELBO,

In [3]:
res = susieR::susie(dat$X,dat$y,L=L,scaled_prior_variance=0.2,estimate_residual_variance=F,estimate_prior_variance=F)
In [4]:
res$elbo
  1. -429.40495176513
  2. -409.549300389438
  3. -407.707739710942
  4. -407.7058409462
  5. -407.70583907902
In [5]:
res = susieR::susie(dat$X,dat$y,L=L,scaled_prior_variance=0.2,estimate_residual_variance=T,estimate_prior_variance=F)
res$elbo
  1. -405.95129176068
  2. -305.537422264398
  3. -301.451620424532
  4. -301.435227591177
  5. -301.435194084957
In [6]:
res = susieR::susie(dat$X,dat$y,L=L,scaled_prior_variance=0.2,estimate_residual_variance=T,estimate_prior_variance=T)
res$elbo
  1. -404.317496820361
  2. -355.288104606139
  3. -299.920961061546
  4. -299.335726585569
  5. -299.335132481419

Compare mvsusieR's Bayesian multivariate regression module with SuSiE

I have implemented the multiple regression in this class in mvsusieR package. On the mvsusieR interface this is triggered by setting prior V as a scalar and make input Y a one column matrix,

In [7]:
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=0.2,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
res$elbo
  1. -429.40495176513
  2. -409.549300389436
  3. -407.707739710942
  4. -407.7058409462
  5. -407.70583907902
In [8]:
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=0.2,compute_objective=T,estimate_residual_variance=T,estimate_prior_variance=F)
res$elbo
  1. -405.95129176068
  2. -305.537422264397
  3. -301.451620424532
  4. -301.435227591177
  5. -301.435194084957
In [9]:
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=0.2,compute_objective=T,estimate_residual_variance=T,estimate_prior_variance=T)
res$elbo
  1. -404.317496769782
  2. -355.288104764095
  3. -299.920961070197
  4. -299.33572658556
  5. -299.335132481417

The output is identical to using susieR.

Compare mvsusieR's MASH regression module with SuSiE

Here I create a degenerated MASH regression module implemented in mvsusieR. "Degenerated" means it has only one phenotype, and the prior is also a trivial one-component 1 by 1 matrix of 0.2 * var(Y). Here we use EE model in MASH (alpha = 0).

The difference in code between this and previous section in computing ELBO can be found in this function for computing the L-th KL: compute_expected_loglik_partial() and this function: compute_objective() for finalizing the ELBO. The univariate and multivariate cases are distinguished by if ... else statement.

In [10]:
m_init = mvsusieR:::MashInitializer$new(list(matrix(0.2*var(dat$y))), 1, alpha=0)
m_init$mash_prior
$pi
function (object, ...) 
UseMethod("weights")
$Ulist
  1. A matrix: 1 × 1 of type dbl[,1]
    1.138784
$grid
1
$usepointmass
TRUE
In [12]:
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,compute_objective=T, estimate_residual_variance=F,estimate_prior_variance=F)
In [13]:
res$elbo
  1. -429.40495176513
  2. -409.549300389436
  3. -407.707739710942
  4. -407.7058409462
  5. -407.70583907902

The output is also the same as previous calculations.

ELBO for multivariate calculation

Now using the same code, but more phenotypes, R = 5, and still using a very simple prior for the MASH part,

In [14]:
set.seed(2)
dat = mvsusie_sim1(r=5)
m_init = mvsusieR:::MashInitializer$new(list(dat$V), 1, alpha=0)
m_init$mash_prior
$pi
function (object, ...) 
UseMethod("weights")
$Ulist
  1. A matrix: 5 × 5 of type dbl[,5]
    1.14068826-0.02561830-0.19275481-0.040421229-0.029376381
    -0.02561830 1.01088663 0.03491952 0.018064842 0.016833489
    -0.19275481 0.03491952 1.15215509-0.112673234-0.063194855
    -0.04042123 0.01806484-0.11267323 0.836044978 0.008248383
    -0.02937638 0.01683349-0.06319485 0.008248383 0.865937437
$grid
1
$usepointmass
TRUE
In [16]:
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
In [17]:
res$elbo
  1. -2184.07604753707
  2. -2173.19925417124
  3. -2169.55143832283
  4. -2167.74410199547
  5. -2167.26694357295
  6. -2167.08740604301
  7. -2166.51381866454
  8. -2166.01270008591
  9. -2164.89405592026
  10. -2163.63506437888
  11. -2163.57378303911
  12. -2163.57265177467
  13. -2163.57261683874

and with a different seed:

In [19]:
set.seed(1)
dat = mvsusie_sim1(r=5)
m_init = mvsusieR:::MashInitializer$new(list(dat$V), 1, alpha=0)
res = mvsusieR::mvsusie(dat$X,dat$y,L=L,prior_variance=m_init,compute_objective=T,estimate_residual_variance=F,estimate_prior_variance=F)
res$elbo
  1. -2169.12300499713
  2. -2160.89504325175
  3. -2156.1789136104
  4. -2155.07970729671
  5. -2155.04514487993
  6. -2155.04301270126
  7. -2155.0428902174

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago