Multivariate Bayesian variable selection regression

Running "degenerated" MASH computation

This is prototype to a unit test to verify implementation of mash computation is correct, by comparing it to univariate case when $Y$ has one column and prior covariance matrices is fixed.

Simulate data

In [1]:
set.seed(1)
n = 1000
p = 1000
beta = rep(0,p)
beta[1:4] = 1
X = matrix(rnorm(n*p),nrow=n,ncol=p)
y = X %*% beta + rnorm(n)
#' res =susie(X,y,L=10)
library(mvsusieR)
Loading required package: mashr
Loading required package: ashr

Run univariate computation

In [2]:
prior_var = 0.2 * as.numeric(var(y))
residual_var = as.numeric(var(y))
data = mvsusieR:::DenseData$new(X,y)
In [3]:
residual_var
5.74816803778574
In [4]:
prior_var
1.14963360755715
In [5]:
m1 = mvsusieR:::BayesianSimpleRegression$new(ncol(X), residual_var, prior_var)
m1$fit(data, save_summary_stats = T)
In [6]:
head(m1$posterior_b1)
A matrix: 6 × 1 of type dbl[,1]
1.17582479
1.10571117
1.11985691
1.07629210
-0.12724319
-0.07497652
In [7]:
m1
<BayesianSimpleRegression>
  Public:
    bhat: active binding
    clone: function (deep = FALSE) 
    compute_loglik_null: function (d) 
    fit: function (d, prior_weights = NULL, use_residual = FALSE, save_summary_stats = FALSE) 
    initialize: function (J, residual_variance, prior_variance, estimate_prior_variance = FALSE) 
    lbf: active binding
    loglik_null: active binding
    posterior_b1: active binding
    posterior_b2: active binding
    prior_variance: active binding
    residual_variance: active binding
    sbhat: active binding
  Private:
    .bhat: 1.18170980260002 1.11124526041777 1.12546180389929 1.081 ...
    .lbf: 118.091157635422 104.120913660333 106.870332723677 98.51 ...
    .loglik_null: NULL
    .posterior_b1: 1.17582479362293 1.10571117047545 1.11985691443764 1.076 ...
    .posterior_b2: 1.38828921226831 1.2283224594841 1.25980477578369 1.1641 ...
    .prior_variance: 1.14963360755715
    .residual_variance: 5.74816803778574
    .sbhat: 0.0758546106689995 0.0758546106689995 0.0758546106689995 ...
    estimate_prior_variance: function (betahat, shat2, prior_weights, method = c("optim", 
    J: 1000
    lbf_grad: function (V, sbhat2, T2) 
    loglik: function (V, betahat, shat2, prior_weights) 
    loglik_grad: function (V, bhat, sbhat2, prior_weights) 
    neg_loglik_logscale: function (lV, betahat, shat2, prior_weights) 
    negloglik_grad_logscale: function (lV, betahat, shat2, prior_weights) 
    to_estimate_prior_variance: FALSE

Run multivariate computation

In [8]:
# Assuming 1 out of $J$ are causal, we place a null weight $1-1/J$ a priori.
# This will lead to some shrinkage
# null_weight = 1 - 1 / ncol(X)
null_weight = 0
prior_covar = mvsusieR:::MashInitializer$new(list(0.2 * cov(y)), 1, prior_weights=1 - null_weight, null_weight=null_weight, alpha = 0)
residual_covar = cov(y)
In [9]:
prior_covar$prior_variance
$pi
  1. 0
  2. 1
$xUlist
$null
A matrix: 1 × 1 of type dbl[,1]
0
$`.1`
A matrix: 1 × 1 of type dbl[,1]
1.149634
In [10]:
residual_covar
A matrix: 1 × 1 of type dbl[,1]
5.748168
In [11]:
m2 = mvsusieR:::MashRegression$new(ncol(X), residual_covar, prior_covar)
In [12]:
m2$fit(data, save_summary_stats = T)
In [13]:
head(m2$posterior_b1)
A matrix: 6 × 1 of type dbl[,1]
1.17582479
1.10571117
1.11985691
1.07629210
-0.12724319
-0.07497652
In [14]:
m2
<MashRegression>
  Inherits from: <BayesianSimpleRegression>
  Public:
    bhat: active binding
    clone: function (deep = FALSE) 
    compute_loglik_null: function (d) 
    fit: function (d, prior_weights = NULL, use_residual = FALSE, save_summary_stats = FALSE) 
    initialize: function (J, residual_variance, mash_initializer, estimate_prior_variance = FALSE) 
    lbf: active binding
    lfsr: active binding
    loglik_null: active binding
    mixture_posterior_weights: active binding
    posterior_b1: active binding
    posterior_b2: active binding
    prior_variance: active binding
    residual_variance: active binding
    residual_variance_inv: active binding
    sbhat: active binding
  Private:
    .bhat: 1.18170980260002 1.11124526041777 1.12546180389929 1.081 ...
    .lbf: 118.091157635422 104.120913660333 106.870332723677 98.51 ...
    .lfsr: 9.33179437154505e-55 1.15780564779897e-48 7.313195958829 ...
    .loglik_null: -119.686629951695 -105.64646483559 -108.409644755107 -10 ...
    .mixture_posterior_weights: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  ...
    .posterior_b1: 1.17582479362293 1.10571117047545 1.11985691443764 1.076 ...
    .posterior_b2: 1.38828921226832 1.2283224594841 1.25980477578369 1.1641 ...
    .prior_variance: list
    .residual_variance: 5.74816803778574
    .residual_variance_inv: 0.173968470202414
    .sbhat: 0.0758546106689995 0.0758546106689995 0.0758546106689995 ...
    alpha: 0
    estimate_prior_variance: function (betahat, shat2, prior_weights, method = c("optim", 
    J: NULL
    lbf_grad: function (V, sbhat2, T2) 
    loglik: function (V, betahat, shat2, prior_weights) 
    loglik_grad: function (V, bhat, sbhat2, prior_weights) 
    neg_loglik_logscale: function (lV, betahat, shat2, prior_weights) 
    negloglik_grad_logscale: function (lV, betahat, shat2, prior_weights) 
    precomputed_cov_matrices: NULL
    residual_correlation: 1
    to_estimate_prior_variance: FALSE

All quantities seem to agree now.

Run ASH to confirm

ASH works well.

In [15]:
library(ashr)
In [16]:
a.out = ash(as.vector(m1$bhat), as.vector(m1$sbhat), mixcompdist = 'normal')
head(get_pm(a.out))
  1. 1.17684065287554
  2. 1.10666645474003
  3. 1.12082441998294
  4. 1.07722197094866
  5. -0.000161974838377648
  6. -3.79185875354948e-05

Run with fixed prior directly from MASH

In [17]:
prior_covar$mash_prior
$pi
  1. 0
  2. 1
$Ulist
  1. A matrix: 1 × 1 of type dbl[,1]
    1.149634
$grid
1
$usepointmass
TRUE
In [18]:
library(mashr)
In [19]:
data = mash_set_data(m2$bhat, m2$sbhat)
m.c = mash(data, g = prior_covar$mash_prior, fixg = TRUE, algorithm ='Rcpp')
 - Computing 1000 x 2 likelihood matrix.
 - Likelihood calculations took 0.00 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.00 seconds.
In [21]:
head(get_pm(m.c))
A matrix: 6 × 1 of type dbl[,1]
1.17582479
1.10571117
1.11985691
1.07629210
-0.12724319
-0.07497652

Run from MASH with canonical priors but weights learned from data

Very similiar results to what I got with fixed g earlier.

In [22]:
U.c = cov_canonical(data)
print(names(U.c))
[1] "identity"      "singletons_1"  "equal_effects" "simple_het_1" 
[5] "simple_het_2"  "simple_het_3" 
In [23]:
m.c = mash(data, U.c, , algorithm ='Rcpp')
 - Computing 1000 x 109 likelihood matrix.
 - Likelihood calculations took 0.01 seconds.
 - Fitting model with 109 mixture components.
 - Model fitting took 0.17 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.00 seconds.
In [24]:
head(get_pm(m.c))
A matrix: 6 × 1 of type dbl[,1]
1.176841e+00
1.106666e+00
1.120824e+00
1.077222e+00
-1.619748e-04
-3.791859e-05
In [25]:
m.c$fitted_g
$pi
null
0.995204679826406
identity.1
7.95271687980721e-11
singletons_1.1
1.92337562326512e-18
equal_effects.1
1.92337562326512e-18
simple_het_1.1
1.92337562326512e-18
simple_het_2.1
1.92337562326512e-18
simple_het_3.1
1.92337562326512e-18
identity.2
1.91640242800725e-18
singletons_1.2
1.91640242800727e-18
equal_effects.2
1.91640242800725e-18
simple_het_1.2
1.91640242800725e-18
simple_het_2.2
1.91640242800725e-18
simple_het_3.2
1.91640242800725e-18
identity.3
1.90255267138468e-18
singletons_1.3
1.90255267138468e-18
equal_effects.3
1.90255267138468e-18
simple_het_1.3
1.90255267138468e-18
simple_het_2.3
1.90255267138468e-18
simple_het_3.3
1.90255267138468e-18
identity.4
7.05488713855198e-12
singletons_1.4
8.57096558015708e-12
equal_effects.4
8.53584198125e-12
simple_het_1.4
8.53866029373803e-12
simple_het_2.4
8.53902464905407e-12
simple_het_3.4
8.50320944549281e-12
identity.5
1.82210469493815e-18
singletons_1.5
1.82210469493815e-18
equal_effects.5
1.82210469493815e-18
simple_het_1.5
1.82210469493815e-18
simple_het_2.5
1.82210469493815e-18
simple_het_3.5
1.82210469493815e-18
identity.6
1.72160299319065e-18
singletons_1.6
1.72160299319057e-18
equal_effects.6
1.72160299319057e-18
simple_het_1.6
1.72160299319057e-18
simple_het_2.6
1.72160299319057e-18
simple_het_3.6
1.72160299319057e-18
identity.7
1.54166429607594e-18
singletons_1.7
1.54166429607594e-18
equal_effects.7
1.54166429607594e-18
simple_het_1.7
1.54166429607594e-18
simple_het_2.7
1.54166429607594e-18
simple_het_3.7
1.54166429607594e-18
identity.8
1.25157437790948e-18
singletons_1.8
1.25157437790948e-18
equal_effects.8
1.25157437790948e-18
simple_het_1.8
1.25157437790948e-18
simple_het_2.8
1.25157437790948e-18
simple_het_3.8
1.25157437790948e-18
identity.9
8.63417008698988e-19
singletons_1.9
8.63417008698988e-19
equal_effects.9
8.63417008698988e-19
simple_het_1.9
8.63417008698988e-19
simple_het_2.9
8.63417008698988e-19
simple_het_3.9
8.63417008698988e-19
identity.10
4.74457879337841e-19
singletons_1.10
4.74457879337841e-19
equal_effects.10
4.74457879337841e-19
simple_het_1.10
4.74457879337841e-19
simple_het_2.10
4.74457879337841e-19
simple_het_3.10
4.74457879337841e-19
identity.11
2.03257357510353e-19
singletons_1.11
2.03257357510353e-19
equal_effects.11
2.03257357510353e-19
simple_het_1.11
2.03257357510353e-19
simple_het_2.11
2.03257357510353e-19
simple_het_3.11
2.03257357510353e-19
identity.12
7.17188935208766e-20
singletons_1.12
7.17188935208766e-20
equal_effects.12
7.17188935208766e-20
simple_het_1.12
7.17188935208766e-20
simple_het_2.12
7.17188935208766e-20
simple_het_3.12
7.17188935208766e-20
identity.13
3.11041956192117e-20
singletons_1.13
3.11041956192117e-20
equal_effects.13
3.11041956192117e-20
simple_het_1.13
3.11041956192117e-20
simple_het_2.13
3.11041956192117e-20
simple_het_3.13
3.11041956192117e-20
identity.14
2.85540249574286e-20
singletons_1.14
2.85540249574286e-20
equal_effects.14
2.85540249574286e-20
simple_het_1.14
2.85540249574286e-20
simple_het_2.14
2.85540249574286e-20
simple_het_3.14
2.85540249574286e-20
identity.15
2.66109962906175e-20
singletons_1.15
5.15481860006988e-14
equal_effects.15
5.63610430010078e-14
simple_het_1.15
4.27368970777414e-14
simple_het_2.15
3.67144166887171e-13
simple_het_3.15
2.66109962906175e-20
identity.16
0.0047953200438066
singletons_1.16
1.57699152796768e-20
equal_effects.16
1.57699152796768e-20
simple_het_1.16
1.57699152796768e-20
simple_het_2.16
1.57699152796768e-20
simple_het_3.16
1.57699152796768e-20
identity.17
6.3974500564993e-21
singletons_1.17
6.3974500564993e-21
equal_effects.17
6.3974500564993e-21
simple_het_1.17
6.3974500564993e-21
simple_het_2.17
6.3974500564993e-21
simple_het_3.17
6.3974500564993e-21
identity.18
2.0642370841958e-21
singletons_1.18
2.0642370841958e-21
equal_effects.18
2.0642370841958e-21
simple_het_1.18
2.0642370841958e-21
simple_het_2.18
2.0642370841958e-21
simple_het_3.18
2.0642370841958e-21
$Ulist
$identity
A matrix: 1 × 1 of type dbl[,1]
1
$singletons_1
A matrix: 1 × 1 of type dbl[,1]
1
$equal_effects
A matrix: 1 × 1 of type dbl[,1]
1
$simple_het_1
A matrix: 1 × 1 of type dbl[,1]
1
$simple_het_2
A matrix: 1 × 1 of type dbl[,1]
1
$simple_het_3
A matrix: 1 × 1 of type dbl[,1]
1
$grid
  1. 0.00651462291736128
  2. 0.00921306808347891
  3. 0.0130292458347226
  4. 0.0184261361669578
  5. 0.0260584916694451
  6. 0.0368522723339156
  7. 0.0521169833388903
  8. 0.0737045446678313
  9. 0.104233966677781
  10. 0.147409089335663
  11. 0.208467933355561
  12. 0.294818178671325
  13. 0.416935866711122
  14. 0.589636357342651
  15. 0.833871733422245
  16. 1.1792727146853
  17. 1.66774346684449
  18. 2.3585454293706
$usepointmass
TRUE
In [ ]:


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago