mr-ash example analysis¶

Here we analyze GTEx based simulation data using mr-ash implemented in varbvs. The simulated data is stored in the same format as the processed GTEx data; therefore the same procedure can be used directly for data analysis. The only difference is that simulated data do not have covariates.

All files are saved in HDF5 format on midway: /project/compbio/internal_public_supp/GTEx7Toy.

In the toy data-set I selected 3 genes:

/chr4/ENSG00000145214
/chr18/ENSG00000264247
/chr19/ENSG00000267508

These should be used as geno_table variable in the code below.

For the toy real data there are 53 groups in the HDF file. To show what they are:

h5ls TY.expr.h5

For the simulated data there is only one group called simulated, which is the data I'll use in the example below.

To load data:

# source("http://bioconductor.org/biocLite.R")
# biocLite("rhdf5")
library(rhdf5)
source("src/Utils.R")
fpath = '/home/gaow/Documents/GTEx/ToyExample'
genotype_file = paste0(fpath, '/TY.genotype.h5')
expr_file = paste0(fpath, '/TY.expr_simulated.h5')
gene = 'ENSG00000145214'
geno_table = '/chr4/ENSG00000145214'
expr_table = '/simulated'
dat = load_data(genotype_file, expr_file, geno_table, expr_table)

To analyze:

# library(devtools)
# install_github("pcarbo/varbvs",subdir = "varbvs-R")
X = as.matrix(dat$X)
storage.mode(X) <- "double"
y = as.vector(dat$y)
# Univariate analysis
res0 = univariate_lm(X,y)

mixsd = ashr:::autoselect.mixsd(res0, sqrt(2), 0, c(-Inf,Inf), "normal")
res = varbvs::varbvsmix(X, NULL, y, sa = c(0, mixsd^2)) #, sigma = 1, update.sigma = F)

Fitting variational approximation for linear regression model with
mixture-of-normals priors.
samples:      635    mixture component sd's:    0.0072..5.2
variables:    7258   fit mixture variances:     no
covariates:   0      fit mixture weights:       yes
mixture size: 21     fit residual var. (sigma): yes
intercept:    yes    convergence tolerance      1.0e-04
       variational    max. --------- hyperparameters ---------
iter   lower bound  change   sigma  mixture sd's  mix. weights

To extract results from analysis:

res$pip = res$alpha %*% c(res$w)
res$beta = rowSums(res$alpha * res$mu)
res$sigma
res$w

Compare with simulation parameters¶

meta = paste0(fpath, '/TY.meta_simulation.json')
# install.packages('rjson', repos = 'http://cloud.r-project.org')
meta <- rjson::fromJSON(file = meta)
str(meta)

List of 4
 $ pi   : num [1:4] 0.4 0.2 0.2 0.2
 $ pi0  : num 0.998
 $ sigma: num [1:4] 0.25 0.5 1 2
 $ beta :List of 3
  ..$ ENSG00000264247: num [1:9871] 3.23 0 0 0 0 ...
  ..$ ENSG00000267508: num [1:8911] -2.34 0 0 0 0 ...
  ..$ ENSG00000145214: num [1:7258] -2 0 0 0 0 ...

Compare mixture proportion estimates:

truth = c(meta$pi0, meta$pi * (1 - meta$pi0))
est = res$w
cbind(truth, est)

Compare effect size estimates:

beta = cbind(res$beta, meta$beta[[gene]])
beta = beta[order(beta[,2]),]
beta

plot(res$beta)
plot(meta$beta[[gene]])
plot(initial$betahat)

beta_ash = ashr::ash(initial$betahat,initial$sebetahat)
plot(beta_ash$result$PosteriorMean)

plot(-log10(beta_ash$result$lfsr))

yhat = X %*% res$beta + res$mu.cov[1]
plot(y,yhat)
abline(0,1)
cor(y,yhat)