Here we analyze GTEx based simulation data using mr-ash implemented in varbvs
. The simulated data is stored in the same format as the processed GTEx data; therefore the same procedure can be used directly for data analysis. The only difference is that simulated data do not have covariates.
All files are saved in HDF5 format on midway: /project/compbio/internal_public_supp/GTEx7Toy
.
In the toy data-set I selected 3 genes:
/chr4/ENSG00000145214
/chr18/ENSG00000264247
/chr19/ENSG00000267508
These should be used as geno_table
variable in the code below.
For the toy real data there are 53 groups in the HDF file. To show what they are:
h5ls TY.expr.h5
For the simulated data there is only one group called simulated
, which is the data I'll use in the example below.
To load data:
# source("http://bioconductor.org/biocLite.R")
# biocLite("rhdf5")
library(rhdf5)
source("src/Utils.R")
fpath = '/home/gaow/Documents/GTEx/ToyExample'
genotype_file = paste0(fpath, '/TY.genotype.h5')
expr_file = paste0(fpath, '/TY.expr_simulated.h5')
gene = 'ENSG00000145214'
geno_table = '/chr4/ENSG00000145214'
expr_table = '/simulated'
dat = load_data(genotype_file, expr_file, geno_table, expr_table)
To analyze:
# library(devtools)
# install_github("pcarbo/varbvs",subdir = "varbvs-R")
X = as.matrix(dat$X)
storage.mode(X) <- "double"
y = as.vector(dat$y)
# Univariate analysis
res0 = univariate_lm(X,y)
mixsd = ashr:::autoselect.mixsd(res0, sqrt(2), 0, c(-Inf,Inf), "normal")
res = varbvs::varbvsmix(X, NULL, y, sa = c(0, mixsd^2)) #, sigma = 1, update.sigma = F)
To extract results from analysis:
res$pip = res$alpha %*% c(res$w)
res$beta = rowSums(res$alpha * res$mu)
res$sigma
res$w
meta = paste0(fpath, '/TY.meta_simulation.json')
# install.packages('rjson', repos = 'http://cloud.r-project.org')
meta <- rjson::fromJSON(file = meta)
str(meta)
Compare mixture proportion estimates:
truth = c(meta$pi0, meta$pi * (1 - meta$pi0))
est = res$w
cbind(truth, est)
Compare effect size estimates:
beta = cbind(res$beta, meta$beta[[gene]])
beta = beta[order(beta[,2]),]
beta
plot(res$beta)
plot(meta$beta[[gene]])
plot(initial$betahat)
beta_ash = ashr::ash(initial$betahat,initial$sebetahat)
plot(beta_ash$result$PosteriorMean)
plot(-log10(beta_ash$result$lfsr))
yhat = X %*% res$beta + res$mu.cov[1]
plot(y,yhat)
abline(0,1)
cor(y,yhat)