Basically same parameter setting as MASH paper.
Here I explore results from a later run on V8 data using the refactored mashr
, and compare results with analysis in mash paper on V6 data. The data was analyzed by procedures documented here, with parameters used in MASH paper.
Note: this workflow includes $X^TX$ in the prior, thus is slower than without.
# To reproduce:
!sos run analysis/20171002_MASH_V8.ipynb mash
library(lattice)
library(ggplot2)
library(colorRamps)
library(mashr)
library(repr)
thresh_inconsistent=function(effectsize,thresh,sigs){
z= sapply(seq(1:nrow(effectsize)),function(x){
l=sigs[x,];p=effectsize[x,];plow=p[which(l<thresh)];##grab only those posterior means that are 'significant'
if(length(plow)==0){return("FALSE")}##for ones who show no significants, they can't be heterogenous
else{pos=sum(plow>0);neg=sum(plow<0);pos*neg!=0}
})
return(sum(z==TRUE))}
het.norm = function(effectsize) {
t(apply(effectsize,1,function(x){
x/x[which.max(abs(x))]
}))
}
sign.norm = function(effectsize) {
t(apply(effectsize,1,function(x){
x/sign(x[which.max(abs(x))])
}))}
sign.tissue.func = function(normdat){
apply(normdat,1,function(x){
sum(x<0)})}
het.func = function (normdat, threshold) {
apply((normdat),1,function(x){sum(x > threshold)})
}
hlindex=function(normdat,sigdat,thresh1,thresh2){
hl=NULL
for(j in 1:nrow(normdat)){
hl[j]=sum(normdat[j,]>thresh1&sigdat[j,]<thresh2)}
return(hl)}
het.index.two=function(data,thresh){
t(apply(data,1,function(x){
maxx=x[which.max(abs(x))]
if(maxx==0){h=0}
else{
normx=x/x[which.max(abs(x))]
h=sum(normx>thresh)}
return(h)
}))}
# Compute RMSE.
rmse <- function(truth, estimate)
sqrt(mean((truth - estimate)^2))
res = readRDS('~/Documents/GTExV8/MASH/GTExV8.ciseQTL.4MASH.xtx.K5.P3.V1.mash_model.rds')
res$result = readRDS('~/Documents/GTExV8/MASH/GTExV8.ciseQTL.4MASH.xtx.K5.P3.V1.mash_posterior.rds')
The log-likelihood of fit is:
get_loglik(res)
This loglik is smaller than that from V6 data by 10 times but it is smaller than without using X'X in priors.
Here is a plot of weights learned.
options(repr.plot.width=12, repr.plot.height=4)
barplot(get_estimated_pi(res), las = 2, cex.names = 0.7)
Here, matrix X'X accounts for over 30% of all weights in the GTEx data.