MASH results on V8 data¶

Basically same parameter setting as MASH paper.

Here I explore results from a later run on V8 data using the refactored mashr, and compare results with analysis in mash paper on V6 data. The data was analyzed by procedures documented here, with parameters used in MASH paper.

Note: this workflow includes $X^TX$ in the prior, thus is slower than without.

# To reproduce:
!sos run analysis/20171002_MASH_V8.ipynb mash

library(lattice)
library(ggplot2)
library(colorRamps)
library(mashr)
library(repr)

thresh_inconsistent=function(effectsize,thresh,sigs){
  z= sapply(seq(1:nrow(effectsize)),function(x){
    l=sigs[x,];p=effectsize[x,];plow=p[which(l<thresh)];##grab only those posterior means that are 'significant'
    if(length(plow)==0){return("FALSE")}##for ones who show no significants, they can't be heterogenous
    else{pos=sum(plow>0);neg=sum(plow<0);pos*neg!=0}
  })
  return(sum(z==TRUE))}

het.norm = function(effectsize) {
  t(apply(effectsize,1,function(x){
    x/x[which.max(abs(x))]
  }))
}

sign.norm = function(effectsize) {
  t(apply(effectsize,1,function(x){
    x/sign(x[which.max(abs(x))])
  }))}

sign.tissue.func = function(normdat){
  apply(normdat,1,function(x){
    sum(x<0)})}

het.func = function (normdat, threshold) {
    apply((normdat),1,function(x){sum(x > threshold)})
}

hlindex=function(normdat,sigdat,thresh1,thresh2){
    hl=NULL
    for(j in 1:nrow(normdat)){
        hl[j]=sum(normdat[j,]>thresh1&sigdat[j,]<thresh2)}
    return(hl)}

het.index.two=function(data,thresh){
    t(apply(data,1,function(x){
        maxx=x[which.max(abs(x))]
        if(maxx==0){h=0}
        else{
            normx=x/x[which.max(abs(x))]
            
            h=sum(normx>thresh)}
        
        return(h)
        
    }))}

# Compute RMSE.
rmse <- function(truth, estimate)
  sqrt(mean((truth - estimate)^2))

Loading required package: ashr

res = readRDS('~/Documents/GTExV8/MASH/GTExV8.ciseQTL.4MASH.xtx.K5.P3.V1.mash_model.rds')
res$result = readRDS('~/Documents/GTExV8/MASH/GTExV8.ciseQTL.4MASH.xtx.K5.P3.V1.mash_posterior.rds')

MASH model fit¶

The log-likelihood of fit is:

get_loglik(res)

This loglik is smaller than that from V6 data by 10 times but it is smaller than without using X'X in priors.

Here is a plot of weights learned.

options(repr.plot.width=12, repr.plot.height=4)
barplot(get_estimated_pi(res), las = 2, cex.names = 0.7)

Here, matrix X'X accounts for over 30% of all weights in the GTEx data.