Multivariate Bayesian variable selection regression

Summary of single tissue analysis results

From > 25K RDS files

Currently all results are stored in RDS format per gene per tissue, containing all relevant data in the mr-ash model fit. We need to extract relevant information from them.

Number of eQTLs per gene

In [ ]:
[global]
parameter: cwd = '/home/gaow/Documents/GTExV8/MR_ASH'

[default_1]
tissues = ['Lung', 'Thyroid', 'Whole_Blood']
input: for_each = ['tissues']
output: "${cwd}/${_tissues}.eqtl"
task: workdir = cwd, walltime = '6h', cores = 1, mem = "2G"
R:
    setwd("${cwd}/${_tissues}")
    sample.files = Sys.glob("*.rds")
    eqtls = list()
    for (i in 1:length(sample.files)) {
        lfsr = readRDS(sample.files[i])$mr_ash$fit$lfsr
        eqtls[[sample.files[i]]] = paste0(names(lfsr[lfsr < 0.05]), collapse = ',')
    }
    write.table(do.call(rbind, eqtls), ${_output!r}, col.names=F, quote=F)

[default_2]
input: group_by = 1
output: "${_input}.pdf"
task:
python:
    def barplot(summary, fn):
        import pandas as pd, seaborn as sns, matplotlib.pyplot as plt
        import os
        summary = pd.Series(summary).value_counts()
        total = sum(summary)
        df = pd.DataFrame({'groups': [int(x) for x in summary.index], 'val': [x/total for x in summary]})
        df = df.sort_values(by='groups')
        sns.mpl.rc("figure", figsize=(5,5))
        colors = ['#348ABD', '#7A68A6', '#A60628', '#467821', '#FF0000', '#188487', '#E2A233',
                  '#A9A9A9', '#000000', '#FF00FF', '#FFD700', '#ADFF2F', '#00FFFF', '#10B7B3', '#FF5757']
        ax = sns.barplot(x = 'groups', y = 'val', data = df)
        sns.despine()
        ax.set_ylabel('Percentage')
        ax.set_xlabel('Number of eQTLs for ${_input!bn}')
        vals = ax.get_yticks()
        ax.set_yticklabels(['{:.1f}%'.format(x*100) for x in vals])
        #ax.set_yticklabels([])
        ax.get_figure().savefig(fn, bbox_inches='tight')
        plt.clf()

    data = [x.strip('\n').split() for x in open(${_input!r}).readlines()]
    summary = [x[1].count(',') + 1 for x in data if len(x) > 1] + [0 for x in data if len(x) == 1]
    barplot(summary, ${_output!r})

Summary for Thyroid data


> stem(num.eqtls, width = 0)

  The decimal point is at the |

   0 | +17248
   1 | +5003
   2 | +1840
   3 | +625
   4 | +203
   5 | +90
   6 | +43
   7 | +23
   8 | +10
   9 | +4
  10 | +4
  11 | +3
  12 | +2
  13 | 
  14 | +3

> length(num.eqtls)
[1] 25101

In sum, mr-ash has detected at least one eQTL in 31.28% genes. Of these genes, 63.7% has one eQTL, 23.4% has 2 eQTL and 12.9% has 3 or more.

Distribution of eQTLs per gene

In [3]:
%preview img/Thyroid.eqtl.pdf -n
%preview img/Lung.eqtl.pdf -n
%preview img/Whole_Blood.eqtl.pdf -n
> img/Whole_Blood.eqtl.pdf (15.1 KiB):
> img/Lung.eqtl.pdf (13.7 KiB):
> img/Thyroid.eqtl.pdf (14.8 KiB):
sos run analysis/20171030_Single_Tissue_MR_ASH_Summary.ipynb -r midway2

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago