From > 25K RDS files
Currently all results are stored in RDS format per gene per tissue, containing all relevant data in the mr-ash
model fit. We need to extract relevant information from them.
[global]
parameter: cwd = '/home/gaow/Documents/GTExV8/MR_ASH'
[default_1]
tissues = ['Lung', 'Thyroid', 'Whole_Blood']
input: for_each = ['tissues']
output: "${cwd}/${_tissues}.eqtl"
task: workdir = cwd, walltime = '6h', cores = 1, mem = "2G"
R:
setwd("${cwd}/${_tissues}")
sample.files = Sys.glob("*.rds")
eqtls = list()
for (i in 1:length(sample.files)) {
lfsr = readRDS(sample.files[i])$mr_ash$fit$lfsr
eqtls[[sample.files[i]]] = paste0(names(lfsr[lfsr < 0.05]), collapse = ',')
}
write.table(do.call(rbind, eqtls), ${_output!r}, col.names=F, quote=F)
[default_2]
input: group_by = 1
output: "${_input}.pdf"
task:
python:
def barplot(summary, fn):
import pandas as pd, seaborn as sns, matplotlib.pyplot as plt
import os
summary = pd.Series(summary).value_counts()
total = sum(summary)
df = pd.DataFrame({'groups': [int(x) for x in summary.index], 'val': [x/total for x in summary]})
df = df.sort_values(by='groups')
sns.mpl.rc("figure", figsize=(5,5))
colors = ['#348ABD', '#7A68A6', '#A60628', '#467821', '#FF0000', '#188487', '#E2A233',
'#A9A9A9', '#000000', '#FF00FF', '#FFD700', '#ADFF2F', '#00FFFF', '#10B7B3', '#FF5757']
ax = sns.barplot(x = 'groups', y = 'val', data = df)
sns.despine()
ax.set_ylabel('Percentage')
ax.set_xlabel('Number of eQTLs for ${_input!bn}')
vals = ax.get_yticks()
ax.set_yticklabels(['{:.1f}%'.format(x*100) for x in vals])
#ax.set_yticklabels([])
ax.get_figure().savefig(fn, bbox_inches='tight')
plt.clf()
data = [x.strip('\n').split() for x in open(${_input!r}).readlines()]
summary = [x[1].count(',') + 1 for x in data if len(x) > 1] + [0 for x in data if len(x) == 1]
barplot(summary, ${_output!r})
> stem(num.eqtls, width = 0)
The decimal point is at the |
0 | +17248
1 | +5003
2 | +1840
3 | +625
4 | +203
5 | +90
6 | +43
7 | +23
8 | +10
9 | +4
10 | +4
11 | +3
12 | +2
13 |
14 | +3
> length(num.eqtls)
[1] 25101
In sum, mr-ash
has detected at least one eQTL in 31.28% genes. Of these genes, 63.7% has one eQTL, 23.4% has 2 eQTL and 12.9% has 3 or more.
%preview img/Thyroid.eqtl.pdf -n
%preview img/Lung.eqtl.pdf -n
%preview img/Whole_Blood.eqtl.pdf -n
sos run analysis/20171030_Single_Tissue_MR_ASH_Summary.ipynb -r midway2