We pick a few hundred SNPs that harbors eQTLs (3 or 4 eQTLs) and analyze with BIMBAM. Here we focus on regions between 171.12Mb to 171.20Mb on chr1, based on what we've previously learned.
dat = readRDS('../data/Thyroid.FMO2.2Mb.RDS')
X = dat$X[,3501:4500]
snps = colnames(X)
cat(paste0(rownames(dat$Z),collapse='\n'), file = '/tmp/Thyroid.samples')
%put snps
%get snps
print(snps[0], snps[-1])
geno = '~/Documents/GTExV8/genotypes/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze_MAF001_GTonly.vcf.gz'
Here I have to exclude indels because BIMBAM cannot handle it. Also I include about 400 SNPs for computational convenience.
bash:
tabix ${geno} chr1:171120000-171200000 --print-header | awk '(length($4)==1 && length($5)==1) || $1 ~ /^#/' > /tmp/Thyroid.FMO2.selected.vcf
bash:
plink --vcf /tmp/Thyroid.FMO2.selected.vcf --recode bimbam-1chr \
--out /tmp/Thyroid.FMO2.bimbam --keep <(awk '{print $1,$1}' /tmp/Thyroid.samples)
y = .lm.fit(dat$Z, dat$y)$residuals
cat(paste0(y,collapse='\n'), file = '/tmp/Thyroid.FMO2.bimbam.recode.pheno.txt')
bash:
bimbam-lin -g /tmp/Thyroid.FMO2.bimbam.recode.geno.txt -p /tmp/Thyroid.FMO2.bimbam.recode.pheno.txt -pos /tmp/Thyroid.FMO2.bimbam.recode.pos.txt \
-o pref4 -l 3
BIMBAM of 365 SNPs and -l 4
(up to 4 SNPs in multi-snp analysis) will use more than 64GB memory and gets killed. Try it with -l 3
.
First, read all BIMBAM multi-snp results, sort by BF factors and match with SNP positions for convenience of interpretation.
bfs = [x.strip().split() for x in open('output/pref4.multi.txt').readlines()]
snps = [x.strip().split() for x in open('/tmp/Thyroid.FMO2.bimbam.recode.pos.txt').readlines()]
bfs = [x for x in bfs if x[0].startswith('+') or x[0].startswith('-')]
bfs_sorted = sorted(bfs, key = lambda x: float(x[0]), reverse = True)
top_models = [[x[0],
snps[int(x[2])-1][0] if x[2] != 'NA' else 'NA',
snps[int(x[3])-1][0] if x[3] != 'NA' else 'NA',
snps[int(x[4])-1][0] if x[4] != 'NA' else 'NA'] \
for x in bfs_sorted]
with open('/home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out', 'w') as f:
f.write('\n'.join(['\t'.join(x) for x in top_models]))
bash:
gzip --best /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz
Then for each of the top 4 SNPs identified by varbvsmix
and varbvs
, I grep
the relevant multi-SNP set in BIMBAM result and see the BF factors.
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvsmix.gz
bash:
zcat /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz | grep -n -P \
'chr1_171172098_C_T_b38|chr1_171199984_T_C_b38|chr1_171122735_A_G_b38|chr1_171133158_A_G_b38' \
| gzip --best > /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvsmix.gz
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvs.gz
bash:
zcat /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz | grep -n -P \
'chr1_171168633_C_A_b38|chr1_171147265_C_A_b38|chr1_171164750_C_A_b38|chr1_171178589_C_T_b38' \
| gzip --best > /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvs.gz
So at the face value, varbvsmix
picked up chr1_171172098_C_T_b38
which is part of the strongest association signal identified by BIMBAM.