Multivariate Bayesian variable selection regression

BIMBAM analysis with selected set of FMO2 SNPs for Thyroid

We pick a few hundred SNPs that harbors eQTLs (3 or 4 eQTLs) and analyze with BIMBAM. Here we focus on regions between 171.12Mb to 171.20Mb on chr1, based on what we've previously learned.

Get subset of SNPs

In [4]:
dat = readRDS('../data/Thyroid.FMO2.2Mb.RDS')
X = dat$X[,3501:4500]
In [30]:
snps = colnames(X)
cat(paste0(rownames(dat$Z),collapse='\n'), file = '/tmp/Thyroid.samples')
In [7]:
%put snps
In [8]:
%get snps
In [11]:
print(snps[0], snps[-1])
chr1_171097715_C_T_b38 chr1_171323497_T_C_b38
In [12]:
geno = '~/Documents/GTExV8/genotypes/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze_MAF001_GTonly.vcf.gz'

Here I have to exclude indels because BIMBAM cannot handle it. Also I include about 400 SNPs for computational convenience.

Convert to BIMBAM genotype

In [33]:
bash:
    tabix ${geno} chr1:171120000-171200000 --print-header | awk '(length($4)==1 && length($5)==1) || $1 ~ /^#/' > /tmp/Thyroid.FMO2.selected.vcf
In [34]:
bash:
    plink --vcf /tmp/Thyroid.FMO2.selected.vcf --recode bimbam-1chr \
        --out /tmp/Thyroid.FMO2.bimbam --keep <(awk '{print $1,$1}' /tmp/Thyroid.samples)
PLINK v1.90b4.1 64-bit (30 Mar 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /tmp/Thyroid.FMO2.bimbam.log.
Options in effect:
  --keep /dev/fd/63
  --out /tmp/Thyroid.FMO2.bimbam
  --recode bimbam-1chr
  --vcf /tmp/Thyroid.FMO2.selected.vcf

64307 MB RAM detected; reserving 32153 MB for main workspace.
--vcf: /tmp/Thyroid.FMO2.bimbam-temporary.bed +
/tmp/Thyroid.FMO2.bimbam-temporary.bim + /tmp/Thyroid.FMO2.bimbam-temporary.fam
written.
365 variants loaded from .bim file.
838 people (0 males, 0 females, 838 ambiguous) loaded from .fam.
Ambiguous sex IDs written to /tmp/Thyroid.FMO2.bimbam.nosex .
--keep: 574 people remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 574 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate in remaining samples is 0.997432.
365 variants and 574 people pass filters and QC.
Note: No phenotypes present.
--recode bimbam-1chr to /tmp/Thyroid.FMO2.bimbam.recode.geno.txt +
/tmp/Thyroid.FMO2.bimbam.recode.pheno.txt +
/tmp/Thyroid.FMO2.bimbam.recode.pos.txt ... 101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899done.

Prepare BIMBAM phenotype

In [35]:
y = .lm.fit(dat$Z, dat$y)$residuals
cat(paste0(y,collapse='\n'), file = '/tmp/Thyroid.FMO2.bimbam.recode.pheno.txt')

Run BIMBAM

In [37]:
bash:
    bimbam-lin -g /tmp/Thyroid.FMO2.bimbam.recode.geno.txt -p /tmp/Thyroid.FMO2.bimbam.recode.pheno.txt -pos /tmp/Thyroid.FMO2.bimbam.recode.pos.txt \
                -o pref4 -l 3
-bimbam: position files contain 365 records.
-bimbam: file 0 has 574 individual and 365 snps
-bimbam: read file 0 again 
-bimbam: number of phenotypes = 1
-bimbam: single snp analysis of typed snps:  [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100%
-bimbam: multi-snp analysis of exact genotypes:  [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100%
-bimbam: finished, for details see log: pref4.log.txt
RuntimeError: Failed to execute commmand ``/bin/bash /home/gaow/GIT/github/gtex-eqtls/analysis/.sos/scratch_0_0_7c56f7b2.sh`` (ret=1, workdir=/home/gaow/GIT/github/gtex-eqtls/analysis)

BIMBAM of 365 SNPs and -l 4 (up to 4 SNPs in multi-snp analysis) will use more than 64GB memory and gets killed. Try it with -l 3.

Summarize BIMBAM results

First, read all BIMBAM multi-snp results, sort by BF factors and match with SNP positions for convenience of interpretation.

In [38]:
bfs = [x.strip().split() for x in open('output/pref4.multi.txt').readlines()]
In [41]:
snps = [x.strip().split() for x in open('/tmp/Thyroid.FMO2.bimbam.recode.pos.txt').readlines()]
In [42]:
bfs = [x for x in bfs if x[0].startswith('+') or x[0].startswith('-')]
In [48]:
bfs_sorted = sorted(bfs, key = lambda x: float(x[0]), reverse = True)
In [58]:
top_models = [[x[0], 
               snps[int(x[2])-1][0] if x[2] != 'NA' else 'NA', 
               snps[int(x[3])-1][0] if x[3] != 'NA' else 'NA', 
               snps[int(x[4])-1][0] if x[4] != 'NA' else 'NA'] \
              for x in bfs_sorted]
In [60]:
with open('/home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out', 'w') as f:
    f.write('\n'.join(['\t'.join(x) for x in top_models]))
In [61]:
bash:
     gzip --best /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out
In [69]:
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz
> /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz (44.0 MiB):
+25.825	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171190872_G_A_b38
+25.691	chr1_171140916_T_C_b38	chr1_171164750_C_A_b38	chr1_171172098_C_T_b38
+25.691	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171178705_A_G_b38
+25.609	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171174538_G_C_b38
+25.538	chr1_171122735_A_G_b38	chr1_171172098_C_T_b38	chr1_171190872_G_A_b38

Then for each of the top 4 SNPs identified by varbvsmix and varbvs, I grep the relevant multi-SNP set in BIMBAM result and see the BF factors.

For varbvsmix

In [70]:
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvsmix.gz
bash:
    zcat /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz | grep -n -P \
    'chr1_171172098_C_T_b38|chr1_171199984_T_C_b38|chr1_171122735_A_G_b38|chr1_171133158_A_G_b38' \
    | gzip --best > /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvsmix.gz
> /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvsmix.gz (2.2 MiB):
1:+25.825	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171190872_G_A_b38
2:+25.691	chr1_171140916_T_C_b38	chr1_171164750_C_A_b38	chr1_171172098_C_T_b38
3:+25.691	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171178705_A_G_b38
4:+25.609	chr1_171140916_T_C_b38	chr1_171172098_C_T_b38	chr1_171174538_G_C_b38
5:+25.538	chr1_171122735_A_G_b38	chr1_171172098_C_T_b38	chr1_171190872_G_A_b38

For varbvs

In [71]:
%preview /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvs.gz
bash:
    zcat /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.out.gz | grep -n -P \
    'chr1_171168633_C_A_b38|chr1_171147265_C_A_b38|chr1_171164750_C_A_b38|chr1_171178589_C_T_b38' \
    | gzip --best > /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvs.gz
> /home/gaow/Documents/GTExV8/Toys/FMO2/Thyroid.FMO2.bimbam.bvs.gz (2.2 MiB):
2:+25.691	chr1_171140916_T_C_b38	chr1_171164750_C_A_b38	chr1_171172098_C_T_b38
13:+25.497	chr1_171140916_T_C_b38	chr1_171168633_C_A_b38	chr1_171190872_G_A_b38
19:+25.430	chr1_171122735_A_G_b38	chr1_171164750_C_A_b38	chr1_171172098_C_T_b38
23:+25.363	chr1_171140916_T_C_b38	chr1_171164750_C_A_b38	chr1_171168633_C_A_b38
24:+25.363	chr1_171140916_T_C_b38	chr1_171164750_C_A_b38	chr1_171169344_G_A_b38

So at the face value, varbvsmix picked up chr1_171172098_C_T_b38 which is part of the strongest association signal identified by BIMBAM.


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago