Multivariate Bayesian variable selection regression

Format MASH weights for M&M

Previously I've analyzed GTEx V8 data with MASH. Here I'll format it for use with V7 data that we have extracted genotypes for.

In [1]:
%cd ~/Documents/GTExV8/Toys
/project/mstephens/SuSiE/gtex_data/Toys

Here is our dataset:

In [27]:
dat = readRDS('Multi_Tissues.ENSG00000025772.RDS')
In [28]:
head(rownames(dat$X))
  1. 'GTEX-1117F'
  2. 'GTEX-111CU'
  3. 'GTEX-111FC'
  4. 'GTEX-111VG'
  5. 'GTEX-111YS'
  6. 'GTEX-1122O'
In [29]:
tail(rownames(dat$X))
  1. 'GTEX-ZYVF'
  2. 'GTEX-ZYW4'
  3. 'GTEX-ZYY3'
  4. 'GTEX-ZZ64'
  5. 'GTEX-ZZPT'
  6. 'GTEX-ZZPU'
In [30]:
head(rownames(dat$y_res))
  1. 'GTEX-1117F'
  2. 'GTEX-111CU'
  3. 'GTEX-111FC'
  4. 'GTEX-111VG'
  5. 'GTEX-111YS'
  6. 'GTEX-1122O'
In [31]:
tail(rownames(dat$y_res))
  1. 'GTEX-ZYVF'
  2. 'GTEX-ZYW4'
  3. 'GTEX-ZYY3'
  4. 'GTEX-ZZ64'
  5. 'GTEX-ZZPT'
  6. 'GTEX-ZZPU'
In [32]:
colnames(dat$y_res)
  1. 'Adipose_Subcutaneous'
  2. 'Adipose_Visceral_Omentum'
  3. 'Adrenal_Gland'
  4. 'Artery_Aorta'
  5. 'Artery_Coronary'
  6. 'Artery_Tibial'
  7. 'Brain_Amygdala'
  8. 'Brain_Anterior_cingulate_cortex_BA24'
  9. 'Brain_Caudate_basal_ganglia'
  10. 'Brain_Cerebellar_Hemisphere'
  11. 'Brain_Cerebellum'
  12. 'Brain_Cortex'
  13. 'Brain_Frontal_Cortex_BA9'
  14. 'Brain_Hippocampus'
  15. 'Brain_Hypothalamus'
  16. 'Brain_Nucleus_accumbens_basal_ganglia'
  17. 'Brain_Putamen_basal_ganglia'
  18. 'Brain_Spinal_cord_cervical_c-1'
  19. 'Brain_Substantia_nigra'
  20. 'Breast_Mammary_Tissue'
  21. 'Cells_Cultured_fibroblasts'
  22. 'Cells_EBV-transformed_lymphocytes'
  23. 'Colon_Sigmoid'
  24. 'Colon_Transverse'
  25. 'Esophagus_Gastroesophageal_Junction'
  26. 'Esophagus_Mucosa'
  27. 'Esophagus_Muscularis'
  28. 'Heart_Atrial_Appendage'
  29. 'Heart_Left_Ventricle'
  30. 'Kidney_Cortex'
  31. 'Liver'
  32. 'Lung'
  33. 'Minor_Salivary_Gland'
  34. 'Muscle_Skeletal'
  35. 'Nerve_Tibial'
  36. 'Ovary'
  37. 'Pancreas'
  38. 'Pituitary'
  39. 'Prostate'
  40. 'Skin_Not_Sun_Exposed_Suprapubic'
  41. 'Skin_Sun_Exposed_Lower_leg'
  42. 'Small_Intestine_Terminal_Ileum'
  43. 'Spleen'
  44. 'Stomach'
  45. 'Testis'
  46. 'Thyroid'
  47. 'Uterus'
  48. 'Vagina'
  49. 'Whole_Blood'

Here is the model we've previously learned via MASH on V8 data,

In [34]:
mash = readRDS('/project/compbio/GTEx_eQTL/mashr_flashr_workflow_output/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds')
In [42]:
dim(mash$fitted_g$Ulist[[1]])
  1. 49
  2. 49

It seems to have 49 tissues. Need to find out if they are the same 49 tissues here,

In [44]:
sumstats = readRDS("/home/gaow/GIT/github/mvarbvs/fastqtl_to_mash_output/FastQTLSumStats.mash.rds")
In [49]:
all(colnames(sumstats$strong.z) == colnames(dat$y_res))
TRUE

Great, this is a perfect match! I can safely use the precomputed weight for this analysis. No formatting needed.


Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago