Previously I've analyzed GTEx V8 data with MASH. Here I'll format it for use with V7 data that we have extracted genotypes for.
%cd ~/Documents/GTExV8/Toys
Here is our dataset:
dat = readRDS('Multi_Tissues.ENSG00000025772.RDS')
head(rownames(dat$X))
tail(rownames(dat$X))
head(rownames(dat$y_res))
tail(rownames(dat$y_res))
colnames(dat$y_res)
Here is the model we've previously learned via MASH on V8 data,
mash = readRDS('/project/compbio/GTEx_eQTL/mashr_flashr_workflow_output/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds')
dim(mash$fitted_g$Ulist[[1]])
It seems to have 49 tissues. Need to find out if they are the same 49 tissues here,
sumstats = readRDS("/home/gaow/GIT/github/mvarbvs/fastqtl_to_mash_output/FastQTLSumStats.mash.rds")
all(colnames(sumstats$strong.z) == colnames(dat$y_res))
Great, this is a perfect match! I can safely use the precomputed weight for this analysis. No formatting needed.