Multivariate Bayesian variable selection regression

Numerical comparison plots

Figure to summarize numerical comparison results. See this notebook for its input data.

I plan to make 3 types of comparisons: oracle, mismatched and default. In particular I'll put together results from all scenarios (averaged), singleton scenario and shared (with heterogenous effect size), in 3 panels, for 6 quantities:

  • size
  • purity
  • coverage
  • power
  • per condition FDR
  • per condition power
In [1]:
%cd ~/GIT/github/mnm-twas/dsc
/scratch/midway2/gaow/GIT/github/mnm-twas/dsc

Load and organize data

In [2]:
res = readRDS('../data/finemap_output.query_result.rds')
res = res[,c(2,4,5,6,7,8,9,10,11,12,13,14,15)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap', 'false_positive_cross_cond', 'false_negative_cross_cond', 'true_positive_cross_cond')
In [3]:
oracle = "oracle"
mismatch = "mismatch"
default = "default"

Purity

In [4]:
purity = aggregate(purity~pattern + method, res, mean)
purity$scenario = rep(NA, nrow(purity))
purity$scenario[which(purity$method == purity$pattern & purity$method != 'mixture_1')] = oracle
purity$scenario[which(purity$method != purity$pattern & purity$method != 'mixture_1')] = mismatch
purity$scenario[which(purity$method == "mixture_1")] = default
purity = purity[which(!is.na(purity$scenario)),]
purity_median = aggregate(purity~scenario, purity, median)
purity_median
scenariopurity
default 0.9847903
mismatch 0.9360424
oracle 0.9837503
In [5]:
purity_singleton = purity[which(purity$pattern == 'singleton'),]
purity_median_singleton = aggregate(purity~scenario, purity_singleton, median)
purity_median_singleton
scenariopurity
default 0.8568686
mismatch 0.8554835
oracle 0.8699070
In [6]:
purity_het = purity[which(purity$pattern == 'low_het'),]
purity_median_het = aggregate(purity~scenario, purity_het, median)
purity_median_het
scenariopurity
default 0.9851506
mismatch 0.9835865
oracle 0.9854448

Size

In [7]:
size = aggregate(size~pattern + method, res, mean)
size$scenario = rep(NA, nrow(size))
size$scenario[which(size$method == size$pattern & size$method != 'mixture_1')] = oracle
size$scenario[which(size$method != size$pattern & size$method != 'mixture_1')] = mismatch
size$scenario[which(size$method == "mixture_1")] = default
size = size[which(!is.na(size$scenario)),]
size_median = aggregate(size~scenario, size, median)
size_singleton = size[which(size$pattern == 'singleton'),]
size_median_singleton = aggregate(size~scenario, size_singleton, median)
size_het = size[which(size$pattern == 'low_het'),]
size_median_het = aggregate(size~scenario, size_het, median)
In [8]:
size_median
scenariosize
default 9.771
mismatch11.613
oracle 9.782
In [9]:
size_median_singleton
scenariosize
default 17.714
mismatch17.291
oracle 18.366
In [10]:
size_median_het
scenariosize
default 8.679
mismatch8.797
oracle 8.624

Coverage

In [11]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
In [12]:
fdr$scenario = rep(NA, nrow(fdr))
fdr$scenario[which(fdr$method == fdr$pattern & fdr$method != 'mixture_1')] = oracle
fdr$scenario[which(fdr$method != fdr$pattern & fdr$method != 'mixture_1')] = mismatch
fdr$scenario[which(fdr$method == "mixture_1")] = default
fdr = fdr[which(!is.na(fdr$scenario)),]
fdr_mean = aggregate(fdr~scenario, fdr, mean)
fdr_singleton = fdr[which(fdr$pattern == 'singleton'),]
fdr_mean_singleton = aggregate(fdr~scenario, fdr_singleton, mean)
fdr_het = fdr[which(fdr$pattern == 'low_het'),]
fdr_mean_het = aggregate(fdr~scenario, fdr_het, mean)
In [13]:
fdr_mean
scenariofdr
default 0.05729344
mismatch 0.07227381
oracle 0.07222584
In [14]:
fdr_mean_singleton
scenariofdr
default 0.05464481
mismatch 0.08773907
oracle 0.05663717
In [15]:
fdr_mean_het
scenariofdr
default 0.05804111
mismatch 0.06680457
oracle 0.07048984

Power

In [16]:
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
In [17]:
power$scenario = rep(NA, nrow(power))
power$scenario[which(power$method == power$pattern & power$method != 'mixture_1')] = oracle
power$scenario[which(power$method != power$pattern & power$method != 'mixture_1')] = mismatch
power$scenario[which(power$method == "mixture_1")] = default
power = power[which(!is.na(power$scenario)),]
power_mean = aggregate(power~scenario, power, mean)
power_singleton = power[which(power$pattern == 'singleton'),]
power_mean_singleton = aggregate(power~scenario, power_singleton, mean)
power_het = power[which(power$pattern == 'low_het'),]
power_mean_het = aggregate(power~scenario, power_het, mean)
In [18]:
power_mean
scenariopower
default 0.8623861
mismatch 0.8328877
oracle 0.8769862
In [19]:
power_mean_singleton
scenariopower
default 0.6433824
mismatch 0.5791667
oracle 0.6629902
In [20]:
power_mean_het
scenariopower
default 0.9183197
mismatch 0.8991832
oracle 0.9148191

FDR per condition

In [21]:
tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fp = aggregate(false_positive_cross_cond ~ pattern + method, res, sum)
fdr_cond = merge(tp, fp, by = c("pattern", "method"))
fdr_cond$fdr_cond = fdr_cond$false_positive_cross_cond/(fdr_cond$true_positive_cross_cond + fdr_cond$false_positive_cross_cond)
fdr_cond = fdr_cond[order(fdr_cond$method),]
In [22]:
fdr_cond$scenario = rep(NA, nrow(fdr_cond))
fdr_cond$scenario[which(fdr_cond$method == fdr_cond$pattern & fdr_cond$method != 'mixture_1')] = oracle
fdr_cond$scenario[which(fdr_cond$method != fdr_cond$pattern & fdr_cond$method != 'mixture_1')] = mismatch
fdr_cond$scenario[which(fdr_cond$method == "mixture_1")] = default
fdr_cond = fdr_cond[which(!is.na(fdr_cond$scenario)),]
fdr_cond_mean = aggregate(fdr_cond~scenario, fdr_cond, mean)
fdr_cond_singleton = fdr_cond[which(fdr_cond$pattern == 'singleton'),]
fdr_cond_mean_singleton = aggregate(fdr_cond~scenario, fdr_cond_singleton, mean)
fdr_cond_het = fdr_cond[which(fdr_cond$pattern == 'low_het'),]
fdr_cond_mean_het = aggregate(fdr_cond~scenario, fdr_cond_het, mean)
In [23]:
fdr_cond_mean
scenariofdr_cond
default 0.05695558
mismatch 0.11434839
oracle 0.06458136
In [24]:
fdr_cond_mean_singleton
scenariofdr_cond
default 0.05978261
mismatch 0.41482803
oracle 0.05663717
In [25]:
fdr_cond_mean_het
scenariofdr_cond
default 0.05559602
mismatch 0.06129201
oracle 0.06349590

Power per condition

In [26]:
tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fn = aggregate(false_negative_cross_cond ~ pattern + method, res, sum)
power_cond = merge(tp, fn, by = c("pattern", "method"))
power_cond$power_cond = power_cond$true_positive_cross_cond/(power_cond$true_positive_cross_cond + power_cond$false_negative_cross_cond)
In [27]:
power_cond$scenario = rep(NA, nrow(power_cond))
power_cond$scenario[which(power_cond$method == power_cond$pattern & power_cond$method != 'mixture_1')] = oracle
power_cond$scenario[which(power_cond$method != power_cond$pattern & power_cond$method != 'mixture_1')] = mismatch
power_cond$scenario[which(power_cond$method == "mixture_1")] = default
power_cond = power_cond[which(!is.na(power_cond$scenario)),]
power_cond_mean = aggregate(power_cond~scenario, power_cond, mean)
power_cond_singleton = power_cond[which(power_cond$pattern == 'singleton'),]
power_cond_mean_singleton = aggregate(power_cond~scenario, power_cond_singleton, mean)
power_cond_het = power_cond[which(power_cond$pattern == 'low_het'),]
power_cond_mean_het = aggregate(power_cond~scenario, power_cond_het, mean)
In [28]:
power_cond_mean
scenariopower_cond
default 0.9952224
mismatch 0.8367604
oracle 0.9861663
In [29]:
power_cond_mean_singleton
scenariopower_cond
default 1.0000000
mismatch 0.9766568
oracle 1.0000000
In [30]:
power_cond_mean_het
scenariopower_cond
default 0.9959037
mismatch 0.8163650
oracle 0.9890362

Notice the per condition power looks a lot higher than the other analysis, because most powerful tests are for shared effects, which will get counted $R$ times for each signal in per condition analysis, but only are counted one time in overall assessment. Therefore singleton scenarios are relatively more abundant in overall assessment, leading to less powerful tests.

In [31]:
ls()
  1. 'default'
  2. 'fdr'
  3. 'fdr_cond'
  4. 'fdr_cond_het'
  5. 'fdr_cond_mean'
  6. 'fdr_cond_mean_het'
  7. 'fdr_cond_mean_singleton'
  8. 'fdr_cond_singleton'
  9. 'fdr_het'
  10. 'fdr_mean'
  11. 'fdr_mean_het'
  12. 'fdr_mean_singleton'
  13. 'fdr_singleton'
  14. 'fn'
  15. 'fp'
  16. 'mismatch'
  17. 'oracle'
  18. 'overlap'
  19. 'power'
  20. 'power_cond'
  21. 'power_cond_het'
  22. 'power_cond_mean'
  23. 'power_cond_mean_het'
  24. 'power_cond_mean_singleton'
  25. 'power_cond_singleton'
  26. 'power_het'
  27. 'power_mean'
  28. 'power_mean_het'
  29. 'power_mean_singleton'
  30. 'power_singleton'
  31. 'purity'
  32. 'purity_het'
  33. 'purity_median'
  34. 'purity_median_het'
  35. 'purity_median_singleton'
  36. 'purity_singleton'
  37. 'res'
  38. 'size'
  39. 'size_het'
  40. 'size_median'
  41. 'size_median_het'
  42. 'size_median_singleton'
  43. 'size_singleton'
  44. 'total'
  45. 'total_true'
  46. 'total_true_included'
  47. 'tp'
  48. 'valid'
In [32]:
output = list(fdr_cond=fdr_cond_mean, fdr_cond_het=fdr_cond_mean_het, fdr_cond_singleton=fdr_cond_mean_singleton,
             fdr=fdr_mean, fdr_het=fdr_mean_het, fdr_singleton=fdr_mean_singleton, 
             power_cond=power_cond_mean, power_cond_het=power_cond_mean_het, power_cond_singleton=power_cond_mean_singleton,
             power=power_mean, power_het=power_mean_het, power_singleton=power_mean_singleton,
             purity=purity_median, purity_het=purity_median_het, purity_singleton=purity_median_singleton,
             size=size_median, size_het=size_median_het, size_singleton=size_median_singleton)
saveRDS(output, '../data/finemap_output.summarized_result.rds')

Make plot

In [33]:
output = readRDS('../data/finemap_output.summarized_result.rds')
In [34]:
get_table = function(output, key) {
    dat = Reduce(function(x, y) merge(x, y, by = 'scenario'), list(output[[paste0(key,'_het')]], output[[paste0(key,'_singleton')]], output[[key]]))
    dat = reshape2::melt(dat, id.vars = c("scenario"))
    dat$variable = NULL
    dat = cbind(dat, c(rep('shared',3), rep('singleton',3), rep('all',3)))
    dat = cbind(dat, c(1,1,1,2,2,2,3,3,3))
    dat = cbind(dat, c(2,3,1,2,3,1,2,3,1))
    names(dat) = c('method', key, 'scenario', 'o1', 'o2')
    return(dat)
}
In [35]:
colors = RColorBrewer::brewer.pal(n = 8, name = "Set1")
colors
  1. '#E41A1C'
  2. '#377EB8'
  3. '#4DAF4A'
  4. '#984EA3'
  5. '#FF7F00'
  6. '#FFFF33'
  7. '#A65628'
  8. '#F781BF'
In [36]:
library(lattice)
default_panel = function(x, y, ...) {
    panel.points(x, y, col=THE_COLOR, pch=16, cex=1.1)
}
W = 3.2
H2 = 4
H1 = 2.9

Size

In [37]:
size = get_table(output, 'size')
In [38]:
pdf('1.pdf', width=W, height=H1)
THE_COLOR=colors[3]
dotplot(size ~ reorder(method, o2) |  reorder(scenario, o1), data=size,  
        auto.key=list(columns=2), scale=list(alternating=1, x = list(draw=F)), 
        layout = c(3,1), angle = 45, panel=default_panel,
        xlab="", ylab = "Median number of variables")
dev.off()
png: 2
In [39]:
size
methodsizescenarioo1o2
default 8.679 shared 1 2
mismatch 8.797 shared 1 3
oracle 8.624 shared 1 1
default 17.714 singleton2 2
mismatch 17.291 singleton2 3
oracle 18.366 singleton2 1
default 9.771 all 3 2
mismatch 11.613 all 3 3
oracle 9.782 all 3 1
In [40]:
output$size
scenariosize
default 9.771
mismatch11.613
oracle 9.782
In [41]:
output$size_singleton
scenariosize
default 17.714
mismatch17.291
oracle 18.366
In [42]:
output$size_het
scenariosize
default 8.679
mismatch8.797
oracle 8.624
In [43]:
%preview 1.pdf -s png --dpi 100
> 1.pdf (4.6 KiB):

Purity

In [44]:
purity = get_table(output, 'purity')
In [45]:
pdf('2.pdf', width=W, height=H1)
THE_COLOR=colors[2]
dotplot(purity ~ reorder(method, o2) | reorder(scenario, o1), data=purity,  
        auto.key=list(columns=2), scale=list(alternating=1, x = list(draw = F)), 
        layout = c(3,1), angle = 45, panel=default_panel, ylim = c(0.8,1),
        xlab="", ylab = "Minimum absolute correlation")
dev.off()
png: 2
In [46]:
%preview 2.pdf -s png --dpi 100
> 2.pdf (4.6 KiB):

Overall FDR of CS

In [47]:
fdr = get_table(output, 'fdr')
In [48]:
pdf('3.pdf', width=W, height=H1)
THE_COLOR = colors[1]
my_panel = function(x, y, subscripts, ...) {
    panel.points(x, y, col=THE_COLOR, pch=16, cex=1.1)
    panel.abline(h=0.95, col = "#800000", lty = 2)
}
fdr$coverage = 1 - fdr$fdr
dotplot(coverage ~ reorder(method, o2) |  reorder(scenario, o1), data=fdr,  
        auto.key=list(columns=2), scales=list(alternating=1, x = list(draw = F)), 
        layout = c(3,1), angle = 45, panel = my_panel,
        xlab="", ylab = "Cross-condition coverage", ylim = c(0.4,1))
dev.off()
png: 2
In [49]:
%preview 3.pdf -s png --dpi 100
> 3.pdf (4.6 KiB):

Overall power of CS

In [50]:
power = get_table(output, 'power')
pdf('4.pdf', width=W, height=H2)
THE_COLOR = colors[4]
dotplot(power ~ reorder(method, o2) |  reorder(scenario, o1), data=power,  
        auto.key=list(columns=2), scales=list(alternating=1, x = list(rot = 45)),
        layout = c(3,1), angle = 45, panel = default_panel,
        xlab="Method", ylab = "Cross-condition power", ylim = c(0.4,1.05))
dev.off()
png: 2
In [51]:
%preview 4.pdf -s png --dpi 100
> 4.pdf (4.7 KiB):

Condition specific FDR of CS using lfsr filters

In [52]:
fdr_cond = get_table(output, 'fdr_cond')
pdf('5.pdf', width=W, height=H2)
THE_COLOR = colors[5]
my_panel = function(x, y, subscripts, ...) {
    panel.points(x, y, col=THE_COLOR, pch=16, cex=1.1)
    panel.abline(h=0.05, col = "#800000", lty = 2)
}

dotplot(fdr_cond ~ reorder(method, o2) |  reorder(scenario, o1), data=fdr_cond,  
        auto.key=list(columns=2), scales=list(alternating=1, x = list(rot = 45)),
        layout = c(3,1), angle = 45, panel = my_panel, ylim = c(0,0.6),
        xlab="Method", ylab = "Condition specific FDR (for lfsr<0.05)")
dev.off()
png: 2
In [53]:
%preview 5.pdf -s png --dpi 100
> 5.pdf (4.8 KiB):

Condition specific power for CS using lfsr filters

In [54]:
power_cond = get_table(output, 'power_cond')
pdf('6.pdf', width=W, height=H2)
THE_COLOR = colors[8]
dotplot(power_cond ~ reorder(method, o2) |  reorder(scenario, o1), data=power_cond,  
        auto.key=list(columns=2), scales=list(alternating=1, x = list(rot = 45)),
        layout = c(3,1), angle = 45, panel = default_panel, ylim = c(0.75,1.05),
        xlab="Method", ylab = "Condition specific power (for lfsr<0.05)")
dev.off()
png: 2
In [55]:
%preview 6.pdf -s png --dpi 100
> 6.pdf (4.8 KiB):
In [56]:
mkdir -p finemap_output
convert -density 500 -quality 100 \
        \( 1.pdf 2.pdf 3.pdf -density 500 -quality 100 +append \) \
        \( 4.pdf 5.pdf 6.pdf +append \) \
        -append finemap_output/finemap_output_comparisons.png # add parameter -flatten to make it non-transparent
convert: /scratch/midway2/gaow/miniconda3/lib/libuuid.so.1: no version information available (required by /lib64/libSM.so.6)
In [57]:
%preview finemap_output/finemap_output_comparisons.png --width 50%
> finemap_output/finemap_output_comparisons.png (188.4 KiB):

Copyright © 2016-2020 Gao Wang et al at Stephens Lab, University of Chicago