M&M ASH benchmark VI¶

This is a continuation of Part V where I set total PVE is set to 0.1 and assume 1 or 2 causal variables per region. I added in evaluation of lfsr per condition.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

The "oracle" prior is mostly better than using other priors, for all scenarios.
Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

Conclusion¶

The expected observations above are both true, with some interesting exceptions
- "oracle" mixture prior is not better than using mixture prior on some other scenarios -- overfitting of mixture prior?
- Singleton oracle is bad
Power table: model mis-specification will result in overlaps, but there is no overlapping issue in mixture model
Overlaps of singleton results are prevalent as expected
mixture prior has great FDR control on CS
mixture prior has good lfsr control on effect estimates

The benchmark was executd on UChicago midway

./finemap.dsc --host mnm_R5.yml --R 5 -c 12

This executes the default pipeline in finemap.dsc file, as of today (2019.02.04).

%cd ~/GIT/github/mnm-twas/dsc

/home/gaow/Documents/GIT/github/mnm-twas/dsc

start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap susie_scores.false_pos_cond_discoveries susie_scores.false_neg_cond_discoveries susie_scores.true_cond_discoveries", omit.file.columns = T, verbose = F)
end_time <- Sys.time()

end_time - start_time

Time difference of 13.34753 mins

head(out)

dim(out)

saveRDS(out, '../data/finemap_output.query_result.rds')

res = out[,c(2,4,5,6,7,8,9,10,11,12,13,14,15)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap', 'false_positive_cross_cond', 'false_negative_cross_cond', 'true_positive_cross_cond')

Purity of CS¶

purity = aggregate(purity~pattern + method, res, mean)
purity

aggregate(purity~method, purity, mean)

Size of CS¶

size = aggregate(size~pattern+method, res, median)
size

aggregate(size~method, size, mean)

Power of CS¶

total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power = power[order(power$method),]
power

aggregate(power~method, power, mean)

FDR of CS¶

valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr = fdr[order(fdr$method),]
fdr

aggregate(fdr~method, fdr, mean)

Power for per signal per condition estimates¶

We compute lfsr on per signal per condition basis. We call it a signal in the condition if lfsr is smaller than 0.05.

tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fn = aggregate(false_negative_cross_cond ~ pattern + method, res, sum)
power = merge(tp, fn, by = c("pattern", "method"))

power$power = power$true_positive_cross_cond/(power$true_positive_cross_cond + power$false_negative_cross_cond)
power = power[order(power$method),]
power

aggregate(power~method, power, mean)

FDR for per signal per condition estimates¶

tp = aggregate(true_positive_cross_cond ~ pattern + method, res, sum)
fp = aggregate(false_positive_cross_cond ~ pattern + method, res, sum)
fdr = merge(tp, fp, by = c("pattern", "method"))
fdr$fdr = fdr$false_positive_cross_cond/(fdr$true_positive_cross_cond + fdr$false_positive_cross_cond)
fdr = fdr[order(fdr$method),]
fdr

Performance of effect size estimates¶

Total number of true discoveries over total number of signals to detect??

DSC	sharing_pattern	mnm	mnm.eff_mode	susie_scores.total	susie_scores.valid	susie_scores.size	susie_scores.purity	susie_scores.top	susie_scores.n_causal	susie_scores.included_causal	susie_scores.false_pos_cond_discoveries	susie_scores.false_neg_cond_discoveries	susie_scores.true_cond_discoveries
1	identity	mnm_identity	identity	2	1	16	0.9314858	0	1	1	3	2	5
1	identity	mnm_identity	identity	1	1	1	1.0000000	1	1	1	0	0	5
1	identity	mnm_identity	identity	2	2	5	0.9823716	0	2	2	0	0	10
1	identity	mnm_identity	identity	2	2	12	0.9753366	2	2	2	0	0	10
1	identity	mnm_identity	identity	3	3	9	0.9706318	1	3	3	0	0	15
1	identity	mnm_identity	identity	1	1	4	0.9939019	1	1	1	0	0	5

pattern	method	purity
high_het	high_het	0.9827047
identity	high_het	0.9842457
low_het	high_het	0.9841991
mid_het	high_het	0.9833802
mixture01	high_het	0.9365145
shared	high_het	0.9824573
singleton	high_het	0.8627706
high_het	identity	0.9823101
identity	identity	0.9841695
low_het	identity	0.9835865
mid_het	identity	0.9829505
mixture01	identity	0.9355703
shared	identity	0.9825129
singleton	identity	0.8587500
high_het	low_het	0.9833144
identity	low_het	0.9852966
low_het	low_het	0.9854448
mid_het	low_het	0.9833520
mixture01	low_het	0.9304827
shared	low_het	0.9838405
singleton	low_het	0.8329095
high_het	mid_het	0.9838315
identity	mid_het	0.9838237
low_het	mid_het	0.9848168
mid_het	mid_het	0.9833310
mixture01	mid_het	0.9321916
shared	mid_het	0.9833230
singleton	mid_het	0.8554835
high_het	mixture_1	0.9847903
identity	mixture_1	0.9858474
low_het	mixture_1	0.9851506
mid_het	mixture_1	0.9852112
mixture01	mixture_1	0.9367697
shared	mixture_1	0.9846586
singleton	mixture_1	0.8568686
high_het	shared	0.8838506
identity	shared	0.8531477
low_het	shared	0.9554263
mid_het	shared	0.8993438
mixture01	shared	0.7760270
shared	shared	0.9849261
singleton	shared	0.3716790
high_het	singleton	0.8938148
identity	singleton	0.8905021
low_het	singleton	0.8991093
mid_het	singleton	0.8925364
mixture01	singleton	0.8730581
shared	singleton	0.8994284
singleton	singleton	0.8699070

method	purity
high_het	0.9594674
identity	0.9585500
low_het	0.9549486
mid_het	0.9581144
mixture_1	0.9598995
shared	0.8177715
singleton	0.8883366

pattern	method	size
high_het	high_het	3.00
identity	high_het	3.50
low_het	high_het	3.50
mid_het	high_het	4.00
mixture01	high_het	5.00
shared	high_het	4.00
singleton	high_het	6.00
high_het	identity	3.00
identity	identity	3.50
low_het	identity	3.50
mid_het	identity	4.00
mixture01	identity	5.00
shared	identity	4.00
singleton	identity	6.00
high_het	low_het	3.00
identity	low_het	3.50
low_het	low_het	3.00
mid_het	low_het	4.00
mixture01	low_het	5.00
shared	low_het	4.00
singleton	low_het	6.00
high_het	mid_het	3.00
identity	mid_het	3.50
low_het	mid_het	3.00
mid_het	mid_het	4.00
mixture01	mid_het	5.00
shared	mid_het	4.00
singleton	mid_het	6.00
high_het	mixture_1	3.00
identity	mixture_1	3.25
low_het	mixture_1	3.00
mid_het	mixture_1	3.50
mixture01	mixture_1	5.00
shared	mixture_1	4.00
singleton	mixture_1	6.00
high_het	shared	4.00
identity	shared	5.00
low_het	shared	4.00
mid_het	shared	5.00
mixture01	shared	4.00
shared	shared	4.00
singleton	shared	0.00
high_het	singleton	10.00
identity	singleton	10.00
low_het	singleton	10.00
mid_het	singleton	10.50
mixture01	singleton	10.00
shared	singleton	11.00
singleton	singleton	6.50

method	size
high_het	4.142857
identity	4.142857
low_het	4.071429
mid_het	4.071429
mixture_1	3.964286
shared	3.714286
singleton	9.714286

	pattern	method	total_true_included	total_true	overlap	power
1	high_het	high_het	792	874	0.084	0.9061785
8	identity	high_het	793	856	0.024	0.9264019
15	low_het	high_het	786	857	0.078	0.9171529
22	mid_het	high_het	809	881	0.000	0.9182747
29	mixture01	high_het	706	849	0.212	0.8315665
36	shared	high_het	789	854	0.256	0.9238876
43	singleton	high_het	548	816	0.000	0.6715686
2	high_het	identity	791	874	0.138	0.9050343
9	identity	identity	794	856	0.024	0.9275701
16	low_het	identity	789	857	0.116	0.9206534
23	mid_het	identity	809	881	0.000	0.9182747
30	mixture01	identity	704	849	0.206	0.8292108
37	shared	identity	791	854	0.292	0.9262295
44	singleton	identity	546	816	0.000	0.6691176
3	high_het	low_het	796	874	0.054	0.9107551
10	identity	low_het	794	856	0.390	0.9275701
17	low_het	low_het	784	857	0.038	0.9148191
24	mid_het	low_het	818	881	0.384	0.9284904
31	mixture01	low_het	696	849	0.262	0.8197880
38	shared	low_het	791	854	0.256	0.9262295
45	singleton	low_het	517	816	0.000	0.6335784
4	high_het	mid_het	793	874	0.084	0.9073227
11	identity	mid_het	794	856	0.024	0.9275701
18	low_het	mid_het	784	857	0.018	0.9148191
25	mid_het	mid_het	809	881	0.000	0.9182747
32	mixture01	mid_het	705	849	0.250	0.8303887
39	shared	mid_het	789	854	0.256	0.9238876
46	singleton	mid_het	536	816	0.000	0.6568627
5	high_het	mixture_1	783	874	0.000	0.8958810
12	identity	mixture_1	783	856	0.000	0.9147196
19	low_het	mixture_1	787	857	0.068	0.9183197
26	mid_het	mixture_1	806	881	0.000	0.9148695
33	mixture01	mixture_1	696	849	0.036	0.8197880
40	shared	mixture_1	794	854	0.034	0.9297424
47	singleton	mixture_1	525	816	0.000	0.6433824
6	high_het	shared	685	874	3.718	0.7837529
13	identity	shared	665	856	5.768	0.7768692
20	low_het	shared	770	857	2.226	0.8984831
27	mid_het	shared	725	881	3.076	0.8229285
34	mixture01	shared	589	849	5.840	0.6937574
41	shared	shared	796	854	0.270	0.9320843
48	singleton	shared	216	816	0.000	0.2647059
7	high_het	singleton	747	874	126.100	0.8546911
14	identity	singleton	727	856	125.698	0.8492991
21	low_het	singleton	724	857	116.864	0.8448075
28	mid_het	singleton	759	881	144.052	0.8615210
35	mixture01	singleton	663	849	104.002	0.7809187
42	shared	singleton	758	854	128.340	0.8875878
49	singleton	singleton	541	816	0.000	0.6629902

method	power
high_het	0.8707187
identity	0.8708701
low_het	0.8658901
mid_het	0.8684465
mixture_1	0.8623861
shared	0.7389402
singleton	0.8202593

	pattern	method	valid	total	fdr
1	high_het	high_het	787	868	0.09331797
8	identity	high_het	792	859	0.07799767
15	low_het	high_het	782	843	0.07236062
22	mid_het	high_het	800	875	0.08571429
29	mixture01	high_het	698	759	0.08036891
36	shared	high_het	788	856	0.07943925
43	singleton	high_het	540	603	0.10447761
2	high_het	identity	787	868	0.09331797
9	identity	identity	793	859	0.07683353
16	low_het	identity	786	844	0.06872038
23	mid_het	identity	800	875	0.08571429
30	mixture01	identity	696	758	0.08179420
37	shared	identity	791	858	0.07808858
44	singleton	identity	539	602	0.10465116
3	high_het	low_het	791	856	0.07593458
10	identity	low_het	794	849	0.06478210
17	low_het	low_het	778	837	0.07048984
24	mid_het	low_het	809	861	0.06039489
31	mixture01	low_het	690	742	0.07008086
38	shared	low_het	790	852	0.07276995
45	singleton	low_het	509	555	0.08288288
4	high_het	mid_het	787	863	0.08806489
11	identity	mid_het	793	859	0.07683353
18	low_het	mid_het	779	839	0.07151371
25	mid_het	mid_het	800	872	0.08256881
32	mixture01	mid_het	698	759	0.08036891
39	shared	mid_het	788	855	0.07836257
46	singleton	mid_het	528	588	0.10204082
5	high_het	mixture_1	777	832	0.06610577
12	identity	mixture_1	781	831	0.06016847
19	low_het	mixture_1	779	827	0.05804111
26	mid_het	mixture_1	796	845	0.05798817
33	mixture01	mixture_1	688	728	0.05494505
40	shared	mixture_1	793	834	0.04916067
47	singleton	mixture_1	519	549	0.05464481
6	high_het	shared	727	757	0.03963012
13	identity	shared	719	746	0.03619303
20	low_het	shared	801	849	0.05653710
27	mid_het	shared	758	793	0.04413619
34	mixture01	shared	613	642	0.04517134
41	shared	shared	796	841	0.05350773
48	singleton	shared	214	224	0.04464286
7	high_het	singleton	1890	2020	0.06435644
14	identity	singleton	1946	2087	0.06756109
21	low_het	singleton	1931	2065	0.06489104
28	mid_het	singleton	1998	2147	0.06939916
35	mixture01	singleton	1587	1696	0.06426887
42	shared	singleton	2084	2237	0.06839517
49	singleton	singleton	533	565	0.05663717

method	fdr
high_het	0.08481090
identity	0.08416001
low_het	0.07104787
mid_het	0.08282189
mixture_1	0.05729344
shared	0.04568834
singleton	0.06507271

	pattern	method	true_positive_cross_cond	false_negative_cross_cond	power
1	high_het	high_het	3901	105	0.9737893
8	identity	high_het	3938	74	0.9815553
15	low_het	high_het	3895	64	0.9838343
22	mid_het	high_het	3967	99	0.9756517
29	mixture01	high_het	2890	51	0.9826590
36	shared	high_het	3927	84	0.9790576
43	singleton	high_het	540	18	0.9677419
2	high_het	identity	3908	95	0.9762678
9	identity	identity	3941	79	0.9803483
16	low_het	identity	3906	71	0.9821473
23	mid_het	identity	3968	104	0.9744597
30	mixture01	identity	2886	56	0.9809653
37	shared	identity	3935	99	0.9754586
44	singleton	identity	539	20	0.9642218
3	high_het	low_het	3925	72	0.9819865
10	identity	low_het	3938	50	0.9874624
17	low_het	low_het	3879	43	0.9890362
24	mid_het	low_het	4012	64	0.9842983
31	mixture01	low_het	2888	29	0.9900583
38	shared	low_het	3939	61	0.9847500
45	singleton	low_het	509	10	0.9807322
4	high_het	mid_het	3904	93	0.9767325
11	identity	mid_het	3945	74	0.9815875
18	low_het	mid_het	3877	61	0.9845099
25	mid_het	mid_het	3962	95	0.9765837
32	mixture01	mid_het	2905	48	0.9837453
39	shared	mid_het	3926	77	0.9807644
46	singleton	mid_het	528	16	0.9705882
5	high_het	mixture_1	3860	36	0.9907598
12	identity	mixture_1	3888	20	0.9948823
19	low_het	mixture_1	3890	16	0.9959037
26	mid_het	mixture_1	3959	36	0.9909887
33	mixture01	mixture_1	2850	7	0.9975499
40	shared	mixture_1	3955	14	0.9964727
47	singleton	mixture_1	519	0	1.0000000
6	high_het	shared	3198	457	0.8749658
13	identity	shared	3015	603	0.8333333
20	low_het	shared	3781	228	0.9431280
27	mid_het	shared	3440	369	0.9031242
34	mixture01	shared	2578	248	0.9122435
41	shared	shared	3975	11	0.9972403
48	singleton	shared	214	0	1.0000000
7	high_het	singleton	1878	8094	0.1883273
14	identity	singleton	1939	8355	0.1883622
21	low_het	singleton	1918	8273	0.1882053
28	mid_het	singleton	1990	8596	0.1879841
35	mixture01	singleton	1582	6169	0.2041027
42	shared	singleton	2078	8954	0.1883611
49	singleton	singleton	533	0	1.0000000

method	power
high_het	0.9777556
identity	0.9762670
low_het	0.9854748
mid_het	0.9792159
mixture_1	0.9952224
shared	0.9234336
singleton	0.3064775