M&M benchmark XI¶

This benchmark is an improvments over the previous one, in the following espects.

Use both small $R=5$ and large $R=45$ simulations to compare if merely increasing number of conditions messes it up.
Simulate even simpler: 1 effect and using 1 grid for effect covariance such that the prior is no longer a mixture.
Analyzing it with L = 1, L = 2 and L = 10.
Assess CS overlap at both variable and CS level
Evaluated both EE and EZ model when computing BF for multivariate analysis
Add oracle residual covariance method
Use correlation from FLASH method instead and use actual variance of $Y$ to scale it, for residual variance.
Turn on ELBO computation and add a score to check for convergence; although for now we know that missing data ELBO computation can be problematic. Notice this ELBO evaluation is only relevant when there is no missing data. For cases of non-missing data I still use convergence in PIP.
Increased number of replicates to 500.

Conclusion¶

Under simple situation, that is, small number of effects, small enough L, and no missing data, increasing $R=5$ to $R=45$ does not seem to result in false positives for EZ model.
- And as expected from the simulated senario, increased number of conditions help with the power due to sharing of effects by magnitude.
EE model has high FDR when model is mis-specified (L > 1)!
CS overlapping situation exists for EZ model, and gets worse as $R$ increases, even for L=2 and lower power setting compared to previous simulation.
- Using an oracle covariance does not seem to have helped.
FLASH based covariance hurts the power for EZ model.
EE model sometimes has convergence issue by ELBO although ELBO keeps increasing.

Next steps for this investigation¶

Missing data situation: when ELBO computation is not involved and when residual covariance is simply diag, missing data handling should be really easy -- not sure how to check for that "bug".
What is wrong with EE model FDR problem?

The corresponding DSC code are from c5d75a5 and to be reproduced as follows:

./finemap.dsc --host dsc_mnm.yml -o mnm_20191116

%cd ~/GIT/mvarbvs/dsc_mnm

/project2/mstephens/gaow/mvarbvs/dsc_mnm

out = dscrutils::dscquery('mnm_20191116', targets = c('simulate.n_traits', 'mnm.resid_method', 'mnm.missing_Y', 'mnm.alpha', 'mnm.L', 'susie_scores', 'susie_scores.total', 'susie_scores.valid', 'susie_scores.size', 'susie_scores.purity', 'susie_scores.top', 'susie_scores.n_causal', 'susie_scores.included_causal', 'susie_scores.overlap_var', 'susie_scores.overlap_cs','susie_scores.false_pos_cond_discoveries', 'susie_scores.false_neg_cond_discoveries', 'susie_scores.true_cond_discoveries', 'susie_scores.converged'),
                                          module.output.files = "susie_scores", verbose = F)

head(out)

dim(out)

saveRDS(out, '../data/finemap_output.20191116.rds')

res = out[,-1]
colnames(res) = c('n_traits', 'resid_method', 'missing', 'EZ_model', 'L', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap_var', 'overlap_cs', 'false_positive_cross_cond', 'false_negative_cross_cond', 'true_positive_cross_cond', 'elbo_converged', 'filename')

Purity of CS¶

Purity is higher for $R=45$ simply due to higher power; because in this simulation there is no FDR issue.

purity = aggregate(purity~n_traits + resid_method + missing + EZ_model + L, res, mean)
purity = purity[which(purity$missing==FALSE),-3]
purity = purity[order(purity$n_traits),]
purity

Power of CS¶

Focusing on $L = 2$ to evaluate overlapping CS status. In this case there still exists overlaps between CS, but not as many as with $L=10$. Overlapping status got worse when increased $R$.

total_true_included = aggregate(total_true_included ~ n_traits + resid_method + missing + EZ_model + L, res, sum)
total_true = aggregate(total_true ~  n_traits + resid_method + missing + EZ_model + L, res, sum)
cs_overlap = aggregate(overlap_cs ~  n_traits + resid_method + missing + EZ_model + L, res, sum)
snp_overlap = aggregate(overlap_var ~  n_traits + resid_method + missing + EZ_model + L, res, sum)
power = merge(total_true_included, total_true, by = c( 'n_traits' , 'resid_method' , 'missing' , 'EZ_model', 'L'))
power = merge(power, cs_overlap,  by = c( 'n_traits' , 'resid_method' , 'missing' , 'EZ_model', 'L'))
power = merge(power, snp_overlap,  by = c( 'n_traits' , 'resid_method' , 'missing' , 'EZ_model', 'L'))
power$power = round(power$total_true_included/power$total_true,3)
power$overlap_cs = round(power$overlap_cs, 3)
power$overlap_var = round(power$overlap_var, 3)
power = power[which(power$missing==FALSE),-3]
power = power[order(power$n_traits),]
power = power[order(power$L),]
power = power[order(power$EZ_model),]
#power = power[order(power$missing),]
power

FDR of CS no missing data¶

valid = aggregate(valid ~ n_traits + resid_method + missing + EZ_model + L, res, sum)
total = aggregate(total ~ n_traits + resid_method + missing + EZ_model + L, res, sum)
fdr = merge(valid, total, by = c( 'n_traits' , 'resid_method' , 'missing' , 'EZ_model', 'L'))
fdr$fdr = round((fdr$total - fdr$valid)/fdr$total,3)
fdr = fdr[which(fdr$missing==FALSE),-3]
fdr = fdr[order(fdr$n_traits),]
fdr

FDR of CS with missing data¶

valid = aggregate(valid ~ n_traits + resid_method + missing + EZ_model + L, res, sum)
total = aggregate(total ~ n_traits + resid_method + missing + EZ_model + L, res, sum)
fdr = merge(valid, total, by = c( 'n_traits' , 'resid_method' , 'missing' , 'EZ_model', 'L'))
fdr$fdr = round((fdr$total - fdr$valid)/fdr$total,3)
fdr = fdr[which(fdr$missing==TRUE),-3]
fdr = fdr[order(fdr$n_traits),]
fdr

Convergence¶

Based on ELBO. In principle all runs should converge by ELBO. If it is not converged, then it means ELBO is not non-increasing.

It is only relevant to focus on $L>1$. For without missing data the runs do converge wrt ELBO.

elbo_converged = aggregate(elbo_converged~n_traits + resid_method + missing +  EZ_model + L, res, mean)
#elbo_converged = elbo_converged[which(elbo_converged$missing==FALSE),-3]
elbo_converged = elbo_converged[which(elbo_converged$L!=1),]
elbo_converged = elbo_converged[order(elbo_converged$n_traits),]
elbo_converged

The convergence issue for EE model: they still have increasing ELBO; but the model did not converge after 100 iterations (ELBO still increase!)

	n_traits	resid_method	EZ_model	L	purity
1	5	diag	0	1	0.995564192
3	5	flash	0	1	0.993308992
5	5	oracle	0	1	0.996260685
13	5	diag	1	1	0.898551466
15	5	flash	1	1	0.493463519
17	5	oracle	1	1	0.918378140
25	5	diag	0	2	0.995498060
27	5	flash	0	2	0.993060009
29	5	oracle	0	2	0.995250958
37	5	diag	1	2	0.859395104
39	5	flash	1	2	0.369242831
41	5	oracle	1	2	0.891067109
49	5	diag	0	10	0.991605224
51	5	flash	0	10	0.988157033
53	5	oracle	0	10	0.986834604
61	5	diag	1	10	0.012468447
63	5	flash	1	10	0.001081979
65	5	oracle	1	10	0.024371456
2	45	diag	0	1	0.999944125
4	45	flash	0	1	0.998646255
6	45	oracle	0	1	0.999954269
14	45	diag	1	1	0.996597778
16	45	flash	1	1	0.353285630
18	45	oracle	1	1	0.997202390
26	45	diag	0	2	0.992376542
28	45	flash	0	2	0.991455608
30	45	oracle	0	2	0.991887007
38	45	diag	1	2	0.994617874
40	45	flash	1	2	0.198183655
42	45	oracle	1	2	0.995283875
50	45	diag	0	10	0.956458416
52	45	flash	0	10	0.965783140
54	45	oracle	0	10	0.966548821
62	45	diag	1	10	0.975531115
64	45	flash	1	10	0.000000000
66	45	oracle	1	10	0.978309174

	n_traits	resid_method	EZ_model	L	total_true_included	total_true	overlap_cs	overlap_var	power
37	5	diag	0	1	495	500	0	0	0.990
49	5	flash	0	1	499	500	0	0	0.998
61	5	oracle	0	1	492	500	0	0	0.984
1	45	diag	0	1	498	500	0	0	0.996
13	45	flash	0	1	499	500	0	0	0.998
25	45	oracle	0	1	498	500	0	0	0.996
39	5	diag	0	2	495	500	0	0	0.990
51	5	flash	0	2	499	500	0	0	0.998
63	5	oracle	0	2	492	500	0	0	0.984
3	45	diag	0	2	498	500	0	0	0.996
15	45	flash	0	2	499	500	0	0	0.998
27	45	oracle	0	2	498	500	0	0	0.996
38	5	diag	0	10	495	500	1	1	0.990
50	5	flash	0	10	499	500	1	1	0.998
62	5	oracle	0	10	492	500	1	1	0.984
2	45	diag	0	10	498	500	0	0	0.996
14	45	flash	0	10	499	500	0	0	0.998
26	45	oracle	0	10	497	500	0	0	0.994
40	5	diag	1	1	495	500	0	0	0.990
52	5	flash	1	1	291	500	0	0	0.582
64	5	oracle	1	1	498	500	0	0	0.996
4	45	diag	1	1	500	500	0	0	1.000
16	45	flash	1	1	199	500	0	0	0.398
28	45	oracle	1	1	500	500	0	0	1.000
42	5	diag	1	2	488	500	4	621	0.976
54	5	flash	1	2	221	500	2	92	0.442
66	5	oracle	1	2	496	500	2	349	0.992
6	45	diag	1	2	500	500	3	122	1.000
18	45	flash	1	2	115	500	9	609	0.230
30	45	oracle	1	2	500	500	2	9	1.000
41	5	diag	1	10	9	500	26	8723	0.018
53	5	flash	1	10	1	500	3	2452	0.002
65	5	oracle	1	10	18	500	33	11985	0.036
5	45	diag	1	10	499	500	34	1608	0.998
17	45	flash	1	10	0	500	0	0	0.000
29	45	oracle	1	10	499	500	23	1241	0.998

	n_traits	resid_method	EZ_model	L	valid	total	fdr
37	5	diag	0	1	495	500	0.010
38	5	diag	0	10	495	568	0.129
39	5	diag	0	2	495	545	0.092
40	5	diag	1	1	495	495	0.000
41	5	diag	1	10	23	23	0.000
42	5	diag	1	2	492	492	0.000
49	5	flash	0	1	499	500	0.002
50	5	flash	0	10	499	577	0.135
51	5	flash	0	2	499	543	0.081
52	5	flash	1	1	291	291	0.000
53	5	flash	1	10	3	3	0.000
54	5	flash	1	2	223	223	0.000
61	5	oracle	0	1	492	500	0.016
62	5	oracle	0	10	492	600	0.180
63	5	oracle	0	2	492	548	0.102
64	5	oracle	1	1	498	498	0.000
65	5	oracle	1	10	37	37	0.000
66	5	oracle	1	2	498	498	0.000
1	45	diag	0	1	498	500	0.004
2	45	diag	0	10	498	3075	0.838
3	45	diag	0	2	498	628	0.207
4	45	diag	1	1	500	500	0.000
5	45	diag	1	10	529	529	0.000
6	45	diag	1	2	503	503	0.000
13	45	flash	0	1	499	500	0.002
14	45	flash	0	10	499	3325	0.850
15	45	flash	0	2	499	632	0.210
16	45	flash	1	1	199	200	0.005
17	45	flash	1	10	0	0	NaN
18	45	flash	1	2	124	124	0.000
25	45	oracle	0	1	498	500	0.004
26	45	oracle	0	10	497	3383	0.853
27	45	oracle	0	2	498	642	0.224
28	45	oracle	1	1	500	500	0.000
29	45	oracle	1	10	521	521	0.000
30	45	oracle	1	2	502	502	0.000

	n_traits	resid_method	EZ_model	L	valid	total	fdr
43	5	diag	0	1	467	485	0.037
44	5	diag	0	10	484	748	0.353
45	5	diag	0	2	480	561	0.144
46	5	diag	1	1	179	184	0.027
47	5	diag	1	10	0	16	1.000
48	5	diag	1	2	111	118	0.059
55	5	flash	0	1	467	485	0.037
56	5	flash	0	10	484	748	0.353
57	5	flash	0	2	480	561	0.144
58	5	flash	1	1	170	175	0.029
59	5	flash	1	10	0	16	1.000
60	5	flash	1	2	103	110	0.064
67	5	oracle	0	1	465	485	0.041
68	5	oracle	0	10	480	777	0.382
69	5	oracle	0	2	478	561	0.148
70	5	oracle	1	1	224	229	0.022
71	5	oracle	1	10	1	17	0.941
72	5	oracle	1	2	151	159	0.050
7	45	diag	0	1	403	499	0.192
8	45	diag	0	10	428	2579	0.834
9	45	diag	0	2	416	944	0.559
10	45	diag	1	1	423	489	0.135
11	45	diag	1	10	681	1083	0.371
12	45	diag	1	2	443	561	0.210
19	45	flash	0	1	377	497	0.241
20	45	flash	0	10	411	2592	0.841
21	45	flash	0	2	396	935	0.576
22	45	flash	1	1	274	370	0.259
23	45	flash	1	10	344	812	0.576
24	45	flash	1	2	263	413	0.363
31	45	oracle	0	1	403	499	0.192
32	45	oracle	0	10	428	2659	0.839
33	45	oracle	0	2	416	944	0.559
34	45	oracle	1	1	422	489	0.137
35	45	oracle	1	10	660	1085	0.392
36	45	oracle	1	2	442	562	0.214

DSC	simulate.n_traits	mnm.resid_method	mnm.missing_Y	mnm.L	susie_scores.total	susie_scores.valid	susie_scores.size	susie_scores.purity	susie_scores.top	susie_scores.n_causal	susie_scores.included_causal	susie_scores.true_cond_discoveries	susie_scores.converged	susie_scores.output.file
1	5	oracle	TRUE	1	1	1	2	0.9852849	1	1	1	5	TRUE	susie_scores/full_data_1_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1
1	5	oracle	TRUE	1	1	1	1	1.0000000	1	1	1	5	TRUE	susie_scores/full_data_2_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1
1	5	oracle	TRUE	1	1	1	1	1.0000000	1	1	1	5	TRUE	susie_scores/full_data_3_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1
1	5	oracle	TRUE	1	1	1	43	0.9899983	0	1	1	5	TRUE	susie_scores/full_data_4_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1
1	5	oracle	TRUE	1	1	1	23	0.9863327	0	1	1	5	TRUE	susie_scores/full_data_5_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1
1	5	oracle	TRUE	1	1	1	9	0.9216036	1	1	1	5	TRUE	susie_scores/full_data_6_high_het_1_oracle_generator_1_mnm_high_het_1_susie_scores_1

	n_traits	resid_method	missing	EZ_model	L	elbo_converged
25	5	diag	FALSE	0	2	0.994
27	5	flash	FALSE	0	2	1.000
29	5	oracle	FALSE	0	2	0.998
31	5	diag	TRUE	0	2	1.000
33	5	flash	TRUE	0	2	1.000
35	5	oracle	TRUE	0	2	1.000
37	5	diag	FALSE	1	2	1.000
39	5	flash	FALSE	1	2	1.000
41	5	oracle	FALSE	1	2	1.000
43	5	diag	TRUE	1	2	1.000
45	5	flash	TRUE	1	2	1.000
47	5	oracle	TRUE	1	2	1.000
49	5	diag	FALSE	0	10	1.000
51	5	flash	FALSE	0	10	1.000
53	5	oracle	FALSE	0	10	1.000
55	5	diag	TRUE	0	10	1.000
57	5	flash	TRUE	0	10	1.000
59	5	oracle	TRUE	0	10	1.000
61	5	diag	FALSE	1	10	1.000
63	5	flash	FALSE	1	10	1.000
65	5	oracle	FALSE	1	10	1.000
67	5	diag	TRUE	1	10	1.000
69	5	flash	TRUE	1	10	1.000
71	5	oracle	TRUE	1	10	1.000
26	45	diag	FALSE	0	2	0.984
28	45	flash	FALSE	0	2	0.994
30	45	oracle	FALSE	0	2	0.992
32	45	diag	TRUE	0	2	1.000
34	45	flash	TRUE	0	2	1.000
36	45	oracle	TRUE	0	2	1.000
38	45	diag	FALSE	1	2	1.000
40	45	flash	FALSE	1	2	1.000
42	45	oracle	FALSE	1	2	1.000
44	45	diag	TRUE	1	2	1.000
46	45	flash	TRUE	1	2	1.000
48	45	oracle	TRUE	1	2	1.000
50	45	diag	FALSE	0	10	0.988
52	45	flash	FALSE	0	10	0.992
54	45	oracle	FALSE	0	10	0.986
56	45	diag	TRUE	0	10	0.998
58	45	flash	TRUE	0	10	1.000
60	45	oracle	TRUE	0	10	0.998
62	45	diag	FALSE	1	10	1.000
64	45	flash	FALSE	1	10	1.000
66	45	oracle	FALSE	1	10	1.000
68	45	diag	TRUE	1	10	1.000
70	45	flash	TRUE	1	10	0.994
72	45	oracle	TRUE	1	10	0.998