CS outlier scenarios¶

This notebook explores scenarios when CS tend not contain causal signals (false positives).

Among the CS we identify, ideally 95% should contain one causal signal. This is indeed the case, shown in this workflow(see the ld_5 step). However from the plot we noticed that there are some outlier cases when our Bayesian CS contain too many false positives. We suspect that these outliers belong to "near null" cases, that is, low PVE and high number of causal.

This is formally explored here.

What we learned¶

Indeed that outlier case are difficult cases
It seems we should be conservative: that is, do not estimate residual variance

%cd ~/GIT/github/mvarbvs/dsc

import pickle, os
ld_cutoff = 0.25
capture_cutoff = 0.90
data = pickle.load(open('benchmark/ld_20180516.pkl', 'rb'))[ld_cutoff]
data = [(os.path.basename(x)[:-4] + '.png', y) for x, y in data]

import pandas as pd
data = pd.DataFrame(data, columns = ['output', 'capture_rate'])

result = pd.read_csv('benchmark/purity_20180516/index.csv')
result['output'] = result['output'].apply(lambda x: os.path.basename(x))

merged = pd.merge(result,data, on='output')

merged['avg_pve'] = merged['PVE'] / merged['N_Causal']
pd.options.display.max_rows = 999
# merged.sort_values(by='purity')

merged['capture_rate'] = merged['capture_rate'].apply(lambda x: f'over {capture_cutoff*100:.1f}%' if x > capture_cutoff else f'under {capture_cutoff*100:.1f}%')

/home/gaow/GIT/github/mvarbvs/dsc

import seaborn as sns
sns.set(rc={'figure.figsize':(15,6)}, style = "whitegrid")
ax = sns.factorplot(x="PVE", y="N_Causal",
                   hue="est_residual", col="capture_rate",
                    data=merged, kind="swarm",
                    size=4, aspect=.7)
ax.fig.suptitle(f'LD = {ld_cutoff}',y=1.02)
ax.savefig("benchmark/ld_20180516_outlier.png", dpi=500)