This notebook explores scenarios when CS tend not contain causal signals (false positives).
Among the CS we identify, ideally 95% should contain one causal signal. This is indeed the case, shown in this workflow(see the ld_5
step). However from the plot we noticed that there are some outlier cases when our Bayesian CS contain too many false positives. We suspect that these outliers belong to "near null" cases, that is, low PVE and high number of causal.
This is formally explored here.
%cd ~/GIT/github/mvarbvs/dsc
import pickle, os
ld_cutoff = 0.25
capture_cutoff = 0.90
data = pickle.load(open('benchmark/ld_20180516.pkl', 'rb'))[ld_cutoff]
data = [(os.path.basename(x)[:-4] + '.png', y) for x, y in data]
import pandas as pd
data = pd.DataFrame(data, columns = ['output', 'capture_rate'])
result = pd.read_csv('benchmark/purity_20180516/index.csv')
result['output'] = result['output'].apply(lambda x: os.path.basename(x))
merged = pd.merge(result,data, on='output')
merged['avg_pve'] = merged['PVE'] / merged['N_Causal']
pd.options.display.max_rows = 999
# merged.sort_values(by='purity')
merged['capture_rate'] = merged['capture_rate'].apply(lambda x: f'over {capture_cutoff*100:.1f}%' if x > capture_cutoff else f'under {capture_cutoff*100:.1f}%')
import seaborn as sns
sns.set(rc={'figure.figsize':(15,6)}, style = "whitegrid")
ax = sns.factorplot(x="PVE", y="N_Causal",
hue="est_residual", col="capture_rate",
data=merged, kind="swarm",
size=4, aspect=.7)
ax.fig.suptitle(f'LD = {ld_cutoff}',y=1.02)
ax.savefig("benchmark/ld_20180516_outlier.png", dpi=500)