from IPython.display import Image
Image(open("forking_paths.gif",'rb').read())


import numpy as np
import pandas as pd
import statsmodels.api as sm
from specification_curve import specification_curve as specy


n_samples = 300
np.random.seed(1342)


x_1 = np.random.random(size=n_samples)
x_2 = np.random.random(size=n_samples)
x_3 = np.random.random(size=n_samples)


x_4 = np.random.randint(2, size=n_samples)
x_5 = np.random.randint(2, size=n_samples)


y = (0.5*x_1 + 0.1*x_2 + 0.5*x_3 + x_4*0.6 + x_4*0.9 + x_5*0.4
     + 3*np.random.randn(n_samples))


df = pd.DataFrame([x_1, x_2, x_3, x_4, x_5, y],
                  ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'y']).T


df['x_4'] = df['x_4'].astype('category')
df['x_5'] = df['x_5'].astype('category')


X = df[['x_1', 'x_2', 'x_3']]
ols_reg1 = sm.OLS(df[['y']], X.astype(float)).fit()
ols_reg1.summary()


X = df[['x_1', 'x_2', 'x_3', 'x_4', 'x_5']]
ols_reg2 = sm.OLS(df[['y']], X.astype(float)).fit()
ols_reg2.summary()


sc = specy.SpecificationCurve(df, 'y', 'x_1', ['x_2', 'x_3', 'x_4', 'x_5'],
                              cat_expand=['x_4', 'x_5'])
sc.fit()
sc.plot()

Fit complete

Dep. Variable:	y	R-squared (uncentered):	0.210
Model:	OLS	Adj. R-squared (uncentered):	0.202
Method:	Least Squares	F-statistic:	26.30
Date:	Wed, 16 Jun 2021	Prob (F-statistic):	4.09e-15
Time:	16:04:38	Log-Likelihood:	-749.34
No. Observations:	300	AIC:	1505.
Df Residuals:	297	BIC:	1516.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x_1	1.3314	0.494	2.694	0.007	0.359	2.304
x_2	1.6684	0.467	3.573	0.000	0.750	2.587
x_3	-0.2838	0.491	-0.578	0.564	-1.250	0.683

Omnibus:	1.342	Durbin-Watson:	2.040
Prob(Omnibus):	0.511	Jarque-Bera (JB):	1.079
Skew:	-0.124	Prob(JB):	0.583
Kurtosis:	3.157	Cond. No.	3.19

Dep. Variable:	y	R-squared (uncentered):	0.262
Model:	OLS	Adj. R-squared (uncentered):	0.249
Method:	Least Squares	F-statistic:	20.92
Date:	Wed, 16 Jun 2021	Prob (F-statistic):	6.83e-18
Time:	16:04:38	Log-Likelihood:	-739.16
No. Observations:	300	AIC:	1488.
Df Residuals:	295	BIC:	1507.
Df Model:	5
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x_1	0.5475	0.511	1.071	0.285	-0.459	1.554
x_2	1.1525	0.473	2.438	0.015	0.222	2.083
x_3	-0.7013	0.489	-1.434	0.153	-1.664	0.261
x_4	1.3581	0.318	4.269	0.000	0.732	1.984
x_5	0.4174	0.321	1.299	0.195	-0.215	1.050

A Quick Simulated Introduction to the specification_curve Library¶

Prepared for 'Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research'¶

21st June 2021¶

Omnibus:	1.054	Durbin-Watson:	1.996
Prob(Omnibus):	0.590	Jarque-Bera (JB):	0.796
Skew:	-0.019	Prob(JB):	0.672
Kurtosis:	3.249	Cond. No.	4.27