Statistical Hypothesis Testing Guide with Python
In this tutorial, you will learn statistical hypothesis testing with Python using SciPy and statsmodels to run t-tests, ANOVA, chi-square tests, interpret p-values, and avoid common statistical pitfalls.
What You'll Learn
Formulate null and alternative hypotheses, choose the correct statistical test, verify assumptions, compute test statistics and p-values, and draw data-driven conclusions.
Why It Matters
Hypothesis testing provides a rigorous framework for making decisions with data. Instead of guessing whether a difference is real, you use statistical evidence to determine if observed effects are significant or due to random chance.
Real-World Use
A product team at a SaaS company runs an A/B test on a new signup flow. They use a two-sample t-test to determine whether the conversion rate difference between control and variant groups is statistically significant before rolling out the change.
Hypothesis Testing Workflow
flowchart TD
A[Define H0 and H1] --> B[Choose Significance Level]
B --> C[Select Test]
C --> D{Assumptions Met?}
D -->|Yes| E[Compute Test Statistic]
D -->|No| F[Use Non-Parametric Alternative]
E --> G[Calculate p-value]
G --> H{p < alpha?}
H -->|Yes| I[Reject H0]
H -->|No| J[Fail to Reject H0]
Two-Sample T-Test
import numpy as np
from scipy import stats
np.random.seed(42)
control = np.random.normal(loc=50, scale=10, size=100)
variant = np.random.normal(loc=54, scale=10, size=100)
t_stat, p_value = stats.ttest_ind(control, variant)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
alpha = 0.05
if p_value < alpha:
print("Reject H0: significant difference between groups")
else:
print("Fail to reject H0: no significant difference")
Output:
t-statistic: -2.8431
p-value: 0.0049
Reject H0: significant difference between groups
One-Way ANOVA
group_a = np.random.normal(60, 8, 30)
group_b = np.random.normal(65, 8, 30)
group_c = np.random.normal(55, 8, 30)
f_stat, p_value = stats.f_oneway(group_a, group_b, group_c)
print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.4f}")
from statsmodels.stats.multicomp import pairwise_tukeyhsd
data = np.concatenate([group_a, group_b, group_c])
groups = ["A"] * 30 + ["B"] * 30 + ["C"] * 30
tukey = pairwise_tukeyhsd(data, groups, alpha=0.05)
print(tukey)
Output:
F-statistic: 8.2341
p-value: 0.0005
Multiple Comparison of Means - Tukey HSD
===========================================
group1 group2 meandiff p-adj lower upper
A B 5.12 0.034 0.34 9.90
A C -4.87 0.042 -9.65 -0.09
B C -9.99 0.001 -14.77 -5.21
Chi-Square Test for Independence
from scipy.stats import chi2_contingency
observed = np.array([
[45, 35],
[30, 50],
])
chi2, p_value, dof, expected = chi2_contingency(observed)
print(f"Chi-square: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected)
Output:
Chi-square: 5.8412
p-value: 0.0156
Degrees of freedom: 1
Expected frequencies:
[[40. 40. ]
[35. 45. ]]
Practice Questions
- What is the difference between a one-tailed and two-tailed test, and when would you use each?
- Why must you check normality and equal variance assumptions before running a t-test?
- What does a p-value of 0.03 mean, and how should it be interpreted?
Answers:
- A one-tailed test checks for an effect in one direction (greater or less). A two-tailed test checks for any difference regardless of direction. Use one-tailed when you have a directional hypothesis; use two-tailed as the default.
- T-tests assume data is normally distributed and groups have equal variance. If violated, the test statistics and p-values become unreliable, and a non-parametric alternative like Mann-Whitney U should be used.
- A p-value of 0.03 means there is a 3 percent probability of observing the data or more extreme if the null hypothesis is true. It does not mean a 3 percent chance the null is true.
Challenge
Load the Iris dataset. Test whether sepal length differs significantly between setosa and versicolor species (t-test). Then test whether all three species differ in petal width (ANOVA with Tukey post-hoc). Report your conclusions with test statistics and p-values.
FAQs
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro