Tutorial

Bootstrap Sampling in Python

Practical guide to bootstrap sampling in Python: definition, NumPy examples, reproducible resampling with a seed, bootstrap standard error, and 95% confidence interval computation.

Drake Nguyen

Founder · System Architect

3 min read
Bootstrap Sampling in Python
Bootstrap Sampling in Python

Bootstrap sampling in Python: an overview

This practical guide explains bootstrap sampling in Python and shows how to implement resampling methods to estimate population parameters like the mean. Examples use NumPy and demonstrate reproducible sampling with a seed, how to compute bootstrap standard error, and how to derive a bootstrap confidence interval in Python.

Bootstrap sampling is a nonparametric resampling technique that repeatedly draws samples with replacement from an observed dataset to approximate the sampling distribution of a statistic (for example, the mean).

How bootstrap resampling works

  • Start from a single observed sample (the empirical distribution).
  • Draw many resamples with replacement (each resample is the same size as the observed sample or a chosen size).
  • Compute the statistic of interest (mean, median, etc.) on each resample to build the bootstrap distribution.
  • Use that distribution to estimate standard error, bias, and confidence intervals for the statistic.

Implementing bootstrap sampling in Python

1. Imports and example data

import numpy as np

# Use a seed for reproducibility (important when demonstrating bootstrap sampling)
seed = 42
rng = np.random.default_rng(seed)

# Generate a population-like sample (normal distribution with mean ~300)
x = rng.normal(loc=300.0, scale=10.0, size=1000)
print("Observed mean:", np.mean(x))

2. Simple demonstration: repeated small samples

This short example shows drawing 50 small samples of size 4 (sampling with replacement) and averaging their means. Note: this is a didactic example, not the typical bootstrap practice which usually resamples the full observed sample size.

rng = np.random.default_rng(seed)
sample_means = []
for _ in range(50):
    # sampling with replacement using numpy random choice
    small_sample = rng.choice(x, size=4, replace=True)
    sample_means.append(np.mean(small_sample))

print("Mean of sample means:", np.mean(sample_means))

3. Proper bootstrap resampling to estimate mean and confidence interval

The function below implements the bootstrap method in Python using numpy random sampling. It returns the array of bootstrap means, the estimated standard error, and a percentile confidence interval (by default 95%).

def bootstrap_mean(data, n_bootstrap=1000, sample_size=None, seed=None, ci=95):
    """Return bootstrap means, standard error, and percentile CI for the mean.

    Args:
        data (array-like): observed data to resample from (empirical distribution).
        n_bootstrap (int): number of bootstrap replicates.
        sample_size (int or None): size of each resample; if None uses len(data).
        seed (int or None): integer seed for reproducibility.
        ci (float): central percentile for CI, e.g. 95 for 95% CI.
    """
    data = np.asarray(data)
    if sample_size is None:
        sample_size = len(data)

    rng = np.random.default_rng(seed)
    boot_means = np.empty(n_bootstrap)
    for i in range(n_bootstrap):
        resample = rng.choice(data, size=sample_size, replace=True)
        boot_means[i] = np.mean(resample)

    se = np.std(boot_means, ddof=1)
    lower = np.percentile(boot_means, (100 - ci) / 2)
    upper = np.percentile(boot_means, 100 - (100 - ci) / 2)
    return boot_means, se, (lower, upper)

# Usage example
boot_means, boot_se, boot_ci = bootstrap_mean(x, n_bootstrap=2000, seed=seed)
print("Bootstrap standard error:", boot_se)
print("95% bootstrap CI for mean:", boot_ci)

Notes and best practices

  • Set a seed when demonstrating bootstrap sampling in Python to make results reproducible for readers and tests.
  • Choose a sufficiently large number of bootstrap replicates (commonly 1,000–10,000) to stabilize estimates of standard error and confidence intervals.
  • The nonparametric bootstrap makes minimal assumptions about the population; it is a flexible resampling method used widely in statistics and machine learning to assess estimator variability and reduce overfitting risk.

Conclusion

Bootstrap sampling in Python (the bootstrap method Python users often call "bootstrapping") is an accessible resampling approach to estimate the sampling distribution of statistics, compute bootstrap standard error in Python, and derive confidence intervals. Using numpy random sampling and an explicit seed helps reproduce results and implement bootstrap resampling numpy examples reliably.

Stay updated with Netalith

Get coding resources, product updates, and special offers directly in your inbox.