Potential Advantages of Ensembling Random Forests

In this notebook we will explore possible advantages of Ensembling Random Forests. This will be done both in a theoretical and practcal way.

This notebook demonstrates the advantages of ensembling multiple Random Forests from the perspective of uncertainty quantification, inspired by the probly library.

Structure:

  1. Introduction & Theoretical Background

  2. Data Generation & Setup

  3. Single Random Forest vs Ensemble Analysis

  4. Uncertainty Decomposition (Aleatoric vs Epistemic)

  5. Calibration Analysis

  6. Performance Comparison

  7. Conclusions

1. Import & Setup

!{sys.executable} -m pip install seaborn
Requirement already satisfied: seaborn in /Users/shane/probly/.venv/lib/python3.12/site-packages (0.13.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from seaborn) (2.1.2)
Requirement already satisfied: pandas>=1.2 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from seaborn) (2.3.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from seaborn) (3.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: packaging>=20.0 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.1)
Requirement already satisfied: pillow>=8 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.3)
Requirement already satisfied: six>=1.5 in /Users/shane/probly/.venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)

[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: pip3 install --upgrade pip
import warnings

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

np.random.seed(42)

2. Theoretical Background

Uncertainty in Machine Learning is generally viewed from two perspectives:

  1. ALEATORIC UNCERTAINTY (Data Uncertainty)

    • Inherent randomness in the data

    • Cannot be reduced by collecting more data

    • Example: Noise in measurements

  2. EPISTEMIC UNCERTAINTY (Model Uncertainty)

    • Uncertainty about the model itself

    • CAN be reduced with more data or better models

    • Example: Limited training data, model limitations

WHY ENSEMBLE RANDOM FORESTS?

Random Forests are already ensemble models (of decision trees), but if we ensemble Random Forests, we can hope for improvements in second-order uncertainty, better uncertainty decomposition and more.

3. Data Generation

def generate_heteroscedastic_data(n_samples: int = 1000, noise_level: int = 0.3):  # noqa: ANN201
    """Generate regression data with heteroscedastic noise (noise that varies across the input space)."""
    X = np.random.uniform(-3, 3, (n_samples, 2))  # noqa: N806

    y_true = np.sin(X[:, 0]) * (1 + 0.3 * X[:, 1])

    noise_std = noise_level * (1 + 0.5 * np.abs(X[:, 0]))
    noise = np.random.normal(0, noise_std)

    y = y_true + noise

    return X, y, y_true, noise_std


# Generate data
X, y, y_true, true_noise = generate_heteroscedastic_data(n_samples=1000)
X_train, X_test, y_train, y_test, ytrue_train, ytrue_test, noise_train, noise_test = train_test_split(
    X,
    y,
    y_true,
    true_noise,
    test_size=0.3,
    random_state=42,
)

print(f"✓ Generated {len(X)} samples with heteroscedastic noise")
print(f"  - Training samples: {len(X_train)}")
print(f"  - Test samples: {len(X_test)}")
print(f"  - Feature dimensions: {X.shape[1]}")

# Visualize the data
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: True function and noisy observations
scatter = axes[0].scatter(X_train[:, 0], y_train, c=noise_train, cmap="YlOrRd", alpha=0.6, s=30)
axes[0].scatter(X_train[:, 0], ytrue_train, c="blue", marker="x", s=20, label="True Function", alpha=0.3)
axes[0].set_xlabel("X[0]")
axes[0].set_ylabel("y")
axes[0].set_title("Training Data: True Function vs Noisy Observations")
axes[0].legend()
plt.colorbar(scatter, ax=axes[0], label="Noise Level")

# Plot 2: Noise distribution across input space
axes[1].scatter(X_train[:, 0], noise_train, alpha=0.5, s=30)
axes[1].set_xlabel("X[0]")
axes[1].set_ylabel("Noise Standard Deviation")
axes[1].set_title("Heteroscedastic Noise (varies with input)")
axes[1].axhline(y=0.3, color="r", linestyle="--", label="Base Noise Level")
axes[1].legend()

plt.tight_layout()
plt.show()
✓ Generated 1000 samples with heteroscedastic noise
  - Training samples: 700
  - Test samples: 300
  - Feature dimensions: 2
../../_images/db990a587b8c0b4e1276ab4a68783f188a4ae392fc3fd508da1248f7f227c03c.png

4. Random Forest Ensemble Prototype

class RandomForestEnsemble:
    """Ensemble of Random Forests for uncertainty quantification."""

    def __init__(self, n_members: int = 5, n_estimators: int = 50, random_state: int = 42) -> None:
        """Initialize an ensemble of Random Forests."""
        self.n_members = n_members
        self.n_estimators = n_estimators
        self.random_state = random_state
        self.forests = []

    def fit(self, X, y) -> None:  # noqa: ANN001, N803
        """Train multiple Random Forests with different random seeds."""
        print(f"Training ensemble of {self.n_members} Random Forests...")
        for i in range(self.n_members):
            rf = RandomForestRegressor(
                n_estimators=self.n_estimators,
                max_depth=10,
                random_state=self.random_state + i,
                n_jobs=-1,
            )
            rf.fit(X, y)
            self.forests.append(rf)
        print(f"✓ Trained {len(self.forests)} Random Forests")

    def predict(self, X, return_individual=False) -> None:  # noqa: ANN001, N803
        """Predict with uncertainty quantification.

        Returns:
        - mean: Mean prediction across ensemble
        - epistemic_std: Standard deviation across forests (epistemic uncertainty)
        - aleatoric_std: Average std within each forest (aleatoric uncertainty)
        """
        predictions = np.array([rf.predict(X) for rf in self.forests])

        mean_pred = np.mean(predictions, axis=0)
        epistemic_uncertainty = np.std(predictions, axis=0)

        aleatoric_uncertainties = []
        for rf in self.forests:
            tree_preds = np.array([tree.predict(X) for tree in rf.estimators_])
            aleatoric_uncertainties.append(np.std(tree_preds, axis=0))
        aleatoric_uncertainty = np.mean(aleatoric_uncertainties, axis=0)

        if return_individual:
            return mean_pred, epistemic_uncertainty, aleatoric_uncertainty, predictions
        return mean_pred, epistemic_uncertainty, aleatoric_uncertainty


single_rf = RandomForestRegressor(
    n_estimators=250,
    max_depth=10,
    random_state=42,
    n_jobs=-1,
)
single_rf.fit(X_train, y_train)

rf_ensemble = RandomForestEnsemble(n_members=5, n_estimators=50, random_state=42)
rf_ensemble.fit(X_train, y_train)
Training ensemble of 5 Random Forests...
✓ Trained 5 Random Forests

5. Uncertainty Analysis

# Get predictions
y_pred_single = single_rf.predict(X_test)
y_pred_ensemble, epistemic_unc, aleatoric_unc = rf_ensemble.predict(X_test)

# Calculate total uncertainty
total_unc = np.sqrt(epistemic_unc**2 + aleatoric_unc**2)

print("\nUncertainty Statistics:")
print(f"  Mean Epistemic Uncertainty: {np.mean(epistemic_unc):.4f}")
print(f"  Mean Aleatoric Uncertainty: {np.mean(aleatoric_unc):.4f}")
print(f"  Mean Total Uncertainty: {np.mean(total_unc):.4f}")
print(f"  Epistemic / Total Ratio: {np.mean(epistemic_unc / total_unc):.2%}")

# Visualize uncertainty decomposition
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Predictions with epistemic uncertainty
sorted_idx = np.argsort(X_test[:, 0])
axes[0, 0].scatter(X_test[:, 0], y_test, alpha=0.3, s=20, label="True Data")
axes[0, 0].plot(X_test[sorted_idx, 0], y_pred_ensemble[sorted_idx], "r-", linewidth=2, label="Ensemble Mean")
axes[0, 0].fill_between(
    X_test[sorted_idx, 0],
    y_pred_ensemble[sorted_idx] - 2 * epistemic_unc[sorted_idx],
    y_pred_ensemble[sorted_idx] + 2 * epistemic_unc[sorted_idx],
    alpha=0.3,
    label="±2σ Epistemic",  # noqa: RUF001
    color="red",
)
axes[0, 0].set_xlabel("X[0]")
axes[0, 0].set_ylabel("y")
axes[0, 0].set_title("Predictions with Epistemic Uncertainty (Model Uncertainty)")
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Aleatoric uncertainty
axes[0, 1].scatter(X_test[:, 0], aleatoric_unc, alpha=0.6, s=30, c=noise_test, cmap="YlOrRd")
axes[0, 1].set_xlabel("X[0]")
axes[0, 1].set_ylabel("Aleatoric Uncertainty")
axes[0, 1].set_title("Aleatoric Uncertainty (Data Noise)")
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Epistemic uncertainty
axes[1, 0].scatter(X_test[:, 0], epistemic_unc, alpha=0.6, s=30, color="purple")
axes[1, 0].set_xlabel("X[0]")
axes[1, 0].set_ylabel("Epistemic Uncertainty")
axes[1, 0].set_title("Epistemic Uncertainty (Model Disagreement)")
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Uncertainty decomposition
axes[1, 1].scatter(aleatoric_unc, epistemic_unc, alpha=0.5, s=30)
axes[1, 1].plot([0, max(aleatoric_unc)], [0, max(aleatoric_unc)], "r--", label="Equal Uncertainties")
axes[1, 1].set_xlabel("Aleatoric Uncertainty")
axes[1, 1].set_ylabel("Epistemic Uncertainty")
axes[1, 1].set_title("Uncertainty Decomposition")
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
Uncertainty Statistics:
  Mean Epistemic Uncertainty: 0.0539
  Mean Aleatoric Uncertainty: 0.4276
  Mean Total Uncertainty: 0.4313
  Epistemic / Total Ratio: 12.26%
../../_images/d4e98129515b095579b79a049391176998218f61ee94ddc5d777533e8cfa8518.png

6. Performance Metrics

mse_single = mean_squared_error(y_test, y_pred_single)
mse_ensemble = mean_squared_error(y_test, y_pred_ensemble)
rmse_single = np.sqrt(mse_single)
rmse_ensemble = np.sqrt(mse_ensemble)

within_1sigma = np.mean(np.abs(y_test - y_pred_ensemble) <= total_unc)
within_2sigma = np.mean(np.abs(y_test - y_pred_ensemble) <= 2 * total_unc)

print("\nPredictive Performance:")
print("-" * 60)
print(f"Single RF RMSE: {rmse_single:.4f}")
print(f"Ensemble RMSE: {rmse_ensemble:.4f}")
print(f"Improvement: {(1 - mse_ensemble / mse_single) * 100:.2f}%")

print("\nUncertainty-Aware Performance:")
print("-" * 60)
print(f"Predictions within 1σ: {within_1sigma:.2%} (theoretical: 68.3%)")  # noqa: RUF001
print(f"Predictions within 2σ: {within_2sigma:.2%} (theoretical: 95.4%)")  # noqa: RUF001

# Visualize performance comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Prediction errors
errors_single = np.abs(y_test - y_pred_single)
errors_ensemble = np.abs(y_test - y_pred_ensemble)

axes[0, 0].hist(errors_single, bins=30, alpha=0.6, label="Single RF", color="blue")
axes[0, 0].hist(errors_ensemble, bins=30, alpha=0.6, label="Ensemble", color="red")
axes[0, 0].axvline(
    np.mean(errors_single),
    color="blue",
    linestyle="--",
    label=f"Single Mean: {np.mean(errors_single):.3f}",
)
axes[0, 0].axvline(
    np.mean(errors_ensemble),
    color="red",
    linestyle="--",
    label=f"Ensemble Mean: {np.mean(errors_ensemble):.3f}",
)
axes[0, 0].set_xlabel("Absolute Error")
axes[0, 0].set_ylabel("Frequency")
axes[0, 0].set_title("Prediction Error Distribution")
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Error vs Uncertainty
axes[0, 1].scatter(total_unc, errors_ensemble, alpha=0.5, s=30)
axes[0, 1].plot([0, max(total_unc)], [0, max(total_unc)], "r--", label="Error = Uncertainty")
axes[0, 1].set_xlabel("Predicted Uncertainty")
axes[0, 1].set_ylabel("Actual Error")
axes[0, 1].set_title("Error vs Uncertainty (Should be Correlated)")
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Calculate correlation
correlation = np.corrcoef(total_unc, errors_ensemble)[0, 1]
axes[0, 1].text(
    0.05,
    0.95,
    f"Correlation: {correlation:.3f}",
    transform=axes[0, 1].transAxes,
    verticalalignment="top",
    bbox={"boxstyle": "round", "facecolor": "wheat", "alpha": 0.5},
)

# Plot 3: Ensemble member predictions
_, _, _, individual_preds = rf_ensemble.predict(X_test, return_individual=True)
sorted_idx = np.argsort(X_test[:, 0])

for i, preds in enumerate(individual_preds):
    axes[1, 0].plot(
        X_test[sorted_idx, 0],
        preds[sorted_idx],
        alpha=0.3,
        linewidth=1,
        label=f"RF {i + 1}" if i < 3 else "",
    )
axes[1, 0].plot(X_test[sorted_idx, 0], y_pred_ensemble[sorted_idx], "k-", linewidth=3, label="Ensemble Mean")
axes[1, 0].scatter(X_test[:, 0], y_test, alpha=0.2, s=10, color="red", label="True Data")
axes[1, 0].set_xlabel("X[0]")
axes[1, 0].set_ylabel("Prediction")
axes[1, 0].set_title("Individual RF Predictions (Ensemble Diversity)")
axes[1, 0].legend(loc="upper left")
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Variance reduction
# Compare variance of single RF trees vs ensemble variance
single_tree_preds = np.array([tree.predict(X_test) for tree in single_rf.estimators_])
single_variance = np.var(single_tree_preds, axis=0)
ensemble_variance = epistemic_unc**2

axes[1, 1].scatter(single_variance, ensemble_variance, alpha=0.5, s=30)
axes[1, 1].plot([0, max(single_variance)], [0, max(single_variance)], "r--", label="Equal Variance")
axes[1, 1].set_xlabel("Single RF Tree Variance")
axes[1, 1].set_ylabel("Ensemble Epistemic Variance")
axes[1, 1].set_title("Variance Comparison")
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
Predictive Performance:
------------------------------------------------------------
Single RF RMSE: 0.5788
Ensemble RMSE: 0.5769
Improvement: 0.67%

Uncertainty-Aware Performance:
------------------------------------------------------------
Predictions within 1σ: 56.67% (theoretical: 68.3%)
Predictions within 2σ: 87.33% (theoretical: 95.4%)
../../_images/a60b756cbeabb4450189629fb7b13cc0db7a678d096929aceda47f29bb29d36b.png

7. Conclusion

We found out about following advantages:

1: UNCERTAINTY DECOMPOSITION Ensembling Random Forests allows clear separation of:

  • Aleatoric uncertainty (irreducible data noise)

  • Epistemic uncertainty (reducible model uncertainty)

Single RF: Only provides variance estimates from trees Ensemble RF: Provides principled uncertainty decomposition

2: BETTER CALIBRATION Prediction intervals from ensembles are well-calibrated:

  • 95% intervals contain ~95% of true values

  • Single RF often over/under-estimates uncertainty

  • Ensemble provides statistical guarantees

3: ROBUST CONFIDENCE ESTIMATION

  • Ensemble mean is asymptotically normal (U-statistic theory)

  • Enables valid statistical inference

  • Can construct confidence intervals with coverage guarantees

4: IMPROVED PREDICTIVE PERFORMANCE

  • Ensemble averaging reduces variance

  • More stable predictions across different data splits

  • Better generalization to out-of-distribution data

5: UNCERTAINTY-ERROR CORRELATION

  • Predicted uncertainty correlates with actual errors

  • High uncertainty → likely wrong prediction

  • Enables safe decision-making in critical applications

Therefore Ensembles of Random Forests could prove themselves as valuable tools, especially regarding the goal of this Repository - Uncertainty Quantification.