Bayesian Transformation

This notebook is a practical introduction to the Bayesian transformation in probly. Bayesian Neural Networks are a more advanced topic than Dropout or DropConnect, so this tutorial aims to provide an intuitive, hands-on understanding.

We will start by explaining the core idea behind Bayesian Neural Networks (BNNs) and then see how the probly transformation enables you to create them. After that, we will look at a PyTorch example to inspect the transformed model and use it to estimate uncertainty. Bayesian approaches to neural networks enable principled uncertainty estimation in deep learning [BCKW15, HW21, KG17].

Part A: Introduction to BNNs and the Bayesian Transformation

1.Concept: What is a Bayesian Neural Network?

To understand the Bayesian transformation, we first need to understand the difference between a standard neural network and a Bayesian one.

1.1 Standard Neural Networks

In a standard neural network, each weight is a single, deterministic number. After training, these weights are fixed. When you pass an input through the model, it follows one exact path, producing one exact output. The model has no inherent way to express how “sure” it is about the values of its weights. This corresponds to the classical deterministic formulation of neural networks [Bis06a, GBC16].

1.2 Bayesian Neural Networks (BNNs)

In a Bayesian Neural Network, we replace the deterministic weights with probability distributions. Instead of a weight being a single number, it might be represented by a Gaussian (normal) distribution with a mean and a standard deviation. This probabilistic formulation of network weights is central to Bayesian neural networks [BCKW15, GG16b].

  • The mean represents the most likely value for that weight.

  • The standard deviation represents the model’s uncertainty about that weight. [DKD09, HW21]. A small standard deviation means the model is very confident in the weight’s value, while a large one means it is very unsure.

During a forward pass, we don’t use the mean value directly. Instead, we sample a value for each weight from its distribution. Because we get a slightly different set of weights every time, each forward pass on the same input will produce a slightly different output. This natural variation is a direct reflection of the model’s parameter uncertainty. [AB21, GG16b]

1.3 The Bayesian Transformation (probly)

The Bayesian transformation in probly automates the process of converting a standard network into a BNN.

The transformation does the following:

It walks through your PyTorch model and finds all compatible layers (e.g., nn.Linear and nn.Conv2d). It programmatically replaces each standard layer with a corresponding custom Bayesian layer (e.g., BayesLinear, BayesConv2d). These new layers contain weight distributions instead of single values and are inherently stochastic, even during inference.

This allows us to get a distribution of predictions by running multiple forward passes, which we can then use to quantify the model’s uncertainty.

1.4. What that entails

Aspect

Bayesian Transformation (probly)

Main Idea

“Weights are distributions”

Stochastic Element

Weights are sampled from probability distributions.

Architectural Change

Replaces nn.Linear and nn.Conv2d with BayesLinear/BayesConv2d layers.

Uncertainty Interpretation

A principled, direct measure of the model’s parameter uncertainty.

Supported Layers

Linear and Conv2d

Key Parameters

prior_mean, prior_std, posterior_std

2. Quickstart (PyTorch)

Below: build a small MLP, apply bayesian(model), and inspect the modified architecture to see the layer replacement.

import torch
from torch import nn

from probly.transformation import bayesian


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 1) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Bayesian transformation with default parameters
model_bnn = bayesian(model)
print("\nWith Bayesian transformation:\n", model_bnn)
Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

With Bayesian transformation:
 Sequential(
  (0): BayesLinear()
  (1): ReLU()
  (2): BayesLinear()
  (3): ReLU()
  (4): BayesLinear()
)

Notes on the structure

  • Notice that each Linear layer has been replaced by a BayesLinear layer.

  • The new layers manage the distributions for the weights and biases internally.

3. Uncertainty via Stochastic Forward Passes

To obtain predictive uncertainty, we run multiple forward passes. In each pass, a new set of weights is sampled from the learned distributions. We then compute the mean and variance of the resulting predictions.

# Toy regression data
torch.manual_seed(0)
n = 128
X = torch.randn(n, 10)
true_w = torch.randn(10, 1)
y = X @ true_w + 0.1 * torch.randn(n, 1)

# Build and transform the model
model = build_mlp(in_dim=10, hidden=64, out_dim=1)
model_bnn = bayesian(model)

# Simple training loop (for illustration)
opt = torch.optim.Adam(model_bnn.parameters(), lr=1e-2)
loss_fn = nn.MSELoss()

for _step in range(200):
    opt.zero_grad()
    pred = model_bnn(X)
    loss = loss_fn(pred, y)
    loss.backward()
    opt.step()


# Prediction function (stochastic passes)
@torch.no_grad()
def stochastic_predict(
    bayesian_model: nn.Module,
    inputs: torch.Tensor,
    n_samples: int = 50,
) -> tuple[torch.Tensor, torch.Tensor]:
    preds = []
    for _ in range(n_samples):
        preds.append(bayesian_model(inputs).detach())
    stacked = torch.stack(preds, dim=0)  # [n_samples, N, out_dim]
    mean = stacked.mean(dim=0)
    var = stacked.var(dim=0, unbiased=False)
    return mean, var


mean_pred, var_pred = stochastic_predict(model_bnn, X[:5], n_samples=100)
print("Predictive mean (first 5):\n", mean_pred.squeeze())
print("\nPredictive variance (first 5):\n", var_pred.squeeze())
Predictive mean (first 5):
 tensor([-1.1153,  3.2737,  1.8071, -1.3878, -0.6908])

Predictive variance (first 5):
 tensor([0.0594, 0.0913, 0.0668, 0.0288, 0.0506])

4. Part A Summary

In Part A, we introduced the core concept of Bayesian Neural Networks, where weights are represented as probability distributions rather than single numbers. This inherently captures the model’s uncertainty about its own parameters. We saw how the probly.transformation.bayesian Bayesian Transformation makes creating BNNs simple: it traverses a standard PyTorch model and replaces nn.Linear and nn.Conv2d layers with their Bayesian counterparts. This transformed model naturally producesIn Part A, we introduced the core concept of Bayesian Neural Networks, where weights are represented as probability distributions rather than single numbers. This inherently captures the model’s uncertainty about its own parameters. We saw how the probly Bayesian Transformation makes creating BNNs simple: it traverses a standard PyTorch model and replaces nn.Linear and nn.Conv2d layers with their Bayesian counterparts. This transformed model naturally produces a distribution of outputs for any given input, allowing us to directly quantify predictive uncertainty. a distribution of outputs for any given input, allowing us to directly quantify predictive uncertainty.

Part B — Applied BNN Transformation

In Part A, we learned what the Bayesian transformation in probly does. In this Part B, we will apply it to a model containing both linear and convolutional layers, run several stochastic predictions, and visualize the resulting uncertainty.

An indepth tutorial showing:

  • How to define a standard neural network (LeNet) and make it Bayesian using the bayesian transformation.

  • How to set up the specialized training loop required for a BNN using the ELBO loss function.

  • How to train the BNN on a real-world dataset (FashionMNIST).

  • How to evaluate the final classification accuracy of the trained Bayesian model.

Can be found in the here: Training a BNN for Classification.

Final Summary — Bayesian Transformation Tutorial

This tutorial introduced the core concepts of Bayesian Neural Networks (BNNs), where weights are treated as probability distributions to capture model uncertainty. We demonstrated how probly’s bayesian transformation automates this by replacing standard nn.Linear and nn.Conv2d layers with their stochastic Bayesian counterparts. We also walked through a simplified example of how to run multiple forward passes to get a predictive mean and variance.

While this notebook covered the fundamentals, a proper BNN requires a specialized training procedure. For a complete, end-to-end guide that shows you how to train a Bayesian LeNet on the FashionMNIST dataset using the correct ELBO loss, please see the next tutorial: Training a BNN for Classification.