Evidential Regression Transformation¶

This notebook is a practical introduction to the Evidential Regression transformation in probly. This technique allows a model to predict not just a single value, but a full probability distribution, enabling it to quantify both its confidence in the data (aleatoric uncertainty) and its own knowledge (epistemic uncertainty) [ASSR20, SKK18].

We will start by explaining the core idea behind evidential regression and then see how probly’s transformation automates the process of building such a model by replacing the final layer. We will then train this model on a simple 1D dataset and visualize its predictive uncertainty.

Part A — Introduction to Evidential Regression¶

1. Concept: What is Evidential Regression?¶

1.1 The Problem: Standard Regression Predicts a Point¶

A standard regression network is trained to predict a single value. For a given input, it might predict y = 3.14. This gives us no information about the model’s confidence. Is the prediction 3.14 ± 0.01 or 3.14 ± 10.0? We have no way of knowing {cite:p}.

1.2 The Evidential Approach: Predicting a Distribution¶

Evidential Regression reframes the problem. Instead of predicting a single point, the model predicts the four parameters of a Normal-Inverse-Gamma (NIG) distribution. These four parameters are: gamma (γ), nu (ν), alpha (α), and beta (β) [ASSR20].

Together, these parameters define a distribution over our prediction. From them, we can directly calculate:

The Prediction: The mean of the distribution (given by gamma).
Aleatoric Uncertainty (Data Noise): The inherent noise or ambiguity in the data itself. A high value means the data points are widely scattered {cite:p}.
Epistemic Uncertainty (Model Ignorance): The model’s own uncertainty about its predictions. A high value means the model is “out of its depth,” [SKK18] perhaps because it’s seeing data far from what it was trained on.

1.3 The Evidential Regression Transformation (probly)¶

The probly transformation makes it easy to create an evidential regression model.

You design your network as usual.
The evidential_regression transformation traverses your model backwards and replaces the final nn.Linear layer with a special NormalInverseGammaLinear layer.
This new final layer is responsible for outputting the four (γ, ν, α, β) parameters instead of a single value.

The uncertainty can then be calculated from these parameters in a single forward pass.

1.4 Short side‑by‑side comparison¶

Aspect	Evidential Regression	Standard (Point) Regression
Model Output	Four parameters: `(γ, ν, α, β)`	A single predicted value.
Final Layer	`NormalInverseGammaLinear`	`nn.Linear`
Uncertainty Source	Calculated directly from the four output parameters.	None.
Inference Cost	One single forward pass.	One single forward pass.

2. Quickstart (PyTorch)¶

Below: build a small MLP and apply evidential_regression(model) to see how the last linear layer is replaced.

from torch import nn

from probly.transformation import evidential_regression


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 1) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Evidential Regression transformation
model_evidential = evidential_regression(model)
print("\nWith Evidential transformation:\n", model_evidential)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

With Evidential transformation:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): NormalInverseGammaLinear()
)

Notes on the structure¶

Notice that the transformation has replaced only the final nn.Linear layer with a NormalInverseGammaLinear layer.
The output of this new model will be a dictionary containing the four parameters.

3. Part A Summary¶

In Part A, we introduced Evidential Regression as a method for a model to predict its own uncertainty. Instead of a single point, the model learns to output the four parameters of a Normal-Inverse-Gamma distribution (γ, ν, α, β). We learned that the probly transformation automates this by replacing the final linear layer of a network. The key advantage is that both data uncertainty (aleatoric) and model uncertainty (epistemic) can be calculated from these parameters in a single, deterministic forward pass.

Part B — Applied Evidential Regression¶

An indepth walkthrough of:

How to generate synthetic regression data using a cubic function.
How to define a standard neural network with ReLU activations.
How to transform the network into an Evidential Regression Model.
How to train the network using the evidential loss and regularization.
How to plot the predictions and targets to visualize the model’s performance.

Can be found here: Training an Evidential Regression Model

Final Summary — Evidential Regression Tutorial¶

This tutorial demonstrated how to use the Evidential Regression Transformation in probly to create models that can predict their own uncertainty. We learned that instead of a single point, an evidential model outputs the four parameters of a distribution (γ, ν, α, β). The probly transformation automates this by replacing the final linear layer of a network. The key advantage is that both aleatoric (data) uncertainty and epistemic (model) uncertainty can be calculated from these parameters in a single forward pass. We saw this in practice by training a model on a dataset with a gap. The final visualization clearly showed the model’s uncertainty increasing in the regions where it had no training data, making it a powerful and interpretable tool for building more reliable regression models.