Evidential Classification Transformation¶

This notebook is a practical introduction to the Evidential Classification transformation in probly. Evidential Deep Learning is a powerful and computationally efficient method for uncertainty quantification that differs significantly from sampling-based approaches like MC-Dropout [SKK18]

We will start by explaining the core idea behind evidential learning and see how probly’s transformation helps you build such models. We will then walk through a PyTorch example to see how to get an uncertainty estimate from a single forward pass [SKK18].

Part A — Introduction to Evidential Learning¶

1. Concept: What is Evidential Classification?¶

1.1 The Problem: Overconfident Softmax¶

A standard classification network outputs logits, which are converted to probabilities using a softmax function. While useful, a high softmax probability (e.g., 0.99) is often misinterpreted as high model confidence. A model can be “confidently wrong,” especially on out-of-distribution data [GPSW17b, O+19].

1.2 The Evidential Approach: Learning “Evidence”¶

Evidential Deep Learning reframes the problem. Instead of learning a direct mapping from input to class probabilities, the model learns to collect evidence for each class [SKK18]. Think of the model as a detective gathering clues for different suspects (the classes):

If the model finds many clues pointing to one suspect and very few for others (e.g., evidence of [100, 2, 5]), it is very confident.
If the model finds very few clues for any suspect (e.g., evidence of [0.1, 0.2, 0.15]), it is very uncertain. This might happen if the input is ambiguous or something the model has never seen before. The model’s final output is a vector of these evidence scores. The total amount of evidence collected is a direct measure of confidence [HW21].

1.3 The Evidential Transformation (probly)¶

The probly transformation helps you build an evidential model by ensuring the output can be interpreted as evidence.

You design your network as usual, but your final layer should output raw logits that represent the “evidence.”
The evidential_classification transformation simply appends a torch.nn.Softplus() activation function.
This ensures the evidence scores are always positive, a requirement for the underlying mathematical theory (the Dirichlet distribution) [ASSR20, SKK18].

The uncertainty can then be calculated directly from these evidence scores in a single forward pass [HW21, SKK18].

1.4 Short side‑by‑side comparison¶

Aspect	Evidential Classification	Standard (Softmax) Classification
Model Output	A vector of evidence for each class	A vector of probabilities for each class.
Final Activation	`Softplus` (to ensure positive evidence).	`Softmax` (to ensure probabilities sum to 1).
Uncertainty Source	The magnitude of the total evidence.	No direct measure; high probability is a poor proxy.
Inference Cost	One single forward pass.	One single forward pass.

2. Quickstart (PyTorch)¶

Below: build a small MLP and apply evidential_classification(model) to see how the final activation is appended.

import torch
from torch import nn

from probly.transformation import evidential_classification


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 3) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Evidential Classification transformation
model_evidential = evidential_classification(model)
print("\nWith Evidential transformation:\n", model_evidential)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=3, bias=True)
)

With Evidential transformation:
 Sequential(
  (0): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=3, bias=True)
  )
  (1): Softplus(beta=1.0, threshold=20.0)
)

Notes on the structure¶

Notice that the transformation has wrapped the original model in a Sequential module and appended a Softplus layer at the end.
The output of this new model will now always be positive.

3. Uncertainty from a Single Forward Pass¶

The key advantage of evidential learning is that uncertainty can be calculated directly from the output of a single prediction. The output of the model gives us the evidence alpha for each class. The total evidence, or Dirichlet strength S, is the sum of all alpha. The uncertainty u is then simply the number of classes K divided by this strength.

High S (lots of evidence) -> Low u (low uncertainty).
Low S (little evidence) -> High u (high uncertainty).

from probly.quantification.classification import evidential_uncertainty

torch.manual_seed(0)

# Create a dummy evidential model
model_evidential = evidential_classification(build_mlp())

# A dummy input
x = torch.randn(1, 10)

# Get the evidence from a single forward pass
with torch.no_grad():
    evidence = model_evidential(x)

# `probly` provides a function to calculate uncertainty directly
uncertainty = evidential_uncertainty(evidence.numpy())

print("Input data:\n", x)
print("\nOutput Evidence (alpha):\n", evidence)
print(f"\nCalculated Uncertainty: {uncertainty.item():.4f}")

# Example with higher evidence (more confidence)
high_evidence = torch.tensor([[100.0, 2.0, 5.0]])
low_uncertainty = evidential_uncertainty(high_evidence.numpy())
print(f"\nUncertainty for high evidence: {low_uncertainty.item():.4f}")

Input data:
 tensor([[ 0.1167,  0.1689, -1.1233,  1.8116,  0.6322, -0.8759,  0.3580, -0.4363,
         -0.7609,  1.5249]])

Output Evidence (alpha):
 tensor([[0.8682, 0.9021, 0.5639]])

Calculated Uncertainty: 0.5624

Uncertainty for high evidence: 0.0273

4. Part A Summary¶

In Part A, we introduced Evidential Deep Learning as a powerful alternative to standard softmax classification. Instead of outputting probabilities, an evidential model outputs “evidence” for each class. We learned that the probly transformation makes this easy by appending a Softplus activation to a standard network. The key advantage is that model uncertainty can be directly calculated from the magnitude of this evidence in a single, deterministic forward pass, making it much faster than sampling-based methods.

Part B — Applied Evidential Classification¶

In Part A, we learned the concept of the Evidential Classification transformation. In this Part B, we will apply it to a classification model, get a prediction, and calculate the uncertainty from a single forward pass.

An indepth tutorial showing:

How to define a standard neural network (LeNet) and make it an Evidential model using the evidential_classification transformation.
How to set up the specialized training loop required for an Evidential model, using the Evidential Log Loss and a KL Divergence regularizer.
How to train the Evidential model on a real-world dataset (FashionMNIST).
How to evaluate the final classification accuracy of the trained model.
How to compute and visualize Evidential Uncertainty by rotating an image.

can be found here Training an Evidential Model for Classification

Final Summary — Evidential Transformation Tutorial¶

This tutorial introduced the core concepts of Evidential Deep Learning, a powerful and efficient method for uncertainty quantification. We learned that instead of outputting probabilities like a standard classifier, an evidential model outputs “evidence” for each class.

We saw that probly’s evidential_classification transformation automates this by simply appending a Softplus layer to a standard network, ensuring the evidence is always positive. The key advantage of this approach is its speed: a meaningful uncertainty score can be calculated directly from the magnitude of the evidence in a single forward pass.

For a complete, end-to-end example that shows how to train an evidential model on the FashionMNIST dataset using the specialized evidential loss functions, please see the next tutorial: Training an Evidential Model for Classification.