probly.train.evidential.torch

Unified Evidential Train Function.

Functions

der_loss(y, mu, kappa, alpha, beta[, lam])

Deep Evidential Regression loss for uncertainty-aware regression.

dirichlet_entropy(alpha)

Dirichlet entropy for predictive uncertainty estimation.

evidential_ce_loss(alphas, targets)

Evidential Cross Entropy Loss for classification uncertainty estimation.

evidential_kl_divergence(alphas, targets)

Evidential KL divergence loss for classification uncertainty estimation.

evidential_log_loss(alphas, targets)

Evidential Log Loss for classification uncertainty estimation.

evidential_mse_loss(alphas, targets)

Evidential Mean Squared Error loss for classification uncertainty estimation.

evidential_nignll_loss(inputs, targets)

Evidence-based Normal-Inverse-Gamma (NIG) regression loss.

evidential_regression_regularization(inputs, ...)

Regularization term for evidential regression.

ird_loss(alpha, y[, adversarial_alpha, p, ...])

Information Robust Dirichlet (IRD) loss for predictive uncertainty estimation.

kl_dirichlet(prior_alpha, posterior_alpha)

Compute KL(Dir(alpha_p) || Dir(alpha_q)) for each batch item.

lp_fn(alpha, y[, p])

Lp calibration loss for predictive uncertainty estimation.

make_in_domain_target_alpha(y)

Construct target Dirichlet distribution for in-distribution samples.

make_ood_target_alpha(batch_size[, ...])

Construct flat Dirichlet target distribution for out-of-distribution samples.

natpn_loss(alpha, y[, entropy_weight])

Natural Posterior Network (NatPN) classification loss.

normal_wishart_log_prob(m, l_precision, ...)

Compute simplified univariate Normal-Wishart log-likelihood.

pn_loss(model, x_in, y_in, x_ood)

Paired ID/OOD training loss for Dirichlet Prior Networks.

postnet_loss(alpha, y[, entropy_weight])

Posterior Networks (PostNet) classification loss.

predictive_probs(alpha)

Expected categorical probabilities under Dirichlet.

regularization_fn(alpha, y)

Regularization term for Information Robust Dirichlet Networks.

rpn_distillation_loss(rpn_params, mus, variances)

Compute the distillation loss for Regression Prior Networks (RPN).

rpn_loss(model, x_id, y_id, x_ood[, ...])

Paired in-distribution and out-of-distribution loss for Regression Prior Networks.

rpn_ng_kl(mu, kappa, alpha, beta, mu0, ...)

KL divergence between two Normal-Gamma distributions.

rpn_prior(shape, device)

Normal-Gamma prior with zero evidence for Regression Prior Networks.

unified_evidential_train(mode, model, dataloader)

Trains a given Neural Network using different learning approaches, depending on the approach of a selected paper.

probly.train.evidential.torch.der_loss(y, mu, kappa, alpha, beta, lam=0.01)[source]

Deep Evidential Regression loss for uncertainty-aware regression.

Combines a Student-t negative log-likelihood with an evidence regularization term as proposed by Amini et al. (2020).

Reference:

Amini et al., “Deep Evidential Regression”, NeurIPS 2020. https://arxiv.org/abs/1910.02600

Parameters:
  • y (Tensor) – Ground-truth regression targets, shape (B,) or (B, 1).

  • mu (Tensor) – Predicted mean of the Normal-Inverse-Gamma distribution, shape (B,).

  • kappa (Tensor) – Predicted scaling parameter, shape (B,).

  • alpha (Tensor) – Predicted shape parameter, shape (B,).

  • beta (Tensor) – Predicted scale parameter, shape (B,).

  • lam (float) – Weight of the evidence regularization term.

Returns:

Scalar Deep Evidential Regression loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.dirichlet_entropy(alpha)[source]

Dirichlet entropy for predictive uncertainty estimation.

Used in Information Robust Dirichlet Networks to encourage uncertainty on adversarial or out-of-distribution inputs by maximizing the entropy of the Dirichlet distribution.

Reference:

Tsiligkaridis, “Information Robust Dirichlet Networks for Predictive Uncertainty Estimation”, 2019. https://arxiv.org/abs/1910.04819

The entropy is given by:
H(alpha) = log B(alpha)
  • (alpha_0 - K) * ψ(alpha_0)

  • Σ_k (alpha_k - 1) * ψ(alpha_k)

Parameters:

alpha (Tensor) – Dirichlet concentration parameters, shape (B_a, K), must be > 0.

Returns:

Scalar Dirichlet entropy summed over the batch.

Raises:

ValueError – If alpha contains non-positive values.

Return type:

Tensor

probly.train.evidential.torch.evidential_ce_loss(alphas, targets)[source]

Evidential Cross Entropy Loss for classification uncertainty estimation.

Implements the evidential cross-entropy loss proposed by Sensoy et al. (2018) for Evidential Deep Learning.

Reference:

Sensoy et al., “Evidential Deep Learning to Quantify Classification Uncertainty”, NeurIPS 2018. https://arxiv.org/abs/1806.01768

Parameters:
  • alphas (Tensor) – Dirichlet concentration parameters, shape (B, C).

  • targets (Tensor) – Ground-truth class labels, shape (B,).

Returns:

Scalar evidential cross-entropy loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.evidential_kl_divergence(alphas, targets)[source]

Evidential KL divergence loss for classification uncertainty estimation.

Implements the KL divergence regularization term proposed by Sensoy et al. (2018) for Evidential Deep Learning.

Reference:

Sensoy et al., “Evidential Deep Learning to Quantify Classification Uncertainty”, NeurIPS 2018. https://arxiv.org/abs/1806.01768

Parameters:
  • alphas (Tensor) – Dirichlet concentration parameters, shape (B, C).

  • targets (Tensor) – Ground-truth class labels, shape (B,).

Returns:

Scalar evidential KL divergence loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.evidential_log_loss(alphas, targets)[source]

Evidential Log Loss for classification uncertainty estimation.

Implements the evidential log loss proposed by Sensoy et al. (2018) for Evidential Deep Learning.

Reference:

Sensoy et al., “Evidential Deep Learning to Quantify Classification Uncertainty”, NeurIPS 2018. https://arxiv.org/abs/1806.01768

Parameters:
  • alphas (Tensor) – Dirichlet concentration parameters, shape (B, C).

  • targets (Tensor) – Ground-truth class labels, shape (B,).

Returns:

Scalar evidential log loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.evidential_mse_loss(alphas, targets)[source]

Evidential Mean Squared Error loss for classification uncertainty estimation.

Implements the evidential MSE loss proposed by Sensoy et al. (2018), combining prediction error and predictive variance under a Dirichlet distribution.

Reference:

Sensoy et al., “Evidential Deep Learning to Quantify Classification Uncertainty”, NeurIPS 2018. https://arxiv.org/abs/1806.01768

Parameters:
  • alphas (Tensor) – Dirichlet concentration parameters, shape (B, C).

  • targets (Tensor) – Ground-truth class labels, shape (B,).

Returns:

Scalar evidential mean squared error loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.evidential_nignll_loss(inputs, targets)[source]

Evidence-based Normal-Inverse-Gamma (NIG) regression loss.

Implements the negative log-likelihood term used in Deep Evidential Regression as proposed by Amini et al. (2020).

Reference:

Amini et al., “Deep Evidential Regression”, NeurIPS 2020. https://arxiv.org/abs/1910.02600

Parameters:
  • inputs (dict[str, Tensor]) – Dictionary containing NIG distribution parameters with keys "gamma", "nu", "alpha", and "beta", each of shape (B,).

  • targets (Tensor) – Ground-truth regression targets, shape (B,).

Returns:

Scalar NIG negative log-likelihood loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.evidential_regression_regularization(inputs, targets)[source]

Regularization term for evidential regression.

Implements the evidence regularization component proposed by Amini et al. (2020) to penalize confident but inaccurate predictions in Deep Evidential Regression.

Reference:

Amini et al., “Deep Evidential Regression”, NeurIPS 2020. https://arxiv.org/abs/1910.02600

Parameters:
  • inputs (dict[str, Tensor]) – Dictionary containing evidential regression parameters with keys "gamma", "nu", and "alpha", each of shape (B,).

  • targets (Tensor) – Ground-truth regression targets, shape (B,).

Returns:

Scalar evidential regression regularization loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.ird_loss(alpha, y, adversarial_alpha=None, p=2.0, lam=1.0, gamma=1.0, normalize=True)[source]

Information Robust Dirichlet (IRD) loss for predictive uncertainty estimation.

Implements the loss proposed by Tsiligkaridis (2019), combining an Lp calibration term, a trigamma-based regularization term, and an optional entropy-based adversarial regularizer.

Reference:

Tsiligkaridis, “Information Robust Dirichlet Networks for Predictive Uncertainty Estimation”, 2019. https://arxiv.org/abs/1910.04819

Parameters:
  • alpha (Tensor) – Dirichlet concentration parameters, shape (B, K).

  • y (Tensor) – One-hot encoded class labels, shape (B, K).

  • adversarial_alpha (Tensor | None) – Dirichlet concentration parameters for adversarial inputs, shape (B_a, K).

  • p (float) – Lp norm exponent controlling calibration strength.

  • lam (float) – Weight of the regularization term.

  • gamma (float) – Weight of the entropy regularization term.

  • normalize (bool) – Whether to normalize loss terms by batch size.

Returns:

Scalar IRD loss summed over all input examples.

Return type:

Tensor

probly.train.evidential.torch.kl_dirichlet(prior_alpha, posterior_alpha)[source]

Compute KL(Dir(alpha_p) || Dir(alpha_q)) for each batch item.

Used by Posterior Networks, Dirichlet Prior Networks, and PN-style in-distribution / out-of-distribution losses to compare Dirichlet distributions.

Parameters:
  • prior_alpha (Tensor) – Prior Dirichlet concentration parameters, shape (B, C).

  • posterior_alpha (Tensor) – Posterior Dirichlet concentration parameters, shape (B, C).

Returns:

KL divergence for each batch element, shape (B,)

Return type:

Tensor

probly.train.evidential.torch.lp_fn(alpha, y, p=2.0)[source]

Lp calibration loss for predictive uncertainty estimation.

Implements the Lp calibration loss proposed by Tsiligkaridis (2019) for Information Robust Dirichlet Networks.

Reference:

Tsiligkaridis, “Information Robust Dirichlet Networks for Predictive Uncertainty Estimation”, 2019. https://arxiv.org/abs/1910.04819

The loss is computed using the expectation-based formulation:

F_i = ( E[(1 - p_c)^p] + Σ_{j≠c} E[p_j^p] )^(1/p)

Parameters:
  • alpha (Tensor) – Dirichlet concentration parameters, shape (B, K), must be > 0.

  • y (Tensor) – One-hot encoded class labels, shape (B, K).

  • p (float) – Lp norm exponent controlling calibration strength (default: 2.0).

Returns:

Scalar Lp calibration loss summed over the batch.

Raises:

ValueError – If alpha contains non-positive values or if shapes do not match.

Return type:

Tensor

probly.train.evidential.torch.make_in_domain_target_alpha(y)[source]

Construct target Dirichlet distribution for in-distribution samples.

Used by Dirichlet Prior Networks, Posterior Networks, and PN-style paired losses to create a sharp (peaked) Dirichlet target for supervised in-distribution training.

Parameters:

y (Tensor) – Ground-truth class labels, shape (B,).

Returns:

Target Dirichlet concentration parameters, shape (B, C).

Return type:

Tensor

probly.train.evidential.torch.make_ood_target_alpha(batch_size, num_classes=10, alpha0=10)[source]

Construct flat Dirichlet target distribution for out-of-distribution samples.

Used by Dirichlet Prior Networks, Posterior Networks, and PN-style paired losses to encourage high uncertainty on out-of-distribution inputs by assigning uniform Dirichlet concentration parameters.

Parameters:
  • batch_size (int) – Number of out-of-distribution samples in the batch.

  • num_classes (int) – Number of classes. Defaults to 10.

  • alpha0 (float) – Total Dirichlet concentration (strength) parameter.

Returns:

Target Dirichlet concentration parameters, shape (B, C).

Return type:

Tensor

probly.train.evidential.torch.natpn_loss(alpha, y, entropy_weight=0.0001)[source]

Natural Posterior Network (NatPN) classification loss.

Implements the Dirichlet-Categorical Bayesian loss with an entropy regularizer as proposed by Charpentier et al. (2022).

Reference:

Charpentier et al., “Natural Posterior Network”, NeurIPS 2022. https://arxiv.org/abs/2105.04471

Parameters:
  • alpha (Tensor) – Posterior Dirichlet concentration parameters, shape (B, C).

  • y (Tensor) – Ground-truth class labels, shape (B,) with values in [0, C-1].

  • entropy_weight (float) – Weight controlling the strength of the entropy regularization term.

Returns:

Scalar NatPN loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.normal_wishart_log_prob(m, l_precision, kappa, nu, mu_k, sigma2_k)[source]

Compute simplified univariate Normal-Wishart log-likelihood.

Parameters:
  • m (Tensor) – Prior mean parameter.

  • l_precision (Tensor) – Precision (> 0), formerly L.

  • kappa (Tensor) – Strength parameter (> 0).

  • nu (Tensor) – Degrees of freedom (> 2).

  • mu_k (Tensor) – Sample mean from ensemble.

  • sigma2_k (Tensor) – Sample variance from ensemble.

Returns:

Log-likelihood under the Normal-Wishart model.

Return type:

Tensor

probly.train.evidential.torch.pn_loss(model, x_in, y_in, x_ood)[source]

Paired ID/OOD training loss for Dirichlet Prior Networks.

Combines KL divergence to sharp in-distribution targets and flat out-of-distribution targets, with an additional cross-entropy term for classification stability.

Reference:

Malinin and Gales, “Predictive Uncertainty Estimation via Prior Networks”, NeurIPS 2018. https://arxiv.org/abs/1802.10501

Parameters:
  • model (Module) – Network mapping inputs to Dirichlet concentration parameters.

  • x_in (Tensor) – In-distribution inputs, shape (B, …).

  • y_in (Tensor) – In-distribution class labels, shape (B,).

  • x_ood (Tensor) – Out-of-distribution inputs, shape (B_ood, …).

Returns:

Scalar paired ID+OOD Prior Networks loss.

Return type:

Tensor

probly.train.evidential.torch.postnet_loss(alpha, y, entropy_weight=1e-05)[source]

Posterior Networks (PostNet) classification loss.

Implements the expected cross-entropy loss with an entropy regularizer as proposed by Charpentier et al. (2020) for Posterior Networks.

Reference:

Charpentier et al., “Posterior Networks: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts”, NeurIPS 2020. https://arxiv.org/abs/2006.09239

Parameters:
  • alpha (Tensor) – Dirichlet concentration parameters, shape (B, C).

  • y (Tensor) – Ground-truth class labels, shape (B,).

  • entropy_weight (float) – Weight of the entropy regularization term.

Returns:

Scalar Posterior Networks loss averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.predictive_probs(alpha)[source]

Expected categorical probabilities under Dirichlet.

Used by Posterior Networks, Dirichlet Prior Networks, and other Dirichlet-based classification models to obtain predictive class probabilities.

Parameters:

alpha (Tensor) – Dirichlet concentration parameters, shape (B, C).

Returns:

Expected categorical probabilities, shape (B, C).

Return type:

Tensor

probly.train.evidential.torch.regularization_fn(alpha, y)[source]

Regularization term for Information Robust Dirichlet Networks.

Penalizes high Dirichlet concentration values for incorrect classes to encourage confident but well-calibrated predictions.

Reference:

Tsiligkaridis, “Information Robust Dirichlet Networks for Predictive Uncertainty Estimation”, 2019. https://arxiv.org/abs/1910.04819

Parameters:
  • alpha (Tensor) – Dirichlet concentration parameters, shape (B, K), must be > 0.

  • y (Tensor) – One-hot encoded class labels, shape (B, K).

Returns:

Scalar regularization loss summed over classes and batch.

Raises:

ValueError – If alpha and y shapes do not match.

Return type:

Tensor

probly.train.evidential.torch.rpn_distillation_loss(rpn_params, mus, variances)[source]

Compute the distillation loss for Regression Prior Networks (RPN).

This loss measures how well the RPN’s Normal-Wishart distribution matches the empirical ensemble distributions (mu_k, var_k).

Parameters:
  • rpn_params (tuple[Tensor, Tensor, Tensor, Tensor]) – The RPN output parameters (m, l_precision, kappa, nu).

  • mus (list[Tensor]) – Ensemble predicted means.

  • variances (list[Tensor]) – Ensemble predicted variances.

Returns:

Scalar loss value.

Return type:

Tensor

probly.train.evidential.torch.rpn_loss(model, x_id, y_id, x_ood, lam_der=0.01, lam_rpn=50.0)[source]

Paired in-distribution and out-of-distribution loss for Regression Prior Networks.

Computes the Regression Prior Network (RPN) training objective using paired in-distribution (ID) and out-of-distribution (OOD) mini-batches. The loss combines a supervised Deep Evidential Regression (DER) term on ID data with a KL regularization term that pushes OOD predictions back toward the Normal-Gamma prior.

Reference:

Malinin et al., “Regression Prior Networks”, NeurIPS 2020. https://arxiv.org/abs/2006.11590

Parameters:
  • model (Module) – Regression model returning (mu, kappa, alpha, beta) for each input.

  • x_id (Tensor) – In-distribution inputs, shape (B_id, …).

  • y_id (Tensor) – In-distribution regression targets, shape (B_id,) or compatible.

  • x_ood (Tensor) – Out-of-distribution inputs, shape (B_ood, …).

  • lam_der (float) – Weight of the DER evidence regularization term.

  • lam_rpn (float) – Weight of the RPN prior-matching KL term.

Returns:

Scalar paired ID+OOD Regression Prior Network loss.

Return type:

Tensor

probly.train.evidential.torch.rpn_ng_kl(mu, kappa, alpha, beta, mu0, kappa0, alpha0, beta0)[source]

KL divergence between two Normal-Gamma distributions.

Computes the KL divergence between a predicted Normal-Gamma distribution and a prior Normal-Gamma distribution, as used in Regression Prior Networks to regularize out-of-distribution predictions.

Reference:

Malinin et al., “Regression Prior Networks”, NeurIPS 2020. https://arxiv.org/abs/2006.11590

Parameters:
  • mu (Tensor) – Predicted mean parameter, shape (B,).

  • kappa (Tensor) – Predicted scaling parameter, shape (B,).

  • alpha (Tensor) – Predicted shape parameter, shape (B,).

  • beta (Tensor) – Predicted scale parameter, shape (B,).

  • mu0 (Tensor) – Prior mean parameter, shape (B,).

  • kappa0 (Tensor) – Prior scaling parameter, shape (B,).

  • alpha0 (Tensor) – Prior shape parameter, shape (B,).

  • beta0 (Tensor) – Prior scale parameter, shape (B,).

Returns:

Scalar KL divergence between predicted and prior Normal-Gamma distributions, averaged over the batch.

Return type:

Tensor

probly.train.evidential.torch.rpn_prior(shape, device)[source]

Normal-Gamma prior with zero evidence for Regression Prior Networks.

Constructs an uninformative Normal-Gamma prior used in Regression Prior Networks to regularize out-of-distribution predictions via KL divergence, as proposed by Malinin et al. (2020).

Reference:

Malinin et al., “Regression Prior Networks”, NeurIPS 2020. https://arxiv.org/abs/2006.11590

Parameters:
  • shape (Size | tuple[int, ...]) – Shape of the prior parameter tensors (e.g., batch shape).

  • device (device) – Torch device on which to allocate the tensors.

Returns:

Tuple (mu0, kappa0, alpha0, beta0) of Normal-Gamma prior parameters, each with the specified shape.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor]

probly.train.evidential.torch.unified_evidential_train(mode, model, dataloader, loss_fn=None, oodloader=None, class_count=None, epochs=5, lr=0.001, device='cpu')[source]

Trains a given Neural Network using different learning approaches, depending on the approach of a selected paper.

Parameters:
  • mode (Literal['PostNet', 'NatPostNet', 'EDL', 'PrNet', 'IRD', 'DER', 'RPN']) – Identifier of the paper-based training approach to be used. Must be one of: “PostNet”, “NatPostNet”, “EDL”, “PrNet”, “IRD”, “DER” or “RPN”.

  • model (nn.Module) – The neural network to be trained.

  • dataloader (DataLoader) – Pytorch.Dataloader providing the In-Distributtion training samples and corresponding labels.

  • loss_fn (Callable[..., torch.Tensor] | None) – Loss functions used for training. The inputs of each loss-functions depends on the selected mode

  • oodloader (DataLoader | None) – Pytorch.Dataloader providing the Out-Of-Distributtion training samples and corresponding labels. This is only required for certain modes such as “PrNet”

  • class_count (torch.Tensor | None) – Tensor containing the number of samples per class.

  • epochs (int) – Number of training epochs.

  • lr (float) – Learning rate used by the optimizer.

  • device (str) – Device on which the model is trained (e.g. “cpu” or “cuda”)

Returns:

None. The function performs training of the provided model and does not return a value. But prints the total-losses per Epoch.

Return type:

None