probly.evaluation.metrics

Collection of performance metrics to evaluate predictions.

Functions

brier_score(probs, targets)

Compute the Brier score of the predicted probabilities.

coverage(preds, targets)

Compute the coverage of set-valued predictions described in [AB21].

coverage_convex_hull(probs, targets, **kwargs)

Compute credal set coverage via convex hull [NZD25].

covered_efficiency(preds, targets)

Compute the efficiency of the set-valued predictions for which the ground truth is covered.

efficiency(preds)

Compute the efficiency of set-valued predictions described in [AB21].

expected_calibration_error(probs, labels[, ...])

Compute the expected calibration error (ECE) of the predicted probabilities [GPSW17b].

expected_calibration_error_binary(probs, labels)

Expected Calibration Error (ECE) for binary classifiers.

log_loss(probs, targets)

Compute the log loss of the predicted probabilities.

spherical_score(probs, targets)

Compute the spherical score of the predicted probabilities.

zero_one_loss(probs, targets)

Compute the zero-one loss of the predicted probabilities.

probly.evaluation.metrics.brier_score(probs, targets)[source]

Compute the Brier score of the predicted probabilities.

We assume the score to be negatively-oriented, i.e. lower is better.

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_classes).

  • targets (ndarray) – True labels of shape (n_instances,). If None, score is computed for all classes.

Returns:

The Brier score.

Return type:

loss

probly.evaluation.metrics.coverage(preds, targets)[source]

Compute the coverage of set-valued predictions described in [AB21].

Parameters:
  • preds (ndarray) – Predictions of shape (n_instances, n_classes) or (n_instances, n_samples, n_classes).

  • targets (ndarray) – Targets of shape (n_instances,) or (n_instances, n_classes).

Returns:

The coverage of the set-valued predictions.

Return type:

cov

probly.evaluation.metrics.coverage_convex_hull(probs, targets, **kwargs)[source]

Compute credal set coverage via convex hull [NZD25].

The coverage is defined as the proportion of instances whose true distribution is contained in the convex hull. This is computed using linear programming by checking whether the target distribution can be expressed as a convex combination of the predicted distributions.

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_samples, n_classes).

  • targets (ndarray) – The true labels as an array of shape (n_instances, n_classes).

  • **kwargs (Any) – Additional keyword arguments for the linear programming solver, e.g. tolerance.

Returns:

The coverage.

Return type:

cov

probly.evaluation.metrics.covered_efficiency(preds, targets)[source]

Compute the efficiency of the set-valued predictions for which the ground truth is covered.

In the case of a set over classes this is the mean of the number of classes in the set. In the case of a credal set, this is computed by the mean difference between the upper and lower probabilities.

Parameters:
  • preds (ndarray) – Predictions of shape (n_instances, n_classes) or (n_instances, n_samples, n_classes).

  • targets (ndarray) – The true labels as an array of shape (n_instances, n_classes).

Returns:

The efficiency of the set-valued predictions for which the ground truth is covered.

Return type:

ceff

probly.evaluation.metrics.efficiency(preds)[source]

Compute the efficiency of set-valued predictions described in [AB21].

In the case of a set over classes this is the mean of the number of classes in the set. In the case of a credal set, this is computed by the mean difference between the upper and lower probabilities.

Parameters:

preds (ndarray) – Predictions of shape (n_instances, n_classes) or (n_instances, n_samples, n_classes).

Returns:

The efficiency of the set-valued predictions.

Return type:

eff

probly.evaluation.metrics.expected_calibration_error(probs, labels, num_bins=10)[source]

Compute the expected calibration error (ECE) of the predicted probabilities [GPSW17b].

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_classes).

  • labels (ndarray) – The true labels as an array of shape (n_instances,).

  • num_bins (int) – The number of bins to use for the calibration error calculation.

Returns:

The expected calibration error.

Return type:

ece

probly.evaluation.metrics.expected_calibration_error_binary(probs, labels, num_bins=10)[source]

Expected Calibration Error (ECE) for binary classifiers.

This function works with sigmoid outputs.

probs: shape (N,) or (N, 1) — sigmoid probabilities labels: shape (N,) — binary labels {0,1}

Parameters:
Return type:

float

probly.evaluation.metrics.log_loss(probs, targets)[source]

Compute the log loss of the predicted probabilities.

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_classes).

  • targets (ndarray) – True labels of shape (n_instances,). If None, loss is computed for all classes.

Returns:

The log loss.

Return type:

loss

probly.evaluation.metrics.spherical_score(probs, targets)[source]

Compute the spherical score of the predicted probabilities.

We assume the score to be negatively-oriented, i.e. lower is better.

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_classes).

  • targets (ndarray) – True labels of shape (n_instances,). If None, score is computed for all classes.

Returns:

The spherical score.

Return type:

loss

probly.evaluation.metrics.zero_one_loss(probs, targets)[source]

Compute the zero-one loss of the predicted probabilities.

Parameters:
  • probs (ndarray) – The predicted probabilities as an array of shape (n_instances, n_classes).

  • targets (ndarray) – True labels of shape (n_instances,). If None, loss is computed for all classes.

Returns:

The zero-one loss.

Return type:

loss