probly.evaluation.ood

Unified OOD evaluation API for probly.

Functions

evaluate_ood(in_distribution, out_distribution)

Unified OOD evaluation API.

out_of_distribution_detection_aupr(...)

Perform out-of-distribution detection using AUPR (Area Under the Precision-Recall Curve).

out_of_distribution_detection_auroc(...)

Perform out-of-distribution detection using prediction functionals from id and ood data.

out_of_distribution_detection_fnr_at_x_tpr(...)

Perform out-of-distribution detection using false negative rate at user given true positive rate.

out_of_distribution_detection_fpr_at_x_tpr(...)

Perform out-of-distribution detection using false positive rate (FPR) at a given true positive rate.

parse_dynamic_metric(spec)

Parse dynamic metric specification.

visualize_ood(in_distribution, out_distribution)

Generate visualization plots from OOD scores.

probly.evaluation.ood.evaluate_ood(in_distribution, out_distribution, metrics=None)[source]

Unified OOD evaluation API.

Provides backward compatibility while supporting multiple metrics.

Parameters: in_distribution :

Scores for in-distribution samples.

out_distribution :

Scores for out-of-distribution samples.

metricsstr, list of str, or None
  • None or “auroc”: Returns single AUROC value (backward compatible)

  • “all”: Returns dict with all available metrics

  • list: Returns dict with specified metrics

Returns: dict[str, float]

Dictionary mapping metric names to values. If metrics is None or “auroc”, the dict contains only the “auroc” entry.

Parameters:
Return type:

dict[str, float]

probly.evaluation.ood.out_of_distribution_detection_aupr(in_distribution, out_distribution)[source]

Perform out-of-distribution detection using AUPR (Area Under the Precision-Recall Curve).

This metric evaluates how well the model distinguishes between in- and out-of-distribution samples, focusing more on positive class (OOD) precision and recall.

Parameters:
  • in_distribution (ndarray) – in-distribution prediction functionals

  • out_distribution (ndarray) – out-of-distribution prediction functionals

Returns:

float, area under the precision-recall curve

Return type:

aupr

probly.evaluation.ood.out_of_distribution_detection_auroc(in_distribution, out_distribution)[source]

Perform out-of-distribution detection using prediction functionals from id and ood data.

This can be epistemic uncertainty, as is common, but also e.g. softmax confidence.

Parameters:
  • in_distribution (ndarray) – in-distribution prediction functionals

  • out_distribution (ndarray) – out-of-distribution prediction functionals

Returns:

float, area under the roc curve

Return type:

auroc

probly.evaluation.ood.out_of_distribution_detection_fnr_at_x_tpr(in_distribution, out_distribution, tpr_target=0.95)[source]

Perform out-of-distribution detection using false negative rate at user given true positive rate.

If no thresholds are specified, the default tpr_target is 0.95.

Parameters:
  • in_distribution (ndarray) – in-distribution prediction functionals

  • out_distribution (ndarray) – out-of-distribution prediction functionals

  • tpr_target (float) – target TPR value in [0, 1], e.g. 0.95

Returns:

float, FNR at the first threshold where TPR >= tpr_target

Return type:

fnr@X

probly.evaluation.ood.out_of_distribution_detection_fpr_at_x_tpr(in_distribution, out_distribution, tpr_target=0.95)[source]

Perform out-of-distribution detection using false positive rate (FPR) at a given true positive rate.

If no thresholds are specified, the default tpr_target is 0.95.

This can be epistemic uncertainty, as is common, but also e.g. softmax confidence.

Parameters:
  • in_distribution (ndarray) – numpy.ndarray, scores for in-distribution samples

  • out_distribution (ndarray) – numpy.ndarray, scores for out-of-distribution samples

  • tpr_target (float) – target TPR value in [0, 1], e.g. 0.95

Returns:

float, FPR at the first threshold where TPR >= tpr_target

Return type:

fpr_at_target

Notes

  • Assumes that larger scores correspond to the positive class (out-of-distribution).

  • If tpr_target cannot be reached, a ValueError is raised.

probly.evaluation.ood.parse_dynamic_metric(spec)[source]

Parse dynamic metric specification.

Examples

fpr@0.8 fnr@95% fpr -> default threshold is 0.95 fnr -> default threshold is 0.95

Parameters:

spec (str)

Return type:

tuple[str, float]

probly.evaluation.ood.visualize_ood(in_distribution, out_distribution, plot_types=None, invert_scores=True)[source]

Generate visualization plots from OOD scores.

Parameters: in_distribution :

Scores for in-distribution samples.

out_distribution :

Scores for out-of-distribution samples.

plot_typeslist[str], optional

List of specific plots to return (e.g. [‘roc’, ‘hist’, ‘pr’]). If None, all plots are generated.

invert_scoresbool

If True (default), assumes scores are ‘Confidence’ (High = ID). They will be inverted (1.0 - score) for metrics where OOD is the positive class. If False, assumes scores are ‘Anomaly Scores’ (High = OOD).

Returns:

Dict containing matplotlib Figures for the requested plots.

Parameters:
Return type:

dict[str, Figure]