probly.data_generation.first_order_datagenerator

Backend/General First-Order data generator.

General implementation using pure Python constructs (no torch dependency).

Functions

output_dataloader(base_dataset, distributions, *)

Create a loader that yields inputs (and labels if present) with distributions.

Classes

DatasetLike(*args, **kwargs)

Minimal dataset protocol for typing (len and index access).

FirstOrderDataGenerator(model[, device, ...])

General backend first-order data generator.

FirstOrderDataset(base_dataset, distributions)

Dataset wrapper pairing inputs (and labels if present) with distributions.

SimpleDataLoader(dataset[, batch_size, shuffle])

A minimal Python data loader that batches items by index.

class probly.data_generation.first_order_datagenerator.DatasetLike(*args, **kwargs)[source]

Bases: Protocol

Minimal dataset protocol for typing (len and index access).

class probly.data_generation.first_order_datagenerator.FirstOrderDataGenerator(model, device='cpu', batch_size=64, output_mode='auto', output_transform=None, input_getter=None, model_name=None, return_numpy=True)[source]

Bases: object

General backend first-order data generator.

Parameters:
  • model (Callable[..., Any])

  • device (str)

  • batch_size (int)

  • output_mode (str)

  • output_transform (Callable[[Any], Any] | None)

  • input_getter (Callable[[Any], Any] | None)

  • model_name (str | None)

  • return_numpy (bool)

generate_distributions(dataset_or_loader, *, progress=True)[source]

Generate per-sample distributions for a dataset or loader.

Parameters:
Return type:

object

load_distributions(path, *, return_numpy=True)[source]

Load distributions and metadata from JSON.

When return_numpy is True (default), returns numpy arrays for each distribution row. If False, returns plain Python lists of floats.

Parameters:
Return type:

tuple[dict[int, Any], dict[str, Any]]

prepares_batch_inp(sample)[source]

Extract the model input from a dataset sample.

Behavior: - If input_getter is provided use it to obtain the input. - If the sample is a tuple like (input, label, …), return the first element. - Otherwise, return the sample as-is.

Notes: - Lists are treated as input-only feature vectors and are NOT unpacked.

Parameters:

sample (object)

Return type:

object

save_distributions(path, distributions, *, meta=None)[source]

Save distributions and optional metadata as JSON.

Parameters:
  • path (str | Path)

  • distributions (Mapping[int, Iterable[float]])

  • meta (dict[str, Any] | None)

Return type:

None

to_probs(outputs)[source]

Convert raw model outputs to probability rows.

Parameters:

outputs (object)

Return type:

object

batch_size: int = 64
device: str = 'cpu'
input_getter: Callable[[Any], Any] | None = None
model: Callable[..., Any]
model_name: str | None = None
output_mode: str = 'auto'
output_transform: Callable[[Any], Any] | None = None
return_numpy: bool = True
class probly.data_generation.first_order_datagenerator.FirstOrderDataset(base_dataset, distributions, input_getter=None, return_numpy=True)[source]

Bases: object

Dataset wrapper pairing inputs (and labels if present) with distributions.

Parameters:
class probly.data_generation.first_order_datagenerator.SimpleDataLoader(dataset, batch_size=64, shuffle=False)[source]

Bases: object

A minimal Python data loader that batches items by index.

Parameters:
probly.data_generation.first_order_datagenerator.output_dataloader(base_dataset, distributions, *, batch_size=64, shuffle=False, num_workers=0, pin_memory=False, input_getter=None, return_numpy=True)[source]

Create a loader that yields inputs (and labels if present) with distributions.

Note: num_workers and pin_memory are kept for API parity with Torch but ignored here.

Parameters:
  • base_dataset (DatasetLike)

  • distributions (Mapping[int, Iterable[float]])

  • batch_size (int)

  • shuffle (bool)

  • num_workers (int)

  • pin_memory (bool)

  • input_getter (Callable[[Any], Any] | None)

  • return_numpy (bool)

Return type:

SimpleDataLoader