probly.datasets.torch.ImageNetReaL

class probly.datasets.torch.ImageNetReaL(root: str | Path, transform: Callable[..., Any] | None = None)[source]

Bases: ImageNet

A Dataset class for the ImageNet ReaL dataset introduced in [BHenaffK+20].

This dataset is a re-labeled version of the ImageNet validation set, where each image can belong to multiple classes resulting in a distribution over classes. The ImageNet dataset needs to be downloaded from https://www.image-net.org and the first order labels can be downloaded from https://github.com/google-research/reassessed-imagenet.

Initialize an instance of the ImageNetReaL class.

Parameters:
  • root – Root directory of the dataset.

  • transform – Optional transform to apply to the data.

dists: list

List of distributions over target classes.

extra_repr() str[source]
find_classes(directory: str | Path) tuple[list[str], dict[str, int]][source]

Find the class folders in a dataset structured as follows:

directory/
├── class_x
│   ├── xxx.ext
│   ├── xxy.ext
│   └── ...
│       └── xxz.ext
└── class_y
    ├── 123.ext
    ├── nsdf3.ext
    └── ...
    └── asd932_.ext

This method can be overridden to only consider a subset of classes, or to adapt to a different dataset directory structure.

Parameters:

directory (str) – Root directory path, corresponding to self.root

Raises:

FileNotFoundError – If dir has no class folders.

Returns:

List of all classes and dictionary mapping each class to an index.

Return type:

(Tuple[List[str], Dict[str, int]])

static make_dataset(directory: str | Path, class_to_idx: dict[str, int], extensions: tuple[str, ...] | None = None, is_valid_file: Callable[[str], bool] | None = None, allow_empty: bool = False) list[tuple[str, int]][source]

Generates a list of samples of a form (path_to_sample, class).

This can be overridden to e.g. read files from a compressed zip file instead of from the disk.

Parameters:
  • directory (str) – root dataset directory, corresponding to self.root.

  • class_to_idx (Dict[str, int]) – Dictionary mapping class name to class index.

  • extensions (optional) – A list of allowed extensions. Either extensions or is_valid_file should be passed. Defaults to None.

  • is_valid_file (optional) – A function that takes path of a file and checks if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed. Defaults to None.

  • allow_empty (bool, optional) – If True, empty folders are considered to be valid classes. An error is raised on empty folders if False (default).

Raises:
  • ValueError – In case class_to_idx is empty.

  • ValueError – In case extensions and is_valid_file are None or both are not None.

  • FileNotFoundError – In case no valid file was found for any class.

Returns:

samples of a form (path_to_sample, class)

Return type:

List[Tuple[str, int]]

parse_archives() None[source]
property split_folder: str