probly.datasets.torch¶
Collection of dataset classes for loading data from different datasets.
Classes
|
Implementation of the Benthic dataset. |
|
A Dataset class for the CIFAR10H dataset introduced in [PBGR19]. |
|
A Dataset base class for the DCICDatasets introduced in [SGZ+22]. |
|
A Dataset class for the ImageNet ReaL dataset introduced in [BHenaffK+20]. |
|
Implementation of the Plankton dataset. |
|
Implementation of the QualityMRI dataset. |
|
Implementation of the Treeversity#1 dataset. |
|
Implementation of the Treeversity#6 dataset. |
- class probly.datasets.torch.Benthic(root, transform=None, *, first_order=True)[source]¶
Bases:
DCICDatasetImplementation of the Benthic dataset.
The dataset can be found at https://zenodo.org/records/7180818.
- class probly.datasets.torch.CIFAR10H(root, transform=None, *, download=False)[source]¶
Bases:
CIFAR10A Dataset class for the CIFAR10H dataset introduced in [PBGR19].
The dataset can be found at https://github.com/jcpeterson/cifar-10h.
- counts¶
Tensor containing counts.
- Type:
- targets¶
Tensor of size (n_instances, n_classes), first-order distribution.
- Type:
- download()¶
- Return type:
None
- base_folder = 'cifar-10-batches-py'¶
- filename = 'cifar-10-python.tar.gz'¶
- meta = {'filename': 'batches.meta', 'key': 'label_names', 'md5': '5ff9c542aee3614f3951f8cda6e48888'}¶
- test_list = [['test_batch', '40351d587109b95175f43aff81a1287e']]¶
- tgz_md5 = 'c58f30108f718f92721af3b95e74349a'¶
- train_list = [['data_batch_1', 'c99cafc152244af753f735de768cd75f'], ['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'], ['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'], ['data_batch_4', '634d18415352ddfa80567beed471001a'], ['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb']]¶
- url = 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'¶
- class probly.datasets.torch.DCICDataset(root, transform=None, *, first_order=True)[source]¶
Bases:
DatasetA Dataset base class for the DCICDatasets introduced in [SGZ+22].
These datasets can be found at https://zenodo.org/records/7180818.
- transform¶
Transform to apply to the data.
- Type:
Callable
- class probly.datasets.torch.ImageNetReaL(root, transform=None)[source]¶
Bases:
ImageNetA Dataset class for the ImageNet ReaL dataset introduced in [BHenaffK+20].
This dataset is a re-labeled version of the ImageNet validation set, where each image can belong to multiple classes resulting in a distribution over classes. The ImageNet dataset needs to be downloaded from https://www.image-net.org and the first order labels can be downloaded from https://github.com/google-research/reassessed-imagenet.
- Parameters:
root (str | Path)
transform (Callable[..., Any] | None)
- static make_dataset(directory, class_to_idx, extensions=None, is_valid_file=None, allow_empty=False)¶
Generates a list of samples of a form (path_to_sample, class).
This can be overridden to e.g. read files from a compressed zip file instead of from the disk.
- Parameters:
directory (str) – root dataset directory, corresponding to
self.root.class_to_idx (Dict[str, int]) – Dictionary mapping class name to class index.
extensions (optional) – A list of allowed extensions. Either extensions or is_valid_file should be passed. Defaults to None.
is_valid_file (optional) – A function that takes path of a file and checks if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed. Defaults to None.
allow_empty (bool, optional) – If True, empty folders are considered to be valid classes. An error is raised on empty folders if False (default).
- Raises:
ValueError – In case
class_to_idxis empty.ValueError – In case
extensionsandis_valid_fileare None or both are not None.FileNotFoundError – In case no valid file was found for any class.
- Returns:
samples of a form (path_to_sample, class)
- Return type:
- find_classes(directory)¶
Find the class folders in a dataset structured as follows:
directory/ ├── class_x │ ├── xxx.ext │ ├── xxy.ext │ └── ... │ └── xxz.ext └── class_y ├── 123.ext ├── nsdf3.ext └── ... └── asd932_.extThis method can be overridden to only consider a subset of classes, or to adapt to a different dataset directory structure.
- Parameters:
directory (str) – Root directory path, corresponding to
self.root- Raises:
FileNotFoundError – If
dirhas no class folders.- Returns:
List of all classes and dictionary mapping each class to an index.
- Return type:
- parse_archives()¶
- Return type:
None
- class probly.datasets.torch.Plankton(root, transform=None, *, first_order=True)[source]¶
Bases:
DCICDatasetImplementation of the Plankton dataset.
The dataset can be found at https://zenodo.org/records/7180818.
- class probly.datasets.torch.QualityMRI(root, transform=None, *, first_order=True)[source]¶
Bases:
DCICDatasetImplementation of the QualityMRI dataset.
The dataset can be found at https://zenodo.org/records/7180818.
- class probly.datasets.torch.Treeversity1(root, transform=None, *, first_order=True)[source]¶
Bases:
DCICDatasetImplementation of the Treeversity#1 dataset.
The dataset can be found at https://zenodo.org/records/7180818.
- class probly.datasets.torch.Treeversity6(root, transform=None, *, first_order=True)[source]¶
Bases:
DCICDatasetImplementation of the Treeversity#6 dataset.
The dataset can be found at https://zenodo.org/records/7180818.