torchvision.datasets¶

所有数据集都是torch.utils.data.Dataset的子类，即，它们实现了__getitem__和__len__方法。因此，它们都可以传递给一个torch.utils.data.DataLoader，该类可以使用torch.multiprocessing个工作者并行加载多个样本。例如：

imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)

以下数据集可用：

数据集

所有的数据集都有几乎相似的API。它们都有两个通用参数： transform 和 target_transform 分别用于转换输入和目标。

CelebA ¶

class torchvision.datasets.CelebA(root: str, split: str = 'train', target_type: Union[List[str], str] = 'attr', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

大规模名人面部属性（CelebA）数据集数据集。

Parameters:

根目录 (字符串) – 图像下载到的根目录。
划分 (字符串) – 可以为 {‘训练’, ‘验证’, ‘测试’, ‘全部’}。根据选择的数据集进行相应划分。
target_type (字符串 or 列表, 可选的) –
使用的目标类型，attr，identity，bbox，或 landmarks。也可以是一个列表，以输出包含所有指定目标类型的元组。这些目标代表：

attr (np.array shape=(40,) dtype=int): binary (0, 1) labels for attributes identity (int): label for each person (data points with the same identity are the same person) bbox (np.array shape=(4,) dtype=int): bounding box (x, y, width, height) landmarks (np.array shape=(10,) dtype=int): landmark points (lefteye_x, lefteye_y, righteye_x,

righteye_y, nose_x, nose_y, leftmouth_x, leftmouth_y, rightmouth_x, rightmouth_y)

默认为 attr。如果为空，None 将作为目标返回。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.ToTensor
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

CIFAR ¶

class torchvision.datasets.CIFAR10(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

CIFAR10 Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其中目录 cifar-10-batches-py 已存在或将被保存，如果设置了 download=True。
训练 (bool, 可选) – 如果为True，从训练集创建数据集，否则从测试集创建。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（图像，目标）其中目标是目标类别的索引。
返回类型：	元组

class torchvision.datasets.CIFAR100(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

CIFAR100 Dataset.

这是CIFAR10数据集的一个子类。

城市景观数据集 ¶

注意

需要下载 Cityscapes 数据集。

class torchvision.datasets.Cityscapes(root: str, split: str = 'train', mode: str = 'fine', target_type: Union[List[str], str] = 'instance', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]¶

城市景观数据集。

Parameters:

根目录 (字符串) – 数据集的根目录，其中包含目录 leftImg8bit 和 gtFine 或 gtCoarse。
分割 (字符串，可选) – 要使用的图像分割，train，test 或 val 如果模式为”fine”，否则 train，train_extra 或 val
模式 (字符串，可选) – 要使用的质量模式，fine 或 coarse
目标类型 (字符串 或列表, 可选) – 使用的目标类型，instance, semantic, polygon 或 color. 也可以是一个列表以输出包含所有指定目标类型的元组。
变换 (可调用对象,可选) – 一个函数/变换，它接收一个PIL图像并返回变换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。

示例

获取语义分割目标

dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type='semantic')

img, smnt = dataset[0]

获取多个目标

dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type=['instance', 'color', 'polygon'])

img, (inst, col, poly) = dataset[0]

在“粗略”数据集上验证

dataset = Cityscapes('./data/cityscapes', split='val', mode='coarse',
                     target_type='semantic')

img, smnt = dataset[0]

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	(image, target) 其中 target 是一个包含所有目标类型的元组，如果 target_type 是包含多个项目的列表。否则，如果 target_type 为 "polygon"，target 是一个 json 对象，否则为图像分割。
返回类型：	元组

COCO ¶

注意

这些需要安装COCO API

字幕 ¶

class torchvision.datasets.CocoCaptions(root: str, annFile: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]¶

MS Coco Captions 数据集。

Parameters:

根目录 (字符串) – 图像下载到的根目录。
annFile (字符串) – JSON注释文件的路径。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.ToTensor
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。

示例

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	元组 (图像，目标)。目标是图像的一组字幕。
返回类型：	元组

检测 ¶

class torchvision.datasets.CocoDetection(root: str, annFile: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None) → None[source]¶

MS Coco Detection 数据集。

Parameters:

根目录 (字符串) – 图像下载到的根目录。
annFile (字符串) – JSON注释文件的路径。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.ToTensor
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	元组 (图像，目标)。目标是由 `coco.loadAnns` 返回的对象。
返回类型：	元组

DatasetFolder ¶

class torchvision.datasets.DatasetFolder(root: str, loader: Callable[[str], Any], extensions: Union[Tuple[str, ...], NoneType] = None, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, is_valid_file: Union[Callable[[str], bool], NoneType] = None) → None[source]¶

一个通用的数据加载器，其中样本按如下方式排列：

root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext

root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext

Parameters:

根目录 (字符串) – 根目录路径。
加载器 (可调用) – 给定其路径时用于加载样本的函数。
extensions (元组[字符串]) – 允许的扩展名列表。不应同时传递 extensions 和 is_valid_file。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个样本并返回其变换版本。例如，transforms.RandomCrop用于图像。
target_transform (可调用对象, 可选) – 一个函数/转换，它接收目标并对其进行变换。
is_valid_file – 一个接受文件路径的函数并检查该文件是否为有效的文件（用于检查损坏的文件） extensions 和 is_valid_file 不能同时传递。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	(sample, target) 其中 target 是目标类别的 class_index。
返回类型：	元组

EMNIST ¶

class torchvision.datasets.EMNIST(root: str, split: str, **kwargs) → None[source]¶

EMNIST Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其中存在 EMNIST/processed/training.pt 和 EMNIST/processed/test.pt。
划分 (字符串) - 数据集有6种不同的划分：byclass，bymerge， balanced，letters，digits 和 mnist。此参数指定要使用哪一个。
训练 (bool, 可选) – 如果为 True，从 training.pt 创建数据集，否则从 test.pt。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

FakeData ¶

class torchvision.datasets.FakeData(size: int = 1000, image_size: Tuple[int, int, int] = (3, 224, 224), num_classes: int = 10, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, random_offset: int = 0) → None[source]¶

一个返回随机生成图像的假数据集，并将其作为PIL图像返回。

Parameters:

大小 (int, 可选) – 数据集的大小。默认：1000张图像
image_size (元组, 可选) – 返回图像的大小。默认值：(3, 224, 224)
类别数 (int, 可选) – 数据集中的类别数量。默认值：10
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
random_offset (int) – 偏移用于生成每个图像的基于索引的随机种子。默认值：0

Fashion-MNIST ¶

class torchvision.datasets.FashionMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

Fashion-MNIST Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其中存在 FashionMNIST/processed/training.pt 和 FashionMNIST/processed/test.pt。
训练 (bool, 可选) – 如果为 True，从 training.pt 创建数据集，否则从 test.pt。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

Flickr图片库 ¶

class torchvision.datasets.Flickr8k(root: str, ann_file: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]¶

Flickr8k 实体数据集。

Parameters:	根目录 (字符串) – 图像下载到的根目录。 ann_file (字符串) – 注释文件路径。变换 (可调用对象,可选) – 一个函数/变换，它接收一个PIL图像并返回变换后的版本。例如，`transforms.ToTensor` 目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	元组 (图像，目标)。目标是图像的一组字幕。
返回类型：	元组

class torchvision.datasets.Flickr30k(root: str, ann_file: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]¶

Flickr30k 实体数据集。

Parameters:	根目录 (字符串) – 图像下载到的根目录。 ann_file (字符串) – 注释文件路径。变换 (可调用对象,可选) – 一个函数/变换，它接收一个PIL图像并返回变换后的版本。例如，`transforms.ToTensor` 目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	元组 (图像，目标)。目标是图像的一组字幕。
返回类型：	元组

HMDB51 ¶

class torchvision.datasets.HMDB51(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]¶

HMDB51 dataset.

HMDB51 是一个动作识别视频数据集。该数据集将每个视频视为固定大小的视频片段集合，由 frames_per_clip 指定，其中每个片段之间的步长以帧为单位，由 step_between_clips 给出。

举个例子，对于分别有10帧和15帧的两段视频来说，如果frames_per_clip=5 和step_between_clips=5，数据集大小将是(2 + 3) = 5，其中前两个元素来自视频1，接下来的三个元素来自视频2。请注意，我们会丢弃那些不正好包含frames_per_clip个元素的片段，因此视频中的某些帧可能不会出现。

内部使用 VideoClips 对象来处理剪辑创建。

Parameters:

根目录 (字符串) – HMDB51 数据集的根目录。
annotation_path (str) – 包含分割文件的文件夹路径。
frames_per_clip (int) – 视频片段中的帧数。
step_between_clips (int) – 每个片段之间的帧数。
折迭 (int, 可选) – 使用哪个折迭。应在1和3之间。
train (bool, optional) – 如果 True，则从训练集创建数据集，否则从 test 集创建。
变换 (可调用对象,可选) – 一个函数/变换，它接收一个TxHxWxC视频并返回变换后的版本。

Returns:

第 T 帧视频音频(Tensor[K, L])：音频帧，其中 K 是通道数

and L is the number of points

标签（int）：视频片段的类别

返回类型：

视频 (张量 [T, H, W, C])

ImageFolder ¶

class torchvision.datasets.ImageFolder(root: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, loader: Callable[[str], Any] = <function default_loader>, is_valid_file: Union[Callable[[str], bool], NoneType] = None)[source]¶

一个通用的数据加载器，其中图像以如下方式排列：

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

Parameters:

根目录 (字符串) – 根目录路径。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
加载器 (可调用对象,可选) – 给定其路径后用于加载图像的函数。
is_valid_file – 一个接受图像文件路径的函数，并检查该文件是否为有效的文件（用于检查损坏的文件）

__getitem__(index: int) → Tuple[Any, Any]¶

Parameters:	索引 (整数) – 索引
Returns:	(sample, target) 其中 target 是目标类别的 class_index。
返回类型：	元组

ImageNet ¶

class torchvision.datasets.ImageNet(root: str, split: str = 'train', download: Union[str, NoneType] = None, **kwargs) → None[source]¶

ImageNet 2012 分类数据集。

Parameters:

根目录 (字符串) – ImageNet 数据集的根目录。
数据集划分 (字符串，可选) – 数据集的划分，支持train，或val。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
加载器 – 给定其路径后加载图像的函数。

注意

这需要安装scipy

Kinetics-400 ¶

class torchvision.datasets.Kinetics400(root, frames_per_clip, step_between_clips=1, frame_rate=None, extensions=('avi', ), transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0, _audio_channels=0)[source]¶

Kinetics-400 dataset.

Kinetics-400 是一个动作识别视频数据集。该数据集将每个视频视为固定大小的视频片段集合，指定为 frames_per_clip，其中每个片段之间的帧步长由 step_between_clips 给出。

举个例子，对于分别有10帧和15帧的两段视频来说，如果frames_per_clip=5 和step_between_clips=5，数据集大小将是(2 + 3) = 5，其中前两个元素来自视频1，接下来的三个元素来自视频2。请注意，我们会丢弃那些不正好包含frames_per_clip个元素的片段，因此视频中的某些帧可能不会出现。

内部使用 VideoClips 对象来处理剪辑创建。

Parameters:

根目录 (字符串) – Kinetics-400 数据集的根目录。
frames_per_clip (int) – 剪辑中的帧数
step_between_clips (int) – 每个片段之间的帧数
变换 (可调用对象,可选) – 一个函数/变换，它接收一个TxHxWxC视频并返回变换后的版本。

Returns:

第 T 帧视频音频(Tensor[K, L])：音频帧，其中 K 是通道数

and L is the number of points

标签（int）：视频片段的类别

返回类型：

视频 (张量 [T, H, W, C])

KMNIST ¶

class torchvision.datasets.KMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

Kuzushiji-MNIST Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其中存在 KMNIST/processed/training.pt 和 KMNIST/processed/test.pt。
训练 (bool, 可选) – 如果为 True，从 training.pt 创建数据集，否则从 test.pt。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

LSUN ¶

class torchvision.datasets.LSUN(root: str, classes: Union[str, List[str]] = 'train', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None) → None[source]¶

LSUN dataset.

Parameters:

根目录 (字符串) – 数据库文件的根目录。
类别 (字符串 或列表) – 其中之一 {‘train’, ‘val’, ‘test’} 或一个要加载的类别列表。例如 [‘bedroom_train’, ‘church_outdoor_train’].
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	元组 (图像，目标)，其中目标是目标类别的索引。
返回类型：	元组

MNIST ¶

class torchvision.datasets.MNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

MNIST Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其中存在 MNIST/processed/training.pt 和 MNIST/processed/test.pt。
训练 (bool, 可选) – 如果为 True，从 training.pt 创建数据集，否则从 test.pt。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。

omnilingua ¶

class torchvision.datasets.Omniglot(root: str, background: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

Omniglot 数据集。 :param root: 数据集根目录，其中包含目录

omniglot-py exists.

Parameters:

背景 (bool, 可选) – 如果为 True，从“背景”数据集创建数据集，否则从“评估”数据集创建。此术语由作者定义。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集zip文件并将其放置在根目录中。如果zip文件已经下载，则不会再次下载。

PhotoTour ¶

class torchvision.datasets.PhotoTour(root: str, name: str, train: bool = True, transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

学习局部图像描述符数据数据集。

Parameters:	根目录 (字符串) – 图像所在的根目录。名称 (字符串) – 要加载的数据集名称。变换 (可调用对象,可选) – 一个函数/变换，它接收一个PIL图像并返回变换后的版本。下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Union[torch.Tensor, Tuple[Any, Any, torch.Tensor]][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（data1、data2、matches）
返回类型：	元组

Places365数据集 ¶

class torchvision.datasets.Places365(root: str, split: str = 'train-standard', small: bool = False, download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, loader: Callable[[str], Any] = <function default_loader>) → None[source]¶

Places365 分类数据集。

Parameters:

根目录 (字符串) – Places365 数据集的根目录。
划分 (字符串，可选) – 数据集的划分。可以是以下之一：train-standard（默认值），train-challendge， val。
small (bool, optional) – 如果为True，则使用小图像，即缩放至 256 x 256 像素的图像，而不是高分辨率图像。
下载 (bool, 可选) – 如果 True，则下载数据集组件并将其放置在 root 中。已经下载的存档不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
加载器 – 给定其路径后加载图像的函数。

Raises:

RuntimeError – 如果download is False和元数据文件，即开发工具包，不存在或已损坏。
RuntimeError – 如果 download is True 并且图像存档已经提取。

QMNIST ¶

class torchvision.datasets.QMNIST(root: str, what: Union[str, NoneType] = None, compat: bool = True, train: bool = True, **kwargs) → None[source]¶

QMNIST Dataset.

Parameters:

根目录 (字符串) – 数据集的根目录，其processed子目录包含数据集的torch二进制文件。
数据集类型 (字符串,可选) – 可以为 ‘train’，‘test’，‘test10k’， ‘test50k’ 或 ‘nist’，分别表示与 MNIST 兼容的训练集、60k qmnist 测试集、10k qmnist 示例（与 MNIST 测试集匹配）、剩余的 50k qmnist 测试示例，或所有 NIST 数字。默认情况下，根据兼容性参数 ‘train’ 选择 ‘train’ 或 ‘test’。
兼容性 (bool,可选) – 一个布尔值，指示每个示例的目标是类别编号（与MNIST数据加载器兼容）还是包含完整qmnist信息的torch向量。默认为True。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，将不再重新下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回一个转换后的版本。例如，transforms.RandomCrop
target_transform (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
训练 (bool,可选,兼容性) – 当参数‘what’未指定时，此布尔值决定是否加载训练集或测试集。默认值：True。

SBD ¶

class torchvision.datasets.SBDataset(root: str, image_set: str = 'train', mode: str = 'boundaries', download: bool = False, transforms: Union[Callable, NoneType] = None) → None[source]¶

语义边界数据集

当前的 SBD 包含了来自 PASCAL VOC 2011 数据集的 11355 张图像的标注。

注意

请注意，此数据集包含的训练集和验证集与PASCAL VOC数据集中的划分不同。特别是，一些“训练”图像可能属于VOC2012验证集。如果您有兴趣在VOC 2012验证集上进行测试，请使用image_set=’train_noval’，这将排除所有的验证集图像。

警告

此类需要scipy来从.mat格式加载目标文件。

Parameters:

根目录 (字符串) – 半语义边界数据集的根目录
图像集 (字符串，可选) – 选择要使用的图像集，train，val 或 train_noval。图像集 train_noval 排除了VOC 2012验证图像。
模式 (字符串, 可选) – 选择目标类型。可能的值为‘边界’或‘分割’。在‘边界’的情况下，目标是一个形状为[num_classes, H, W]的数组，其中num_classes=20。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。输入样本是PIL图像，目标是numpy数组如果mode=’boundaries’，或者是PIL图像如果mode=’segmentation’。

SBU ¶

class torchvision.datasets.SBU(root: str, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = True) → None[source]¶

SBU 标注照片数据集。

Parameters:

根目录 (字符串) – 数据集所在根目录，其中包含tarball SBUCaptionedPhotoDataset.tar.gz。
变换 (可调用对象,可选) – 一个函数/变换，它接收一个PIL图像并返回变换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	(图像，目标) 其中目标是该照片的字幕。
返回类型：	元组

STL10 ¶

class torchvision.datasets.STL10(root: str, split: str = 'train', folds: Union[int, NoneType] = None, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

STL10 Dataset.

Parameters:

根目录 (字符串) – 数据集所在根目录，其中包含目录 stl10_binary。
划分 (字符串) – 可以为 {‘训练’, ‘测试’, ‘未标注’, ‘训练+未标注’}。根据相应的数据集进行选择。
折叠 (整数, 可选的) –
10个预定义的1000个样本折之一，或者无。训练时，加载10个预定义的1000个样本折之一。

standard evaluation procedure. If no value is passed, loads the 5k samples.
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（图像，目标）其中目标是目标类别的索引。
返回类型：	元组

SVHN ¶

class torchvision.datasets.SVHN(root: str, split: str = 'train', transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

SVHN 数据集。注意：SVHN 数据集将数字 0 标注为 10。然而，在这个数据集中，我们将数字 0 标注为 0，以与 PyTorch 损失函数兼容，这些函数期望类别标签在范围 [0, C-1] 内。

警告

此类需要scipy来加载.mat格式的数据。

Parameters:

根目录 (字符串) – 数据集所在根目录，其中包含目录 SVHN。
划分 (字符串) – 可以为 {‘训练’, ‘测试’, ‘额外’}。根据选择的数据集进行相应划分。‘额外’为额外训练集。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（图像，目标）其中目标是目标类别的索引。
返回类型：	元组

UCF101 ¶

class torchvision.datasets.UCF101(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)[source]¶

UCF101 dataset.

UCF101 是一个动作识别视频数据集。该数据集将每个视频视为固定大小的视频片段集合，由 frames_per_clip 指定，其中每个片段之间的步长以帧为单位，由 step_between_clips 给出。

举个例子，对于分别有10帧和15帧的两段视频来说，如果frames_per_clip=5 和step_between_clips=5，数据集大小将是(2 + 3) = 5，其中前两个元素来自视频1，接下来的三个元素来自视频2。请注意，我们会丢弃那些不正好包含frames_per_clip个元素的片段，因此视频中的某些帧可能不会出现。

内部使用 VideoClips 对象来处理剪辑创建。

Parameters:

根目录 (字符串) – UCF101 数据集的根目录。
annotation_path (str) – 包含分割文件的文件夹路径
frames_per_clip (int) – 视频片段中的帧数。
step_between_clips (int, 可选) – 每两段之间的帧数。
折迭 (int, 可选) – 使用哪个折迭。应在1和3之间。
train (bool, optional) – 如果 True，则从训练集创建数据集，否则从 test 集创建。
变换 (可调用对象,可选) – 一个函数/变换，它接收一个TxHxWxC视频并返回变换后的版本。

Returns:

第 T 帧视频音频(Tensor[K, L])：音频帧，其中 K 是通道数

and L is the number of points

标签（int）：视频片段的类别

返回类型：

视频 (张量 [T, H, W, C])

USPS ¶

class torchvision.datasets.USPS(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) → None[source]¶

USPS 数据集。数据格式为：[标签 [索引:值 ]*256 n] * 行数，其中 label 位于 [1, 10] 中。每个像素的值位于 [-1, 1] 中。在这里我们将 label 转换为 [0, 9] 并将像素值设置为 [0, 255]。

Parameters:

根目录 (字符串) – 存储``USPS``数据文件的数据集根目录。
训练 (bool, 可选) – 如果为 True，从 usps.bz2 创建数据集，否则从 usps.t.bz2。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（图像，目标）其中目标是目标类别的索引。
返回类型：	元组

VOC ¶

class torchvision.datasets.VOCSegmentation(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None)[source]¶

Pascal VOC 分割数据集。

Parameters:

根目录 (字符串) – VOC 数据集的根目录。
年份 (字符串，可选) – 数据集的年份，支持2007年至2012年的数据。
图像集 (字符串，可选) – 选择要使用的图像集，train，trainval 或 val
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象, 可选) – 一个函数/变换，它接收目标并对其进行转换。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	（图像，目标）其中目标是图像分割。
返回类型：	元组

class torchvision.datasets.VOCDetection(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, transforms: Union[Callable, NoneType] = None)[source]¶

Pascal VOC 目标检测数据集。

Parameters:

根目录 (字符串) – VOC 数据集的根目录。
年份 (字符串，可选) – 数据集的年份，支持2007年至2012年的数据。
图像集 (字符串，可选) – 选择要使用的图像集，train，trainval 或 val
下载 (bool, 可选) – 如果为真，从互联网下载数据集并将其放入根目录。如果数据集已经下载，则不会再次下载。 (默认值：VOC的20个类别的字母顺序索引)。
转换 (可调用对象,可选) – 一个函数/转换，它接收一个PIL图像并返回转换后的版本。例如，transforms.RandomCrop
目标转换 (可调用对象,必需) – 一个函数/转换，它接收目标并对其进行变换。
变换 (可调用对象,可选) – 一个函数/变换，它以输入样本及其目标为输入，并返回变换后的版本。

__getitem__(index: int) → Tuple[Any, Any][source]¶

Parameters:	索引 (整数) – 索引
Returns:	(image, target) 其中 target 是 XML 树的字典。
返回类型：	元组

torchvision.datasets¶

CelebA ¶

CIFAR ¶

城市景观数据集 ¶

COCO ¶

字幕 ¶

检测 ¶

DatasetFolder ¶

EMNIST ¶

FakeData ¶

Fashion-MNIST ¶

Flickr图片库 ¶

HMDB51 ¶

ImageFolder ¶

ImageNet ¶

Kinetics-400 ¶

KMNIST ¶

LSUN ¶

MNIST ¶

omnilingua ¶

PhotoTour ¶

Places365数据集 ¶

QMNIST ¶

SBD ¶

SBU ¶

STL10 ¶

SVHN ¶

UCF101 ¶

USPS ¶

VOC ¶

文档

教程

资源