torchvision.ops 中¶

torchvision.ops实现特定于计算机视觉的运算符。

注意

所有 Operator 都对 TorchScript 提供原生支持。

torchvision.ops.batched_nms(盒子：Torch。Tensor，分数：torch。Tensor、idxs：torch 的 Tensor 和 IDXS 的 Tensor 中。Tensor，iou_threshold：float） → torch。张量 [来源]¶

以批处理方式执行非极大值抑制。

每个索引值对应一个类别，NMS 不会在不同类别的元素之间应用。

参数

boxes （Tensor[N， 4]） – 将执行 NMS 的框。他们的格式应为和。(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2
scores （Tensor[N]） – 每个框的分数
idxs （Tensor[N]） – 每个框的类别索引。
iou_threshold （float） – 丢弃所有具有 IoU > iou_threshold的重叠框

返回

int64 张量，其中包含 NMS 保存的元素的索引，已排序按分数降序排列

返回类型

张肌

torchvision.ops.box_area(盒子：Torch。Tensor） → torch 的 Tensor 中。张量 [来源]¶

计算一组边界框的面积，这些边界框由其（x1， y1， x2， y2）坐标。

参数: boxes （Tensor[N， 4]） – 将计算面积的框。他们应为（x1， y1， x2， y2）格式，其中和。0 <= x1 < x20 <= y1 < y2
返回: 每个框的面积
返回类型: 张量[N]

torchvision.ops.box_convert(盒子：Torch。Tensor、in_fmt：str、out_fmt：str）→ torch。张量 [来源]¶

将框从给定in_fmt转换为out_fmt。支持的 in_fmt 和 out_fmt 包括：

'xyxy'：方框由角表示，x1、y1 是左上角，x2、y2 是右下角。这是 torchvision 实用程序期望的格式。

'xywh' ：框通过角、宽度和高度表示，x1，y2 是左上角，w、h 是宽度和高度。

'cxcywh' ：框通过中心、宽度和高度表示，cx，cy 是框的中心，w，h 是 width 和 height。

参数

boxes （Tensor[N， 4]） – 将被转换的框。
in_fmt （str） - 给定框的输入格式。支持的格式为 ['xyxy'， 'xywh'， 'cxcywh']。
out_fmt （str） – 给定框的输出格式。支持的格式为 ['xyxy'， 'xywh'， 'cxcywh']

返回

框转换为转换格式。

返回类型

张量[N， 4]

torchvision.ops.box_iou(方框 1：火把。Tensor，box2：torch 的 Tensor 和 Tensor 的 Tensor 中。Tensor） → torch 的 Tensor 中。张量 [来源]¶

返回两组框之间的交集与联合（Jaccard index）。

这两组框的格式都应为和。(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2

参数

boxes1 （Tensor[N， 4]） – 第一组框
boxes2 （Tensor[M， 4]） – 第二组框

返回

包含 box1 和 boxes2 中每个元素的成对 IoU 值的 NxM 矩阵

返回类型

张量 [N， M]

torchvision.ops.clip_boxes_to_image(盒子：Torch。Tensor，大小：Tuple[int， int]） → torch。张量 [来源]¶

剪辑框，以便它们位于 size 大小的图像内。

参数

boxes （Tensor[N， 4]） – 格式为 with 和 .(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2
size （Tuple[height， width]） – 图像的大小

返回

剪裁的框

返回类型

张量[N， 4]

torchvision.ops.deform_conv2d(输入：Torch。Tensor，偏移量：torch。张量，权重：torch。Tensor， bias：可选[torch.Tensor] = None， 步幅： Tuple[int， int] = （1， 1），填充： Tuple[int， int] = （0， 0），膨胀：元组[int， int] = （1， 1）， 掩码：可选[torch.Tensor] = None） → torch 的 Torch 中。张量 [来源]¶

执行可变形卷积 v2，如 Deformable ConvNets v2：更可变形，效果更好（如果不是）中所述，并且执行可变形卷积，如果为，则执行可变形卷积网络中所述。maskNonemaskNone

参数

input （Tensor[batch_size， in_channels， in_height， in_width]） – 输入张量
offset （Tensor[batch_size， 2 * offset_groups * kernel_height * kernel_width， out_height， out_width]） – 要应用于卷积核中每个位置的偏移量。
weight （Tensor[out_channels， in_channels // 组， kernel_height， kernel_width]） – 卷积权重，拆分为大小组（in_channels // 组）
bias （Tensor[out_channels]） - 形状（out_channels，）的可选偏差。默认值：无
stride （int or Tuple[int， int]） - 卷积中心之间的距离。默认值：1
padding （int or Tuple[int， int]） – 0 周围填充的高度/宽度每张图片。默认值：0
dilation （int or Tuple[int， int]） - 内核元素之间的间距。默认值：1
mask （Tensor[batch_size， offset_groups * kernel_height * kernel_width， out_height， out_width]） – 要应用于卷积核中每个位置的掩码。默认值：无

返回

卷积的结果

返回类型

张量[batch_sz、out_channels、out_h、out_w]

例子：：

>>> input = torch.rand(4, 3, 10, 10)
>>> kh, kw = 3, 3
>>> weight = torch.rand(5, 3, kh, kw)
>>> # offset and mask should have the same spatial size as the output
>>> # of the convolution. In this case, for an input of 10, stride of 1
>>> # and kernel size of 3, without padding, the output size is 8
>>> offset = torch.rand(4, 2 * kh * kw, 8, 8)
>>> mask = torch.rand(4, kh * kw, 8, 8)
>>> out = deform_conv2d(input, offset, weight, mask=mask)
>>> print(out.shape)
>>> # returns
>>>  torch.Size([4, 5, 8, 8])

torchvision.ops.generalized_box_iou(方框 1：火把。Tensor，box2：torch 的 Tensor 和 Tensor 的 Tensor 中。Tensor） → torch 的 Tensor 中。张量 [来源]¶

返回两组框之间的广义交集（Jaccard index）。

这两组框的格式都应为和。(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2

参数

boxes1 （Tensor[N， 4]） – 第一组框
boxes2 （Tensor[M， 4]） – 第二组框

返回

包含成对广义 IoU 值的 NxM 矩阵对于 box1 和 box2 中的每个元素

返回类型

张量 [N， M]

torchvision.ops.masks_to_boxes(面具：手电筒。Tensor） → torch 的 Tensor 中。张量 [来源]¶

计算提供的蒙版周围的边界框。

返回包含边界框的 [N， 4] 张量。框的格式为和。(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2

参数: masks （Tensor[N， H， W]） – 要转换的掩码，其中 N 是掩码的数量和（H， W）是空间维度。
返回: 边界框
返回类型: 张量[N， 4]

使用的示例：masks_to_boxes

将蒙版重新用于边界框¶

torchvision.ops.nms(盒子：Torch。Tensor，分数：torch。Tensor，iou_threshold：float）→ torch。张量 [来源]¶

根据对盒子执行非极大值抑制（NMS）到它们的交集对联合（IoU）。

NMS 迭代删除具有 IoU 大于 iou_threshold 且另一个 IoU 大于 IoU （更高的分数）箱。

如果多个框具有完全相同的分数并满足 IoU criterion 相对于引用框，则所选框为不能保证 CPU 和 GPU 之间相同。这与此类似到 PyTorch 中存在重复值时 argsort 的行为。

参数

boxes （Tensor[N， 4]）） – 要对其执行 NMS 的框。他们的格式应为和。(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2
scores （Tensor[N]） – 每个框的分数
iou_threshold （float） – 丢弃所有具有 IoU > iou_threshold的重叠框

返回

int64 张量，其中包含已保留的元素的索引按 NMS，按分数降序排序

返回类型

张肌

torchvision.ops.ps_roi_align(输入：Torch。张量，框：torch。张量，output_size：int，spatial_scale：float = 1.0，sampling_ratio：int = - 1） → 割torch。张量 [来源]¶

执行位置敏感感兴趣区域（RoI）对齐运算符在 Light-Head R-CNN 中被提及。

参数

input （Tensor[N， C， H， W]） - 输入张量，即包含元素的批处理。每个元素包含 Dimensions 的特征图。NCH x W
boxes （Tensor[K， 5] 或 List[Tensor[L， 4]]） – （x1， y1， x2， y2）中的框坐标格式。坐标必须满足和。如果传递了单个 Tensor，则第一列应包含批处理中相应元素的索引，即中的数字。如果传递了 Tensor 列表，则每个 Tensor 将对应于元素 i 的框在批处理中。0 <= x1 < x20 <= y1 < y2[0, N - 1]
output_size （int or Tuple[int， int]） – 池化后输出的大小（以 bin 或像素为单位）执行为（height， width）。
spatial_scale （float） – 将输入坐标映射到的比例因子盒子坐标。默认值：1.0
sampling_ratio （int） – 插值网格中的采样点数量用于计算每个共用输出箱的输出值。如果> 0，则然后精确地使用每个 bin 的采样点。如果 <= 0，则使用自适应数量的网格点（计算为，高度也相同）。默认值：-1sampling_ratio x sampling_ratioceil(roi_width / output_width)

返回

池化 RoIs

返回类型

张量[K， C / （output_size[0] * output_size[1]）， output_size[0]， output_size[1]]

torchvision.ops.ps_roi_pool(输入：Torch。张量，框：torch。Tensor， output_size： int， spatial_scale： float = 1.0） → torch 的 Tensor 中。张量 [来源]¶

执行位置敏感感兴趣区域（RoI）池运算符在 R-FCN 中描述

参数

input （Tensor[N， C， H， W]） - 输入张量，即包含元素的批处理。每个元素包含 Dimensions 的特征图。NCH x W
boxes （Tensor[K， 5] 或 List[Tensor[L， 4]]） – （x1， y1， x2， y2）中的框坐标格式。坐标必须满足和。如果传递了单个 Tensor，则第一列应包含批处理中相应元素的索引，即中的数字。如果传递了 Tensor 列表，则每个 Tensor 将对应于元素 i 的框在批处理中。0 <= x1 < x20 <= y1 < y2[0, N - 1]
output_size （int or Tuple[int， int]） – 池化后输出的大小（以 bin 或像素为单位）执行为（height， width）。
spatial_scale （float） – 将输入坐标映射到的比例因子盒子坐标。默认值：1.0

返回

共用的 RoIs.

返回类型

张量[K， C / （output_size[0] * output_size[1]）， output_size[0]， output_size[1]]

torchvision.ops.remove_small_boxes(盒子：Torch。Tensor，min_size：float）→ torch。张量 [来源]¶

移除至少包含一侧小于min_size的箱子。

参数

boxes （Tensor[N， 4]） – 格式为 with 和 .(x1, y1, x2, y2)0 <= x1 < x20 <= y1 < y2
min_size （float） – 最小大小

返回

具有两侧的框的索引大于 min_size

返回类型

张量[K]

torchvision.ops.roi_align(输入：Torch。张量，框：Union[torch.张量、List[torch.Tensor]]， output_size： None， spatial_scale： float = 1.0， sampling_ratio： int = - 1，对齐：bool = False）→ torch。张量 [来源]¶

使用平均池化执行 Region of Interest （RoI） Align 运算符，如 Mask R-CNN 中所述。

参数

input （Tensor[N， C， H， W]） - 输入张量，即包含元素的批处理。每个元素包含 Dimensions 的特征图。如果张量被量化，我们预计批量大小为。NCH x WN == 1
boxes （Tensor[K， 5] 或 List[Tensor[L， 4]]） – （x1， y1， x2， y2）中的框坐标格式。坐标必须满足和。如果传递了单个 Tensor，则第一列应包含批处理中相应元素的索引，即中的数字。如果传递了 Tensor 列表，则每个 Tensor 将对应于元素 i 的框在批处理中。0 <= x1 < x20 <= y1 < y2[0, N - 1]
output_size （int or Tuple[int， int]） – 池化后输出的大小（以 bin 或像素为单位）执行为（height， width）。
spatial_scale （float） – 将输入坐标映射到的比例因子盒子坐标。默认值：1.0
sampling_ratio （int） – 插值网格中的采样点数量用于计算每个共用输出箱的输出值。如果> 0，则然后精确地使用每个 bin 的采样点。如果 <= 0，则使用自适应数量的网格点（计算为，高度也相同）。默认值：-1sampling_ratio x sampling_ratioceil(roi_width / output_width)
aligned （bool） – 如果为 False，则使用传统实现。如果为 True，则像素移动框坐标 -0.5 以更好地与两者对齐相邻像素索引。此版本用于 Detectron2

返回

共用的 RoIs.

返回类型

张量 [K， C， output_size[0]， output_size[1]]

torchvision.ops.roi_pool(输入：Torch。张量，框：Union[torch.张量、List[torch.Tensor]]，output_size：无，spatial_scale：float = 1.0）→ torch。张量 [来源]¶

执行快速 R-CNN 中描述的感兴趣区域（RoI）池运算符

参数

input （Tensor[N， C， H， W]） - 输入张量，即包含元素的批处理。每个元素包含 Dimensions 的特征图。NCH x W
boxes （Tensor[K， 5] 或 List[Tensor[L， 4]]） – （x1， y1， x2， y2）中的框坐标格式。坐标必须满足和。如果传递了单个 Tensor，则第一列应包含批处理中相应元素的索引，即中的数字。如果传递了 Tensor 列表，则每个 Tensor 将对应于元素 i 的框在批处理中。0 <= x1 < x20 <= y1 < y2[0, N - 1]
output_size （int or Tuple[int， int]） – 裁剪后输出的大小执行为（height， width）
spatial_scale （float） – 将输入坐标映射到的比例因子盒子坐标。默认值：1.0

返回

共用的 RoIs.

返回类型

张量 [K， C， output_size[0]， output_size[1]]

torchvision.ops.sigmoid_focal_loss(输入： torch.Tensor，目标：torch。张量，alpha：float = 0.25，gamma：float = 2，减少：str = '无'）[来源]¶

https://github.com/facebookresearch/fvcore/blob/master/fvcore/nn/focal_loss.py 的原始实现。在 RetinaNet 中用于密集检测的损失：https://arxiv.org/abs/1708.02002。

参数

inputs – 任意形状的浮点张量。每个示例的预测。
targets – 与输入具有相同形状的浮点张量。存储二进制文件输入中每个元素的分类标签（0 表示负类，1 表示正类）。
alpha — （可选）范围（0,1）中的加权因子以平衡正面示例与负面示例或 -1 表示忽略。默认值 = 0.25
gamma – 调节因子（1 - p_t）的指数为平衡简单与困难的例子。
减少 – 'none' |'卑鄙' |'总和' 'none'：不会对输出应用缩减。 'mean'：输出将被平均。 'sum'：输出将被求和。

返回

loss 张量。

torchvision.ops.stochastic_depth(输入：Torch。Tensor， p： float， mode： str， training： bool = True） → torch.张量 [来源]¶

实现 “Deep Networks with Stochastic Depth” 中的随机深度，用于随机放置残差残差架构的分支。

参数

input （Tensor[N， ..]） – 输入张量或具有第一个维度的任意维度是它的 batch，即带有 rows 的 batch。N
p （float） - 输入归零的概率。
mode （str） – 或 . 将整个输入随机归零，即 zeroes 从批次中随机选择的行。"batch""row""batch""row"
training – 如果为，则应用 stochastic depth 。违约：TrueTrue

返回

随机归零的张量。

返回类型

张量[N， ..]

class （output_size：无， spatial_scale： float， sampling_ratio： int， aligned： bool = False）[来源]torchvision.ops.RoIAlign¶: 看roi_align().

类（output_size：int，spatial_scale：float，sampling_ratio：int）[来源]torchvision.ops.PSRoIAlign¶: 看ps_roi_align().

class （output_size： None， spatial_scale： float）[来源]torchvision.ops.RoIPool¶: 看roi_pool().

类（output_size：int，spatial_scale：float）[来源]torchvision.ops.PSRoIPool¶: 看ps_roi_pool().

类（in_channels： int， out_channels： int， kernel_size： int， 步幅： int = 1，填充：int = 0，膨胀：int = 1，组：int = 1，偏差：bool = True）[来源]torchvision.ops.DeformConv2d¶: 看deform_conv2d().

类（featmap_names： List[str]， output_size： Union[int， Tuple[int]、列表 [int]]、sampling_ratio：int、*、canonical_scale：int = 224，canonical_level：int = 4）[来源]torchvision.ops.MultiScaleRoIAlign¶

多尺度 RoIAlign 池化，对于有或没有 FPN 的检测都很有用。

它通过方程 1 中指定的启发式方法推断池化的规模特征金字塔网络论文。它们仅关键字参数和分别对应于方程 1 和，以及的含义如下：是金字塔的目标层级将感兴趣区域与汇集在一起。canonical_scalecanonical_level224k0=4canonical_levelw x h = canonical_scale x canonical_scale

参数

featmap_names （List[str]） – 将使用的特征图的名称用于池化。
output_size （List[Tuple[int， int]] or List[int]） – 池化区域的输出大小
sampling_ratio （int） – ROIAlign 的采样率
canonical_scale （int， optional） – LevelMapper 的 canonical_scale
canonical_level （int， optional） – LevelMapper 的 canonical_level

例子：

>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])

类（in_channels_list： List[int]， out_channels： int， extra_blocks：可选[torchvision.ops.feature_pyramid_network。ExtraFPNBlock] = None）[来源]torchvision.ops.FeaturePyramidNetwork¶

模块，该模块从一组特征图的顶部添加 FPN。这是基于“用于对象检测的特征金字塔网络”。

特征图目前应该越来越深入次序。

模型的输入应为 OrderedDict[Tensor]，其中包含特征图，FPN 将在其上添加。

参数

in_channels_list （list[int]） – 每个特征图的通道数传递给模块
out_channels （int） – FPN 表示的通道数
extra_blocks （ExtraFPNBlock 或 None） – 如果提供，额外的作将被执行。预计它会采用 fpn 功能，原始的 features 和原始特征的名称作为输入，并返回新的特征图列表及其相应的名称

例子：

>>> m = torchvision.ops.FeaturePyramidNetwork([10, 20, 30], 5)
>>> # get some dummy data
>>> x = OrderedDict()
>>> x['feat0'] = torch.rand(1, 10, 64, 64)
>>> x['feat2'] = torch.rand(1, 20, 16, 16)
>>> x['feat3'] = torch.rand(1, 30, 8, 8)
>>> # compute the FPN on top of x
>>> output = m(x)
>>> print([(k, v.shape) for k, v in output.items()])
>>> # returns
>>>   [('feat0', torch.Size([1, 5, 64, 64])),
>>>    ('feat2', torch.Size([1, 5, 16, 16])),
>>>    ('feat3', torch.Size([1, 5, 8, 8]))]

类（p：float，模式：str）[来源]torchvision.ops.StochasticDepth¶: 看stochastic_depth().

torchvision.ops 中¶

文档

教程

资源