目录

Operators

torchvision.ops implements operators, losses and layers that are specific for Computer Vision.

Note

All operators have native support for TorchScript.

Detection and Segmentation Operators

The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.

batched_nms(boxes, scores, idxs, iou_threshold)

Performs non-maximum suppression in a batched fashion.

masks_to_boxes(masks)

Compute the bounding boxes around the provided masks.

nms(boxes, scores, iou_threshold)

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

roi_align(input, boxes, output_size[, ...])

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

roi_pool(input, boxes, output_size[, ...])

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

ps_roi_align(input, boxes, output_size[, ...])

Performs Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN.

ps_roi_pool(input, boxes, output_size[, ...])

Performs Position-Sensitive Region of Interest (RoI) Pool operator described in R-FCN

FeaturePyramidNetwork(in_channels_list, ...)

Module that adds a FPN from on top of a set of feature maps.

MultiScaleRoIAlign(featmap_names, ...[, ...])

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

RoIAlign(output_size, spatial_scale, ...[, ...])

See roi_align().

RoIPool(output_size, spatial_scale)

See roi_pool().

PSRoIAlign(output_size, spatial_scale, ...)

See ps_roi_align().

PSRoIPool(output_size, spatial_scale)

See ps_roi_pool().

Box Operators

These utility functions perform various operations on bounding boxes.

box_area(boxes)

Computes the area of a set of bounding boxes, which are specified by their (x1, y1, x2, y2) coordinates.

box_convert(boxes, in_fmt, out_fmt)

Converts boxes from given in_fmt to out_fmt.

box_iou(boxes1, boxes2)

Return intersection-over-union (Jaccard index) between two sets of boxes.

clip_boxes_to_image(boxes, size)

Clip boxes so that they lie inside an image of size size.

complete_box_iou(boxes1, boxes2[, eps])

Return complete intersection-over-union (Jaccard index) between two sets of boxes.

distance_box_iou(boxes1, boxes2[, eps])

Return distance intersection-over-union (Jaccard index) between two sets of boxes.

generalized_box_iou(boxes1, boxes2)

Return generalized intersection-over-union (Jaccard index) between two sets of boxes.

remove_small_boxes(boxes, min_size)

Remove boxes which contains at least one side smaller than min_size.

Losses

The following vision-specific loss functions are implemented:

complete_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap.

distance_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the distance between boxes' centers isn't zero.

generalized_box_iou_loss(boxes1, boxes2[, ...])

Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box.

sigmoid_focal_loss(inputs, targets[, alpha, ...])

Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.

Layers

TorchVision provides commonly used building blocks as layers:

Conv2dNormActivation(in_channels, ...)

Configurable block used for Convolution2d-Normalization-Activation blocks.

Conv3dNormActivation(in_channels, ...)

Configurable block used for Convolution3d-Normalization-Activation blocks.

DeformConv2d(in_channels, out_channels, ...)

See deform_conv2d().

DropBlock2d(p, block_size[, inplace, eps])

See drop_block2d().

DropBlock3d(p, block_size[, inplace, eps])

See drop_block3d().

FrozenBatchNorm2d(num_features[, eps])

BatchNorm2d where the batch statistics and the affine parameters are fixed

MLP(in_channels, hidden_channels, ...)

This block implements the multi-layer perceptron (MLP) module.

Permute(dims)

This module returns a view of the tensor input with its dimensions permuted.

SqueezeExcitation(input_channels, ...)

This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig.

StochasticDepth(p, mode)

See stochastic_depth().

deform_conv2d(input, offset, weight[, bias, ...])

Performs Deformable Convolution v2, described in Deformable ConvNets v2: More Deformable, Better Results if mask is not None and Performs Deformable Convolution, described in Deformable Convolutional Networks if mask is None.

drop_block2d(input, p, block_size[, ...])

Implements DropBlock2d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.

drop_block3d(input, p, block_size[, ...])

Implements DropBlock3d from "DropBlock: A regularization method for convolutional networks" <https://arxiv.org/abs/1810.12890>.

stochastic_depth(input, p, mode[, training])

Implements the Stochastic Depth from "Deep Networks with Stochastic Depth" used for randomly dropping residual branches of residual architectures.

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源