torch.nn.functional¶
Convolution functions¶
| Applies a 1D convolution over an input signal composed of several input planes. | |
| Applies a 2D convolution over an input image composed of several input planes. | |
| Applies a 3D convolution over an input image composed of several input planes. | |
| Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called “deconvolution”. | |
| Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”. | |
| Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution” | |
| Extracts sliding local blocks from a batched input tensor. | |
| Combines an array of sliding local blocks into a large containing tensor. | 
Pooling functions¶
| Applies a 1D average pooling over an input signal composed of several input planes. | |
| Applies 2D average-pooling operation in regions by step size steps. | |
| Applies 3D average-pooling operation in regions by step size steps. | |
| Applies a 1D max pooling over an input signal composed of several input planes. | |
| Applies a 2D max pooling over an input signal composed of several input planes. | |
| Applies a 3D max pooling over an input signal composed of several input planes. | |
| Computes a partial inverse of  | |
| Computes a partial inverse of  | |
| Computes a partial inverse of  | |
| Applies a 1D power-average pooling over an input signal composed of several input planes. | |
| Applies a 2D power-average pooling over an input signal composed of several input planes. | |
| Applies a 1D adaptive max pooling over an input signal composed of several input planes. | |
| Applies a 2D adaptive max pooling over an input signal composed of several input planes. | |
| Applies a 3D adaptive max pooling over an input signal composed of several input planes. | |
| Applies a 1D adaptive average pooling over an input signal composed of several input planes. | |
| Applies a 2D adaptive average pooling over an input signal composed of several input planes. | |
| Applies a 3D adaptive average pooling over an input signal composed of several input planes. | |
| Applies 2D fractional max pooling over an input signal composed of several input planes. | |
| Applies 3D fractional max pooling over an input signal composed of several input planes. | 
Non-linear activation functions¶
| Thresholds each element of the input Tensor. | |
| In-place version of  | |
| Applies the rectified linear unit function element-wise. | |
| In-place version of  | |
| Applies the HardTanh function element-wise. | |
| In-place version of  | |
| Applies the hardswish function, element-wise, as described in the paper: | |
| Applies the element-wise function . | |
| Applies element-wise, . | |
| In-place version of  | |
| Applies element-wise, , with and . | |
| Applies element-wise, . | |
| Applies element-wise, | |
| In-place version of  | |
| Applies element-wise the function where weight is a learnable parameter. | |
| Randomized leaky ReLU. | |
| In-place version of  | |
| The gated linear unit. | |
| When the approximate argument is ‘none’, it applies element-wise the function | |
| Applies element-wise | |
| Applies the hard shrinkage function element-wise | |
| Applies element-wise, | |
| Applies element-wise, the function | |
| Applies element-wise, the function . | |
| Applies a softmin function. | |
| Applies a softmax function. | |
| Applies the soft shrinkage function elementwise | |
| Samples from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretizes. | |
| Applies a softmax followed by a logarithm. | |
| Applies element-wise, | |
| Applies the element-wise function | |
| Applies the element-wise function | |
| Applies the Sigmoid Linear Unit (SiLU) function, element-wise. | |
| Applies the Mish function, element-wise. | |
| Applies Batch Normalization for each channel across a batch of data. | |
| Applies Group Normalization for last certain number of dimensions. | |
| Applies Instance Normalization for each channel in each data sample in a batch. | |
| Applies Layer Normalization for last certain number of dimensions. | |
| Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. | |
| Performs normalization of inputs over specified dimension. | 
Linear functions¶
| Applies a linear transformation to the incoming data: . | |
| Applies a bilinear transformation to the incoming data: | 
Dropout functions¶
| During training, randomly zeroes some of the elements of the input tensor with probability  | |
| Applies alpha dropout to the input. | |
| Randomly masks out entire channels (a channel is a feature map, e.g. | |
| Randomly zero out entire channels (a channel is a 1D feature map, e.g., the -th channel of the -th sample in the batched input is a 1D tensor ) of the input tensor). | |
| Randomly zero out entire channels (a channel is a 2D feature map, e.g., the -th channel of the -th sample in the batched input is a 2D tensor ) of the input tensor). | |
| Randomly zero out entire channels (a channel is a 3D feature map, e.g., the -th channel of the -th sample in the batched input is a 3D tensor ) of the input tensor). | 
Sparse functions¶
| A simple lookup table that looks up embeddings in a fixed dictionary and size. | |
| Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings. | |
| Takes LongTensor with index values of shape  | 
Distance functions¶
| See  | |
| Returns cosine similarity between  | |
| Computes the p-norm distance between every pair of row vectors in the input. | 
Loss functions¶
| Function that measures the Binary Cross Entropy between the target and input probabilities. | |
| Function that measures Binary Cross Entropy between target and input logits. | |
| Poisson negative log likelihood loss. | |
| See  | |
| This criterion computes the cross entropy loss between input and target. | |
| The Connectionist Temporal Classification loss. | |
| Gaussian negative log likelihood loss. | |
| See  | |
| Function that takes the mean element-wise absolute value difference. | |
| Measures the element-wise mean squared error. | |
| See  | |
| See  | |
| See  | |
| See  | |
| The negative log likelihood loss. | |
| Function that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. | |
| Function that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. | |
| See  | |
| See  | |
| See  | 
Vision functions¶
| Rearranges elements in a tensor of shape  to a tensor of shape , where r is the  | |
| Reverses the  | |
| Pads tensor. | |
| Down/up samples the input to either the given  | |
| Upsamples the input to either the given  | |
| Upsamples the input, using nearest neighbours’ pixel values. | |
| Upsamples the input, using bilinear upsampling. | |
| Given an  | |
| Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices  |