扩展 PyTorch¶

在本说明中，我们将介绍扩展torch.nn,torch.autograd,torch和编写自定义 C++ 扩展。

添加新运算符¶

PyTorch 提供了一个大型的运算符库，这些运算符可用于 Tensor（例如torch.add(),torch.sum()等）。但是，您可能希望为 PyTorch 引入新的自定义作并使其行为类似于 PyTorch 的内置运算符。为此，您必须通过 Python torch.library 或 C++ TORCH_LIBRARY 向 PyTorch 注册自定义作蜜蜂属。

有关更多详细信息，请参阅 PyTorch 自定义运算符登录页面。

扩展`torch.autograd`¶

添加作autograd需要实现新的Function子类。回想一下，函数是什么autograd用于对作历史记录和计算进行编码梯度。

本文档的第一部分重点介绍向后模式 AD，因为它是使用最广泛的特征。最后的部分讨论了正向模式 AD 的扩展。

适用情形¶

通常，如果要在模型中执行计算，请实现自定义函数不可微分或依赖于非 PyTorch 库（例如 NumPy），但仍然希望您的作与其他作链接并使用 Autograd 引擎。

在某些情况下，自定义函数还可用于提高性能和内存使用情况：如果您使用 C++ 扩展实现了向前和向后传递，则你可以将它们包装在Function与 autograd 接口发动机。如果您想减少为向后传递保存的缓冲区数量，自定义函数可用于将 Ops 组合在一起。

何时不使用¶

如果您已经可以根据 PyTorch 的内置作编写函数，则其 backward graph （很可能）已经能够被 autograd 记录下来。在这种情况下，您不需要自己实现 backward 函数。考虑使用普通旧的 Python 函数。

如果你需要维护状态，即可训练的参数，你应该（也）使用自定义模块。有关扩展的更多信息，请参阅以下部分torch.nn.

如果您想在向后传递期间更改渐变或执行 side effect 中，请考虑注册一个 Tensor 或 Module 钩子。

如何使用¶

请按照以下步骤进行作： 1. 子类Function并实施forward(), （可选）和setup_context()backward()方法。 2. 对 ctx 参数调用适当的方法。 3. 声明你的函数是否支持 double backward。 4. 使用 gradcheck 验证您的渐变是否正确。

步骤1：子类化之后Function，您需要定义 3 个方法：

forward()是执行该作的代码。可能需要任意数量的参数，其中一些是可选的，如果你指定默认值。这里接受各种 Python 对象。跟踪 history 的参数（即 with ）将被转换为不跟踪 history 的参数，并且它们的使用将在 Graph 中注册。请注意，此 logic 不会遍历 lists/dicts/任何其他数据结构，并且只会遍历考虑作为调用的直接参数的张量。您可以返回单个输出或Tensorrequires_grad=TrueTensortuple之 tensor 的 Tensor 值。另外，请参考的文档Function查找有用的方法的描述，这些方法可以是仅调用forward().
setup_context()（可选）。可以编写一个 “combined”forward()那接受对象或（从 PyTorch 2.0 开始）单独的ctxforward()确实如此 not accept 和进行修改的方法。这ctxsetup_context()ctxforward()应该具有 compute 并且应该只负责修改（并且没有任何计算）。通常，单独的setup_context()ctxforward()并且更接近 PyTorch 原生作可以正常工作，因此更适合与各种 PyTorch 子系统组合。有关更多详细信息，请参阅组合或分离 forward（）和 setup_context（）。setup_context()
backward()（或）定义梯度公式。它将被赋予与输出一样多的参数，每个它们表示该输出的梯度。永远不要修改是很重要的这些在地。它应该返回与那里一样多的张量是输入，每个输入都包含其相应的输入。如果您的输入不需要梯度（是一个布尔值元组，指示无论每个输入都需要梯度计算），还是非对象，您都可以返回。此外，如果您有可选的 arguments 设置为vjp()Tensorneeds_input_gradTensorpython:Noneforward()您可以返回比 10 更多的梯度是输入，只要它们都是None.

步骤2：您有责任正确使用这些功能，以确保新的ctxFunction适用于 Autograd 引擎。

save_for_backward()必须是用于保存要在向后传递中使用的任何张量。非张量应该直接存储在 CTX 上。如果张量既不是输入也不是输出保存后向后保存Function可能不支持双向后（请参阅步骤 3）。
mark_dirty()必须用于标记由 forward 函数就地修改的任何输入。
mark_non_differentiable()必须用于告诉引擎输出是否不可微分。由 default 将设置所有 Differentiable 类型的 output Tensor 要求梯度。不可微分类型的张量（即整型）永远不会标记为需要渐变。
set_materialize_grads()可以是用于告诉 Autograd 引擎在满足以下条件的情况下优化梯度计算输出不依赖于输入，因为没有具体化给定给 backward 的 grad 张量功能。也就是说，如果设置为 False、Python 中的 None 对象或“未定义的张量”（tensor x for 其中 x.defined（）为 False）C++不会转换为之前填充零的张量更改为 backward 调用，因此您的代码将需要处理此类对象，就像它们是张量填充为零。此设置的默认值为 True。

步骤3：如果您的Function不支持 double backward 您应该通过使用once_differentiable().使用此装饰器，尝试通过函数执行 double backward 将产生错误。有关双向后的更多信息，请参阅我们的 double backward 教程。

步骤4：建议您使用torch.autograd.gradcheck()检查您的反向函数是否正确计算了 forward 通过使用 backward 函数计算雅可比矩阵，以及将按元素计算的值与数值计算的雅可比行列式进行比较 finite-differencing 的

例¶

您可以在下面找到函数的代码，其中补充说明：Linear

# Inherit from Function
class LinearFunction(Function):

    # Note that forward, setup_context, and backward are @staticmethods
    @staticmethod
    def forward(input, weight, bias):
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    @staticmethod
    # inputs is a Tuple of all of the inputs passed to forward.
    # output is the output of the forward().
    def setup_context(ctx, inputs, output):
        input, weight, bias = inputs
        ctx.save_for_backward(input, weight, bias)

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0)

        return grad_input, grad_weight, grad_bias

现在，为了更轻松地使用这些自定义作，我们建议使用 aliasing 它们或将它们包装在一个函数中。包装在函数中让我们支持 default arguments 和 keyword arguments：

# Option 1: alias
linear = LinearFunction.apply

# Option 2: wrap in a function, to support default args and keyword args.
def linear(input, weight, bias=None):
    return LinearFunction.apply(input, weight, bias)

在这里，我们给出了一个函数的另一个示例，该函数由非 Tensor 参数：

class MulConstant(Function):
    @staticmethod
    def forward(tensor, constant):
        return tensor * constant

    @staticmethod
    def setup_context(ctx, inputs, output):
        # ctx is a context object that can be used to stash information
        # for backward computation
        tensor, constant = inputs
        ctx.constant = constant

    @staticmethod
    def backward(ctx, grad_output):
        # We return as many input gradients as there were arguments.
        # Gradients of non-Tensor arguments to forward must be None.
        return grad_output * ctx.constant, None

在这里，我们通过调用 set_materialize_grads（False）来优化上面的示例：

class MulConstant(Function):
    @staticmethod
    def forward(tensor, constant):
        return tensor * constant

    @staticmethod
    def setup_context(ctx, inputs, output):
        tensor, constant = inputs
        ctx.set_materialize_grads(False)
        ctx.constant = constant

    @staticmethod
    def backward(ctx, grad_output):
        # Here we must handle None grad_output tensor. In this case we
        # can skip unnecessary computations and just return None.
        if grad_output is None:
            return None, None

        # We return as many input gradients as there were arguments.
        # Gradients of non-Tensor arguments to forward must be None.
        return grad_output * ctx.constant, None

如果您需要在forward()得救，它们必须作为输出返回，或者组合 and（请参阅组合或分离 forward（）和 setup_context（））。请注意，这意味着如果您希望梯度流经这些中间值，则需要为它们定义 gradient 公式（另请参阅 double backward tutorial ）：forwardsetup_context()

class MyCube(torch.autograd.Function):
    @staticmethod
    def forward(x):
        # We wish to save dx for backward. In order to do so, it must
        # be returned as an output.
        dx = 3 * x ** 2
        result = x ** 3
        return result, dx

    @staticmethod
    def setup_context(ctx, inputs, output):
        x, = inputs
        result, dx = output
        ctx.save_for_backward(x, dx)

    @staticmethod
    def backward(ctx, grad_output, grad_dx):
        x, dx = ctx.saved_tensors
        # In order for the autograd.Function to work with higher-order
        # gradients, we must add the gradient contribution of `dx`,
        # which is grad_dx * 6 * x.
        result = grad_output * dx + grad_dx * 6 * x
        return result

# Wrap MyCube in a function so that it is clearer what the output is
def my_cube(x):
    result, dx = MyCube.apply(x)
    return result

注意

的输入，即，也可以是跟踪历史记录。所以 if 是使用 microiable 实现的作（例如，调用另一个自定义backwardgrad_outputbackwardFunction），高阶导数将起作用。在这种情况下，也可以使用保存的 tensor 在 backward 中，并且有梯度回流，但保存在中的 Tensor 不会有梯度回流。如果需要将 Tensor 的梯度回流，则应使其成为自定义的输出，并使用 .save_for_backwardctxctxFunctionsave_for_backward

你可能想检查你实现的 backward 方法是否真的计算函数的导数。通过与使用小有限差分的数值近似值：

from torch.autograd import gradcheck

# gradcheck takes a tuple of tensors as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (torch.randn(20,20,dtype=torch.double,requires_grad=True), torch.randn(30,20,dtype=torch.double,requires_grad=True))
test = gradcheck(linear, input, eps=1e-6, atol=1e-4)
print(test)

有关有限差分梯度比较的更多详细信息，请参阅数值梯度检查。如果您的函数用于高阶导数（区分向后传递），则可以使用同一包中的函数来检查高阶导数。gradgradcheck

组合或分离`forward()`和`setup_context()`¶

有两种主要方法可以定义Function.也：

定义一个forward()它将前向计算逻辑与setup_context()
（从 PyTorch 2.0 开始）定义一个单独的forward()和setup_context()

我们建议使用第二个选项（单独的forward()和）因为这更接近 PyTorch 原生作的实现方式，它由跟setup_context()torch.func变换。但是，我们计划在未来支持这两种方法; 结合forward()其中：带来更大的灵活性，因为您可以保存中间体，而无需将其作为输出返回。setup_context()

有关如何定义Function带 SEPARATEforward()和。setup_context()

以下是如何定义Functionwith combinedforward()和：setup_context()

class LinearFunction(Function):
    @staticmethod
    # ctx is the first argument to forward
    def forward(ctx, input, weight, bias=None):
        # The forward pass can use ctx.
        ctx.save_for_backward(input, weight, bias)
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0)

        return grad_input, grad_weight, grad_bias

正向模式 AD¶

覆盖正向模式 AD 公式具有非常相似的 API，但有一些不同的微妙之处。您可以实现jvp()功能。

它将获得与输入一样多的参数，每个它们表示该输入的梯度。它应该返回与那里一样多的张量是输出，每个输出都包含其相应输出的梯度。这Tensorjvp()将在forward()方法，在 return 之前。apply()

jvp()与backward()功能：

您可以使用 ctx 从forward()到jvp()功能。如果backward(), 您可以通过在del ctx.foojvp()功能。
的实现jvp()必须是向后可微分的，或者显式检查未设置任何给定的 Forward Mode gradient。requires_grad
这jvp()函数必须与forward(). 例如，如果就地修改了第 th 个输入，则必须就地更新第 th 个梯度。同样，如果 th 输出是 th input.那么返回的 th 个输出梯度必须为给定 TH 个输入梯度的视图。iijkjk
由于用户无法指定需要计算哪个梯度，因此jvp()函数应始终计算所有输出的梯度。
正向模式渐变确实遵循set_materialize_grads()禁用此选项后，您可以获得 None 输入渐变。

`torch.func`transforms 和/或`torch.vmap()`¶

请参阅使用 autograd.Function torch.func。函数了解详细信息。

扩展`torch.nn`¶

nn导出两种接口 - 模块及其功能版本。您可以通过两种方式扩展它，但我们建议使用 modules for 各种层，其中包含任何参数或缓冲区，并建议使用函数形式无参数作，如激活函数、池化、等。

添加作的功能版本已在部分。

添加`Module`¶

因为nn大量使用autograd，在编辑器中添加新的Module需要实现Function执行作并可以计算梯度。从现在开始，让我们假设我们想要实现一个模块，并且我们有函数如上面的清单所示实现。只需很少的代码即可添加这个。现在，有两个功能需要实现：Linear

__init__ (可选）- 接受参数，例如内核大小、数字 of 特征等，并初始化参数和缓冲区。
forward()- 实例化Function和使用它来执行作。它与函数式包装器非常相似如上所示。

这是 module 的实现方式：Linear

class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super().__init__()
        self.input_features = input_features
        self.output_features = output_features

        # nn.Parameter is a special kind of Tensor, that will get
        # automatically registered as Module's parameter once it's assigned
        # as an attribute. Parameters and buffers need to be registered, or
        # they won't appear in .parameters() (doesn't apply to buffers), and
        # won't be converted when e.g. .cuda() is called. You can use
        # .register_buffer() to register buffers.
        # nn.Parameters require gradients by default.
        self.weight = nn.Parameter(torch.empty(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.empty(output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)

        # Not a very smart way to initialize weights
        nn.init.uniform_(self.weight, -0.1, 0.1)
        if self.bias is not None:
            nn.init.uniform_(self.bias, -0.1, 0.1)

    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight, self.bias)

    def extra_repr(self):
        # (Optional)Set the extra information about this module. You can test
        # it by printing an object of this class.
        return 'input_features={}, output_features={}, bias={}'.format(
            self.input_features, self.output_features, self.bias is not None
        )

扩展`torch`Python 接口¶

您可以通过定义自定义类中具有与 .但是，如果您想能够将这些类型传递给函数，例如TensorTensortorch.add()在顶级torch接受作数的命名空间？Tensor

如果您的自定义 Python 类型定义了一个名为 PyTorch 的方法将调用 Implement 时，您的 custom 类传递给__torch_function____torch_function__torchNamespace。这使得可以为torch命名空间，您的实现可以调用该命名空间，允许用户在现有 PyTorch 中使用自定义类型他们已为 .这适用于 “duck” 类型，这些类型与 User Defined 无关的子类。__torch_function__TensorTensorTensor

扩展`torch`替换为 -like 类型`Tensor`¶

注意

此功能的灵感来自 NumPy 协议。请参阅 NumPy 文档和 NEP-0018 以获取更多细节。__array_function__

为了具体化这一点，让我们从一个简单的示例开始，该示例说明了 API 调度机制。我们将创建一个表示 2D 标量的自定义类型张量，由沿对角线条目的顺序和值参数化，：Nvalue

class ScalarTensor(object):
   def __init__(self, N, value):
       self._N = N
       self._value = value

   def __repr__(self):
       return "ScalarTensor(N={}, value={})".format(self._N, self._value)

   def tensor(self):
       return self._value * torch.eye(self._N)

设计的第一次迭代不是很有用。的主要功能是提供更紧凑的标量字符串表示 Tensor 而不是在基 Tensor 类中：ScalarTensor

>>> d = ScalarTensor(5, 2)
>>> d
ScalarTensor(N=5, value=2)
>>> d.tensor()
tensor([[2., 0., 0., 0., 0.],
        [0., 2., 0., 0., 0.],
        [0., 0., 2., 0., 0.],
        [0., 0., 0., 2., 0.],
        [0., 0., 0., 0., 2.]])

如果我们尝试将此对象与torchAPI，我们将运行进入问题：

>>> import torch
>>> torch.mean(d)
TypeError: mean(): argument 'input' (position 1) must be Tensor, not ScalarTensor

添加一个 implementation to 使其上述作可能会成功。让我们重新做我们的实现这次添加一个 implementation：__torch_function__ScalarTensor__torch_function__

HANDLED_FUNCTIONS = {}
class ScalarTensor(object):
    def __init__(self, N, value):
        self._N = N
        self._value = value

    def __repr__(self):
        return "ScalarTensor(N={}, value={})".format(self._N, self._value)

    def tensor(self):
        return self._value * torch.eye(self._N)

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        if kwargs is None:
            kwargs = {}
        if func not in HANDLED_FUNCTIONS or not all(
            issubclass(t, (torch.Tensor, ScalarTensor))
            for t in types
        ):
            return NotImplemented
        return HANDLED_FUNCTIONS[func](*args, **kwargs)

该方法采用四个参数：、引用添加到正在覆盖的 torch API 函数中，将类型，这些 Tensor 类函数实现，，，传递给函数的参数元组，以及 Dict 的 keyword 传递给函数的参数。它使用名为的全局调度表来存储自定义实现。这个的键 dictionary 是命名空间中的函数，值是的实现。__torch_function__functypes__torch_function__argskwargsHANDLED_FUNCTIONStorchScalarTensor

注意

使用全局调度表不是 API 的强制性部分，它只是一种有用的设计模式构建 override 实现。__torch_function__

这个类定义还不足以使东西 - 我们还需要定义一个 implementation for for作数，并添加实现到 Dispatch Table 字典中。单程这样做是为了定义一个装饰器：torch.meanScalarTensortorch.meanScalarTensorHANDLED_FUNCTIONS

import functools
def implements(torch_function):
    """Register a torch function override for ScalarTensor"""
    def decorator(func):
        functools.update_wrapper(func, torch_function)
        HANDLED_FUNCTIONS[torch_function] = func
        return func
    return decorator

这可以应用于我们的 override 的实现：

@implements(torch.mean)
def mean(input):
    return float(input._value) / input._N

通过此更改，我们现在可以与：torch.meanScalarTensor

>>> d = ScalarTensor(5, 2)
>>> torch.mean(d)
0.4

当然，这是最简单的函数类型的示例 override 的，因为它只需要一个作数。我们可以使用相同的机器来覆盖采用多个作数的函数，其中任何一个作数都可能是定义的 Tensor 或 Tensor-like ，例如对于torch.mean__torch_function__torch.add():

def ensure_tensor(data):
    if isinstance(data, ScalarTensor):
        return data.tensor()
    return torch.as_tensor(data)

@implements(torch.add)
def add(input, other):
   try:
       if input._N == other._N:
           return ScalarTensor(input._N, input._value + other._value)
       else:
           raise ValueError("Shape mismatch!")
   except AttributeError:
       return torch.add(ensure_tensor(input), ensure_tensor(other))

当两个作数都是实例时，此版本具有快速路径，而 path（路径较慢）会降级为将数据转换为张量（当任一作数不是 .这使得覆盖当作数为 a 或 regular 时，函数正确：ScalarTensorScalarTensorScalarTensorTensor

>>> s = ScalarTensor(2, 2)
>>> torch.add(s, s)
ScalarTensor(N=2, value=4)
>>> t = torch.tensor([[1, 1,], [1, 1]])
>>> torch.add(s, t)
tensor([[3., 1.],
        [1., 3.]])

请注意，我们的 implementation does not take 或 as 关键字参数，如addalphaouttorch.add()确实：

>>> torch.add(s, s, alpha=2)
TypeError: add() got an unexpected keyword argument 'alpha'

为了速度和灵活性，调度机制不会检查 override 函数的签名是否与函数在__torch_function__torch应用程序接口。对于忽略可选参数很好，但为了确保与的完全兼容，torch API 函数的用户实现应注意完全模拟被覆盖的函数的 API。Tensor

函数torch没有显式覆盖的 API 将返回自。如果所有带有 defined 的作数都返回，则 PyTorch 将引发 .这意味着大多数时候具有显式覆盖将引发一个传递此类类型：NotImplemented__torch_function____torch_function__NotImplementedTypeErrorTypeError

>>> torch.mul(s, 3)
TypeError: no implementation found for 'torch.mul' on types that
implement __torch_function__: [ScalarTensor]

在实践中，这意味着如果你想使用按照这些思路实现，您将需要显式实现完整的__torch_function__torchAPI 或 API 的整个子集您关心的使用案例。这可能是一个艰巨的任务，因为完整的torchAPI 非常广泛。

另一种选择是不返回处理，而是将 a 传递给原始NotImplementedTensortorch在没有覆盖可用时运行。例如，如果我们将实现 for 到下面的一个：__torch_function__ScalarTensor

@classmethod
def __torch_function__(cls, func, types, args=(), kwargs=None):
    if kwargs is None:
        kwargs = {}
    if func not in HANDLED_FUNCTIONS or not all(
            issubclass(t, (torch.Tensor, ScalarTensor))
            for t in types
        ):
        args = [a.tensor() if hasattr(a, 'tensor') else a for a in args]
        return func(*args, **kwargs)
    return HANDLED_FUNCTIONS[func](*args, **kwargs)

然后torch.mul()将正常工作，尽管 return 类型将始终是 a 而不是 a ，即使两个作数是实例：TensorScalarTensorScalarTensor

>>> s = ScalarTensor(2, 2)
>>> torch.mul(s, s)
tensor([[4., 0.],
        [0., 4.]])

另请参阅以下示例，了解此 pattern 的 intent 中，但总是返回一个 to propagate metadata through作中的MetadataTensorMetadataTensortorch应用程序接口。

该协议旨在全面覆盖 API，部分覆盖可能会导致不良结果，尤其是某些函数引发 .对于子类尤其如此，其中 torch.add 的所有三个 Torch.Tensor.__add__和Torch。Tensor.add 必须被覆盖，即使它们返回的结果完全相同。未能做到这也可能导致无限递归。如果需要实现的函数，它们必须在其实现中使用。__torch_function__TypeErrortorch.Tensorsuper().__torch_function__

子类化`torch.Tensor`¶

从版本 1.7.0 开始，应用于子类的 methods on 和 public 命名空间中的函数将返回子类实例而不是实例：torch.Tensortorch.*torch.Tensortorch.Tensor

>>> class SubTensor(torch.Tensor):
...     pass
>>> type(torch.add(SubTensor([0]), SubTensor([1]))).__name__
'SubTensor'
>>> type(torch.add(SubTensor([0]), torch.tensor([1]))).__name__
'SubTensor'

如果存在多个子类，则层次结构中最低的子类将由违约。如果没有唯一的方法来确定这种情况，则引发 a：TypeError

>>> type(torch.add(SubTensor2([0]), SubTensor([1]))).__name__
'SubTensor2'
>>> type(torch.add(SubTensor2([0]), torch.tensor([1]))).__name__
'SubTensor2'
>>> torch.add(SubTensor([0]), OtherSubTensor([1]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no implementation found for 'torch.add' on types that implement __torch_function__: [SubTensor, OtherSubTensor]

如果希望对所有张量方法进行全局覆盖，则可以使用 .下面是一个记录所有函数/方法的示例调用：__torch_function__

class LoggingTensor(torch.Tensor):
    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        # NOTE: Logging calls Tensor.__repr__, so we can't log __repr__ without infinite recursion
        if func is not torch.Tensor.__repr__:
            logging.info(f"func: {func.__name__}, args: {args!r}, kwargs: {kwargs!r}")
        if kwargs is None:
            kwargs = {}
        return super().__torch_function__(func, types, args, kwargs)

但是，如果希望重写 Tensor 子类上的方法，你可以通过直接覆盖 Method （通过定义 it 表示子类），或者通过使用和匹配 .__torch_function__func

应该小心 WITHIN 让子类始终 call 而不是直接调用，与 1.7.0 版本之前的情况相同。如果不这样做，可能会导致递归回 in，从而导致 infinite 递归。__torch_function__super().__torch_function__(func, ...)funcfunc__torch_function__

扩展`torch`使用包装器类型`Tensor`¶

另一个有用的情况是将，包装为 attribute 或通过子类化。下面我们实现这种 type，该 a 将元数据字典附加到通过TensorMetadataTensorTensortorch操作。由于这个是完整torchAPI，我们不需要单独实现每个 override，以便我们可以使 implementation 对允许的作更加宽松：__torch_function__

class MetadataTensor(object):
    def __init__(self, data, metadata=None, **kwargs):
        self._t = torch.as_tensor(data, **kwargs)
        self._metadata = metadata

    def __repr__(self):
        return "Metadata:\n{}\n\ndata:\n{}".format(self._metadata, self._t)

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        if kwargs is None:
            kwargs = {}
        metadatas = tuple(a._metadata for a in args if hasattr(a, '_metadata'))
        args = [getattr(a, '_t', a) for a in args]
        assert len(metadatas) > 0
        ret = func(*args, **kwargs)
        return MetadataTensor(ret, metadata=metadatas[0])

这个简单的实现不一定适用于torchAPI，但它足以捕获最常见的作：

>>> metadata = {'owner': 'Ministry of Silly Walks'}
>>> m = MetadataTensor([[1, 2], [3, 4]], metadata=metadata)
>>> t = torch.tensor([[1, 2], [1, 2]])
>>> torch.add(t, m)
Metadata:
{'owner': 'Ministry of Silly Walks'}

data:
tensor([[2, 4],
        [4, 6]])
>>> torch.mul(t, m)
Metadata:
{'owner': 'Ministry of Silly Walks'}

data:
tensor([[1, 4],
        [3, 8]])

对定义`__torch_function__`¶

可以将 torch API 与多个不同类型一起使用，每个类型都具有 a implementation 的 a implementation，但必须特别小心。在这样的 A 案例规则为：__torch_function__

dispatch作收集 for each operand 的所有不同实现，并按顺序调用它们：subclasses 在超类之前，否则在运算符表达式中从左到右。__torch_function__
如果返回的值不是，则该值为返回作为结果。实现可以注册它们不注册通过返回 .NotImplementedNotImplemented
如果所有实现都返回，则 PyTorch 会引发 .__torch_function__NotImplementedTypeError

测试 PyTorch API 的覆盖覆盖率¶

实现的一个麻烦的方面是，如果某些作 do 而其他作没有覆盖，则用户最多只会看到不一致的体验，或者在最坏的情况下，当它们使用没有覆盖的函数。为了简化此过程，PyTorch 提供面向开发人员的 API，以确保对覆盖的全面支持。此 API 是私有的，可能会受到将来在没有警告的情况下进行更改。__torch_function____torch_function__

首先，要获取所有可覆盖函数的列表，请使用 .这将返回一个字典，其 keys 是 Python API 中的命名空间，其值是该命名空间中可以覆盖的函数。例如，让我们打印其中前 5 个函数的名称可以是重写：torch.overrides._get_overridable_functionsPyTorchtorch.nn.functional

>>> from torch.overrides import get_overridable_functions
>>> func_dict = get_overridable_functions()
>>> nn_funcs = func_dict[torch.nn.functional]
>>> print([f.__name__ for f in nn_funcs[:5])
['adaptive_avg_pool1d', 'adaptive_avg_pool2d', 'adaptive_avg_pool3d',
 'adaptive_max_pool1d', 'adaptive_max_pool1d_with_indices']

此函数列表使得迭代所有可覆盖的函数，但在实践中，这不足以为所有这些功能无需费力地手动复制每个函数。为了简化此过程，该函数返回一个字典映射 API 中的可覆盖函数，以虚拟 lambda 函数与原始函数相同的签名，但无条件返回 -1。这些函数最有用的是用来分析函数的原始函数的签名：torch.overrides._get_testing_overridesPyTorchinspectPyTorch

>>> import inspect
>>> from torch.overrides import get_testing_overrides
>>> override_dict = get_testing_overrides()
>>> dummy_add = override_dict[torch.add]
>>> inspect.signature(dummy_add)
<Signature (input, other, out=None)>

最后，返回函数元组明确不能被 .此列表可以是用于确认字典中不存在的函数是否返回了 by 无法覆盖。torch.overrides.get_ignored_functions__torch_function__get_overridable_functions

扩展`torch`原生 API¶

While 允许有效地扩展 PyTorch 的纯 Python 组件的行为，则不允许扩展 PyTorch 使用 C++ 实现。为此，子类也可以定义哪个将能够覆盖 C++ 级别。__torch_function__Tensor__torch_dispatch__

要有效地使用此功能，了解实现了 PyTorch。其中最重要的组件是我们所说的 “dispatcher”（最佳描述可以在此博客文章中找到，尽管它略微过时）。如从它的名字暗示，它负责调用正确的后端 function 来获取函数的特定调用。例如，在调用时，调度程序将检查这两个参数，找出哪个 “功能”（autograd、autocast、functionalization 等）和哪个“后端”（CPU、 CUDA、MPS 等）应用于此特定调用，最后调用所有正确的内核。内核做的一个非常常见的事情是 “redispatch”。例如，在运行 neural network on GPU 和 autocast 一起，第一个调用将是 autocast 内核将处理任何可能的 autocast logic 并重新分派。下一个功能将是 autograd，它将正确创建 autograd 图，然后重新调度下来。最后，我们到达 CUDA 的后端内核，它将启动正确的 CUDA 内核并返回最终结果。在离开时，autograd 会将图形附加到 output 和最后，autocast 将有机会在退出时执行它需要的任何更新。torch.add(a, b)

调度程序的一个配置是调用所有这些 feature 和 backend key 的顺序。最新的列表及其顺序可以在枚举内部找到。为了扩展 torch，本次讨论的 Ordering 的重要子集是：DispatchKey.hDispatchKey

vmap -> autocast -> autograd -> ZeroTensor -> neg/conj -> 函数化 -> Python -> 后端

对于本次讨论而言，最重要的关键是，每个定义了该方法的 Tensor 子类都将调用此功能。正是从那里调用用户定义的方法，并且可以任意覆盖行为。从那里，再次调用 provided 将执行 “redispatch”。Python__torch_dispatch__func

此实现的一些重要含义是：

此代码在 “below all features” 下运行。因此，它只负责生成每个 Tensor 的输出值，就像常规后端一样（并且可以而且应该忽略所有高级功能，如 autograd、autocast 等）。
如果任何高级功能在没有重新调度的情况下实现了给定的函数，则它永远不会到达 key，因此永远不会触发回调。这尤其适用于 CompositeImplicitAutograd 函数，这些函数在 Autograd 级别进行评估，而无需重新分派。这是因为 CompositeImplicitAutograd 函数通过隐式调用其他原生 operations 来指定其 autograd 公式，因此在 Autograd 级别，该函数被分解为其原生 op，而这些 ops 则被评估。Python__torch_dispatch__
当回调 Python 并包装结果时，使用相同的转换与常规 PyTorch Python/C++ 绑定相同。特别是，某些对象不能用 Python 表示，需要特殊处理（例如，未定义的 Tensor 变为 None）。
我们的原生函数被延迟填充为可调用的 Python 对象，以便从 Python 轻松与它们交互。给定的对象 to 始终是来自此命名空间的条目。此命名空间可用于直接调用本机作并绕过通常的 Python API 和绑定代码。torch.ops.{namespace}.{func_name}.{overload_name}func__torch_dispatch__

以类似的方式，能够插入所有 torch 的 Python API 和 Tensor 方法，能够拦截对 aten 原生 API 的所有调用。请注意，Tensor 上的所有方法在进入调度程序之前都会转换为函数调用，因此在此处显示为函数调用：，并将导致完全相同的 aten 调用。这些函数中的大多数都是 defined 的，其中指定了这些函数的属性及其后端实现。然后，它们的实现以及指定的功能将通过 codegen 自动注册。一些更奇特的函数或特性也在 C++ 代码库或用户定义的 C++ 扩展中的其他位置注册。__torch_function____torch_dispatch__torch.add(a, 2)a + 2native_functions.yaml

也可以使用torch.library.此 Python 功能允许定义和/或向本机函数添加新的实现。这可用于添加缺失的内核、替换现有内核或定义全新的原生函数。

您可以在 subclass zoo repo 中找到许多基于的子类的示例。__torch_dispatch__

`__torch_dispatch__`调用约定¶

@classmethod
def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
    pass

当用户调用输入具有的运算符时，该调用可以转发到。args 和 kwargs 之前被规范化对的调用，即：__torch_dispatch____torch_dispatch____torch_dispatch__

由运算符架构中的仅关键字参数组成。如果 kwarg 等于其默认值（在 schema 中），则不会传递它。kwargs
THE 由所有其他参数组成，无论它们是如何传递的添加到运算符（positional vs keyword）。如果 arg 等于其默认值，并且它是最右侧的位置 arg 或其右侧的所有 args 未通过，则不会通过。args

扩展所有`torch`带模式的 API¶

不幸的是，有些函数不接受 Tensor 输入。这意味着上述子类方法不能用于覆盖 PyTorch 的所有函数的行为。此外，如果用例需要拦截每个函数调用，则将每个 Tensor 更改为子类可能会过于侵入性。

为了解决此用例，我们引入了 “Mode” 的概念。这些存在 for 和 overrides，分别通过子类化和创建，并用作上下文管理器。__torch_function____torch_dispatch__torch.overrides.TorchFunctionModetorch.utils._python_dispatch.TorchDispatchMode

为了简化它如何与子类和其他模式交互的描述，每当输入模式的上下文管理器时，每个函数的行为就好像在参数列表的开头有一个额外的 Tensor 参数，该模式作为子类。这特别意味着所有模式处理程序都将在任何子类处理程序之前调用，并且与内部上下文管理器对应的模式将始终首先运行。

同样重要的是要注意，在给定的模式处理程序中，此特定模式被禁用，可以通过执行手动重新启用。with self:

下面是一个显示每种类型的日志记录模式的示例：

import torch
from torch.overrides import TorchFunctionMode, resolve_name
from torch.utils._python_dispatch import TorchDispatchMode

class FunctionLog(TorchFunctionMode):
    def __torch_function__(self, func, types, args, kwargs=None):
        print(f"Function Log: {resolve_name(func)}(*{args}, **{kwargs})")
        return func(*args, **(kwargs or {}))

class DispatchLog(TorchDispatchMode):
    def __torch_dispatch__(self, func, types, args, kwargs=None):
        print(f"Dispatch Log: {func}(*{args}, **{kwargs})")
        return func(*args, **(kwargs or {}))

def f():
    a = torch.rand(10, requires_grad=True)
    b = a * 2
    b.sum().backward()

print("TorchFunctionMode logging:")
with FunctionLog():
    f()

print("TorchDispatchMode logging:")
with DispatchLog():
    f()

这将打印以下内容，并带有额外的注释：

TorchFunctionMode logging:
Function Log: torch.rand(*(10,), **{'requires_grad': True})
Function Log: torch.Tensor.mul(*(tensor([0.7164, 0.9897, 0.1745, 0.9336, 0.4287, 0.7989, 0.2169, 0.7474, 0.5624,
        0.5970], requires_grad=True), 2), **None)
Function Log: torch.Tensor.sum(*(tensor([1.4328, 1.9794, 0.3490, 1.8671, 0.8573, 1.5977, 0.4338, 1.4948, 1.1249,
        1.1939], grad_fn=<MulBackward0>),), **None)
# Note that at the python level, we only see the call to backward but not what happens in the autograd engine.
Function Log: torch.Tensor.backward(*(tensor(12.3307, grad_fn=<SumBackward0>),), **{'gradient': None, 'retain_graph': None, 'create_graph': False, 'inputs': None})

TorchDispatchMode logging:
# Here the requires_grad flag from autograd is removed while default arguments were populated.
Dispatch Log: aten.rand.default(*([10],), **{'device': device(type='cpu'), 'pin_memory': False})
Dispatch Log: aten.mul.Tensor(*(tensor([0.2151, 0.6018, 0.8415, 0.9060, 0.2974, 0.7708, 0.6668, 0.0352, 0.7948,
        0.6023], requires_grad=True), 2), **{})
Dispatch Log: aten.sum.default(*(tensor([0.4303, 1.2036, 1.6831, 1.8120, 0.5949, 1.5416, 1.3335, 0.0705, 1.5897,
        1.2046], grad_fn=<MulBackward0>),), **{})
# Here we don't see the call to backward itself, but its constituents. Starting here with the factory function that creates the initial gradient.
Dispatch Log: aten.ones_like.default(*(tensor(11.4637, grad_fn=<SumBackward0>),), **{'pin_memory': False, 'memory_format': torch.preserve_format})
# This is the backward of the sum
Dispatch Log: aten.expand.default(*(tensor(1.), [10]), **{})
Dispatch Log: aten.mul.Tensor(*(tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), 2), **{})
Dispatch Log: aten.detach.default(*(tensor([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]),), **{})
Dispatch Log: aten.detach.default(*(tensor([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]),), **{})

扩展 PyTorch¶

添加新运算符¶

扩展torch.autograd¶

适用情形¶

何时不使用¶

如何使用¶

例¶

组合或分离forward()和setup_context()¶

正向模式 AD¶

torch.functransforms 和/或torch.vmap()¶

扩展torch.nn¶

添加Module¶

扩展torchPython 接口¶

扩展torch替换为 -like 类型Tensor¶

子类化torch.Tensor¶

扩展torch使用包装器 类型Tensor¶

对定义__torch_function__¶