自定义编译器传递和分区程序¶

通过¶

刀路大致可分为几个轴：

轴 A：

创建 1 对 X 映射（例如，分解）
创建多对一映射（例如，融合）

轴 B：

执行前向迭代（例如，形状传播）
执行向后迭代（例如，消除死代码）

轴 C：

取决于本地节点信息（例如，out-variant conversion）
依赖于全局图信息（例如内存规划）

我们对这些用例频率的预测是：

A.1、B.1、C.1
答 2
B.2、C.2

1 级¶

对于 1 级用例（创建 1 对 X 映射、执行前向迭代、并查看本地节点信息），我们可以利用一个名为的帮助程序类。这是一种基于解释器的方式，我们执行每个节点并重新创建图形，除了指定的 transformations 指定。这使我们能够通过确保在通道中创建的所有节点都符合 IR 规范，包括确保元数据，例如堆栈跟踪、FakeTensor 值和 torch.nn.Module 层次结构根据所做的转换进行保留和更新。

为了实现这个通道，我们可以创建一个子类并实现公开的函数。当使用 graph 模块调用时，它将运行 Graph 模块并创建一个新图形，其中包含由通行证。这意味着传入的图形模块必须在 CPU 上运行，并且此不变量将在 pass 运行后保持。

一对一通行证¶

一对一映射的示例，如果我们想将一个操作 A 替换为另一个操作 B，我们可以运行给定的，每次我们看到 op A 时，返回 op B。fx.GraphModule

请考虑以下示例：

class ReplaceInPlaceReluWithOutOfPlaceReluPass(ExportPass):
    """
    relu_ is the in-place version. Replace it with relu, which is the
    out-of-place version
    """

    def call_operator(self, op, args, kwargs, meta):
        if op != torch.ops.aten.relu_.default:
            return super().call_operator(op, args, kwargs, meta)
        return super().call_operator(Op(torch.ops.aten.relu.default), args, kwargs, meta)

# To create a pass
replace_pass = ReplaceInPlaceReluWithOutOfPlaceReluPass()
# To run a pass
new_graph_module = replace_pass(graph_module).graph_module

该调用将创建一个 FX 节点，并返回使用给定的参数。super().call_operator(op, args, kwargs, meta)call_function

1 对 X 通行证¶

如果我们想做 1 对 X 的映射，比如用其他 2 个操作 B 替换 op A 和 C，然后我们将进行 2 次调用以创建 2 个 FX 节点。一个包含操作 B，另一个包含操作 C，并返回运行 op C 的结果。super().call_operator

例如：

class ReplaceAddWithMulSub(ExportPass):
    """
    Original:
        def f(x, y):
            return x + y

    After pass:
        def f(x, y):
            z = x * y
            return z - y
    """
    def call_operator(self, op, args, kwargs, meta):
        if op != torch.ops.aten.add.default:
            return super().call_operator(op, args, kwargs, meta)

        x, y = args

        mul_res = super().call_operator(
            torch.ops.aten.mul.default,
            args,
            {},
            meta
        )

        return super().call_operator(
            torch.ops.aten.sub.default,
            (mul_res, y),
            {},
            meta
        )

一对一通行证¶

如果我们想删除一个 op，我们只需返回传递给功能：

class RemoveDetachPass(ExportPass):
    def call_operator(self, op, args, kwargs, meta):
        if op not in (
            torch.ops.aten.detach.default,
            torch.ops.aten.detach_copy.default,
        ):
            return super().call_operator(op, args, kwargs, meta)

        assert len(args) == 1
        return args[0]

利用本地信息¶

利用本地节点信息的一个例子是，如果我们想将所有标量转换为张量，则可以运行给定的，并且对于每个包含标量的参数，我们将其转换为 Tensor。它可能看起来像这样：fx.GraphModule

def args_map(op, fn, args, kwargs):
    assert isinstance(args, tuple)
    assert isinstance(kwargs, dict)
    args = list(args)
    kwargs = kwargs.copy()

    # Update the argument based on the function passed
    def update(key, args, schema):
        args[key] = fn(args[key], schema)

    # Update each argument in the schema
    for i, schema in enumerate(self.op._schema.arguments):
        if schema.name in kwargs:
            update(schema.name, kwargs, schema)
        elif not schema.kwarg_only and i < len(args):
            update(i, args, schema)

class ScalarToTensorPass(ExportPass):
    def call_operator(self, op, args, kwargs):
        def try_coerce(value, arg):
            return (
                torch.tensor(value)
                if isinstance(value, (float, int, bool))
                and type(arg.type) == torch.TensorType
                else value
            )

        args, kwargs = args_map(op, try_coerce, args, kwargs)
        return super().call_operator(op, args, kwargs)

2 级¶

为了创建多对一映射，我们可以利用 FX 的子图重写器。给定一个，它会创建一个与模式匹配的运算符的子图然后用 .patternreplacement

注意

This is an inplace operation.

和 inputs 必须是用与您匹配的 EXIR 图中使用的相同运算（ATen ops）以便 Subgraph Rewriter 可以在图中找到正确的模式。输入添加到模式/替换可调用对象将被视为通配符。patternreplacement

请考虑以下示例：

from torch.fx import subgraph_rewriter

def replace_patterns(graph_module):
    def pattern(x, y):
        x = torch.ops.aten.add.Tensor(x, y)
        x = torch.ops.aten.mul.Tensor(x, y)
        return x

    def replacement(x, y):
        return torch.ops.aten.sub.Tensor(x, y)

replaced_patterns = subgraph_rewriter.replace_pattern_with_filters(
    traced_module, pattern, replacement
)

子图重写器返回以下列表：ReplacedPatterns

@dataclass
class ReplacedPatterns:
    # Node from which the match was found
    anchor: Node
    # Maps nodes in the pattern subgraph to nodes in the larger graph
    nodes_map: Dict[Node, Node]
    # List of nodes that were added into the graph
    replacements: List[Node]

注意

The nodes created by the subgraph rewriter will not have the metadata that
is normally in EXIR nodes (`stack_trace`, `val`, `nn_module_stack`).

3 级¶

对于创建通道的第三种方法，我们可以使用最基本的 . 要创建一个通道，我们可以将其子类化，并使用通行证内容。此外，我们可以实现函数，这些函数将在函数之前和之后调用 .请注意，这些函数也可以在中重写。在图形上运行传递 module，我们可以将 Graph Module 直接传递给类的实例。callrequiresensurescallExportPass

请考虑以下示例：

class ReplaceAddPass(PassBase):

    def __init__(self, replace_op):
        self.replace_op = replace_op

    def call(self, graph_module):
        for node in gm.graph.nodes:
            if node.op == "call_function" and node.target == torch.add:
                node.target = self.replace_op

    # Optional to implement, will be called before call()
    def requires(self, graph_module) -> None:
        for node in graph_module.graph.nodes:
            if node.op == "call_function" and node.target == torch.add:
                return
        raise ValueError("No torch.add ops!")

    # Optional to implement, will be called after call()
    def ensures(self, graph_module: torch.fx.GraphModule) -> None:
        pass

# To create a pass
replace_add_with_div = ReplaceAddPass(torch.div)
# To run a pass
replace_add_with_div(graph_module)

通行证管理器¶

这是一个用于在给定图形上运行多个传递的类模块。初始化实例时，我们传入一个 pass 列表我们想要运行并设置几个标志。运行路径集合在 Graph 模块上，我们可以将 Graph 模块直接传递给实例。PassManagerPassManagerPassManager

一个例子：

from executorch.exir.pass_manager import PassManager

pm = PassManager(
    passes=[replace_add_with_div, replace_div_with_mul],
    run_checks_after_each_pass=True,
    suppress_check_failures=False,
)
graph_module_out = pm(graph_module)

要添加在每次传递后运行的一组通用检查，我们可以调用函数，该函数将可调用函数作为输入。如果设置了标志，则将为在图形模块上运行每个传递后调用。set_checks(check: Callable)run_checks_after_each_passcheck

一个例子：

pm = PassManager(passes=[replace_add_with_div, replace_div_with_mul])

def check_div_target(graph_module):
    for node in graph_module.graph.nodes:
        if node.op == "call_function" and node.target != torch.div:
            raise ValueError("Target should be div!")

pm.add_checks(check_div_target)

pm(graph_module)    # raises ValueError after replace_div_with_mul pass

分区程序¶

我们可以使用几个常见的基于 FX 图的分区器进行分区图表。但是，这些不一定会产生合规的图形带有 IR Spec，因此使用时要小心。

子图匹配器¶

为了在图中查找与特定模式匹配的子图，我们可以利用 FX 的 .

类属性：

pattern (Graph)：目标匹配模式。Placeholder 节点中的 graph 将在匹配时被视为通配符。
match_output (bool)：如果为 True，则模式图中的输出节点将为视为目标模式的一部分。如果为 False，则忽略输出节点比赛期间。
match_placeholder (bool)：如果为 True，则为模式图中的占位符节点将被视为目标模式的一部分。如果为 False，则为 placeholder nodes 将使用通配符。
remove_overlapping_matches (bool)：如果为 True，则在重叠的情况下匹配项，则仅返回第一个匹配项。
ignore_literals (bool)：如果为 True，则不会检查文本是否相等，并且会将它们视为通配符。

请考虑以下示例：

from torch.fx.passes.utils.matcher_utils import SubgraphMatcher

class LargeModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self._weight = torch.nn.Parameter(torch.ones(3, 3))
        self._bias = torch.nn.Parameter(torch.ones(3, 3))

    def forward(self, x):
        return torch.ops.aten.addmm.default(self._bias, x, self._weight)

large_model_graph = to_edge(export(LargeModel(), large_inputs)).exported_program().graph_module.graph

class PatternModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self._weight_1 = torch.nn.Parameter(torch.ones(5, 5))
        self._bias_1 = torch.nn.Parameter(torch.ones(5, 5))

    def forward(self, x):
        return torch.ops.aten.addmm.default(self._bias_1, x, self._weight_1)

pattern_graph = to_edge(export(PatternModel(), pattern_inputs)).exported_program().graph_module.graph

subgraph_matcher = SubgraphMatcher(pattern_graph)
match_result = subgraph_matcher.match(large_model_graph)

该函数返回以下列表：matchInternalMatch

@dataclass
class InternalMatch():
    # Nodes from which the match was found
    anchors: List[Node]
    # Maps nodes in the pattern subgraph to nodes in the larger graph
    nodes_map: Dict[Node, Node] = field(default_factory=dict)
    # Nodes in target graph that are matched placeholder in pattern
    placeholder_nodes: List[Node] = field(default_factory=list)
    # Nodes in matched subgraph returned by output
    returning_nodes: List[Node] = field(default_factory=list)

基于功能的分区程序¶

要找到支持特定不变量的节点的最大子图，我们可以利用 FX 的 .

类属性

graph_module (torch.fx.GraphModule)：我们正在分区的图形模块。
operator_support (OperatorSupportBase)：用于确定分区支持图中的 node。
allows_single_node_partition (bool)：如果为 True，则允许单个节点要形成的分区。
non_compute_ops (Optional[Sequence[str]])：一组被视为 “非计算” （ex 和，因此分区程序不会创建仅包含这些非计算操作torch.ops.aten.view_operator.getitem
allowed_single_node_partition_ops (Optional[Sequence[str]])：一组操作允许位于单个节点分区中。

该类由分区程序，用于确定图中的特定节点是否属于分区。这是通过重写函数来完成的。您可以 chain multiple 通过使用（其中如果任何 OperatorSupportBase 返回 False，则返回 False）和（如果任何 OperatorSupportBase 返回 True，则返回 True）。is_node_supportedOperatorSuppportBase

请考虑以下示例：

from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner
from torch.fx.passes.operator_support import any_chain, OperatorSupportBase

class AddMulOperatorSupport(OperatorSupportBase):
    def is_node_supported(self, submodules, node: torch.fx.Node) -> bool:
        return node.op == "call_function" and node.target in [
            torch.ops.aten.add.Tensor, torch.ops.aten.mul.Tensor,
        ]

capability_partitioner = CapabilityBasedPartitioner(
    graph_module,
    op_support,
)

# Returns a list of partitions (list of nodes that belong in each partition)
partition_list = capability_partitioner.propose_partitions()

如果你查看基于功能的分区器，你可能还会发现一个函数，该函数将返回带有分区的修改后的图形作为子模块，并通过节点在顶层图中调用这些子模块。但是，这不符合 IR 规范，因为我们确实不允许节点。fuse_partitioncall_modulecall_module

组合的¶

我们还提供了一个组合的 helper 函数：

参数：

graph_module (fx.GraphModule)：我们要分区的模块
patterns (List[torch.fx.Graph])：以下形式的模式列表这些图形可以通过通过 exir.capture（推荐）或符号跟踪（其可能不会产生准确的 Edge dialect 图形），或者通过手动制作 graph 模块。graph
op_support (OperatorSupportBase)：可以创建的 OperatorSupportBase 通过以下方式：
- 直接将其子类化并实现is_node_supported()
- 获取的结果create_op_support()
- 获取的结果create_pattern_support()
- 多个 OperatorSupportBase 类与或链接在一起chain()any_chain()

返回

包含节点的分区（最大可能的子图）列表为由给定 OperatorSupportBase 对象的联合和给定的模式图。

源分区程序¶

对于用户希望根据更高的级别模块（或）现在是分解为它们的运算符（，），我们有以下 helper 函数：torch.nn.Lineartorch.nn.functional.Linearaten.permuteaten.addmm

get_source_partitions(graph: torch.fx.Graph, wanted_sources: List[Any]) -> Dict[Any, SourcePartition]

参数：

graph：我们要分区的图
wanted_sources：从此分解的节点的源列表源。这可以是函数（例如）或叶模块类型（例如torch.nn.functional.lineartorch.nn.Linear)

返回：

字典将源（例如）映射到一个列表，该列表对应于从中展平的节点列表该类型的模块。torch.nn.modules.linear.LinearSourcePartitions

@dataclass
class SourcePartition():
    # Nodes in a particular partition
    nodes: List[Node]
    # Module type
    module_type: Type
    # Nodes in the graph that are needed as inputs to the partition
    input_nodes: List[Node] = field(default_factory=list)
    # Nodes in the partition that are being used by nodes outside of the partition
    output_nodes: List[Node] = field(default_factory=list)
    # Parameters that are being used
    params: List[str] = field(default_factory=list)

一个例子：

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(3, 3)
        self.relu = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(3, 5)

    def forward(self, x):
        x = self.linear1(x)
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        return x

inputs = (torch.randn(3, 3),)
edge_graph = to_edge(export(M(), inputs)).exported_program().graph_module.graph
print(edge_graph)
"""
graph():
    %arg0 : [#users=1] = placeholder[target=arg0]
    %_param_constant0 : [#users=1] = get_attr[target=_param_constant0]
    %permute_default : [#users=1] = call_function[target=torch.ops.aten.permute_copy.default](args = (%_param_constant0,), kwargs = {})
    %_param_constant1 : [#users=1] = get_attr[target=_param_constant1]
    %addmm_default : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1, %arg0, %t_default), kwargs = {})
    %_param_constant0_1 : [#users=1] = get_attr[target=_param_constant0]
    %permute_default_1 : [#users=1] = call_function[target=torch.ops.aten.permute_copy.default](args = (%_param_constant0_1,), kwargs = {})
    %_param_constant1_1 : [#users=1] = get_attr[target=_param_constant1]
    %addmm_default_1 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1_1, %addmm_default, %t_default_1), kwargs = {})
    %relu_default : [#users=1] = call_function[target=torch.ops.aten.relu.default](args = (%addmm_default_1,), kwargs = {})
    %_param_constant2 : [#users=1] = get_attr[target=_param_constant2]
    %permute_default_2 : [#users=1] = call_function[target=torch.ops.aten.permute_copy.default](args = (%_param_constant2,), kwargs = {})
    %_param_constant3 : [#users=1] = get_attr[target=_param_constant3]
    %addmm_default_2 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant3, %relu_default, %t_default_2), kwargs = {})
    return [addmm_default_2]
"""

module_partitions = get_source_partitions(edge_graph, [torch.nn.Linear, torch.nn.ReLU])
print(module_partitions)
"""
{<class 'torch.nn.modules.linear.Linear'>: [
    ModulePartition(nodes=[_param_constant0, t_default, _param_constant1, addmm_default], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[arg0], output_nodes=[addmm_default], params=["_param_constant0", "_param_constant1"]),
    ModulePartition(nodes=[_param_constant0_1, t_default_1, _param_constant1_1, addmm_default_1], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[addmm_default], output_nodes=[addmm_default_1], params=["_param_constant0_1", "_param_constant1_1"]),
    ModulePartition(nodes=[_param_constant2, t_default_2, _param_constant3, addmm_default_2], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[relu_default], output_nodes=[addmm_default_2], params=["_param_constant2", "_param_constant3"])],

 <class 'torch.nn.modules.activation.ReLU'>: [
    ModulePartition(nodes=[relu_default], module_type=<class 'torch.nn.modules.activation.ReLU'>, input_nodes=[addmm_default_1], output_nodes=[relu_default], params=[])]}
"""

自定义编译器传递和分区程序¶

通过¶

1 级¶

一对一通行证¶

1 对 X 通行证¶

一对一通行证¶

利用本地信息¶

2 级¶

3 级¶

通行证管理器¶

分区程序¶

子图匹配器¶

基于功能的分区程序¶

组合的¶

源分区程序¶

文档

教程

资源