注意

转到最后下载完整的示例代码

使用 ExecuTorch 开发人员工具对模型进行性能分析¶

作者： Jack Khuu

ExecuTorch 开发者工具是一组工具，旨在为用户提供了对 ExecuTorch 进行性能分析、调试和可视化的能力模型。

本教程将展示如何利用 Developer Tools 对模型进行性能分析的完整端到端流程。具体来说，它将：

生成 Developer Tools （ETRecord， ETDump）使用的工件。
创建一个使用这些工件的 Inspector 类。
利用 Inspector 类分析模型性能分析结果。

先决条件¶

要运行本教程，您首先需要设置 ExecuTorch 环境。

生成 ETRecord（可选）¶

第一步是生成一个 . 包含模型用于将运行时结果（例如性能分析）链接到热切的模型。这是通过生成的。ETRecordETRecordexecutorch.devtools.generate_etrecord

executorch.devtools.generate_etrecord接收输出文件路径（STR），则 edge dialect 模型（），ExecuTorch dialect 模型（）和其他模型的可选字典。EdgeProgramManagerExecutorchProgramManager

在本教程中，使用一个示例模型（如下所示）进行演示。

import copy

import torch
import torch.nn as nn
import torch.nn.functional as F
from executorch.devtools import generate_etrecord

from executorch.exir import (
    EdgeCompileConfig,
    EdgeProgramManager,
    ExecutorchProgramManager,
    to_edge,
)
from torch.export import export, ExportedProgram


# Generate Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)  # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = Net()

aten_model: ExportedProgram = export(model, (torch.randn(1, 1, 32, 32),), strict=True)

edge_program_manager: EdgeProgramManager = to_edge(
    aten_model, compile_config=EdgeCompileConfig(_check_ir_validity=True)
)
edge_program_manager_copy = copy.deepcopy(edge_program_manager)
et_program_manager: ExecutorchProgramManager = edge_program_manager.to_executorch()


# Generate ETRecord
etrecord_path = "etrecord.bin"
generate_etrecord(etrecord_path, edge_program_manager_copy, et_program_manager)

警告

用户应该对的输出进行深层复制，并传入 deepcopy 复制到 API。这是必需的，因为后续调用，执行就地更改，并将在此过程中丢失调试数据。to_edge()generate_etrecordto_executorch()

生成 ETDump¶

下一步是生成一个 . 包含运行时结果从执行捆绑程序模型。ETDumpETDump

在本教程中，将从上面的示例模型创建一个 Bundled Program。

import torch
from executorch.devtools import BundledProgram

from executorch.devtools.bundled_program.config import MethodTestCase, MethodTestSuite
from executorch.devtools.bundled_program.serialize import (
    serialize_from_bundled_program_to_flatbuffer,
)

from executorch.exir import to_edge
from torch.export import export

# Step 1: ExecuTorch Program Export
m_name = "forward"
method_graphs = {m_name: export(model, (torch.randn(1, 1, 32, 32),), strict=True)}

# Step 2: Construct Method Test Suites
inputs = [[torch.randn(1, 1, 32, 32)] for _ in range(2)]

method_test_suites = [
    MethodTestSuite(
        method_name=m_name,
        test_cases=[
            MethodTestCase(inputs=inp, expected_outputs=getattr(model, m_name)(*inp))
            for inp in inputs
        ],
    )
]

# Step 3: Generate BundledProgram
executorch_program = to_edge(method_graphs).to_executorch()
bundled_program = BundledProgram(executorch_program, method_test_suites)

# Step 4: Serialize BundledProgram to flatbuffer.
serialized_bundled_program = serialize_from_bundled_program_to_flatbuffer(
    bundled_program
)
save_path = "bundled_program.bp"
with open(save_path, "wb") as f:
    f.write(serialized_bundled_program)

使用 CMake（按照这些说明设置 cmake）执行捆绑程序以生成：ETDump

cd executorch
./examples/devtools/build_example_runner.sh
cmake-out/examples/devtools/example_runner --bundled_program_path="bundled_program.bp"

创建 Inspector¶

最后一步是通过传入工件路径来创建。 Inspector 从中获取运行时结果并将其关联到 Edge Dialect Graph 的运算符。InspectorETDump

召回：不需要 AN。如果未提供 an，则 Inspector 将显示运行时结果，而不显示运算符关联。ETRecordETRecord

要可视化所有运行时事件，请调用 Inspector 的 .print_data_tabular

from executorch.devtools import Inspector

etrecord_path = "etrecord.bin"
etdump_path = "etdump.etdp"
inspector = Inspector(etdump_path=etdump_path, etrecord=etrecord_path)
inspector.print_data_tabular()

False

使用 Inspector 进行分析¶

Inspector提供 2 种访问摄取信息的方法：EventBlocks 和 .这些媒介使用户能够执行自定义有关其模型性能的分析。DataFrames

以下是和方法的用法示例。EventBlockDataFrame

# Set Up
import pprint as pp

import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)

如果用户想要原始性能分析结果，他们将执行类似于查找事件的原始运行时数据。addmm.out

for event_block in inspector.event_blocks:
    # Via EventBlocks
    for event in event_block.events:
        if event.name == "native_call_addmm.out":
            print(event.name, event.perf_data.raw if event.perf_data else "")

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_addmm.out"]
    print(df[["event_name", "raw"]])
    print()

如果用户想要将 Operator 追溯到他们的模型代码，他们会这样做类似于查找最慢的呼叫。convolution.out

for event_block in inspector.event_blocks:
    # Via EventBlocks
    slowest = None
    for event in event_block.events:
        if event.name == "native_call_convolution.out":
            if slowest is None or event.perf_data.p50 > slowest.perf_data.p50:
                slowest = event
    if slowest is not None:
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces)
        print()
        pp.pprint(slowest.module_hierarchy)

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_convolution.out"]
    if len(df) > 0:
        slowest = df.loc[df["p50"].idxmax()]
        assert slowest
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces if slowest.stack_traces else "")
        print()
        pp.pprint(slowest.module_hierarchy if slowest.module_hierarchy else "")

如果用户想要模块的总运行时间，他们可以使用 .find_total_for_module

print(inspector.find_total_for_module("L__self__"))
print(inspector.find_total_for_module("L__self___conv2"))

0.0
0.0

注意：是 Inspector 的特殊一等方法find_total_for_module

结论¶

在本教程中，我们了解了使用 ExecuTorch 所需的步骤模型。它还演示了如何使用 Inspector API 以分析模型运行结果。

提及的链接¶

脚本总运行时间：（0 分 1.499 秒）

由 Sphinx-Gallery 生成的图库