了解 CUDA 内存使用情况¶

为了调试 CUDA 内存使用情况，PyTorch 提供了一种生成内存快照的方法，用于记录分配的 CUDA 内存的状态在任意时间点，并且可以选择记录导致该快照的分配事件的历史记录。

然后，可以将生成的快照拖放到托管在 pytorch.org/memory_viz 的交互式查看器上，该查看器可用于浏览快照。

生成快照¶

记录快照的常见模式是启用内存历史记录，运行要观察的代码，然后保存带有腌制快照的文件：

# enable memory history, which will
# add tracebacks and event history to snapshots
torch.cuda.memory._record_memory_history()

run_your_code()
torch.cuda.memory._dump_snapshot("my_snapshot.pickle")

使用可视化工具¶

打开 pytorch.org/memory_viz 并将腌制快照文件拖放到可视化工具中。可视化工具是在计算机上本地运行的 javascript 应用程序。它不会上传任何快照数据。

Active Memory Timeline¶

Active Memory Timeline （活动内存时间轴）显示特定 GPU 上快照中随时间推移的所有实时张量。平移/缩放绘图以查看较小的分配。将鼠标悬停在已分配的块上，以查看该块分配时间的堆栈跟踪，以及其地址等详细信息。细节滑块可以调整为当有大量数据时，渲染更少的分配并提高性能。

分配器状态历史记录¶

Allocator State History （分配器状态历史记录）在左侧的时间轴中显示各个分配器事件。在时间轴中选择一个事件，以查看 allocator 状态。此摘要显示了从 cudaMalloc 返回的每个单独的 segment 以及如何将其拆分为单个分配的块或可用空间。将鼠标悬停在 segments 和 blocks 上，以查看分配内存时的堆栈跟踪。将鼠标悬停在事件上可查看事件发生时的堆栈跟踪，例如，释放张量时。内存不足错误将报告为 OOM 事件。在 OOM 期间查看内存状态可能有助于了解原因即使预留内存仍然存在，分配也会失败。

堆栈跟踪信息还报告发生分配的地址。地址b7f064c000000_0是指地址 7f064c000000 的（b）锁，这是该地址的“_0”次分配。可以在 Active Memory Timeline 中查找和搜索此唯一字符串在 Active State History 中检查分配或释放张量时的内存状态。

快照 API 参考¶

torch.cuda.memory 中。_record_memory_history（enabled='all'， context='all'， stacks='all'， max_entries=9223372036854775807， 装置=无）[来源]¶

启用与内存关联的堆栈跟踪记录分配，因此您可以知道什么分配了任何内存torch.cuda.memory._snapshot().

此外，还保留每个当前分配的堆栈跟踪和 free，这也将启用所有 alloc/free 事件的历史记录。

用torch.cuda.memory._snapshot()要检索此信息，以及 _memory_viz.py 中的工具来可视化快照。

Python 跟踪收集速度很快（每条跟踪 2us），因此您可以考虑在生产作业上启用此功能（如果您预计必须进行调试）内存问题。

C++ 跟踪收集也很快（~50ns/帧），这适用于许多典型程序每条跟踪最高为 ~2us，但可能因堆栈深度而异。

参数

enabled （Literal[None， “state”， “all”]， optional） – None，禁用记录内存历史记录。“state” 时，保留当前分配的内存的信息。“all”，此外保留所有 alloc/free 调用的历史记录。默认为 “all”。
context （Literal[None， “state”， “alloc”， “all”]， optional） – 无，不记录任何回溯。“state”，记录当前分配的内存的回溯。“alloc”，此外保留 alloc 调用的回溯。“all”，此外还保留免费调用的回溯。默认为 “all”。
stacks （Literal[“python”， “all”]，可选） – “python”，在回溯 “all” 中包含 Python、TorchScript 和电感器帧，另外还包括 C++ 帧默认为 “all”。
max_entries （int， optional） – 在记录的历史记录中最多保留 max_entries 个 alloc/free 事件。

torch.cuda.memory 中。_snapshot（device=None）[来源]¶

保存调用 CUDA 内存状态时的快照。

状态表示为具有以下结构的字典。

class Snapshot(TypedDict):
    segments : List[Segment]
    device_traces: List[List[TraceEntry]]

class Segment(TypedDict):
    # Segments are memory returned from a cudaMalloc call.
    # The size of reserved memory is the sum of all Segments.
    # Segments are cached and reused for future allocations.
    # If the reuse is smaller than the segment, the segment
    # is split into more then one Block.
    # empty_cache() frees Segments that are entirely inactive.
    address: int
    total_size: int #  cudaMalloc'd size of segment
    stream: int
    segment_type: Literal['small', 'large'] # 'large' (>1MB)
    allocated_size: int # size of memory in use
    active_size: int # size of memory in use or in active_awaiting_free state
    blocks : List[Block]

class Block(TypedDict):
    # A piece of memory returned from the allocator, or
    # current cached but inactive.
    size: int
    requested_size: int # size requested during malloc, may be smaller than
                        # size due to rounding
    address: int
    state: Literal['active_allocated', # used by a tensor
                'active_awaiting_free', # waiting for another stream to finish using
                                        # this, then it will become free
                'inactive',] # free for reuse
    frames: List[Frame] # stack trace from where the allocation occurred

class Frame(TypedDict):
        filename: str
        line: int
        name: str

class TraceEntry(TypedDict):
    # When `torch.cuda.memory._record_memory_history()` is enabled,
    # the snapshot will contain TraceEntry objects that record each
    # action the allocator took.
    action: Literal[
    'alloc'  # memory allocated
    'free_requested', # the allocated received a call to free memory
    'free_completed', # the memory that was requested to be freed is now
                    # able to be used in future allocation calls
    'segment_alloc', # the caching allocator ask cudaMalloc for more memory
                    # and added it as a segment in its cache
    'segment_free',  # the caching allocator called cudaFree to return memory
                    # to cuda possibly trying free up memory to
                    # allocate more segments or because empty_caches was called
    'oom',          # the allocator threw an OOM exception. 'size' is
                    # the requested number of bytes that did not succeed
    'snapshot'      # the allocator generated a memory snapshot
                    # useful to coorelate a previously taken
                    # snapshot with this trace
    ]
    addr: int # not present for OOM
    frames: List[Frame]
    size: int
    stream: int
    device_free: int # only present for OOM, the amount of
                    # memory cuda still reports to be free

返回: Snapshot 字典对象

torch.cuda.memory 中。_dump_snapshot（filename='dump_snapshot.pickle'）[来源]¶

将 torch.memory._snapshot（）字典的腌制版本保存到文件中。

此文件可通过 pytorch.org/memory_viz 的交互式快照查看器打开

参数: filename （str， optional） – 要创建的文件的名称。默认为 “dump_snapshot.pickle”。