记录权重和偏差¶

这个深入的探讨将指导你如何设置对权重和偏差的记录。（W&B）在 torchtune 中。

本次深入探讨将涵盖什么

如何开始使用 W&B
如何使用WandBLogger
如何将配置、指标和模型检查点记录到W&B

torchtune 支持将您的训练运行记录到 Weights & Biases。下面的屏幕截图中可以看到 torchtune 微调运行的示例 W&B 工作区。

注意

您需要安装软件包才能使用此功能。您可以通过 pip 安装它：wandb

pip install wandb

然后您需要使用W&B CLI使用您的API密钥登录：

wandb login

公制记录器¶

您需要进行的唯一更改是将 metric logger 添加到您的配置中。Weights & Biases将为您记录指标和模型检查点。

# enable logging to the built-in WandBLogger
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  # the W&B project to log to
  project: torchtune

我们会自动从您正在运行的配方中获取配置并将其记录到W&B。您可以在W&B概述标签中找到它，并在标签中找到实际文件。Files

提示：如果您的作业崩溃或退出而不清理资源，您可能会看到落后的 wandb 进程在后台运行。要杀死这些落后的进程，可以使用 like 命令。ps -aux | grep wandb | awk '{ print $2 }' | xargs kill

注意

点击这个示例项目以查看 W&B 工作区。用于训练模型的配置可以在这里找到。

记录W&B的模型检查点¶

您还可以通过修改所需的脚本方法将模型检查点记录到W&B。save_checkpoint

建议的方法是这样的：

def save_checkpoint(self, epoch: int) -> None:
    ...
    ## Let's save the checkpoint to W&B
    ## depending on the Checkpointer Class the file will be named differently
    ## Here is an example for the full_finetune case
    checkpoint_file = Path.joinpath(
        self._checkpointer._output_dir, f"torchtune_model_{epoch}"
    ).with_suffix(".pt")
    wandb_at = wandb.Artifact(
        name=f"torchtune_model_{epoch}",
        type="model",
        # description of the model checkpoint
        description="Model checkpoint",
        # you can add whatever metadata you want as a dict
        metadata={
            training.SEED_KEY: self.seed,
            training.EPOCHS_KEY: self.epochs_run,
            training.TOTAL_EPOCHS_KEY: self.total_epochs,
            training.MAX_STEPS_KEY: self.max_steps_per_epoch,
        }
    )
    wandb_at.add_file(checkpoint_file)
    wandb.log_artifact(wandb_at)

记录权重和偏差¶

公制记录器¶

记录W&B的模型检查点¶

文档

教程

资源