目录

快速上手

这是一个自包含的指南,介绍如何构建一个简单的应用程序和组件规格,并通过两种不同的调度器启动它。

安装

首先,我们需要安装包含命令行接口和库的TorchX Python包。

# install torchx with all dependencies
$ pip install torchx[dev]

参见README以获取有关安装的更多信息。

[1]:
%%sh
torchx --help
usage: torchx [-h] [--log_level LOG_LEVEL] [--version]
              {describe,log,run,builtins,runopts,status,configure} ...

torchx CLI

optional arguments:
  -h, --help            show this help message and exit
  --log_level LOG_LEVEL
                        Python logging log level
  --version             show program's version number and exit

sub-commands:
  Use the following commands to run operations, e.g.: torchx run ${JOB_NAME}

  {describe,log,run,builtins,runopts,status,configure}

世界你好

让我们从编写一个简单的“Hello World” Python应用程序开始。这只是一个普通的Python程序,可以包含你想要的任何内容。

注意

此示例使用 Jupyter Notebook %%writefile 创建本地文件,以作示范用途。在正常用法中,这些文件应为独立文件。

[2]:
%%writefile my_app.py

import sys
import argparse

def main(user: str) -> None:
    print(f"Hello, {user}!")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Hello world app"
    )
    parser.add_argument(
        "--user",
        type=str,
        help="the person to greet",
        required=True,
    )
    args = parser.parse_args(sys.argv[1:])

    main(args.user)
Writing my_app.py

现在我们有了一个应用,可以编写它的组件文件了。这个功能使我们能够以用户友好的方式重用和共享我们的应用。

我们可以从 torchx 命令行界面 (cli) 或者作为管道的一部分进行程序化使用。

[3]:
%%writefile my_component.py

import torchx.specs as specs

def greet(user: str, image: str = "my_app:latest") -> specs.AppDef:
    return specs.AppDef(
        name="hello_world",
        roles=[
            specs.Role(
                name="greeter",
                image=image,
                entrypoint="python",
                args=[
                    "-m", "my_app",
                    "--user", user,
                ],
            )
        ],
    )
Writing my_component.py

我们可以通过torchx run执行我们的组件。The local_cwd调度程序根据当前目录执行组件。

[4]:
%%sh
torchx run --scheduler local_cwd my_component.py:greet --user "your name"
Hello, your name!
local_cwd://torchx/hello_world_49bdfa71
torchx 2021-10-20 19:04:18 INFO     Waiting for the app to finish...
torchx 2021-10-20 19:04:19 INFO     Job finished: SUCCEEDED

如果我们希望在其他环境中运行,我们可以构建一个 Docker 容器,这样我们就可以在支持 Docker 的环境中运行我们的组件,例如 Kubernetes,或者通过本地的 Docker 调度程序运行。

注意

这需要安装Docker,并且在Google Colab等环境中无法使用。如果你尚未完成,请按照以下地址的安装说明进行操作:https://docs.docker.com/get-docker/

[5]:
%%writefile Dockerfile

FROM ghcr.io/pytorch/torchx:0.1.0rc1

ADD my_app.py .
Writing Dockerfile

一旦我们创建了 Dockerfile,就可以创建我们的 docker 镜像。

[6]:
%%sh
docker build -t my_app:latest -f Dockerfile .

Step 1/2 : FROM ghcr.io/pytorch/torchx:0.1.0rc1
0.1.0rc1: Pulling from pytorch/torchx
4bbfd2c87b75: Pulling fs layer
d2e110be24e1: Pulling fs layer
889a7173dcfe: Pulling fs layer
6009a622672a: Pulling fs layer
143f80195431: Pulling fs layer
eccbe17c44e1: Pulling fs layer
d4c7af0d4fa7: Pulling fs layer
06b5edd6bf52: Pulling fs layer
f18d016c4ccc: Pulling fs layer
c0ad16d9fa05: Pulling fs layer
30587ba7fd6b: Pulling fs layer
909695be1d50: Pulling fs layer
f119a6d0a466: Pulling fs layer
88d87059c913: Pulling fs layer
143f80195431: Waiting
eccbe17c44e1: Waiting
d4c7af0d4fa7: Waiting
06b5edd6bf52: Waiting
f18d016c4ccc: Waiting
c0ad16d9fa05: Waiting
30587ba7fd6b: Waiting
909695be1d50: Waiting
f119a6d0a466: Waiting
88d87059c913: Waiting
6009a622672a: Waiting
d2e110be24e1: Verifying Checksum
d2e110be24e1: Download complete
6009a622672a: Verifying Checksum
6009a622672a: Download complete
4bbfd2c87b75: Verifying Checksum
4bbfd2c87b75: Download complete
889a7173dcfe: Verifying Checksum
889a7173dcfe: Download complete
eccbe17c44e1: Verifying Checksum
eccbe17c44e1: Download complete
06b5edd6bf52: Verifying Checksum
06b5edd6bf52: Download complete
d4c7af0d4fa7: Verifying Checksum
d4c7af0d4fa7: Download complete
c0ad16d9fa05: Verifying Checksum
c0ad16d9fa05: Download complete
30587ba7fd6b: Verifying Checksum
30587ba7fd6b: Download complete
4bbfd2c87b75: Pull complete
909695be1d50: Verifying Checksum
909695be1d50: Download complete
f119a6d0a466: Verifying Checksum
f119a6d0a466: Download complete
88d87059c913: Verifying Checksum
88d87059c913: Download complete
d2e110be24e1: Pull complete
889a7173dcfe: Pull complete
f18d016c4ccc: Verifying Checksum
f18d016c4ccc: Download complete
143f80195431: Verifying Checksum
143f80195431: Download complete
6009a622672a: Pull complete
143f80195431: Pull complete
eccbe17c44e1: Pull complete
d4c7af0d4fa7: Pull complete
06b5edd6bf52: Pull complete
f18d016c4ccc: Pull complete
c0ad16d9fa05: Pull complete
30587ba7fd6b: Pull complete
909695be1d50: Pull complete
f119a6d0a466: Pull complete
88d87059c913: Pull complete
Digest: sha256:a738949601d82e7f100fa1efeb8dde0c35ce44c66726cf38596f96d78dcd7ad3
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0rc1
 ---> 3dbec59e8049
Step 2/2 : ADD my_app.py .
 ---> ceb8e40109ec
Successfully built ceb8e40109ec
Successfully tagged my_app:latest

然后我们可以在本地调度器上启动它。

[7]:
%%sh
torchx run --scheduler local_docker my_component.py:greet --image "my_app:latest" --user "your name"
Client:
 Version:           20.10.9+azure-1
 API version:       1.41
 Go version:        go1.16.8
 Git commit:        c2ea9bc90bacf19bdbe37fd13eec8772432aca99
 Built:             Thu Sep 23 18:26:34 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.9+azure-1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.8
  Git commit:       79ea9d3080181d755855d5924d0f4f116faa9463
  Built:            Thu Sep 23 18:26:18 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.11+azure
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 runc:
  Version:          1.0.2
  GitCommit:        52b36a2dd837e8462de8e01458bf02cf9eea47dd
 docker-init:
  Version:          0.19.0
  GitCommit:
Hello, your name!
local_docker://torchx/hello_world_fce26e07
Error response from daemon: pull access denied for my_app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
torchx 2021-10-20 19:06:48 WARNING  failed to fetch image my_app:latest, falling back to local: Command '['docker', 'pull', 'my_app:latest']' returned non-zero exit status 1.
torchx 2021-10-20 19:06:48 INFO     Waiting for the app to finish...
torchx 2021-10-20 19:06:49 INFO     Job finished: SUCCEEDED

如果您有一个Kubernetes集群,您可以使用Kubernetes调度器在集群上启动此任务。

$ docker push my_app:latest
$ torchx run --scheduler kubernetes my_component.py:greet --image "my_app:latest" --user "your name"

内置函数

TorchX 还提供了许多内置组件,附带预制的图片。你可以通过以下方式发现它们:

[8]:
%%sh
torchx builtins
Found 7 builtin components:
  1. dist.ddp
  2. utils.booth
  3. utils.copy
  4. utils.echo
  5. utils.sh
  6. utils.touch
  7. serve.torchserve

你可以像使用其他组件一样,通过命令行接口、管道或编程方式来使用这些功能。

[9]:
%%sh
torchx run utils.echo --msg "Hello :)"
Client:
 Version:           20.10.9+azure-1
 API version:       1.41
 Go version:        go1.16.8
 Git commit:        c2ea9bc90bacf19bdbe37fd13eec8772432aca99
 Built:             Thu Sep 23 18:26:34 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.9+azure-1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.8
  Git commit:       79ea9d3080181d755855d5924d0f4f116faa9463
  Built:            Thu Sep 23 18:26:18 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.11+azure
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 runc:
  Version:          1.0.2
  GitCommit:        52b36a2dd837e8462de8e01458bf02cf9eea47dd
 docker-init:
  Version:          0.19.0
  GitCommit:
Hello :)
local_docker://torchx/echo_1d65ce6f
0.1.0rc2: Pulling from pytorch/torchx
4bbfd2c87b75: Already exists
d2e110be24e1: Already exists
889a7173dcfe: Already exists
6009a622672a: Already exists
143f80195431: Already exists
eccbe17c44e1: Already exists
c8e67195524a: Pulling fs layer
b8164a4c6bc7: Pulling fs layer
e6dc31f41742: Pulling fs layer
72e3ffceef80: Pulling fs layer
2f0f90133f1e: Pulling fs layer
2f47096d98b8: Pulling fs layer
fbb305c7c317: Pulling fs layer
f9cd8d47efcc: Pulling fs layer
2f47096d98b8: Waiting
fbb305c7c317: Waiting
f9cd8d47efcc: Waiting
72e3ffceef80: Waiting
2f0f90133f1e: Waiting
b8164a4c6bc7: Verifying Checksum
b8164a4c6bc7: Download complete
72e3ffceef80: Verifying Checksum
72e3ffceef80: Download complete
c8e67195524a: Verifying Checksum
c8e67195524a: Download complete
2f0f90133f1e: Verifying Checksum
2f0f90133f1e: Download complete
fbb305c7c317: Verifying Checksum
fbb305c7c317: Download complete
f9cd8d47efcc: Verifying Checksum
f9cd8d47efcc: Download complete
c8e67195524a: Pull complete
2f47096d98b8: Verifying Checksum
2f47096d98b8: Download complete
b8164a4c6bc7: Pull complete
e6dc31f41742: Verifying Checksum
e6dc31f41742: Download complete
e6dc31f41742: Pull complete
72e3ffceef80: Pull complete
2f0f90133f1e: Pull complete
2f47096d98b8: Pull complete
fbb305c7c317: Pull complete
f9cd8d47efcc: Pull complete
Digest: sha256:f97df7bd3d2137b3dcb6b85e14f4eef9c1c7cdd826d12508cc7cafb08bb2f704
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0rc2
ghcr.io/pytorch/torchx:0.1.0rc2
torchx 2021-10-20 19:07:54 INFO     Waiting for the app to finish...
torchx 2021-10-20 19:07:55 INFO     Job finished: SUCCEEDED

下一步

  1. 查看其他功能,例如 torchx CLI

  2. 学习如何通过参考 规范 编写更复杂的应用程序规格

  3. 浏览内置组件集合 builtin components

  4. 查看 调度器列表 以了解 runner 支持的调度器

  5. 查看可以在哪些 机器学习流水线平台 上运行组件

  6. See a 训练应用示例

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源