快速入门¶
这是一个独立的指南,介绍如何构建一个简单的应用程序和组件规范,并通过两个不同的调度器启动它。
安装¶
我们需要做的第一件事是安装 TorchX python 包,其中包括 CLI 和库。
# install torchx with all dependencies
$ pip install torchx[dev]
有关安装的更多信息,请参阅 README。
[1]:
%%sh
torchx --help
usage: torchx [-h] [--log_level LOG_LEVEL] [--version]
{describe,log,run,builtins,runopts,status,configure} ...
torchx CLI
optional arguments:
-h, --help show this help message and exit
--log_level LOG_LEVEL
Python logging log level
--version show program's version number and exit
sub-commands:
Use the following commands to run operations, e.g.: torchx run ${JOB_NAME}
{describe,log,run,builtins,runopts,status,configure}
世界您好¶
让我们从编写一个简单的 “Hello World” python 应用程序开始。这只是一个普通的 python 程序,可以包含您想要的任何内容。
注意
此示例使用 Jupyter Notebook 创建本地文件以用于示例目的。在正常使用情况下,您可以将这些文件作为独立文件。%%writefile
[2]:
%%writefile my_app.py
import sys
import argparse
def main(user: str) -> None:
print(f"Hello, {user}!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Hello world app"
)
parser.add_argument(
"--user",
type=str,
help="the person to greet",
required=True,
)
args = parser.parse_args(sys.argv[1:])
main(args.user)
Writing my_app.py
现在我们有一个应用程序,我们可以为它编写组件文件。此功能允许我们以用户友好的方式重用和共享我们的应用程序。
我们可以从 cli 或以编程方式将此组件用作管道的一部分。torchx
[3]:
%%writefile my_component.py
import torchx.specs as specs
def greet(user: str, image: str = "my_app:latest") -> specs.AppDef:
return specs.AppDef(
name="hello_world",
roles=[
specs.Role(
name="greeter",
image=image,
entrypoint="python",
args=[
"-m", "my_app",
"--user", user,
],
)
],
)
Writing my_component.py
我们可以通过 .调度程序相对于当前目录执行组件。torchx run
local_cwd
[4]:
%%sh
torchx run --scheduler local_cwd my_component.py:greet --user "your name"
Hello, your name!
local_cwd://torchx/hello_world_f5505d67
torchx 2021-10-21 18:01:59 INFO Waiting for the app to finish...
torchx 2021-10-21 18:02:00 INFO Job finished: SUCCEEDED
如果我们想在其他环境中运行,我们可以构建一个 Docker 容器,这样我们就可以在支持 Docker 的环境(如 Kubernetes)中或通过本地 Docker 调度器运行我们的组件。
注意
这需要安装 Docker,并且在 Google Colab 等环境中不起作用。如果您尚未按照以下安装说明进行作:https://docs.docker.com/get-docker/
[5]:
%%writefile Dockerfile
FROM ghcr.io/pytorch/torchx:0.1.0rc1
ADD my_app.py .
Writing Dockerfile
创建 Dockerfile 后,我们就可以创建 Docker 映像了。
[6]:
%%sh
docker build -t my_app:latest -f Dockerfile .
Step 1/2 : FROM ghcr.io/pytorch/torchx:0.1.0rc1
0.1.0rc1: Pulling from pytorch/torchx
4bbfd2c87b75: Pulling fs layer
d2e110be24e1: Pulling fs layer
889a7173dcfe: Pulling fs layer
6009a622672a: Pulling fs layer
143f80195431: Pulling fs layer
eccbe17c44e1: Pulling fs layer
d4c7af0d4fa7: Pulling fs layer
06b5edd6bf52: Pulling fs layer
f18d016c4ccc: Pulling fs layer
c0ad16d9fa05: Pulling fs layer
30587ba7fd6b: Pulling fs layer
909695be1d50: Pulling fs layer
f119a6d0a466: Pulling fs layer
88d87059c913: Pulling fs layer
6009a622672a: Waiting
143f80195431: Waiting
eccbe17c44e1: Waiting
d4c7af0d4fa7: Waiting
06b5edd6bf52: Waiting
f18d016c4ccc: Waiting
c0ad16d9fa05: Waiting
30587ba7fd6b: Waiting
909695be1d50: Waiting
f119a6d0a466: Waiting
88d87059c913: Waiting
889a7173dcfe: Verifying Checksum
889a7173dcfe: Download complete
d2e110be24e1: Verifying Checksum
d2e110be24e1: Download complete
4bbfd2c87b75: Verifying Checksum
4bbfd2c87b75: Download complete
6009a622672a: Verifying Checksum
6009a622672a: Download complete
eccbe17c44e1: Verifying Checksum
eccbe17c44e1: Download complete
06b5edd6bf52: Verifying Checksum
06b5edd6bf52: Download complete
d4c7af0d4fa7: Verifying Checksum
d4c7af0d4fa7: Download complete
c0ad16d9fa05: Verifying Checksum
c0ad16d9fa05: Download complete
30587ba7fd6b: Verifying Checksum
30587ba7fd6b: Download complete
909695be1d50: Verifying Checksum
909695be1d50: Download complete
f119a6d0a466: Verifying Checksum
f119a6d0a466: Download complete
88d87059c913: Verifying Checksum
88d87059c913: Download complete
4bbfd2c87b75: Pull complete
f18d016c4ccc: Verifying Checksum
f18d016c4ccc: Download complete
143f80195431: Verifying Checksum
143f80195431: Download complete
d2e110be24e1: Pull complete
889a7173dcfe: Pull complete
6009a622672a: Pull complete
143f80195431: Pull complete
eccbe17c44e1: Pull complete
d4c7af0d4fa7: Pull complete
06b5edd6bf52: Pull complete
f18d016c4ccc: Pull complete
c0ad16d9fa05: Pull complete
30587ba7fd6b: Pull complete
909695be1d50: Pull complete
f119a6d0a466: Pull complete
88d87059c913: Pull complete
Digest: sha256:a738949601d82e7f100fa1efeb8dde0c35ce44c66726cf38596f96d78dcd7ad3
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0rc1
---> 3dbec59e8049
Step 2/2 : ADD my_app.py .
---> f77f7d50f1b6
Successfully built f77f7d50f1b6
Successfully tagged my_app:latest
然后,我们可以在本地调度器上启动它。
[7]:
%%sh
torchx run --scheduler local_docker my_component.py:greet --image "my_app:latest" --user "your name"
Hello, your name!
local_docker://torchx/hello_world_ba61a83b
Error response from daemon: pull access denied for my_app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
torchx 2021-10-21 18:04:15 WARNING failed to fetch image my_app:latest, falling back to local: Command '['docker', 'pull', 'my_app:latest']' returned non-zero exit status 1.
torchx 2021-10-21 18:04:15 INFO Waiting for the app to finish...
torchx 2021-10-21 18:04:16 INFO Job finished: SUCCEEDED
如果您有 Kubernetes 集群,则可以使用 Kubernetes 调度程序在集群上启动它。
$ docker push my_app:latest
$ torchx run --scheduler kubernetes my_component.py:greet --image "my_app:latest" --user "your name"
内置¶
TorchX 还提供了许多带有预制图像的内置组件。您可以通过以下方式发现它们:
[8]:
%%sh
torchx builtins
Found 7 builtin components:
1. dist.ddp
2. utils.booth
3. utils.copy
4. utils.echo
5. utils.sh
6. utils.touch
7. serve.torchserve
您可以从 CLI、管道或像任何其他组件一样以编程方式使用这些组件。
[9]:
%%sh
torchx run utils.echo --msg "Hello :)"
Hello :)
local_docker://torchx/echo_d795104b
0.1.0: Pulling from pytorch/torchx
4bbfd2c87b75: Already exists
d2e110be24e1: Already exists
889a7173dcfe: Already exists
6009a622672a: Already exists
143f80195431: Already exists
eccbe17c44e1: Already exists
092b2fdc0e35: Pulling fs layer
8ce7d695178d: Pulling fs layer
08c3ec180556: Pulling fs layer
12128a687923: Pulling fs layer
802a2fbcbff3: Pulling fs layer
5888090352af: Pulling fs layer
12128a687923: Waiting
802a2fbcbff3: Waiting
5888090352af: Waiting
8ce7d695178d: Verifying Checksum
8ce7d695178d: Download complete
08c3ec180556: Verifying Checksum
08c3ec180556: Download complete
802a2fbcbff3: Verifying Checksum
802a2fbcbff3: Download complete
092b2fdc0e35: Verifying Checksum
092b2fdc0e35: Download complete
5888090352af: Verifying Checksum
5888090352af: Download complete
092b2fdc0e35: Pull complete
8ce7d695178d: Pull complete
08c3ec180556: Pull complete
12128a687923: Verifying Checksum
12128a687923: Download complete
12128a687923: Pull complete
802a2fbcbff3: Pull complete
5888090352af: Pull complete
Digest: sha256:de22d3014f9a02f5e9c2e71e9a55cf5426d18ce72f91d1d5405b1a75838555d2
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0
ghcr.io/pytorch/torchx:0.1.0
torchx 2021-10-21 18:05:09 INFO Waiting for the app to finish...
torchx 2021-10-21 18:05:10 INFO Job finished: SUCCEEDED