Quickstart¶
This is a self contained guide on how to build a simple app and component spec and launch it via two different schedulers.
Installation¶
First thing we need to do is to install the TorchX python package which includes the CLI and the library.
# install torchx with all dependencies
$ pip install torchx[dev]
See the README for more information on installation.
[1]:
%%sh
torchx --help
usage: torchx [-h] [--log_level LOG_LEVEL] [--version]
{describe,log,run,builtins,runopts,status,configure} ...
torchx CLI
optional arguments:
-h, --help show this help message and exit
--log_level LOG_LEVEL
Python logging log level
--version show program's version number and exit
sub-commands:
Use the following commands to run operations, e.g.: torchx run ${JOB_NAME}
{describe,log,run,builtins,runopts,status,configure}
Hello World¶
Lets start off with writing a simple “Hello World” python app. This is just a normal python program and can contain anything you’d like.
Note
This example uses Jupyter Notebook %%writefile
to create local files for example purposes. Under normal usage you would have these as standalone files.
[2]:
%%writefile my_app.py
import sys
import argparse
def main(user: str) -> None:
print(f"Hello, {user}!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Hello world app"
)
parser.add_argument(
"--user",
type=str,
help="the person to greet",
required=True,
)
args = parser.parse_args(sys.argv[1:])
main(args.user)
Writing my_app.py
Now that we have an app we can write the component file for it. This function allows us to reuse and share our app in a user friendly way.
We can use this component from the torchx
cli or programmatically as part of a pipeline.
[3]:
%%writefile my_component.py
import torchx.specs as specs
def greet(user: str, image: str = "my_app:latest") -> specs.AppDef:
return specs.AppDef(
name="hello_world",
roles=[
specs.Role(
name="greeter",
image=image,
entrypoint="python",
args=[
"-m", "my_app",
"--user", user,
],
)
],
)
Writing my_component.py
We can execute our component via torchx run
. The local_cwd
scheduler executes the component relative to the current directory.
[4]:
%%sh
torchx run --scheduler local_cwd my_component.py:greet --user "your name"
Hello, your name!
local_cwd://torchx/hello_world_49bdfa71
torchx 2021-10-20 19:04:18 INFO Waiting for the app to finish...
torchx 2021-10-20 19:04:19 INFO Job finished: SUCCEEDED
If we want to run in other environments, we can build a Docker container so we can run our component in Docker enabled environments such as Kubernetes or via the local Docker scheduler.
Note
This requires Docker installed and won’t work in environments such as Google Colab. If you have not done so already follow the install instructions on: https://docs.docker.com/get-docker/
[5]:
%%writefile Dockerfile
FROM ghcr.io/pytorch/torchx:0.1.0rc1
ADD my_app.py .
Writing Dockerfile
Once we have the Dockerfile created we can create our docker image.
[6]:
%%sh
docker build -t my_app:latest -f Dockerfile .
Step 1/2 : FROM ghcr.io/pytorch/torchx:0.1.0rc1
0.1.0rc1: Pulling from pytorch/torchx
4bbfd2c87b75: Pulling fs layer
d2e110be24e1: Pulling fs layer
889a7173dcfe: Pulling fs layer
6009a622672a: Pulling fs layer
143f80195431: Pulling fs layer
eccbe17c44e1: Pulling fs layer
d4c7af0d4fa7: Pulling fs layer
06b5edd6bf52: Pulling fs layer
f18d016c4ccc: Pulling fs layer
c0ad16d9fa05: Pulling fs layer
30587ba7fd6b: Pulling fs layer
909695be1d50: Pulling fs layer
f119a6d0a466: Pulling fs layer
88d87059c913: Pulling fs layer
143f80195431: Waiting
eccbe17c44e1: Waiting
d4c7af0d4fa7: Waiting
06b5edd6bf52: Waiting
f18d016c4ccc: Waiting
c0ad16d9fa05: Waiting
30587ba7fd6b: Waiting
909695be1d50: Waiting
f119a6d0a466: Waiting
88d87059c913: Waiting
6009a622672a: Waiting
d2e110be24e1: Verifying Checksum
d2e110be24e1: Download complete
6009a622672a: Verifying Checksum
6009a622672a: Download complete
4bbfd2c87b75: Verifying Checksum
4bbfd2c87b75: Download complete
889a7173dcfe: Verifying Checksum
889a7173dcfe: Download complete
eccbe17c44e1: Verifying Checksum
eccbe17c44e1: Download complete
06b5edd6bf52: Verifying Checksum
06b5edd6bf52: Download complete
d4c7af0d4fa7: Verifying Checksum
d4c7af0d4fa7: Download complete
c0ad16d9fa05: Verifying Checksum
c0ad16d9fa05: Download complete
30587ba7fd6b: Verifying Checksum
30587ba7fd6b: Download complete
4bbfd2c87b75: Pull complete
909695be1d50: Verifying Checksum
909695be1d50: Download complete
f119a6d0a466: Verifying Checksum
f119a6d0a466: Download complete
88d87059c913: Verifying Checksum
88d87059c913: Download complete
d2e110be24e1: Pull complete
889a7173dcfe: Pull complete
f18d016c4ccc: Verifying Checksum
f18d016c4ccc: Download complete
143f80195431: Verifying Checksum
143f80195431: Download complete
6009a622672a: Pull complete
143f80195431: Pull complete
eccbe17c44e1: Pull complete
d4c7af0d4fa7: Pull complete
06b5edd6bf52: Pull complete
f18d016c4ccc: Pull complete
c0ad16d9fa05: Pull complete
30587ba7fd6b: Pull complete
909695be1d50: Pull complete
f119a6d0a466: Pull complete
88d87059c913: Pull complete
Digest: sha256:a738949601d82e7f100fa1efeb8dde0c35ce44c66726cf38596f96d78dcd7ad3
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0rc1
---> 3dbec59e8049
Step 2/2 : ADD my_app.py .
---> ceb8e40109ec
Successfully built ceb8e40109ec
Successfully tagged my_app:latest
We can then launch it on the local scheduler.
[7]:
%%sh
torchx run --scheduler local_docker my_component.py:greet --image "my_app:latest" --user "your name"
Client:
Version: 20.10.9+azure-1
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc90bacf19bdbe37fd13eec8772432aca99
Built: Thu Sep 23 18:26:34 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.9+azure-1
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3080181d755855d5924d0f4f116faa9463
Built: Thu Sep 23 18:26:18 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.11+azure
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: 52b36a2dd837e8462de8e01458bf02cf9eea47dd
docker-init:
Version: 0.19.0
GitCommit:
Hello, your name!
local_docker://torchx/hello_world_fce26e07
Error response from daemon: pull access denied for my_app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
torchx 2021-10-20 19:06:48 WARNING failed to fetch image my_app:latest, falling back to local: Command '['docker', 'pull', 'my_app:latest']' returned non-zero exit status 1.
torchx 2021-10-20 19:06:48 INFO Waiting for the app to finish...
torchx 2021-10-20 19:06:49 INFO Job finished: SUCCEEDED
If you have a Kubernetes cluster you can use the Kubernetes scheduler to launch this on the cluster instead.
$ docker push my_app:latest
$ torchx run --scheduler kubernetes my_component.py:greet --image "my_app:latest" --user "your name"
Builtins¶
TorchX also provides a number of builtin components with premade images. You can discover them via:
[8]:
%%sh
torchx builtins
Found 7 builtin components:
1. dist.ddp
2. utils.booth
3. utils.copy
4. utils.echo
5. utils.sh
6. utils.touch
7. serve.torchserve
You can use these either from the CLI, from a pipeline or programmatically like you would any other component.
[9]:
%%sh
torchx run utils.echo --msg "Hello :)"
Client:
Version: 20.10.9+azure-1
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc90bacf19bdbe37fd13eec8772432aca99
Built: Thu Sep 23 18:26:34 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.9+azure-1
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3080181d755855d5924d0f4f116faa9463
Built: Thu Sep 23 18:26:18 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.11+azure
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: 52b36a2dd837e8462de8e01458bf02cf9eea47dd
docker-init:
Version: 0.19.0
GitCommit:
Hello :)
local_docker://torchx/echo_1d65ce6f
0.1.0rc2: Pulling from pytorch/torchx
4bbfd2c87b75: Already exists
d2e110be24e1: Already exists
889a7173dcfe: Already exists
6009a622672a: Already exists
143f80195431: Already exists
eccbe17c44e1: Already exists
c8e67195524a: Pulling fs layer
b8164a4c6bc7: Pulling fs layer
e6dc31f41742: Pulling fs layer
72e3ffceef80: Pulling fs layer
2f0f90133f1e: Pulling fs layer
2f47096d98b8: Pulling fs layer
fbb305c7c317: Pulling fs layer
f9cd8d47efcc: Pulling fs layer
2f47096d98b8: Waiting
fbb305c7c317: Waiting
f9cd8d47efcc: Waiting
72e3ffceef80: Waiting
2f0f90133f1e: Waiting
b8164a4c6bc7: Verifying Checksum
b8164a4c6bc7: Download complete
72e3ffceef80: Verifying Checksum
72e3ffceef80: Download complete
c8e67195524a: Verifying Checksum
c8e67195524a: Download complete
2f0f90133f1e: Verifying Checksum
2f0f90133f1e: Download complete
fbb305c7c317: Verifying Checksum
fbb305c7c317: Download complete
f9cd8d47efcc: Verifying Checksum
f9cd8d47efcc: Download complete
c8e67195524a: Pull complete
2f47096d98b8: Verifying Checksum
2f47096d98b8: Download complete
b8164a4c6bc7: Pull complete
e6dc31f41742: Verifying Checksum
e6dc31f41742: Download complete
e6dc31f41742: Pull complete
72e3ffceef80: Pull complete
2f0f90133f1e: Pull complete
2f47096d98b8: Pull complete
fbb305c7c317: Pull complete
f9cd8d47efcc: Pull complete
Digest: sha256:f97df7bd3d2137b3dcb6b85e14f4eef9c1c7cdd826d12508cc7cafb08bb2f704
Status: Downloaded newer image for ghcr.io/pytorch/torchx:0.1.0rc2
ghcr.io/pytorch/torchx:0.1.0rc2
torchx 2021-10-20 19:07:54 INFO Waiting for the app to finish...
torchx 2021-10-20 19:07:55 INFO Job finished: SUCCEEDED
Next Steps¶
Checkout other features of the torchx CLI
Learn how to author more complex app specs by referencing specs
Browse through the collection of builtin components
Take a look at the list of schedulers supported by the runner
See which ML pipeline platforms you can run components on
See a training app example