目录

Quickstart

Running a Builtin Component

Easiest way to get started with TorchX is through the provided CLI.

$ pip install torchx
$ torchx --help

With TorchX you can BYO app but TorchX has a collection of builtins. For now lets take a look at the builtins

$ torchx builtins
Found <n> builtin configs:
  ...
  i. utils.echo
  j. utils.touch
  ...

Echo looks familiar and simple. Lets understand how to run utils.echo.

$ torchx run --scheduler local utils.echo --help
usage: torchx run echo [-h] [--msg MSG]

Echos a message

optional arguments:
-h, --help  show this help message and exit
--msg MSG   Message to echo

We can see that it takes a --msg argument. Lets try running it locally

$ torchx run --scheduler local utils.echo --msg "hello world"

Note

echo in this context is just an app spec. It is not the application logic itself but rather just the “job definition” for running /bin/echo. If you haven’t done so already, this is a good time to read through the Basic Concepts familiarize yourself with the basic concepts.

Defining Your Own Component

Now lets try to implement echo ourselves. To make things more interesting we’ll add two more parameters to our version of echo:

  1. Number of replicas to run in parallel

  2. Prefix the message with the replica id

First we create an app spec file. This is just a regular python file where we define the app spec.

$ touch ~/test.py

Now copy paste the following into test.py

import torchx.specs as specs


def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef:
    """
    Echos a message to stdout (calls /bin/echo)

    Args:
       num_replicas: number of copies (in parallel) to run
       msg: message to echo

    """
    return specs.AppDef(
        name="echo",
        roles=[
            specs.Role(
                name="echo",
                entrypoint="/bin/echo",
                image="/tmp",
                args=[f"replica #{specs.macros.replica_id}: {msg}"],
                num_replicas=num_replicas,
            )
        ],
    )

Notice that

  1. Unlike --msg, --num_replicas does not have a default value indicating that it is a required argument.

  2. We use a local dir (/tmp) as the image. In practice this will be the identifier of the package (e.g. Docker image) that the scheduler supports.

  3. echo_torchx.py does not contain the logic of the app and is simply a job definition.

Now lets try running our custom echo

$ torchx run --scheduler local ~/test.py:echo --num_replicas 4 --msg "foobar"

replica #0: foobar
replica #1: foobar
replica #2: foobar
replica #3: foobar

Running on Other Images

So far we’ve run utils.echo with image=/tmp. This means that the entrypoint we specified is relative to /tmp. That did not matter for us since we specified an absolute path as the entrypoint (entrypoint=/bin/echo). Had we specified entrypoint=echo the local scheduler would have tried to invoke /tmp/echo.

If you have a pre-built application binary, setting the image to a local directory is a quick way to validate the application and the specs.AppDef. But its not all that useful if you want to run the application on a remote scheduler (see Running On Other Schedulers).

Note

The image string in specs.Role is an identifier to a container image supported by the scheduler. Refer to the scheduler documentation to find out what container image is supported by the scheduler you want to use.

For local scheduler we can see that it supports both a local directory and docker as the image:

$ torchx runopts local

{ 'image_type': { 'default': 'dir',
                 'help': 'image type. One of [dir, docker]',
                 'type': 'str'},
... <omitted for brevity> ...

Note

Before proceeding, you will need docker installed. If you have not done so already follow the install instructions on: https://docs.docker.com/get-docker/

Now lets try running echo from a docker container. Modify echo’s AppDef in ~/test.py you created in the previous section to make the image="ubuntu:latest".

import torchx.specs as specs


def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef:
    """
    Echos a message to stdout (calls /bin/echo)

    Args:
       num_replicas: number of copies (in parallel) to run
       msg: message to echo

    """
    return specs.AppDef(
        name="echo",
        roles=[
            specs.Role(
                name="echo",
                entrypoint="/bin/echo",
                image="ubuntu:latest", # IMAGE NOW POINTS TO THE UBUNTU DOCKER IMAGE
                args=[f"replica #{specs.macros.replica_id}: {msg}"],
                num_replicas=num_replicas,
            )
        ],
    )

Try running the echo app

$ torchx run --scheduler local \
             --scheduler_args image_type=docker \
             ~/test.py:echo \
             --num_replicas 4 \
             --msg "foobar from docker!"

Running On Other Schedulers

So far we’ve launched components locally. Lets take a look at how to run this on real schedulers.

Note

This section assumes you have already setup a running instance of one of the supported schedulers

Lets take a look at which schedulers we can launch into and pick one that you have already setup.

$ torchx schedulers

For most schedulers you will have to specify an additional --scheduler_args parameter. These are launch-time parameters to the scheduler that are associated to the run instance of your application (job) rather than the job definition (app spec) of your application, for example job priority. Scheduler args are scheduler specific so you’ll have to find out what scheduler args are required by the scheduler you are planning to use

$ torchx runopts <sched_name>
$ torchx runopts local

Now that you’ve figured out what scheduler args are required, launch your app

$ torchx run --scheduler <sched_name> --scheduler_args <k1=v1,k2=v2,...> \
    ~/my_app.py <app_args...>
$ torchx run --scheduler local --scheduler_args image_type=dir,log_dir=/tmp \
    ~/my_app.py --foo=bar

Note

If your app args overlap with the run subcommand’s args, you have to use the -- delimiter for argparse to not get confused. For example, if your app also takes a --scheduler argument, run it as shown below.

$ torchx run --scheduler local ~/my_app.py -- --scheduler foobar

Next Steps

  1. Checkout other features of the torchx CLI

  2. Learn how to author more complex app specs by referencing torchx.specs

  3. Browse through the collection of builtin components

  4. Take a look at the list of schedulers supported by the runner

  5. See which ML pipeline platforms you can run components on

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源