Quickstart¶
Running a Builtin Component¶
Easiest way to get started with TorchX is through the provided CLI.
# install torchx with all dependencies
$ pip install torchx[dev]
$ torchx --help
With TorchX you can BYO app but TorchX has a collection of builtins. For now lets take a look at the builtins
$ torchx builtins
Found <n> builtin configs:
...
i. utils.echo
j. utils.touch
...
Echo looks familiar and simple. Lets understand how to run utils.echo
.
$ torchx run --scheduler local_cwd utils.echo --help
usage: torchx run echo [-h] [--msg MSG]
Echos a message
optional arguments:
-h, --help show this help message and exit
--msg MSG Message to echo
We can see that it takes a --msg
argument. Lets try running it locally
$ torchx run --scheduler local_cwd utils.echo --msg "hello world"
Note
echo
in this context is just an app spec. It is not the application
logic itself but rather just the “job definition” for running /bin/echo.
If you haven’t done so already, this is a good time to read through the
Basic Concepts familiarize yourself with the basic concepts.
Defining Your Own Component¶
Now lets try to implement echo
ourselves. To make things more interesting
we’ll add two more parameters to our version of echo
:
Number of replicas to run in parallel
Prefix the message with the replica id
First we create an app spec file. This is just a regular python file where we define the app spec.
$ touch ~/test.py
Now copy paste the following into test.py
import torchx.specs as specs
def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef:
"""
Echos a message to stdout (calls /bin/echo)
Args:
num_replicas: number of copies (in parallel) to run
msg: message to echo
"""
return specs.AppDef(
name="echo",
roles=[
specs.Role(
name="echo",
entrypoint="/bin/echo",
image="ubuntu:latest",
args=[f"replica #{specs.macros.replica_id}: {msg}"],
num_replicas=num_replicas,
)
],
)
Notice that
Unlike
--msg
,--num_replicas
does not have a default value indicating that it is a required argument.test.py
does not contain the logic of the app and is simply a job definition.
Now lets try running our custom echo
$ torchx run --scheduler local_cwd ~/test.py:echo --num_replicas 4 --msg "foobar"
replica #0: foobar
replica #1: foobar
replica #2: foobar
replica #3: foobar
Running on Other Images¶
So far we’ve run utils.echo
with the local_cwd
scheduler. This means that the
entrypoint
we specified is relative to the current working directory and
ignores the specified image. That did not matter for us
since we specified an absolute path as the entrypoint (entrypoint=/bin/echo
).
Had we specified entrypoint=echo
the local_cwd scheduler would have tried to invoke
echo
relative to the current directory and the specified PATH.
If you have a pre-built application binary, using local_cwd is a
quick way to validate the application and the specs.AppDef
. But its not all
that useful if you want to run the application on a remote scheduler
(see Running On Other Schedulers).
Note
The image
string in specs.Role
is an identifier to a container image
supported by the scheduler. Refer to the scheduler documentation to find out
what container image is supported by the scheduler you want to use.
To match remote image behavior we can use the local_docker
scheduler which
will launch the image via docker and run the same application.
Note
Before proceeding, you will need docker installed. If you have not done so already follow the install instructions on: https://docs.docker.com/get-docker/
Now lets try running echo
from a docker container. Modify echo’s AppDef
in ~/test.py
you created in the previous section to make the image="ubuntu:latest"
.
import torchx.specs as specs
def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef:
"""
Echos a message to stdout (calls /bin/echo)
Args:
num_replicas: number of copies (in parallel) to run
msg: message to echo
"""
return specs.AppDef(
name="echo",
roles=[
specs.Role(
name="echo",
entrypoint="/bin/echo",
image="ubuntu:latest", # IMAGE NOW POINTS TO THE UBUNTU DOCKER IMAGE
args=[f"replica #{specs.macros.replica_id}: {msg}"],
num_replicas=num_replicas,
)
],
)
Try running the echo app
$ torchx run --scheduler local_docker \
~/test.py:echo \
--num_replicas 4 \
--msg "foobar from docker!"
Running On Other Schedulers¶
So far we’ve launched components locally. Lets take a look at how to run this on real schedulers.
Note
This section assumes you have already setup a running instance of one of the supported schedulers
Lets take a look at which schedulers we can launch into and pick one that you have already setup.
$ torchx schedulers
For most schedulers you will have to specify an additional --scheduler_args
parameter. These are launch-time parameters to the scheduler that are associated
to the run instance of your application (job) rather than the job definition
(app spec) of your application, for example job priority
. Scheduler args
are scheduler specific so you’ll have to find out what scheduler args are
required by the scheduler you are planning to use
$ torchx runopts <sched_name>
$ torchx runopts local_docker
Now that you’ve figured out what scheduler args are required, launch your app
$ torchx run --scheduler <sched_name> --scheduler_args <k1=v1,k2=v2,...> \
utils.sh ~/my_app.py <app_args...>
$ torchx run --scheduler local_cwd --scheduler_args log_dir=/tmp \
utils.sh ~/my_app.py --foo=bar
Note
If your app args overlap with the run
subcommand’s args, you
have to use the --
delimiter for argparse to not get confused.
For example, if your app also takes a --scheduler
argument,
run it as shown below.
$ torchx run --scheduler local_docker ~/my_app.py -- --scheduler foobar
Next Steps¶
Checkout other features of the torchx CLI
Learn how to author more complex app specs by referencing torchx.specs
Browse through the collection of builtin components
Take a look at the list of schedulers supported by the runner
See which ML pipeline platforms you can run components on