TorchX¶

TorchX is an SDK for quickly building and deploying ML applications from R&D to production. It offers various builtin components that encode MLOps best practices and make advanced features like distributed training and hyperparameter optimization accessible to all. Users can get started with TorchX with no added setup cost since it supports popular ML schedulers and pipeline orchestrators that are already widely adopted and deployed in production.

No two production environments are the same. To comply with various use cases, TorchX’s core APIs allow tons of customization at well-defined extension points so that even the most unique applications can be serviced without customizing the whole vertical stack.

GETTING STARTED? First learn the basic concepts and follow the quickstart guide.

In 1-2-3¶

01 DEFINE OR CHOOSE Start by writing a component – a python function that returns an AppDef object for your application. Or you can choose one of the builtin components.

02 RUN AS A JOB Once you’ve defined or chosen a component, you can run it by submitting it as a job in one of the supported Schedulers. TorchX supports several popular ones, such as Kubernetes and SLURM out of the box.

03 CONVERT TO PIPELINE In production, components are often run as a workflow (aka pipeline). TorchX components can be converted to pipeline stages by passing them through the torchx.pipelines adapter. Pipelines lists the pipeline orchestrators supported out of the box.

Documentation¶

Usage

Examples

Components Library¶

Components

Runtime Library¶

Application (Runtime)

Works With¶

Schedulers

Pipelines

Kubeflow Pipelines

Reference¶

API

Best Practices

Experimental¶

Experimental Features

(beta) .torchxconfig file