Distributed¶
Components for applications that run as distributed jobs. Many of the
components in this section are simply topological, meaning that they define
the layout of the nodes in a distributed setting and take the actual
binaries that each group of nodes (specs.Role
) runs.
- torchx.components.dist.ddp(image: str, entrypoint: str, resource: Optional[str] = None, nnodes: int = 1, nproc_per_node: int = 1, base_image: Optional[str] = None, name: str = 'test_name', role: str = 'worker', env: Optional[Dict[str, str]] = None, *script_args: str) → torchx.specs.api.AppDef[source]¶
Distributed data parallel style application (one role, multi-replica).
- Parameters
image – container image.
entrypoint – script or binary to run within the image.
resource – Registered named resource.
nnodes – Number of nodes.
nproc_per_node – Number of processes per node.
name – Name of the application.
base_image – container base image (not required) .
role – Name of the ddp role.
script – Main script.
env – Env variables.
script_args – Script arguments.
- Returns
Torchx AppDef
- Return type