目录

Slurm

This contains the TorchX Slurm scheduler which can be used to run TorchX components on a Slurm cluster.

class torchx.schedulers.slurm_scheduler.SlurmScheduler(session_name: str)[source]

SlurmScheduler is a TorchX scheduling interface to slurm. TorchX expects that slurm CLI tools are locally installed and job accounting is enabled.

Each app def is scheduled using a heterogenous job via sbatch. Each replica of each role has a unique shell script generated with it’s resource allocations and args and then sbatch is used to launch all of them together.

Logs are written to the default slurm log file.

Any scheduler options passed to it are added as SBATCH arguments to each replica. See https://slurm.schedmd.com/sbatch.html#SECTION_OPTIONS for info on the arguments.

For more info see:

$ torchx run --scheduler slurm utils.echo --msg hello
slurm://torchx_user/1234
$ torchx status slurm://torchx_user/1234
$ less slurm-1234.out
...

Feature

Scheduler Support

Fetch Logs

Logs are accessible via the default slurm log file but not the programmatic API.

Distributed Jobs

✔️

Cancel Job

✔️

Describe Job

Partial support. SlurmScheduler will return job and replica status but does not provide the complete original AppSpec.

describe(app_id: str)Optional[torchx.schedulers.api.DescribeAppResponse][source]

Describes the specified application.

Returns

AppDef description or None if the app does not exist.

run_opts()torchx.specs.api.runopts[source]

Returns the run configuration options expected by the scheduler. Basically a --help for the run API.

schedule(dryrun_info: torchx.specs.api.AppDryRunInfo[torchx.schedulers.slurm_scheduler.SlurmBatchRequest])str[source]

Same as submit except that it takes an AppDryRunInfo. Implementors are encouraged to implement this method rather than directly implementing submit since submit can be trivially implemented by:

dryrun_info = self.submit_dryrun(app, cfg)
return schedule(dryrun_info)

文档

访问 PyTorch 的全面开发人员文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并解答您的问题

查看资源