HPO¶
概述与用法¶
The torchx.runtime.hpo 模块包含您可以用来构建超参数优化(HPO)应用程序的模块和函数。请注意,一个HPO应用程序是协调HPO搜索的实体,并且不应与运行每个“试验”所需的HPO应用程序混淆。通常,“试验”在HPO任务中指的是根据HPO任务指定的参数训练ML模型的训练器应用程序。
对于网格搜索,HPO任务可能很简单,就像一个并行循环,对用户定义的搜索空间中的所有参数组合进行穷尽性运行。另一方面,贝叶斯优化需要在试验之间保留优化器状态,这导致了一个更复杂的HPO应用程序实现。
当前这个模块使用 Ax 作为 HPO 的底层大脑,并提供了一些扩展点来整合 Ax 与 TorchX 运行器。
快速开始示例¶
以下示例演示了在TorchX组件上运行HPO(Hyperparameter Optimization)任务。
我们使用内置的utils.booth组件,该组件简单地运行一个应用程序,
该应用程序评估booth函数在(x1, x2)处。目标是找到
x1和x2以最小化booth函数。
import os
from ax.core import (
BatchTrial,
Experiment,
Objective,
OptimizationConfig,
Parameter,
ParameterType,
RangeParameter,
SearchSpace,
)
from ax.modelbridge.dispatch_utils import choose_generation_strategy
from ax.service.scheduler import SchedulerOptions
from ax.service.utils.best_point import get_best_parameters
from ax.service.utils.report_utils import exp_to_df
from ax.utils.common.constants import Keys
from pyre_extensions import none_throws
from torchx.components import utils
from torchx.runtime.hpo.ax import AppMetric, TorchXRunner, TorchXScheduler
from torchx.specs import RunConfig
# Run HPO on the booth function (https://en.wikipedia.org/wiki/Test_functions_for_optimization)
parameters = [
RangeParameter(
name="x1",
lower=-10.0,
upper=10.0,
parameter_type=ParameterType.FLOAT,
),
RangeParameter(
name="x2",
lower=-10.0,
upper=10.0,
parameter_type=ParameterType.FLOAT,
),
]
objective = Objective(metric=AppMetric(name="booth_eval"), minimize=True)
runner = TorchXRunner(
tracker_base=tmpdir,
component=utils.booth,
component_const_params={
"image": "ghcr.io/pytorch/torchx:0.1.0rc0",
},
scheduler="local", # can also be [kubernetes, slurm, etc]
scheduler_args=RunConfig({"log_dir": tmpdir, "image_type": "docker"}),
)
experiment = Experiment(
name="torchx_booth_sequential_demo",
search_space=SearchSpace(parameters=parameters),
optimization_config=OptimizationConfig(objective=objective),
runner=runner,
is_test=True,
properties={Keys.IMMUTABLE_SEARCH_SPACE_AND_OPT_CONF: True},
)
scheduler = TorchXScheduler(
experiment=experiment,
generation_strategy=(
choose_generation_strategy(
search_space=experiment.search_space,
)
),
options=SchedulerOptions(),
)
for i in range(3):
scheduler.run_n_trials(max_trials=2)
print(exp_to_df(experiment))
实验(适应性实验)¶
- class torchx.runtime.hpo.ax.TorchXRunner(tracker_base: str, component: Callable[[...], torchx.specs.api.AppDef], component_const_params: Optional[Dict[str, Any]] = None, scheduler: str = 'local', scheduler_args: Optional[torchx.specs.api.RunConfig] = None)[source]¶
An implementation of
ax.core.runner.Runnerthat delegates job submission to the TorchX Runner. This runner is coupled with the torchx component since Ax runners run trials of a single component with different parameters.预计实验参数名称和类型与组件功能参数完全匹配。组件函数参数中不属于搜索空间的部分可以作为
component_const_params传递。以下参数(只要组件函数在函数签名中声明了它们)将自动传递:trial_idx (int): 当前试验的索引tracker_base (str): torchx 跟踪器的基础(通常是指示跟踪器基础目录的URL)
Example:
def trainer_component( x1: int, x2: float, trial_idx: int, tracker_base: str, x3: float, x4: str) -> spec.AppDef: # ... implementation omitted for brevity ... pass
实验应该设置为:
parameters=[ { "name": "x1", "value_type": "int", # ... other options... }, { "name": "x2", "value_type": "float", # ... other options... } ]
其余的参数可以设置为:
TorchXRunner( tracker_base="s3://foo/bar", component=trainer_component, # trial_idx and tracker_base args passed automatically # if the function signature declares those args component_const_params={"x3": 1.2, "x4": "barbaz"})
按照上述核心指令,将提供的Pytorch深度学习框架网站的文本从英语翻译成中文简体如下: 运行上面设置的实验结果是每次试验都会运行。
appdef = trainer_component( x1=trial.params["x1"], x2=trial.params["x2"], trial_idx=trial.index, tracker_base="s3://foo/bar", x3=1.2, x4="barbaz") torchx.runner.get_runner().run(appdef, ...)
- class torchx.runtime.hpo.ax.TorchXScheduler(experiment: ax.core.experiment.Experiment, generation_strategy: ax.modelbridge.generation_strategy.GenerationStrategy, options: ax.service.scheduler.SchedulerOptions, db_settings: Optional[ax.storage.sqa_store.structs.DBSettings] = None, _skip_experiment_save: bool = False)[source]¶
一个实现的 Ax Scheduler, 它与通过
TorchXRunner连接的实验工作。此调度器并不是一个真正的调度器,而是一个代理调度器, 它将任务委托给各种远程/本地调度器的客户端。 有关支持的调度器列表,请参阅TorchX 调度器文档。
- class torchx.runtime.hpo.ax.AppMetric(name: str, lower_is_better: Optional[bool] = None, properties: Optional[Dict[str, Any]] = None)[source]¶
通过
torchx.tracking模块获取 AppMetric(试验作业/app 返回的观察值)。假设 app 以以下方式使用跟踪器:tracker = torchx.runtime.tracking.FsspecResultTracker(tracker_base) tracker[str(trial_index)] = {metric_name: value} # -- or -- tracker[str(trial_index)] = {"metric_name/mean": mean_value, "metric_name/sem": sem_value}