注意
转到末尾 以下载完整示例代码。
使用预训练模型¶
本教程介绍如何在 TorchRL 中使用预训练模型。
本教程结束时,您将能够使用预训练模型进行高效的图像表征,并对它们进行微调。
TorchRL 提供预训练模型,这些模型可用作变换或策略的组件。由于语义相同,它们可以在一种或另一种上下文中互换使用。在本教程中,我们将使用 R3M (https://arxiv.org/abs/2203.12601),但其他模型(例如 VIP)同样适用。
import torch.cuda
from tensordict.nn import TensorDictSequential
from torch import nn
from torchrl.envs import R3MTransform, TransformedEnv
from torchrl.envs.libs.gym import GymEnv
from torchrl.modules import Actor
is_fork = multiprocessing.get_start_method() == "fork"
device = (
torch.device(0)
if torch.cuda.is_available() and not is_fork
else torch.device("cpu")
)
首先,我们来创建一个环境。为简化起见,我们将使用一个常见的 Gym 环境。在实际应用中,该方法同样适用于更具挑战性的具身 AI 场景(例如,可参考我们的 Habitat 封装器)。
让我们获取预训练模型。我们通过设置 `download=True` 标志来请求该模型的预训练版本;默认情况下,此标志处于关闭状态。 接下来,我们将把变换(transform)添加到环境中。在实际运行中,每一批采集到的数据都会经过该变换,并在输出的 `tensordict` 中映射为一个名为 `r3m_vec` 的条目。我们的策略(policy)由单层多层感知机(MLP)构成,它将读取该向量并计算出对应的动作。
r3m = R3MTransform(
"resnet50",
in_keys=["pixels"],
download=True,
)
env_transformed = TransformedEnv(base_env, r3m)
net = nn.Sequential(
nn.LazyLinear(128, device=device),
nn.Tanh(),
nn.Linear(128, base_env.action_spec.shape[-1], device=device),
)
policy = Actor(net, in_keys=["r3m_vec"])
Downloading: "https://pytorch.s3.amazonaws.com/models/rl/r3m/r3m_50.pt" to /root/.cache/torch/hub/checkpoints/r3m_50.pt
0%| | 0.00/374M [00:00<?, ?B/s]
4%|▍ | 16.2M/374M [00:00<00:02, 170MB/s]
9%|▊ | 32.5M/374M [00:00<00:03, 98.9MB/s]
12%|█▏ | 43.5M/374M [00:00<00:03, 87.6MB/s]
14%|█▍ | 52.8M/374M [00:00<00:05, 65.4MB/s]
17%|█▋ | 64.0M/374M [00:00<00:04, 70.1MB/s]
19%|█▉ | 71.4M/374M [00:00<00:04, 71.4MB/s]
22%|██▏ | 82.0M/374M [00:01<00:04, 68.7MB/s]
26%|██▌ | 97.8M/374M [00:01<00:03, 76.9MB/s]
28%|██▊ | 105M/374M [00:01<00:03, 73.2MB/s]
31%|███ | 115M/374M [00:01<00:03, 71.1MB/s]
35%|███▌ | 131M/374M [00:01<00:03, 79.1MB/s]
39%|███▉ | 148M/374M [00:01<00:02, 82.9MB/s]
43%|████▎ | 162M/374M [00:02<00:02, 92.9MB/s]
46%|████▌ | 172M/374M [00:02<00:02, 84.4MB/s]
48%|████▊ | 180M/374M [00:02<00:02, 81.6MB/s]
50%|█████ | 188M/374M [00:02<00:02, 69.9MB/s]
52%|█████▏ | 195M/374M [00:02<00:03, 58.3MB/s]
54%|█████▎ | 201M/374M [00:02<00:03, 52.6MB/s]
57%|█████▋ | 213M/374M [00:03<00:02, 58.5MB/s]
61%|██████▏ | 229M/374M [00:03<00:02, 65.8MB/s]
66%|██████▌ | 246M/374M [00:03<00:01, 72.7MB/s]
70%|██████▉ | 262M/374M [00:03<00:01, 63.4MB/s]
72%|███████▏ | 268M/374M [00:04<00:02, 53.6MB/s]
74%|███████▍ | 277M/374M [00:04<00:01, 55.5MB/s]
75%|███████▌ | 282M/374M [00:04<00:01, 52.0MB/s]
78%|███████▊ | 293M/374M [00:04<00:01, 58.9MB/s]
80%|███████▉ | 299M/374M [00:04<00:01, 54.9MB/s]
83%|████████▎ | 310M/374M [00:04<00:01, 63.3MB/s]
84%|████████▍ | 316M/374M [00:04<00:01, 55.4MB/s]
87%|████████▋ | 326M/374M [00:05<00:01, 50.1MB/s]
89%|████████▊ | 331M/374M [00:05<00:00, 45.4MB/s]
92%|█████████▏| 342M/374M [00:05<00:00, 45.7MB/s]
93%|█████████▎| 347M/374M [00:05<00:00, 42.6MB/s]
96%|█████████▌| 359M/374M [00:05<00:00, 45.3MB/s]
97%|█████████▋| 363M/374M [00:06<00:00, 41.3MB/s]
100%|██████████| 374M/374M [00:06<00:00, 63.4MB/s]
让我们检查策略的参数数量:
print("number of params:", len(list(policy.parameters())))
number of params: 4
我们收集 32 步的 rollout 并打印其输出:
rollout = env_transformed.rollout(32, policy)
print("rollout with transform:", rollout)
rollout with transform: TensorDict(
fields={
action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
next: TensorDict(
fields={
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False),
r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False)
对于微调,我们在使参数变为可训练状态后,将变换操作集成到策略中。在实践中,限制该操作仅作用于参数的一个子集(例如多层感知机的最后一层)可能更为明智。
r3m.train()
policy = TensorDictSequential(r3m, policy)
print("number of params after r3m is integrated:", len(list(policy.parameters())))
number of params after r3m is integrated: 163
同样,我们再次使用 R3M 进行 rollout。输出结构发生了轻微变化,因为此时环境返回的是像素值(而非嵌入向量)。“r3m_vec” 嵌入是我们的策略所生成的一个中间结果。
rollout = base_env.rollout(32, policy)
print("rollout, fine tuning:", rollout)
rollout, fine tuning: TensorDict(
fields={
action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
next: TensorDict(
fields={
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False),
r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False)
我们将变换从环境交换到策略的简便性,是因为两者都像 TensorDictModule 一样:它们拥有一组 “in_keys” 和 “out_keys”,使得在不同上下文中读写输出变得容易。
本教程最后,让我们看看如何使用 R3M 读取存储在回放缓冲区(例如,在离线强化学习场景中)中的图像。首先,我们来构建数据集:
from torchrl.data import LazyMemmapStorage, ReplayBuffer
storage = LazyMemmapStorage(1000)
rb = ReplayBuffer(storage=storage, transform=r3m)
现在,我们可以收集数据(对我们的目的而言是随机 rollout)并用其填充经验回放缓冲区:
total = 0
while total < 1000:
tensordict = base_env.rollout(1000)
rb.extend(tensordict)
total += tensordict.numel()
让我们检查一下重放缓冲区存储的结构。由于我们尚未使用它,因此其中不应包含“r3m_vec”条目:
print("stored data:", storage._storage)
stored data: TensorDict(
fields={
action: MemoryMappedTensor(shape=torch.Size([1000, 8]), device=cpu, dtype=torch.float32, is_shared=False),
done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
next: TensorDict(
fields={
done: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
reward: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([1000]),
device=cpu,
is_shared=False),
pixels: MemoryMappedTensor(shape=torch.Size([1000, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
terminated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: MemoryMappedTensor(shape=torch.Size([1000, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([1000]),
device=cpu,
is_shared=False)
采样时,数据将经过 R3M 变换,从而得到我们所需的处理后数据。 通过这种方式,我们可以在由图像构成的数据集上离线训练算法:
batch = rb.sample(32)
print("data after sampling:", batch)
data after sampling: TensorDict(
fields={
action: Tensor(shape=torch.Size([32, 8]), device=cpu, dtype=torch.float32, is_shared=False),
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
next: TensorDict(
fields={
done: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
pixels: Tensor(shape=torch.Size([32, 480, 480, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
reward: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False),
r3m_vec: Tensor(shape=torch.Size([32, 2048]), device=cpu, dtype=torch.float32, is_shared=False),
terminated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False),
truncated: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
batch_size=torch.Size([32]),
device=cpu,
is_shared=False)
脚本总运行时间:(1 分钟 33.842 秒)
估计内存使用量: 4016 MB