注意
点击 这里 下载完整示例代码
简介 || 张量 || 自动微分 || 构建模型 || TensorBoard 支持 || 训练模型 || 模型理解
PyTorch TensorBoard 支持¶
创建日期: 2021年11月30日 | 最后更新日期: 2024年5月29日 | 最后验证日期: 2024年11月5日
请跟随下方视频或在 YouTube 上观看。
开始之前¶
要运行此教程,您需要安装PyTorch、TorchVision、Matplotlib和TensorBoard。
与 conda:
conda install pytorch torchvision -c pytorch
conda install matplotlib tensorboard
与 pip:
pip install torch torchvision matplotlib tensorboard
安装完依赖后,请在您安装依赖的 Python 环境中重新启动此笔记本。
介绍¶
在本笔记本中,我们将训练一个LeNet-5变体模型来对抗Fashion-MNIST数据集。Fashion-MNIST是一组描绘各种服装的图像块,有十个类标签表示所描绘的服装类型。
# PyTorch model and training necessities
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Image datasets and image manipulation
import torchvision
import torchvision.transforms as transforms
# Image display
import matplotlib.pyplot as plt
import numpy as np
# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter
# In case you are using an environment that has TensorFlow installed,
# such as Google Colab, uncomment the following code to avoid
# a bug with saving embeddings to your TensorBoard directory
# import tensorflow as tf
# import tensorboard as tb
# tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
在 TensorBoard 中显示图像¶
让我们先将数据集中的一些样本图像添加到TensorBoard中:
# Gather datasets and prepare them for consumption
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Store separate training and validations splits in ./data
training_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=True,
transform=transform)
validation_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=False,
transform=transform)
training_loader = torch.utils.data.DataLoader(training_set,
batch_size=4,
shuffle=True,
num_workers=2)
validation_loader = torch.utils.data.DataLoader(validation_set,
batch_size=4,
shuffle=False,
num_workers=2)
# Class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')
# Helper function for inline image display
def matplotlib_imshow(img, one_channel=False):
if one_channel:
img = img.mean(dim=0)
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
if one_channel:
plt.imshow(npimg, cmap="Greys")
else:
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# Extract a batch of 4 images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz
0%| | 0.00/26.4M [00:00<?, ?B/s]
0%| | 65.5k/26.4M [00:00<01:12, 364kB/s]
1%| | 229k/26.4M [00:00<00:38, 684kB/s]
3%|3 | 918k/26.4M [00:00<00:09, 2.62MB/s]
7%|7 | 1.93M/26.4M [00:00<00:05, 4.10MB/s]
25%|##4 | 6.52M/26.4M [00:00<00:01, 15.2MB/s]
38%|###8 | 10.1M/26.4M [00:00<00:00, 17.5MB/s]
57%|#####7 | 15.1M/26.4M [00:01<00:00, 25.6MB/s]
72%|#######1 | 19.0M/26.4M [00:01<00:00, 24.6MB/s]
89%|########9 | 23.6M/26.4M [00:01<00:00, 29.8MB/s]
100%|##########| 26.4M/26.4M [00:01<00:00, 19.4MB/s]
Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0.00/29.5k [00:00<?, ?B/s]
100%|##########| 29.5k/29.5k [00:00<00:00, 328kB/s]
Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0.00/4.42M [00:00<?, ?B/s]
1%|1 | 65.5k/4.42M [00:00<00:12, 363kB/s]
5%|5 | 229k/4.42M [00:00<00:06, 681kB/s]
21%|## | 918k/4.42M [00:00<00:01, 2.54MB/s]
44%|####3 | 1.93M/4.42M [00:00<00:00, 4.11MB/s]
100%|##########| 4.42M/4.42M [00:00<00:00, 6.09MB/s]
Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
0%| | 0.00/5.15k [00:00<?, ?B/s]
100%|##########| 5.15k/5.15k [00:00<00:00, 36.0MB/s]
Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw
Above, 我们使用了TorchVision和Matplotlib来创建输入数据的小批量视觉网格。下面,我们使用add_image()调用SummaryWriter来记录图像供TensorBoard消费,并且我们也调用了flush()以确保它立即被写入磁盘。
# Default log_dir argument is "runs" - but it's good to be specific
# torch.utils.tensorboard.SummaryWriter is imported above
writer = SummaryWriter('runs/fashion_mnist_experiment_1')
# Write image data to TensorBoard log dir
writer.add_image('Four Fashion-MNIST Images', img_grid)
writer.flush()
# To view, start TensorBoard on the command line with:
# tensorboard --logdir=runs
# ...and open a browser tab to http://localhost:6006/
如果你在命令行启动 TensorBoard 并在新的浏览器标签页中打开它(通常在 localhost:6006),你应该会在 IMAGES 标签页下看到图像网格。
绘制标量以可视化训练¶
TensorBoard 对于跟踪训练进度和效果非常有用。下面,我们将运行一个训练循环,记录一些指标,并保存数据供 TensorBoard 使用。
让我们定义一个模型来分类我们的图像块,并且定义一个优化器和损失函数来进行训练:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 4 * 4)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
现在让我们训练一个epoch,并且每1000个批次评估一次训练集和验证集的损失。
print(len(validation_loader))
for epoch in range(1): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(training_loader, 0):
# basic training loop
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 1000 == 999: # Every 1000 mini-batches...
print('Batch {}'.format(i + 1))
# Check against the validation set
running_vloss = 0.0
# In evaluation mode some model specific operations can be omitted eg. dropout layer
net.train(False) # Switching to evaluation mode, eg. turning off regularisation
for j, vdata in enumerate(validation_loader, 0):
vinputs, vlabels = vdata
voutputs = net(vinputs)
vloss = criterion(voutputs, vlabels)
running_vloss += vloss.item()
net.train(True) # Switching back to training mode, eg. turning on regularisation
avg_loss = running_loss / 1000
avg_vloss = running_vloss / len(validation_loader)
# Log the running loss averaged per batch
writer.add_scalars('Training vs. Validation Loss',
{ 'Training' : avg_loss, 'Validation' : avg_vloss },
epoch * len(training_loader) + i)
running_loss = 0.0
print('Finished Training')
writer.flush()
2500
Batch 1000
Batch 2000
Batch 3000
Batch 4000
Batch 5000
Batch 6000
Batch 7000
Batch 8000
Batch 9000
Batch 10000
Batch 11000
Batch 12000
Batch 13000
Batch 14000
Batch 15000
Finished Training
切换到您的开放TensorBoard,并查看SCALARS选项卡。
可视化您的模型¶
TensorBoard 也可以用于检查您模型中的数据流。
为此,请使用 add_graph() 方法和样本输入调用:
# Again, grab a single mini-batch of images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# add_graph() will trace the sample input through your model,
# and render it as a graph.
writer.add_graph(net, images)
writer.flush()
当你切换到TensorBoard时,你应该会看到一个GRAPHS标签页。 双击“NET”节点以查看模型内的层和数据流。
使用嵌入可视化您的数据集¶
我们使用的28x28像素图像块可以被建模为784维向量(28 * 28 = 784)。将其投影到较低维度的表示形式可以很有启发性。add_embedding()方法将数据投影到方差最高的三个维度,并以交互式3D图表的形式显示它们。add_embedding()方法会自动进行此操作,通过投影到方差最高的三个维度。
下面,我们将取一部分数据,并生成这样的嵌入:
# Select a random subset of data and corresponding labels
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
# Extract a random subset of data
images, labels = select_n_random(training_set.data, training_set.targets)
# get the class labels for each image
class_labels = [classes[label] for label in labels]
# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
metadata=class_labels,
label_img=images.unsqueeze(1))
writer.flush()
writer.close()
现在如果你切换到TensorBoard并选择PROJECTOR标签,你应该会看到投影的3D表示。你可以旋转和缩放模型,在大尺度和小尺度下仔细观察,并看看是否能在投影数据和标签聚类中发现模式。
为了提高可见度,建议这样做:
从左侧的“Color by”下拉菜单中选择“label”。
点击顶部的夜间模式图标,将浅色图片置于深色背景上。
其他资源¶
更多信息,请参阅:
PyTorch 文档中的 torch.utils.tensorboard.SummaryWriter
Tensorboard教程内容在PyTorch.org 教程中
有关TensorBoard的更多信息,请参阅 TensorBoard 文档
脚本总运行时间: (2分钟34.811秒)