如何使用 PyTorch/XLA:GPU 运行¶
PyTorch/XLA 使 PyTorch 用户能够利用 XLA 编译器,该编译器支持包括 TPU、GPU 和 CPU 在内的加速器。本文档将介绍在 nvidia GPU 实例上运行 PyTorch/XLA 的基本步骤。
环境设置¶
确保在主机上安装了 cuda 驱动程序。
码头工人¶
Pytorch/XLA 目前使用 cuda11.8/12.1 和 python 3.8 发布预构建的 docker 镜像和轮子。我们建议用户使用相应的配置创建 docker 容器。有关 docker 镜像和 wheel 的完整列表,请参阅此文档。
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1
# Installing the NVIDIA Container Toolkit per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# For example
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configuring the NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1 bin/bash
sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash
请注意,您需要重新启动 docker 才能使 gpu 设备在 docker 容器中可见。登录到 docker 后,您可以使用 来验证设备是否已正确设置。nvidia-smi
(pytorch) root@20ab2c7a2d06:/# nvidia-smi
Thu Dec 8 06:24:29 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16384MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
检查环境变量¶
确保 和 环境变量 考虑 cuda。请执行 a 和 验证。如果没有,请点击链接进行。例:PATH
LD_LIBRARY_PATH
echo $PATH
echo $LD_LIBRARY_PATH
echo "export PATH=\$PATH:/usr/local/cuda-12.1/bin" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda-12.1/lib64" >> ~/.bashrc
source ~/.bashrc
轮子¶
**注意:** wheel 文件仅与基于 x86_64 linux 的架构兼容。要检查 Linux 系统的架构,请执行以下命令:
uname -a
pip3 install torch==2.4.0
# GPU whl for python 3.10 + cuda 12.1
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl
可以在此处找到其他 Python 版本和 CUDA 版本的 Wheels。
运行一些简单的模型¶
为了运行以下示例,您需要克隆 pytorch/xla 存储库。
MP_ImageNet示例¶
此示例使用 ImageNet。它包含在我们已经克隆到 Docker 容器中的内容中。
(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=CUDA
(pytorch) root@20ab2c7a2d06:/# git clone --recursive https://github.com/pytorch/xla.git
(pytorch) root@20ab2c7a2d06:/# python xla/test/test_train_mp_imagenet.py --fake_data
==> Preparing data..
Epoch 1 train begin 06:12:38
| Training Device=xla:0/0 Epoch=1 Step=0 Loss=6.89059 Rate=2.82 GlobalRate=2.82 Time=06:13:23
| Training Device=xla:0/0 Epoch=1 Step=20 Loss=6.79297 Rate=117.16 GlobalRate=45.84 Time=06:13:36
| Training Device=xla:0/0 Epoch=1 Step=40 Loss=6.43628 Rate=281.16 GlobalRate=80.49 Time=06:13:43
| Training Device=xla:0/0 Epoch=1 Step=60 Loss=5.83108 Rate=346.88 GlobalRate=108.82 Time=06:13:49
| Training Device=xla:0/0 Epoch=1 Step=80 Loss=4.99023 Rate=373.62 GlobalRate=132.43 Time=06:13:56
| Training Device=xla:0/0 Epoch=1 Step=100 Loss=3.92699 Rate=384.33 GlobalRate=152.40 Time=06:14:02
| Training Device=xla:0/0 Epoch=1 Step=120 Loss=2.68816 Rate=388.35 GlobalRate=169.49 Time=06:14:09
ResNet 示例¶
此示例使用 ResNet。
(pytorch) root@20ab2c7a2d06:/# python3 /xla/examples/train_resnet_base.py
1:35PM UTC on Jun 08, 2024
epoch: 1, step: 0, loss: 6.887794017791748, rate: 8.746502586051985
epoch: 1, step: 10, loss: 6.877807140350342, rate: 238.4789458412044
epoch: 1, step: 20, loss: 6.867819786071777, rate: 329.86095958663503
epoch: 1, step: 30, loss: 6.857839584350586, rate: 367.3038003653586
epoch: 1, step: 40, loss: 6.847847938537598, rate: 381.53141087190835
epoch: 1, step: 50, loss: 6.837860584259033, rate: 387.80462249591113
...
epoch: 1, step: 260, loss: 6.628140926361084, rate: 391.135639565343
epoch: 1, step: 270, loss: 6.618192195892334, rate: 391.6901797745233
epoch: 1, step: 280, loss: 6.608224391937256, rate: 391.1602680460045
epoch: 1, step: 290, loss: 6.598264217376709, rate: 391.6731498290759
Epoch 1 train end 1:36PM UTC
AMP (自动混合精度)¶
AMP 在 GPU 训练和 PyTorch/XLA 重用 Cuda 的 AMP 规则方面非常有用。您可以查看我们的 mnist 示例 和 imagenet 示例。请注意,我们还使用了修改后的优化器版本,以避免设备和主机之间的额外同步。
在 GPU 实例上开发 PyTorch/XLA(从支持 GPU 的源代码构建 PyTorch/XLA)¶
在 GPU VM 中,从开发 docker 映像创建 docker 容器。例如:
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_12.1
# Installing the NVIDIA Container Toolkit per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# For example
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configuring the NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --shm-size=16g --net=host --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_12.1
sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash
从源构建 PyTorch 和 PyTorch/XLA。
确保 和 环境变量 考虑 cuda。有关更多信息,请参阅上文。PATH
LD_LIBRARY_PATH
git clone https://github.com/pytorch/pytorch.git
cd pytorch
USE_CUDA=1 python setup.py install
USE_CUDA=1 python setup.py bdist_wheel # Required for hermetic Python in PyTorch/XLA build setup.
git clone https://github.com/pytorch/xla.git
cd xla
XLA_CUDA=1 python setup.py install
验证 PyTorch 和 PyTorch/XLA 是否已成功安装。
如果您可以运行成功运行一些简单模型中的测试,则 PyTorch 和 PyTorch/XLA 应该已成功安装。