使用自定义 C++ 类扩展 TorchScript¶
创建时间:2020年1月23日 | 最后更新时间:2024年12月2日 | 最后验证时间:2024年11月5日
警告
TorchScript 已经不再处于活跃开发阶段。
本教程是自定义操作教程的后续内容,并介绍了我们将C++类绑定到TorchScript和Python同时使用的API。该API与pybind11非常相似,如果您熟悉该系统,大部分概念都可以迁移。
在C++中实现和绑定类¶
对于本教程,我们将定义一个简单的 C++ 类,该类通过成员变量维护持久化状态。
// This header is all you need to do the C++ portions of this
// tutorial
#include <torch/script.h>
// This header is what defines the custom class registration
// behavior specifically. script.h already includes this, but
// we include it here so you know it exists in case you want
// to look at the API or implementation.
#include <torch/custom_class.h>
#include <string>
#include <vector>
template <class T>
struct MyStackClass : torch::CustomClassHolder {
std::vector<T> stack_;
MyStackClass(std::vector<T> init) : stack_(init.begin(), init.end()) {}
void push(T x) {
stack_.push_back(x);
}
T pop() {
auto val = stack_.back();
stack_.pop_back();
return val;
}
c10::intrusive_ptr<MyStackClass> clone() const {
return c10::make_intrusive<MyStackClass>(stack_);
}
void merge(const c10::intrusive_ptr<MyStackClass>& c) {
for (auto& elem : c->stack_) {
push(elem);
}
}
};
有几个需要注意的地方:
torch/custom_class.h是你需要包含的头部,以使用自定义类扩展 TorchScript。请注意,当我们处理自定义类的实例时,我们始终通过
c10::intrusive_ptr<>的实例来操作。将intrusive_ptr理解为一个智能指针,类似于std::shared_ptr,但引用计数直接存储在对象内部,而不是像std::shared_ptr那样存储在一个单独的元数据块中。torch::Tensor内部也使用相同的指针类型;自定义类也必须使用这种指针类型,以便我们可以一致地管理不同类型的对象。需要注意的第二点是,用户定义的类必须继承自
torch::CustomClassHolder。这确保了自定义类有空间来存储引用计数。
现在让我们看看如何使这个类对TorchScript可见,这个过程称为 绑定 该类:
// Notice a few things:
// - We pass the class to be registered as a template parameter to
// `torch::class_`. In this instance, we've passed the
// specialization of the MyStackClass class ``MyStackClass<std::string>``.
// In general, you cannot register a non-specialized template
// class. For non-templated classes, you can just pass the
// class name directly as the template parameter.
// - The arguments passed to the constructor make up the "qualified name"
// of the class. In this case, the registered class will appear in
// Python and C++ as `torch.classes.my_classes.MyStackClass`. We call
// the first argument the "namespace" and the second argument the
// actual class name.
TORCH_LIBRARY(my_classes, m) {
m.class_<MyStackClass<std::string>>("MyStackClass")
// The following line registers the contructor of our MyStackClass
// class that takes a single `std::vector<std::string>` argument,
// i.e. it exposes the C++ method `MyStackClass(std::vector<T> init)`.
// Currently, we do not support registering overloaded
// constructors, so for now you can only `def()` one instance of
// `torch::init`.
.def(torch::init<std::vector<std::string>>())
// The next line registers a stateless (i.e. no captures) C++ lambda
// function as a method. Note that a lambda function must take a
// `c10::intrusive_ptr<YourClass>` (or some const/ref version of that)
// as the first argument. Other arguments can be whatever you want.
.def("top", [](const c10::intrusive_ptr<MyStackClass<std::string>>& self) {
return self->stack_.back();
})
// The following four lines expose methods of the MyStackClass<std::string>
// class as-is. `torch::class_` will automatically examine the
// argument and return types of the passed-in method pointers and
// expose these to Python and TorchScript accordingly. Finally, notice
// that we must take the *address* of the fully-qualified method name,
// i.e. use the unary `&` operator, due to C++ typing rules.
.def("push", &MyStackClass<std::string>::push)
.def("pop", &MyStackClass<std::string>::pop)
.def("clone", &MyStackClass<std::string>::clone)
.def("merge", &MyStackClass<std::string>::merge)
;
}
使用CMake将示例构建为C++项目¶
现在,我们将使用 CMake 构建系统构建上述 C++ 代码。首先,将我们迄今为止所涵盖的所有 C++ 代码放入一个名为 class.cpp 的文件中。
然后,在同一目录下编写一个简单的 CMakeLists.txt 文件。以下是 CMakeLists.txt 应该的样子:
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
project(custom_class)
find_package(Torch REQUIRED)
# Define our library target
add_library(custom_class SHARED class.cpp)
set(CMAKE_CXX_STANDARD 14)
# Link against LibTorch
target_link_libraries(custom_class "${TORCH_LIBRARIES}")
此外,创建一个 build 目录。你的文件树应该如下所示:
custom_class_project/
class.cpp
CMakeLists.txt
build/
我们假设您已经按照前面教程中描述的方式设置了环境。 继续并调用 cmake,然后运行 make 来构建项目:
$ cd build
$ cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')" ..
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found torch: /torchbind_tutorial/libtorch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /torchbind_tutorial/build
$ make -j
Scanning dependencies of target custom_class
[ 50%] Building CXX object CMakeFiles/custom_class.dir/class.cpp.o
[100%] Linking CXX shared library libcustom_class.so
[100%] Built target custom_class
你会发现,在构建目录中现在(以及其他内容)存在一个动态库文件。在 Linux 上,它可能被命名为
libcustom_class.so。因此,文件树应该如下所示:
custom_class_project/
class.cpp
CMakeLists.txt
build/
libcustom_class.so
从 Python 和 TorchScript 使用 C++ 类¶
现在我们已经将我们的类及其注册编译到一个 .so 文件中,
我们可以将该 .so 文件加载到 Python 中并进行尝试。以下是一个脚本,
演示了这一点:
import torch
# `torch.classes.load_library()` allows you to pass the path to your .so file
# to load it in and make the custom C++ classes available to both Python and
# TorchScript
torch.classes.load_library("build/libcustom_class.so")
# You can query the loaded libraries like this:
print(torch.classes.loaded_libraries)
# prints {'/custom_class_project/build/libcustom_class.so'}
# We can find and instantiate our custom C++ class in python by using the
# `torch.classes` namespace:
#
# This instantiation will invoke the MyStackClass(std::vector<T> init)
# constructor we registered earlier
s = torch.classes.my_classes.MyStackClass(["foo", "bar"])
# We can call methods in Python
s.push("pushed")
assert s.pop() == "pushed"
# Test custom operator
s.push("pushed")
torch.ops.my_classes.manipulate_instance(s) # acting as s.pop()
assert s.top() == "bar"
# Returning and passing instances of custom classes works as you'd expect
s2 = s.clone()
s.merge(s2)
for expected in ["bar", "foo", "bar", "foo"]:
assert s.pop() == expected
# We can also use the class in TorchScript
# For now, we need to assign the class's type to a local in order to
# annotate the type on the TorchScript function. This may change
# in the future.
MyStackClass = torch.classes.my_classes.MyStackClass
@torch.jit.script
def do_stacks(s: MyStackClass): # We can pass a custom class instance
# We can instantiate the class
s2 = torch.classes.my_classes.MyStackClass(["hi", "mom"])
s2.merge(s) # We can call a method on the class
# We can also return instances of the class
# from TorchScript function/methods
return s2.clone(), s2.top()
stack, top = do_stacks(torch.classes.my_classes.MyStackClass(["wow"]))
assert top == "wow"
for expected in ["wow", "mom", "hi"]:
assert stack.pop() == expected
使用自定义类保存、加载和运行TorchScript代码¶
我们还可以在使用 libtorch 的 C++ 进程中使用自定义注册的 C++ 类。例如,让我们定义一个简单的 nn.Module,它实例化并调用我们的 MyStackClass 类的方法:
import torch
torch.classes.load_library('build/libcustom_class.so')
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, s: str) -> str:
stack = torch.classes.my_classes.MyStackClass(["hi", "mom"])
return stack.pop() + s
scripted_foo = torch.jit.script(Foo())
print(scripted_foo.graph)
scripted_foo.save('foo.pt')
foo.pt 在我们的文件系统中现在包含我们刚刚定义的序列化 TorchScript 程序。
现在,我们将定义一个新的CMake项目,以展示如何加载此模型及其所需的.so文件。有关如何执行此操作的完整说明,请查看 在C++中加载TorchScript模型教程。
同样如之前所述,让我们创建一个包含以下内容的文件结构:
cpp_inference_example/
infer.cpp
CMakeLists.txt
foo.pt
build/
custom_class_project/
class.cpp
CMakeLists.txt
build/
注意到我们已经复制了序列化的 foo.pt 文件,以及来自上方 custom_class_project 的源代码树。我们将把 custom_class_project 添加为这个 C++ 项目的依赖项,以便我们可以将自定义类编译到二进制文件中。
让我们用以下内容填充 infer.cpp:
#include <torch/script.h>
#include <iostream>
#include <memory>
int main(int argc, const char* argv[]) {
torch::jit::Module module;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
module = torch::jit::load("foo.pt");
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
return -1;
}
std::vector<c10::IValue> inputs = {"foobarbaz"};
auto output = module.forward(inputs).toString();
std::cout << output->string() << std::endl;
}
同样,让我们定义我们的 CMakeLists.txt 文件:
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
project(infer)
find_package(Torch REQUIRED)
add_subdirectory(custom_class_project)
# Define our library target
add_executable(infer infer.cpp)
set(CMAKE_CXX_STANDARD 14)
# Link against LibTorch
target_link_libraries(infer "${TORCH_LIBRARIES}")
# This is where we link in our libcustom_class code, making our
# custom class available in our binary.
target_link_libraries(infer -Wl,--no-as-needed custom_class)
你知道的: cd build,cmake 和 make:
$ cd build
$ cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')" ..
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found torch: /local/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /cpp_inference_example/build
$ make -j
Scanning dependencies of target custom_class
[ 25%] Building CXX object custom_class_project/CMakeFiles/custom_class.dir/class.cpp.o
[ 50%] Linking CXX shared library libcustom_class.so
[ 50%] Built target custom_class
Scanning dependencies of target infer
[ 75%] Building CXX object CMakeFiles/infer.dir/infer.cpp.o
[100%] Linking CXX executable infer
[100%] Built target infer
现在我们可以运行我们的激动人心的 C++ 二进制文件了:
$ ./infer
momfoobarbaz
不可思议!
将自定义类移动到/从 IValues¶
也有可能你需要将自定义类从或移入
IValue``s, such as when you take or return ``IValue``s from TorchScript methods
or you want to instantiate a custom class attribute in C++. For creating an
``IValue 的自定义 C++ 类实例:
torch::make_custom_class<T>()提供了一个类似于 c10::intrusive_ptr<T> 的 API,它会接收你提供的任何一组参数,调用与这些参数匹配的 T 类型构造函数,并将实例包装起来返回。然而,它不仅仅返回一个指向自定义类对象的指针,而是返回一个IValue包装的对象。你可以直接将这个IValue传递给 TorchScript。如果已经有一个指向您类的
intrusive_ptr,您可以直接使用构造函数IValue(intrusive_ptr<T>)从它构建一个 IValue。
将 IValue 转换回自定义类:
IValue::toCustomClass<T>()将返回一个指向intrusive_ptr<T>的引用,该引用指向 自定义类,而IValue包含此自定义类。内部,此函数会检查T是否已注册为自定义类,并且IValue是否确实包含 一个自定义类。您可以手动通过调用isCustomClass()来检查IValue是否包含自定义类。
定义自定义 C++ 类的序列化/反序列化方法¶
如果你尝试将一个带有自定义绑定的 C++ 类作为属性保存为 ScriptModule,你会得到以下错误:
# export_attr.py
import torch
torch.classes.load_library('build/libcustom_class.so')
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.stack = torch.classes.my_classes.MyStackClass(["just", "testing"])
def forward(self, s: str) -> str:
return self.stack.pop() + s
scripted_foo = torch.jit.script(Foo())
scripted_foo.save('foo.pt')
loaded = torch.jit.load('foo.pt')
print(loaded.stack.pop())
$ python export_attr.py
RuntimeError: Cannot serialize custom bound C++ class __torch__.torch.classes.my_classes.MyStackClass. Please define serialization methods via def_pickle for this class. (pushIValueImpl at ../torch/csrc/jit/pickler.cpp:128)
这是因为 TorchScript 无法自动确定从你的 C++ 类中保存哪些信息。您必须手动指定这些信息。实现这一点的方法是在类上定义 __getstate__ 和 __setstate__ 方法,并使用 def_pickle 方法上的特殊方法在 class_ 上进行操作。
注意
TorchScript 中 __getstate__ 和 __setstate__ 的语义与 Python 的 pickle 模块等效。你可以
了解更多
关于我们如何使用这些方法的信息。
这是一个示例,我们可以将def_pickle调用添加到注册
MyStackClass中以包含序列化方法:
// class_<>::def_pickle allows you to define the serialization
// and deserialization methods for your C++ class.
// Currently, we only support passing stateless lambda functions
// as arguments to def_pickle
.def_pickle(
// __getstate__
// This function defines what data structure should be produced
// when we serialize an instance of this class. The function
// must take a single `self` argument, which is an intrusive_ptr
// to the instance of the object. The function can return
// any type that is supported as a return value of the TorchScript
// custom operator API. In this instance, we've chosen to return
// a std::vector<std::string> as the salient data to preserve
// from the class.
[](const c10::intrusive_ptr<MyStackClass<std::string>>& self)
-> std::vector<std::string> {
return self->stack_;
},
// __setstate__
// This function defines how to create a new instance of the C++
// class when we are deserializing. The function must take a
// single argument of the same type as the return value of
// `__getstate__`. The function must return an intrusive_ptr
// to a new instance of the C++ class, initialized however
// you would like given the serialized state.
[](std::vector<std::string> state)
-> c10::intrusive_ptr<MyStackClass<std::string>> {
// A convenient way to instantiate an object and get an
// intrusive_ptr to it is via `make_intrusive`. We use
// that here to allocate an instance of MyStackClass<std::string>
// and call the single-argument std::vector<std::string>
// constructor with the serialized state.
return c10::make_intrusive<MyStackClass<std::string>>(std::move(state));
});
注意
我们对pickle API采取了与pybind11不同的方法。与pybind11作为特殊函数pybind11::pickle()并将其传递给class_::def()不同,我们为此目的提供了一个单独的方法def_pickle。这是因为名称torch::jit::pickle已经被占用,我们不想引起混淆。
一旦我们以这种方式定义了(反)序列化行为,我们的脚本现在就可以成功运行了:
$ python ../export_attr.py
testing
定义自定义操作符,这些操作符可以接受或返回绑定的C++类¶
一旦你定义了一个自定义的 C++ 类,你也可以将该类用作自定义操作符(即自由函数)的参数或返回值。假设你有以下这个自由函数:
c10::intrusive_ptr<MyStackClass<std::string>> manipulate_instance(const c10::intrusive_ptr<MyStackClass<std::string>>& instance) {
instance->pop();
return instance;
}
你可以在以下代码块中运行以下代码来注册它:
m.def(
"manipulate_instance(__torch__.torch.classes.my_classes.MyStackClass x) -> __torch__.torch.classes.my_classes.MyStackClass Y",
manipulate_instance
);
请参阅 自定义操作教程 以获取有关注册 API 的更多详细信息。
完成这一步后,您可以像以下示例一样使用该操作:
class TryCustomOp(torch.nn.Module):
def __init__(self):
super(TryCustomOp, self).__init__()
self.f = torch.classes.my_classes.MyStackClass(["foo", "bar"])
def forward(self):
return torch.ops.my_classes.manipulate_instance(self.f)
注意
注册一个以C++类作为参数的操作符需要确保该自定义类已经被注册。你可以通过确保自定义类的注册和你的自由函数定义位于同一个TORCH_LIBRARY代码块中,并且自定义类的注册优先于自由函数定义来强制实现这一点。在未来,我们可能会放宽这一要求,使得这些可以按任意顺序注册。
结论¶
本教程引导你了解如何将一个 C++ 类暴露给 TorchScript(以及由此扩展到 Python),如何注册其方法,如何从 Python 和 TorchScript 使用该类,以及如何使用该类保存和加载代码并在独立的 C++ 进程中运行该代码。你现在已准备好通过与第三方 C++ 库交互或实现任何需要 Python、TorchScript 和 C++ 之间无缝融合的用例的 C++ 类来扩展你的 TorchScript 模型。
一如既往,如果您遇到任何问题或有任何疑问,您可以使用我们的 论坛 或 GitHub 问题 与我们联系。此外,我们的 常见问题解答 (FAQ) 页面 可能包含有用的信息。