模块¶

标准 TorchRec 模块表示嵌入表的集合：

EmbeddingBagCollection是torch.nn.EmbeddingBag
EmbeddingCollection是torch.nn.Embedding

这些模块是通过标准化的配置类构建的：

EmbeddingBagConfig为EmbeddingBagCollection
EmbeddingConfig为EmbeddingCollection

torchrec.modules.embedding_configs 类。EmbeddingBagConfig（num_embeddings： int， embedding_dim： int， name： str = ''， data_type： ~torchrec.types.DataType = DataType.FP32， feature_names： ~typing.List[str] = <factory>， weight_init_max： ~typing 来。可选[float] = None， weight_init_min： ~typing。可选[float] = None，num_embeddings_post_pruning：~typing。可选[int] = None， init_fn： ~typing。可选[~typing.可调用[[~torch.Tensor]，~typing。可选[~torch.张量]]] = 无，need_pos：bool = False，池化：~torchrec.modules.embedding_configs。池类型 = 池类型.SUM)¶

基地：BaseEmbeddingConfig

EmbeddingBagConfig 是一个数据类，它表示一个 embedding table，其中 output 是要池化的。

参数: pooling （PoolingType） – 池化类型。

torchrec.modules.embedding_configs 类。EmbeddingConfig（num_embeddings： int， embedding_dim： int， name： str = ''， data_type： ~torchrec.types.DataType = DataType.FP32， feature_names： ~typing.List[str] = <factory>， weight_init_max： ~typing 来。可选[float] = None， weight_init_min： ~typing。可选[float] = None，num_embeddings_post_pruning：~typing。可选[int] = None， init_fn： ~typing。可选[~typing.可调用[[~torch.Tensor]，~typing。可选[~torch.张量]]] = 无，need_pos：bool = False)¶

基地：BaseEmbeddingConfig

EmbeddingConfig 是一个表示单个嵌入表的数据类。

torchrec.modules.embedding_configs 类。BaseEmbeddingConfig（num_embeddings： int， embedding_dim： int， name： str = ''， data_type： ~torchrec.types.DataType = DataType.FP32， feature_names： ~typing.List[str] = <factory>， weight_init_max： ~typing 来。可选[float] = None， weight_init_min： ~typing。可选[float] = None，num_embeddings_post_pruning：~typing。可选[int] = None， init_fn： ~typing。可选[~typing.可调用[[~torch.Tensor]，~typing。可选[~torch.张量]]] = 无，need_pos：bool = False)¶

用于嵌入配置的基类。

参数

num_embeddings （int） - 嵌入数。
embedding_dim （int） – 嵌入维度。
name （str） – 嵌入表的名称。
data_type （DataType） – 嵌入表的数据类型。
feature_names （List[str]） – 功能名称列表。
weight_init_max （Optional[float]） – 权重初始化的最大值。
weight_init_min （Optional[float]） – 权重初始化的最小值。
num_embeddings_post_pruning （Optional[int]） – 修剪后用于推理的嵌入数。如果为 None，则不应用修剪。
init_fn （可选[Callable[[torch.Tensor]、可选[torch.Tensor]]]） – 用于嵌入权重的 init 函数。
need_pos （bool） – 表是否具有位置加权。

torchrec.modules.embedding_modules 类。EmbeddingBagCollection（tables： List[EmbeddingBagConfig]， is_weighted： bool = False， 设备：可选 [设备] = 无)¶

EmbeddingBagCollection 表示池化嵌入（EmbeddingBags）的集合。

注意

EmbeddingBagCollection 是一个未分片的模块，未进行性能优化。对于性能敏感的场景，请考虑使用分片版本 ShardedEmbeddingBagCollection。

它可以在以 KeyedJaggedTensor 形式表示稀疏数据的参数上调用，其值为（F， B， L[f][i]），其中：

F：特征（键）的数量
B：批量大小
L[f][i]：稀疏特征的长度（对于每个特征 f 和批次索引 i，即锯齿状，可能不同）

并输出一个 KeyedTensor 其值为（B， D），其中：

B：批量大小
D：所有嵌入表的嵌入维度之和，即 sum（[config.embedding_dim 表示表中的配置]）

假设参数是具有 F 特征、批量大小 B 和 L[f][i] 稀疏长度的 KeyedJaggedTensor J 使得 J[f][i] 是特征 f 和批处理索引 i 的袋子，则输出 KeyedTensor KT 定义如下：KT[i] = torch.cat（[emb[f]（J[f][i]） for f in J.keys（）]），其中 emb[f] 是对应于特征 f 的 EmbeddingBag。

请注意，J[f][i] 是整数值（bag ）的可变长度列表，而 emb[f]（J[f][i]）是池化嵌入通过使用 EmbeddingBag emb[f] 的模式（默认为平均值）减少 J[f][i] 中每个值的嵌入而生成。

参数

tables （List[EmbeddingBagConfig]） – 嵌入表的列表。
is_weighted （bool） – 输入的 KeyedJaggedTensor 是否加权。
device （Optional[torch.device]） – 默认计算设备。

例：

table_0 = EmbeddingBagConfig(
    name="t1", embedding_dim=3, num_embeddings=10, feature_names=["f1"]
)
table_1 = EmbeddingBagConfig(
    name="t2", embedding_dim=4, num_embeddings=10, feature_names=["f2"]
)

ebc = EmbeddingBagCollection(tables=[table_0, table_1])

#        i = 0     i = 1    i = 2  <-- batch indices
# "f1"   [0,1]     None      [2]
# "f2"   [3]       [4]     [5,6,7]
#  ^
# features

features = KeyedJaggedTensor(
    keys=["f1", "f2"],
    values=torch.tensor([0, 1,                  2,    # feature 'f1'
                            3,      4,    5, 6, 7]),  # feature 'f2'
                    #    i = 1    i = 2    i = 3   <--- batch indices
    offsets=torch.tensor([
            0, 2, 2,       # 'f1' bags are values[0:2], values[2:2], and values[2:3]
            3, 4, 5, 8]),  # 'f2' bags are values[3:4], values[4:5], and values[5:8]
)

pooled_embeddings = ebc(features)
print(pooled_embeddings.values())
tensor([
    #  f1 pooled embeddings              f2 pooled embeddings
    #     from bags (dim. 3)                from bags (dim. 4)
    [-0.8899, -0.1342, -1.9060,  -0.0905, -0.2814, -0.9369, -0.7783],  # i = 0
    [ 0.0000,  0.0000,  0.0000,   0.1598,  0.0695,  1.3265, -0.1011],  # i = 1
    [-0.4256, -1.1846, -2.1648,  -1.0893,  0.3590, -1.9784, -0.7681]],  # i = 2
    grad_fn=<CatBackward0>)
print(pooled_embeddings.keys())
['f1', 'f2']
print(pooled_embeddings.offset_per_key())
tensor([0, 3, 7])  # embeddings have dimensions 3 and 4, so embeddings are at [0, 3) and [3, 7).

property device：设备¶: 返回： torch.device：计算设备。

embedding_bag_configs（） →列表[EmbeddingBagConfig]¶

结果: 嵌入包配置。
返回类型：: 列表[EmbeddingBagConfig]

forward（features： KeyedJaggedTensor） → KeyedTensor¶

运行 EmbeddingBagCollection 前向传递。此方法接受 KeyedJaggedTensor 并返回 KeyedTensor，这是池化每个特征的嵌入的结果。

参数: features （KeyedJaggedTensor） – 输入 KJT
结果: KeyedTensor 的

is_weighted（） → bool¶

结果: EmbeddingBagCollection 是否加权。
返回类型：: 布尔

reset_parameters（） → 无¶: 重置 EmbeddingBagCollection 的参数。参数值根据每个 EmbeddingBagConfig 的init_fn（如果存在）进行初始化。

torchrec.modules.embedding_modules 类。EmbeddingCollection（tables： List[EmbeddingConfig]， device： Optional[device] = 无，need_indices：bool = False)¶

EmbeddingCollection 表示非池化嵌入的集合。

注意

EmbeddingCollection 是一个未分片的模块，未进行性能优化。对于性能敏感型方案，请考虑使用分片版本 ShardedEmbeddingCollection。

它可以在以 KeyedJaggedTensor 形式表示稀疏数据的参数上调用，其值为（F， B， L[f][i]），其中：

F：特征（键）的数量
B：批量大小
L[f][i]：稀疏特征的长度（对于每个特征 f 和批次索引 i，即锯齿状，可能不同）

并输出 Dict[Feature， JaggedTensor] 类型的结果，其中 result[f] 是形状为（EB[f]， D[f]）的 JaggedTensor，其中：

EB[f]：特征 f 的“扩展批量大小”等于其袋子值的长度之和，即 sum（[len（J[f][i]） for i in range（B）]）。
D[f]：是特征 f 的嵌入维度。

参数

tables （List[EmbeddingConfig]） – 嵌入表的列表。
device （Optional[torch.device]） – 默认计算设备。
need_indices （bool） – 如果我们需要将索引传递给最终的查找字典。

例：

e1_config = EmbeddingConfig(
    name="t1", embedding_dim=3, num_embeddings=10, feature_names=["f1"]
)
e2_config = EmbeddingConfig(
    name="t2", embedding_dim=3, num_embeddings=10, feature_names=["f2"]
)

ec = EmbeddingCollection(tables=[e1_config, e2_config])

#     0       1        2  <-- batch
# 0   [0,1] None    [2]
# 1   [3]    [4]    [5,6,7]
# ^
# feature

features = KeyedJaggedTensor.from_offsets_sync(
    keys=["f1", "f2"],
    values=torch.tensor([0, 1,                  2,    # feature 'f1'
                            3,      4,    5, 6, 7]),  # feature 'f2'
                    #    i = 1    i = 2    i = 3   <--- batch indices
    offsets=torch.tensor([
            0, 2, 2,       # 'f1' bags are values[0:2], values[2:2], and values[2:3]
            3, 4, 5, 8]),  # 'f2' bags are values[3:4], values[4:5], and values[5:8]
)

feature_embeddings = ec(features)
print(feature_embeddings['f2'].values())
tensor([
    # embedding for value 3 in f2 bag values[3:4]:
    [-0.2050,  0.5478,  0.6054],

    # embedding for value 4 in f2 bag values[4:5]:
    [ 0.7352,  0.3210, -3.0399],

    # embedding for values 5, 6, 7 in f2 bag values[5:8]:
    [ 0.1279, -0.1756, -0.4130],
    [ 0.7519, -0.4341, -0.0499],
    [ 0.9329, -1.0697, -0.8095],

], grad_fn=<EmbeddingBackward>)

property device：设备¶: 返回： torch.device：计算设备。

embedding_configs（） →列表[EmbeddingConfig]¶

结果: 嵌入配置。
返回类型：: 列表[EmbeddingConfig]

embedding_dim（） → int¶

结果: 嵌入维度。
返回类型：: int

embedding_names_by_table（） → 列表[List[str]]¶

结果: 按表划分的嵌入名称。
返回类型：: 列表[List[str]]

forward（features： KeyedJaggedTensor） → Dict[str， JaggedTensor]¶

运行 EmbeddingBagCollection 前向传递。此方法接受 KeyedJaggedTensor 并返回 Dict[str， JaggedTensor]，这是每个特征的单独嵌入的结果。

参数: features （KeyedJaggedTensor） - 形式为 [F X B X L] 的 KJT。
结果: 字典[str， JaggedTensor]

need_indices（） → bool¶

结果: EmbeddingCollection 是否需要索引。
返回类型：: 布尔

reset_parameters（） → 无¶: 重置 EmbeddingCollection 的参数。参数值根据每个 EmbeddingConfig 的init_fn（如果存在）进行初始化。

模块¶

文档

教程

资源