提示模板¶

提示模板是结构化文本模板，用于设置用户提示的格式以优化特定任务上的模型性能。它们可以用于多种用途：

每当提示模型时都需要的特定于模型的模板，例如 [INST] 标记中。这些模型使用这些标签进行了预训练，并使用它们在推理中可以帮助确保最佳性能。
特定于任务的模板，用于为训练后预期的特定任务调整模型。示例包括语法更正（GrammarErrorCorrectionTemplate), 摘要（SummarizeTemplate）、问答（QuestionAnswerTemplate), 和更多。
社区标准化模板，例如ChatMLTemplate

例如，如果我想微调模型以执行语法更正任务，我可以使用GrammarErrorCorrectionTemplate将文本“Correct this to standard English： {prompt} — Corrected： {response}”添加到我的所有数据样本中。

from torchtune.data import GrammarErrorCorrectionTemplate, Message

sample = {
    "incorrect": "This are a cat",
    "correct": "This is a cat.",
}
msgs = [
    Message(role="user", content=sample["incorrect"]),
    Message(role="assistant", content=sample["correct"]),
]

gec_template = GrammarErrorCorrectionTemplate()
templated_msgs = gec_template(msgs)
for msg in templated_msgs:
    print(msg.text_content)
# Correct this to standard English: This are a cat
# ---
# Corrected:
# This is a cat.

添加的文本与模型分词器添加的特殊标记不同。对于扩展的关于提示模板和特殊令牌之间的区别的讨论，参见令牌化提示模板和特殊令牌。

使用提示模板¶

提示模板将传递到分词器中，并将自动应用于您正在微调的数据集。您可以通过两种方式传递它：

提示模板类的字符串点路径，即 “torchtune.models.mistral.MistralChatTemplate” 或 “path.to.my.CustomPromptTemplate”
一个字典，用于将 role 映射到字符串元组，指示要在消息内容之前和之后添加的文本

通过 dotpath 字符串定义¶

# In code
from torchtune.models.mistral import mistral_tokenizer

m_tokenizer = mistral_tokenizer(
    path="/tmp/Mistral-7B-v0.1/tokenizer.model"
    prompt_template="torchtune.models.mistral.MistralChatTemplate"
)

# In config
tokenizer:
  _component_: torchtune.models.mistral.mistral_tokenizer
  path: /tmp/Mistral-7B-v0.1/tokenizer.model
  prompt_template: torchtune.models.mistral.MistralChatTemplate

通过字典定义¶

例如，要实现以下提示模板：

System: {content}\\n
User: {content}\\n
Assistant: {content}\\n
Tool: {content}\\n

您需要为每个角色传入一个元组，其中是字符串 added before the text 内容和是之后添加的字符串。PREPEND_TAGAPPEND_TAG

template = {role: (PREPEND_TAG, APPEND_TAG)}

因此，模板将定义如下：

template = {
    "system": ("System: ", "\\n"),
    "user": ("User: ", "\\n"),
    "assistant": ("Assistant: ", "\\n"),
    "ipython": ("Tool: ", "\\n"),
}

现在我们可以将其作为字典传递到分词器中：

# In code
from torchtune.models.mistral import mistral_tokenizer

template = {
    "system": ("System: ", "\\n"),
    "user": ("User: ", "\\n"),
    "assistant": ("Assistant: ", "\\n"),
    "ipython": ("Tool: ", "\\n"),
}
m_tokenizer = mistral_tokenizer(
    path="/tmp/Mistral-7B-v0.1/tokenizer.model"
    prompt_template=template,
)

# In config
tokenizer:
  _component_: torchtune.models.mistral.mistral_tokenizer
  path: /tmp/Mistral-7B-v0.1/tokenizer.model
  prompt_template:
    system:
      - "System: "
      - "\\n"
    user:
      - "User: "
      - "\\n"
    assistant:
      - "Assistant: "
      - "\\n"
    ipython:
      - "Tool: "
      - "\\n"

如果您不想向角色添加 prepend/append 标签，则可以在需要的地方传入一个空字符串 “”。

使用`PromptTemplate`类¶

模板字典也可以传递到PromptTemplate因此，您可以将其用作独立的自定义 prompt 模板类。

from torchtune.data import PromptTemplate

def my_custom_template() -> PromptTemplate:
    return PromptTemplate(
        template={
            "user": ("User: ", "\\n"),
            "assistant": ("Assistant: ", "\\n"),
        },
    )

template = my_custom_template()
msgs = [
    Message(role="user", content="Hello world!"),
    Message(role="assistant", content="Is AI overhyped?"),
]
templated_msgs = template(msgs)
for msg in templated_msgs:
    print(msg.role, msg.text_content)
# user, User: Hello world!
#
# assistant, Assistant: Is AI overhyped?
#

自定义提示模板¶

对于不完全属于该模式的更高级配置，您可以创建一个继承自PREPEND_TAG content APPEND_TAGPromptTemplateInterface并实现该方法。__call__

from torchtune.data import Message

class PromptTemplateInterface(Protocol):
    def __call__(
        self,
        messages: List[Message],
        inference: bool = False,
    ) -> List[Message]:
        """
        Format each role's message(s) according to the prompt template

        Args:
            messages (List[Message]): a single conversation, structured as a list
                of :class:`~torchtune.data.Message` objects
            inference (bool): Whether the template is being used for inference or not.

        Returns:
            The formatted list of messages
        """
        pass

# Contrived example - make all assistant prompts say "Eureka!"
class EurekaTemplate(PromptTemplateInterface):
    def __call__(
        self,
        messages: List[Message],
        inference: bool = False,
    ) -> List[Message]:
        formatted_dialogue = []
        for message in messages:
            if message.role == "assistant":
                content = "Eureka!"
            else:
                content = message.content
            formatted_dialogue.append(
                Message(
                    role=message.role,
                    content=content,
                    masked=message.masked,
                    ipython=message.ipython,
                    eot=message.eot,
                ),
            )
        return formatted_dialogue

template = EurekaTemplate()
msgs = [
    Message(role="user", content="Hello world!"),
    Message(role="assistant", content="Is AI overhyped?"),
]
templated_msgs = template(msgs)
for msg in templated_msgs:
    print(msg.role, msg.text_content)
# user, Hello world!
# assistant, Eureka!

有关更多示例，您可以查看MistralChatTemplate或Llama2ChatTemplate.

要在分词器中使用此自定义模板，您可以通过 dotpath 字符串传入它：

# In code
from torchtune.models.mistral import mistral_tokenizer

m_tokenizer = mistral_tokenizer(
    path="/tmp/Mistral-7B-v0.1/tokenizer.model",
    prompt_template="path.to.template.EurekaTemplate",
)

# In config
tokenizer:
  _component_: torchtune.models.mistral.mistral_tokenizer
  path: /tmp/Mistral-7B-v0.1/tokenizer.model
  prompt_template: path.to.template.EurekaTemplate

提示模板¶

使用提示模板¶

通过 dotpath 字符串定义¶

通过字典定义¶

使用`PromptTemplate`类¶

自定义提示模板¶

内置提示模板¶

文档

教程

资源

提示模板¶

使用提示模板¶

通过 dotpath 字符串定义¶

通过字典定义¶

使用PromptTemplate类¶

自定义提示模板¶

内置提示模板¶

文档

教程

资源

使用`PromptTemplate`类¶