UpTrain [github || website || docs] 是一个开源平台，用于评估和改进 LLM 应用。它为 20 多项预配置检查（涵盖语言、代码、嵌入等使用场景）提供评分，对失败案例进行根因分析，并提供解决指导。

UpTrain 回调处理器

本笔记本展示了 UpTrain 回调处理器如何无缝集成到你的管道中，支持多种评估。我们选择了一些适合评估链的评估指标，这些评估会自动运行，结果显示在输出中。更多 UpTrain 评估详情请参见此处。以下 LangChain 检索器被选为演示示例：

1. 原始 RAG

RAG 在检索上下文和生成回复中发挥关键作用。为确保其性能和回复质量，我们进行以下评估：

上下文相关性：判断从查询中提取的上下文是否与回复相关。
事实准确性：评估 LLM 是否产生幻觉或提供错误信息。
回复完整性：检查回复是否包含查询所要求的所有信息。

2. 多查询生成

MultiQueryRetriever 用于解决 RAG 管道可能无法基于查询返回最佳文档集的问题。它会生成与原始问题含义相似的多个查询变体，然后分别检索文档。鉴于其复杂性，我们在之前评估的基础上增加：

多查询准确性：确保生成的多查询与原始查询含义相同。

3. 上下文压缩与重排序

重排序是根据与查询的相关性对节点重新排序并选取前 n 个节点的过程。由于重排序完成后节点数量可能减少，我们进行以下评估：

上下文重排序：检查重排序后的节点顺序是否比原始顺序更与查询相关。
上下文简洁性：检验减少后的节点是否仍能提供所有所需信息。

这些评估共同确保了链中 RAG、MultiQueryRetriever 和重排序过程的稳健性和有效性。

安装依赖

pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
 - Avoid using `tokenizers` before the fork if possible
 - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

注意：如果想使用 GPU 版本的库，也可以安装 faiss-gpu 代替 faiss-cpu。

导入库

from getpass import getpass

from langchain_classic.chains import RetrievalQA
from langchain_classic.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_classic.retrievers.document_compressors import FlashrankRerank
from langchain_classic.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter,
)

加载文档

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

将文档分割为块

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)

创建检索器

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

定义 LLM

llm = ChatOpenAI(temperature=0, model="gpt-4")

设置

UpTrain 为你提供：

具有高级钻取和过滤选项的仪表板
失败案例的洞察和常见主题分析
生产数据的可观测性和实时监控
通过与 CI/CD 管道无缝集成进行回归测试

你可以选择以下方式使用 UpTrain 进行评估：

1. UpTrain 开源软件（OSS）

你可以使用开源评估服务来评估你的模型。在这种情况下，你需要提供 OpenAI API 密钥。UpTrain 使用 GPT 模型来评估 LLM 生成的回复。你可以在此处获取你的密钥。要在 UpTrain 仪表板中查看评估结果，你需要在终端中运行以下命令进行设置：

git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh

这将在你的本地机器上启动 UpTrain 仪表板，可通过 http://localhost:3000/dashboard 访问。参数：

key_type=“openai”
api_key=“OPENAI_API_KEY”
project_name=“PROJECT_NAME”

2. UpTrain 托管服务与仪表板

或者，你可以使用 UpTrain 的托管服务来评估你的模型。你可以在此处创建免费的 UpTrain 账号并获取免费试用额度。如果需要更多试用额度，可以在此预约与 UpTrain 维护者的通话。使用托管服务的优势：

无需在本地机器上设置 UpTrain 仪表板。
无需 API 密钥即可访问多种 LLM。

完成评估后，你可以在 UpTrain 仪表板 https://dashboard.uptrain.ai/dashboard 中查看结果。参数：

key_type=“uptrain”
api_key=“UPTRAIN_API_KEY”
project_name=“PROJECT_NAME”

注意： project_name 将是 UpTrain 仪表板中显示评估结果的项目名称。

设置 API 密钥

笔记本将提示你输入 API 密钥。你可以通过修改下方单元格中的 key_type 参数，在 OpenAI API 密钥和 UpTrain API 密钥之间切换。

KEY_TYPE = "openai"  # or "uptrain"
API_KEY = getpass()

1. 原始 RAG

UpTrain 回调处理器将在生成后自动捕获查询、上下文和回复，并对回复进行以下三项评估（评分范围 0 到 1）：

上下文相关性：检查从查询中提取的上下文是否与回复相关。
事实准确性：检查回复的事实准确程度。
回复完整性：检查回复是否包含查询所要求的所有信息。

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

# Create the chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# Create the uptrain callback handler
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(query, config=config)

2024-04-17 17:03:44.969 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:05.809 | INFO     | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that she is a former top litigator in private practice, a former federal public defender, and comes from a family of public school educators and police officers. He described her as a consensus builder and noted that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by both Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

2. 多查询生成

MultiQueryRetriever 用于解决 RAG 管道可能无法基于查询返回最佳文档集的问题。它会生成与原始问题含义相似的多个查询，然后为每个查询检索文档。为评估此检索器，UpTrain 将运行以下评估：

多查询准确性：检查生成的多查询是否与原始查询含义相同。

# Create the retriever
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

chain = (
    {"context": multi_query_retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# Invoke the chain with a query
question = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(question, config=config)

2024-04-17 17:04:10.675 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:16.804 | INFO     | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!

Question: What did the president say about Ketanji Brown Jackson
Multi Queries:
  - How did the president comment on Ketanji Brown Jackson?
  - What were the president's remarks regarding Ketanji Brown Jackson?
  - What statements has the president made about Ketanji Brown Jackson?

Multi Query Accuracy Score: 0.5

2024-04-17 17:04:22.027 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:44.033 | INFO     | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

3. 上下文压缩与重排序

重排序过程涉及根据与查询的相关性对节点重新排序并选取前 n 个节点。由于重排序完成后节点数量可能减少，我们进行以下评估：

上下文重排序：检查重排序后的节点顺序是否比原始顺序更与查询相关。
上下文简洁性：检查减少后的节点是否仍能提供所有所需信息。

# Create the retriever
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor, base_retriever=retriever
)

# Create the chain
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
result = chain.invoke(query, config=config)

2024-04-17 17:04:46.462 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:53.561 | INFO     | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!

Question: What did the president say about Ketanji Brown Jackson

Context Conciseness Score: 0.0
Context Reranking Score: 1.0

2024-04-17 17:04:56.947 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:05:16.551 | INFO     | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!

Question: What did the president say about Ketanji Brown Jackson
Response: The President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

UpTrain 仪表板与洞察

以下是展示仪表板和洞察功能的简短视频：

在 GitHub 上编辑此页或提交问题。

将这些文档连接到 Claude、VSCode 等，通过 MCP 获取实时解答。

Popular Providers

Integrations by component

Uptrain 集成

UpTrain 回调处理器

1. 原始 RAG

2. 多查询生成

3. 上下文压缩与重排序

安装依赖

导入库

加载文档

将文档分割为块

创建检索器

定义 LLM

设置

1. UpTrain 开源软件（OSS）

2. UpTrain 托管服务与仪表板

设置 API 密钥

1. 原始 RAG

2. 多查询生成

3. 上下文压缩与重排序

UpTrain 仪表板与洞察

Popular Providers

Integrations by component

​UpTrain 回调处理器

​1. 原始 RAG

​2. 多查询生成

​3. 上下文压缩与重排序

​安装依赖

​导入库

​加载文档

​将文档分割为块

​创建检索器

​定义 LLM

​设置

​1. UpTrain 开源软件（OSS）

​2. UpTrain 托管服务与仪表板

​设置 API 密钥

​1. 原始 RAG

​2. 多查询生成

​3. 上下文压缩与重排序

​UpTrain 仪表板与洞察

UpTrain 回调处理器

1. 原始 RAG

2. 多查询生成

3. 上下文压缩与重排序

安装依赖

导入库

加载文档

将文档分割为块

创建检索器

定义 LLM

设置

1. UpTrain 开源软件（OSS）

2. UpTrain 托管服务与仪表板

设置 API 密钥

1. 原始 RAG

2. 多查询生成

3. 上下文压缩与重排序

UpTrain 仪表板与洞察