Runpod 集成

开始使用 RunPod 聊天模型。

概述

本指南介绍如何使用 LangChain 的 ChatRunPod 类与托管在 RunPod Serverless 上的聊天模型进行交互。

设置

安装包：
```
pip install -qU langchain-runpod
```
部署聊天模型端点： 按照 RunPod 提供商指南中的设置步骤，在 RunPod Serverless 上部署兼容的聊天模型端点并获取其端点 ID。
设置环境变量： 确保已设置 RUNPOD_API_KEY 和 RUNPOD_ENDPOINT_ID（或特定的 RUNPOD_CHAT_ENDPOINT_ID）。

import getpass
import os

# Make sure environment variables are set (or pass them directly to ChatRunPod)
if "RUNPOD_API_KEY" not in os.environ:
    os.environ["RUNPOD_API_KEY"] = getpass.getpass("Enter your RunPod API Key: ")

if "RUNPOD_ENDPOINT_ID" not in os.environ:
    os.environ["RUNPOD_ENDPOINT_ID"] = input(
        "Enter your RunPod Endpoint ID (used if RUNPOD_CHAT_ENDPOINT_ID is not set): "
    )

# Optionally use a different endpoint ID specifically for chat models
# if "RUNPOD_CHAT_ENDPOINT_ID" not in os.environ:
#     os.environ["RUNPOD_CHAT_ENDPOINT_ID"] = input("Enter your RunPod Chat Endpoint ID (Optional): ")

chat_endpoint_id = os.environ.get(
    "RUNPOD_CHAT_ENDPOINT_ID", os.environ.get("RUNPOD_ENDPOINT_ID")
)
if not chat_endpoint_id:
    raise ValueError(
        "No RunPod Endpoint ID found. Please set RUNPOD_ENDPOINT_ID or RUNPOD_CHAT_ENDPOINT_ID."
    )

实例化

初始化 ChatRunPod 类。您可以通过 model_kwargs 传入模型特定参数，并配置轮询行为。

from langchain_runpod import ChatRunPod

chat = ChatRunPod(
    runpod_endpoint_id=chat_endpoint_id,  # Specify the correct endpoint ID
    model_kwargs={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.9,
        # Add other parameters supported by your endpoint handler
    },
    # Optional: Adjust polling
    # poll_interval=0.2,
    # max_polling_attempts=150
)

调用

使用标准 LangChain 的 .invoke() 和 .ainvoke() 方法调用模型。还支持通过 .stream() 和 .astream() 进行流式输出（通过轮询 RunPod /stream 端点模拟）。

from langchain.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What is the RunPod Serverless API flow?"),
]

# Invoke (Sync)
try:
    response = chat.invoke(messages)
    print("--- Sync Invoke Response ---")
    print(response.content)
except Exception as e:
    print(
        f"Error invoking Chat Model: {e}. Ensure endpoint ID/API key are correct and endpoint is active/compatible."
    )

# Stream (Sync, simulated via polling /stream)
print("\n--- Sync Stream Response ---")
try:
    for chunk in chat.stream(messages):
        print(chunk.content, end="", flush=True)
    print()  # Newline
except Exception as e:
    print(
        f"\nError streaming Chat Model: {e}. Ensure endpoint handler supports streaming output format."
    )

### Async Usage

# AInvoke (Async)
try:
    async_response = await chat.ainvoke(messages)
    print("--- Async Invoke Response ---")
    print(async_response.content)
except Exception as e:
    print(f"Error invoking Chat Model asynchronously: {e}.")

# AStream (Async)
print("\n--- Async Stream Response ---")
try:
    async for chunk in chat.astream(messages):
        print(chunk.content, end="", flush=True)
    print()  # Newline
except Exception as e:
    print(
        f"\nError streaming Chat Model asynchronously: {e}. Ensure endpoint handler supports streaming output format.\n"
    )

链式调用

聊天模型可与 LangChain 表达式语言（LCEL）链无缝集成。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        ("human", "{input}"),
    ]
)

parser = StrOutputParser()

chain = prompt | chat | parser

try:
    chain_response = chain.invoke(
        {"input": "Explain the concept of serverless computing in simple terms."}
    )
    print("--- Chain Response ---")
    print(chain_response)
except Exception as e:
    print(f"Error running chain: {e}")


# Async chain
try:
    async_chain_response = await chain.ainvoke(
        {"input": "What are the benefits of using RunPod for AI/ML workloads?"}
    )
    print("--- Async Chain Response ---")
    print(async_chain_response)
except Exception as e:
    print(f"Error running async chain: {e}")

模型功能（取决于端点）

高级功能的可用性严重依赖于您 RunPod 端点处理程序的具体实现。ChatRunPod 集成提供基本框架，但处理程序必须支持底层功能。

功能	集成支持	依赖端点？	说明
工具调用	❌	✅	需要处理程序处理工具定义并返回工具调用（如 OpenAI 格式）。集成需要解析逻辑。
结构化输出	❌	✅
❌	✅	需要处理程序接受 `json_mode` 参数（或类似参数）并保证 JSON 输出。
❌	✅	需要接受图像数据（如 base64）的多模态处理程序。集成不支持多模态消息。
❌	✅	需要接受音频数据的处理程序。集成不支持音频消息。
❌	✅	需要接受视频数据的处理程序。集成不支持视频消息。
✅ (模拟)	✅	轮询 `/stream`。需要处理程序在状态响应的 `stream` 列表中填充 token 块（如 `[{"output": "token"}]`）。未内置真正的低延迟流式传输。
✅	✅	核心 `ainvoke`/`astream` 已实现。依赖端点处理程序性能。
❌	✅	需要处理程序在最终响应中返回 `prompt_tokens`、`completion_tokens`。集成目前不解析此项。
❌	✅	需要处理程序返回对数概率。集成目前不解析此项。

关键要点： 如果端点遵循基本 RunPod API 约定，标准聊天调用和模拟流式传输均可正常工作。高级功能需要特定的处理程序实现，并可能需要扩展或自定义此集成包。

API 参考

有关 ChatRunPod 类、参数和方法的详细文档，请参阅源代码或生成的 API 参考（如可用）。源代码链接：https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/chat_models.py

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等以获得实时解答。

Popular Providers

Integrations by component

概述

设置

实例化

调用

链式调用

模型功能（取决于端点）

API 参考

Popular Providers

Integrations by component

​概述

​设置

​实例化

​调用

​链式调用

​模型功能（取决于端点）

​API 参考

概述

设置

实例化

调用

链式调用

模型功能（取决于端点）

API 参考