Xorbits Inference (Xinference) 集成

本页演示如何在 LangChain 中使用 Xinference。 Xinference 是一个功能强大且灵活的库，专为在笔记本电脑上部署 LLM、语音识别模型和多模态模型而设计。通过 Xorbits Inference，只需一条命令即可轻松部署并提供你的模型或最先进的内置模型。

安装与设置

可通过 PyPI 使用 pip 安装 Xinference：

pip install "xinference[all]"

LLM

Xinference 支持与 GGML 兼容的各种模型，包括 chatglm、baichuan、whisper、vicuna 和 orca。要查看内置模型，运行以下命令：

xinference list --all

Xinference 封装

可以通过以下命令启动本地 Xinference 实例：

xinference

你还可以在分布式集群中部署 Xinference。为此，首先在你想运行它的服务器上启动 Xinference supervisor：

xinference-supervisor -H "${supervisor_host}"

然后，在你想运行的每台其他服务器上启动 Xinference worker：

xinference-worker -e "http://${supervisor_host}:9997"

你也可以通过以下命令启动本地 Xinference 实例：

xinference

Xinference 运行后，可通过 CLI 或 Xinference 客户端访问模型管理端点。本地部署时，端点为 http://localhost:9997。集群部署时，端点为 http://${supervisor_host}:9997。然后，你需要启动一个模型。可以指定模型名称和其他属性，包括 model_size_in_billions 和 quantization。可以使用命令行界面（CLI）实现，例如：

xinference launch -n orca -s 3 -q q4_0

将返回一个模型 UID。使用示例：

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # 将 model_uid 替换为启动模型时返回的模型 UID
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

使用方法

有关更多信息和详细示例，请参阅 xinference LLM 示例

嵌入

Xinference 还支持嵌入查询和文档。查看 xinference 嵌入示例以获取更详细的演示。

安装 Xinference LangChain 合作伙伴包

通过以下方式安装集成包：

pip install langchain-xinference

聊天模型

from langchain_xinference.chat_models import ChatXinference

LLM

from langchain_xinference.llms import Xinference

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等以获取实时答案。

Popular Providers

Integrations by component

安装与设置

LLM

Xinference 封装

使用方法

嵌入

安装 Xinference LangChain 合作伙伴包

聊天模型

LLM

Popular Providers

Integrations by component

​安装与设置

​LLM

​Xinference 封装

​使用方法

​嵌入

​安装 Xinference LangChain 合作伙伴包

​聊天模型

​LLM

安装与设置

LLM

Xinference 封装

使用方法

嵌入

安装 Xinference LangChain 合作伙伴包

聊天模型

LLM