MLX 本地 pipeline 集成

MLX 模型可通过 MLXPipeline 类在本地运行。 MLX Community 托管了 150 余个模型，全部开源并在 Hugging Face Model Hub 上公开提供，该平台方便人们轻松协作、共同构建 ML 项目。这些模型可通过本地 pipeline 包装器或 MlXPipeline 类调用其托管推理端点，从 LangChain 访问。更多关于 MLX 的信息，请参阅示例仓库 notebook。使用前，您需要安装 mlx-lm Python 包以及 transformers，也可以安装 huggingface_hub。

pip install -qU  mlx-lm transformers huggingface_hub

加载模型

可以使用 from_model_id 方法通过指定模型参数来加载模型。

from langchain_community.llms.mlx_pipeline import MLXPipeline

pipe = MLXPipeline.from_model_id(
    "mlx-community/quantized-gemma-2b-it",
    pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
)

也可以通过直接传入已有的 transformers pipeline 来加载。

from mlx_lm import load

model, tokenizer = load("mlx-community/quantized-gemma-2b-it")
pipe = MLXPipeline(model=model, tokenizer=tokenizer)

创建链

将模型加载到内存后，可将其与提示词组合形成链。

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | pipe

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

加载模型

创建链

Popular Providers

Integrations by component

​加载模型

​创建链

加载模型

创建链