NanoPQ（乘积量化）集成

乘积量化算法（k-NN）简而言之是一种量化算法，有助于压缩数据库中的向量，在涉及大型数据集的语义搜索中非常有用。简单来说，嵌入向量被分割为 M 个子空间，每个子空间进行聚类。对向量聚类后，质心向量会映射到每个子空间聚类中的向量。

本 notebook 介绍如何使用底层基于乘积量化的检索器，该量化算法由 nanopq 包实现。

pip install -qU langchain-community langchain-openai nanopq

from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.retrievers import NanoPQRetriever

从文本创建新检索器

retriever = NanoPQRetriever.from_texts(
    ["Great world", "great words", "world", "planets of the world"],
    SpacyEmbeddings(model_name="en_core_web_sm"),
    clusters=2,
    subspace=2,
)

使用检索器

现在可以使用检索器了！

retriever.invoke("earth")

M: 2, Ks: 2, metric : <class 'numpy.uint8'>, code_dtype: l2
iter: 20, seed: 123
Training the subspace: 0 / 2
Training the subspace: 1 / 2
Encoding the subspace: 0 / 2
Encoding the subspace: 1 / 2

[Document(page_content='world'),
 Document(page_content='Great world'),
 Document(page_content='great words'),
 Document(page_content='planets of the world')]

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等工具，获取实时答案。

Popular Providers

Integrations by component

从文本创建新检索器

使用检索器

Popular Providers

Integrations by component

​从文本创建新检索器

​使用检索器

从文本创建新检索器

使用检索器