PGVector 集成 - Docs by LangChain

一个使用 postgres 作为后端并利用 pgvector 扩展的 LangChain 向量存储抽象实现。

代码位于名为 langchain-postgres 的集成包中。

状态

此代码已从 langchain-community 移植到一个名为 langchain-postgres 的专用包中。已进行以下更改：

langchain-postgres 仅适用于 psycopg3。请将您的连接字符串从 postgresql+psycopg2://... 更新为 postgresql+psycopg://langchain:langchain@...（是的，驱动程序名称是 psycopg 而不是 psycopg3，但它将使用 psycopg3）。
嵌入存储和集合的架构已更改，以使 add_documents 能够与用户指定的 ID 正确配合工作。
现在必须传递一个显式的连接对象。

目前，没有机制支持在架构更改时轻松迁移数据。向量存储中的任何架构更改都将要求用户重新创建表并重新添加文档。如果这是一个问题，请使用其他向量存储。如果不是，此实现应该适合您的用例。

设置

首先下载合作伙伴包：

pip install -qU langchain-postgres

您可以运行以下命令来启动一个带有 pgvector 扩展的 postgres 容器：

%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

凭证

运行此笔记本不需要任何凭证，只需确保您已下载 langchain-postgres 包并正确启动了 postgres 容器。如果您想获得一流的模型调用自动跟踪，也可以通过取消注释以下内容来设置您的 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

实例化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_postgres import PGVector

# 参见上面的 docker 命令以启动启用了 pgvector 的 postgres 实例。
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # 使用 psycopg3！
collection_name = "my_docs"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

管理向量存储

向向量存储添加项目

请注意，通过 ID 添加文档将覆盖任何与该 ID 匹配的现有文档。

from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": 2, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": 3, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": 4, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": 5, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": 6, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": 7, "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": 8, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": 9, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": 10, "location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

从向量存储中删除项目

vector_store.delete(ids=["3"])

查询向量存储

一旦您的向量存储已创建并且相关文档已添加，您很可能希望在链或代理运行期间对其进行查询。

过滤支持

该向量存储支持一组过滤器，可以应用于文档的元数据字段。

运算符	含义/类别
$eq	等于 (==)
$ne	不等于 (!=)
$lt	小于 (<)
$lte	小于或等于 (<=)
$gt	大于 (>)
$gte	大于或等于 (>=)
$in	特殊情况 (in)
$nin	特殊情况 (not in)
$between	特殊情况 (between)
$like	文本 (like)
$ilike	文本 (不区分大小写的 like)
$and	逻辑 (and)
$or	逻辑 (or)

直接查询

执行简单的相似性搜索可以按如下方式进行：

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

如果您提供一个包含多个字段但没有运算符的字典，顶层将被解释为逻辑 AND 过滤器

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"location": {"$in": ["pond", "market"]}},
        ]
    },
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

如果您想执行相似性搜索并接收相应的分数，可以运行：

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

有关可以在 PGVector 向量存储上执行的不同搜索的完整列表，请参阅 API 参考。

通过转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更轻松地使用。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

用于检索增强生成

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

API 参考

有关所有 PGVector VectorStore 功能和配置的详细文档，请访问 API 参考

将这些文档通过 MCP 连接到 Claude、VSCode 等，以获取实时答案。

在 GitHub 上编辑此页面或提交问题。

​状态

​设置

​凭证

​实例化

​管理向量存储

​向向量存储添加项目

​从向量存储中删除项目

​查询向量存储

​过滤支持

​直接查询

​通过转换为检索器进行查询

​用于检索增强生成

​API 参考

状态

设置

凭证

实例化

管理向量存储

向向量存储添加项目

从向量存储中删除项目

查询向量存储

过滤支持

直接查询

通过转换为检索器进行查询

用于检索增强生成

API 参考