YDB 集成

YDB is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.

本 notebook 展示了如何使用与 YDB 向量存储相关的功能。

配置

首先，使用 Docker 设置本地 YDB：

! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk

使用本集成需要安装 langchain-ydb

! pip install -qU langchain-ydb

凭证

本 notebook 无需凭证，只需确保已按上面所示安装了包。如果您想获得一流的模型调用自动追踪，可以通过取消注释以下内容来设置您的 LangSmith API key：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

初始化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

/Users/ovcharuk/opensource/langchain/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings

settings = YDBSettings(
    table="ydb_example",
    strategy=YDBSearchStrategy.COSINE_SIMILARITY,
)
vector_store = YDB(embeddings, config=settings)

管理向量存储

创建向量存储后，您可以通过添加和删除不同条目与其交互。

向向量存储添加条目

准备要处理的文档：

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

您可以使用 add_documents 函数向向量存储添加条目。

vector_store.add_documents(documents=documents, ids=uuids)

Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]

['947be6aa-d489-44c5-910e-62e4d58d2ffb',
 '7a62904d-9db3-412b-83b6-f01b34dd7de3',
 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',
 '99cf4104-36ab-4bd5-b0da-e210d260e512',
 '5810bcd0-b46e-443e-a663-e888c9e028d1',
 '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',
 'f8912944-f80a-4178-954e-4595bf59e341',
 '34fc7b09-6000-42c9-95f7-7d49f430b904',
 '0f6b6783-f300-4a4d-bb04-8025c4dfd409',
 '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']

从向量存储删除条目

您可以使用 delete 函数按 ID 从向量存储删除条目。

vector_store.delete(ids=[uuids[-1]])

True

查询向量存储

创建向量存储并添加相关文档后，您很可能希望在链或代理执行期间对其进行查询。

直接查询

相似性搜索

简单的相似性搜索如下：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k=2
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

带分数的相似性搜索

您也可以执行带分数的搜索：

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=3)
for res, score in results:
    print(f"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]
* [SIM=0.212] I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]

过滤

您可以使用如下过滤条件进行搜索：

results = vector_store.similarity_search_with_score(
    "What did I eat for breakfast?",
    k=4,
    filter={"source": "tweet"},
)
for res, _ in results:
    print(f"* {res.page_content} [{res.metadata}]")

* I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

转换为检索器后查询

您也可以将向量存储转换为检索器，以便在链中更轻松地使用。以下是将向量存储转换为检索器，然后使用简单查询和过滤器调用检索器的方法。

retriever = vector_store.as_retriever(
    search_kwargs={"k": 2},
)
results = retriever.invoke(
    "Stealing from the bank is a crime", filter={"source": "news"}
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]
* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]

用于检索增强生成

有关如何将此向量存储用于检索增强生成（RAG）的指南，请参阅以下内容：

API 参考

有关所有 YDB 功能和配置的详细文档，请前往 API 参考：python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等，获取实时答案。

Popular Providers

Integrations by component

配置

凭证

初始化

管理向量存储

向向量存储添加条目

从向量存储删除条目

查询向量存储

直接查询

相似性搜索

带分数的相似性搜索

过滤

转换为检索器后查询

用于检索增强生成

API 参考

Popular Providers

Integrations by component

​配置

​凭证

​初始化

​管理向量存储

​向向量存储添加条目

​从向量存储删除条目

​查询向量存储

​直接查询

​相似性搜索

​带分数的相似性搜索

​过滤

​转换为检索器后查询

​用于检索增强生成

​API 参考

配置

凭证

初始化

管理向量存储

向向量存储添加条目

从向量存储删除条目

查询向量存储

直接查询

相似性搜索

带分数的相似性搜索

过滤

转换为检索器后查询

用于检索增强生成

API 参考