Elasticsearch 是一个分布式、RESTful 搜索和分析引擎，能够执行向量和词法搜索。它构建在 Apache Lucene 库之上。

本笔记本展示了如何使用与 Elasticsearch 向量存储相关的功能。

设置

要使用 Elasticsearch 向量搜索，您必须安装 langchain-elasticsearch 包。

pip install -qU langchain-elasticsearch

凭证

有两种主要方式来设置用于使用的 Elasticsearch 实例：

Elastic Cloud：Elastic Cloud 是托管的 Elasticsearch 服务。注册免费试用。

要连接到不需要登录凭据的 Elasticsearch 实例（使用安全功能启动 Docker 实例），请将 Elasticsearch URL 和索引名称以及嵌入对象传递给构造函数。

本地安装 Elasticsearch：通过在本地运行来开始使用 Elasticsearch。最简单的方法是使用官方的 Elasticsearch Docker 镜像。有关更多信息，请参阅 Elasticsearch Docker 文档。

本地运行 Elasticsearch

在本地运行 Elasticsearch 用于开发和测试的最简单方法是使用 start-local 脚本。该脚本使用 Docker 通过简单的单行命令设置 Elasticsearch（以及可选的 Kibana）。

curl -fsSL https://elastic.co/start-local | sh

这将创建一个包含配置文件和启动脚本的 elastic-start-local 文件夹。要启动 Elasticsearch：

cd elastic-start-local
./start.sh

Elasticsearch 将在 http://localhost:9200 上可用。elastic 用户的密码和 API 密钥会自动生成并存储在 elastic-start-local 文件夹中的 .env 文件中。如果您只需要 Elasticsearch 而不需要 Kibana，可以使用 --esonly 选项：

curl -fsSL https://elastic.co/start-local | sh -s -- --esonly

start-local 设置仅用于本地测试，不应在生产环境中使用。对于生产安装，请参阅官方 Elasticsearch 文档。

使用身份验证运行

对于生产环境，我们建议您启用安全功能运行。要使用登录凭据连接，您可以使用参数 es_api_key 或 es_user 和 es_password。

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_elasticsearch import ElasticsearchStore

elastic_vector_search = ElasticsearchStore(
    es_url="http://localhost:9200",
    index_name="langchain_index",
    embedding=embeddings,
    es_user="elastic",
    es_password="changeme",
)

如何获取默认 “elastic” 用户的密码？

要获取您的 Elastic Cloud 默认 “elastic” 用户的密码：

登录到 Elastic Cloud 控制台 cloud.elastic.co
转到 “Security” > “Users”
找到 “elastic” 用户并点击 “Edit”
点击 “Reset password”
按照提示重置密码

如何获取 API 密钥？

要获取 API 密钥：

登录到 Elastic Cloud 控制台 cloud.elastic.co
打开 Kibana 并转到 Stack Management > API Keys
点击 “Create API key”
输入 API 密钥的名称并点击 “Create”
复制 API 密钥并将其粘贴到 api_key 参数中

Elastic cloud

要连接到 Elastic Cloud 上的 Elasticsearch 实例，您可以使用 es_cloud_id 参数或 es_url。

elastic_vector_search = ElasticsearchStore(
    es_cloud_id="<cloud_id>",
    index_name="test_index",
    embedding=embeddings,
    es_user="elastic",
    es_password="changeme",
)

如果您想获得最佳的模型调用自动跟踪，您还可以通过取消注释下方来设置您的 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

初始化

Elasticsearch 在本地 localhost:9200 上运行，使用 docker。有关如何从 Elastic Cloud 连接到 Elasticsearch 的更多详细信息，请参阅上方的使用身份验证连接。

from langchain_elasticsearch import ElasticsearchStore

vector_store = ElasticsearchStore(
    "langchain-demo", embedding=embeddings, es_url="http://localhost:9201"
)

管理向量存储

向向量存储添加项目

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['21cca03c-9089-42d2-b41c-3d156be2b519',
 'a6ceb967-b552-4802-bb06-c0e95fce386e',
 '3a35fac4-e5f0-493b-bee0-9143b41aedae',
 '176da099-66b1-4d6a-811b-dfdfe0808d30',
 'ecfa1a30-3c97-408b-80c0-5c43d68bf5ff',
 'c0f08baa-e70b-4f83-b387-c6e0a0f36f73',
 '489b2c9c-1925-43e1-bcf0-0fa94cf1cbc4',
 '408c6503-9ba4-49fd-b1cc-95584cd914c5',
 '5248c899-16d5-4377-a9e9-736ca443ad4f',
 'ca182769-c4fc-4e25-8f0a-8dd0a525955c']

从向量存储删除项目

vector_store.delete(ids=[uuids[-1]])

True

查询向量存储

一旦您的向量存储已创建并添加了相关文档，您很可能希望在链或代理运行期间查询它。这些示例还展示了如何在搜索时使用过滤。

直接查询

相似性搜索

执行简单的相似性搜索并对元数据进行过滤，可以如下进行：

results = vector_store.similarity_search(
    query="LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

带分数的相似性搜索

如果您想执行相似性搜索并接收相应的分数，可以运行：

results = vector_store.similarity_search_with_score(
    query="Will it be hot tomorrow",
    k=1,
    filter=[{"term": {"metadata.source.keyword": "news"}}],
)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.765887] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

通过转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更轻松地使用。

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.2}
)
retriever.invoke("Stealing from the bank is a crime")

[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.'),
 Document(metadata={'source': 'news'}, page_content='The stock market is down 500 points today due to fears of a recession.'),
 Document(metadata={'source': 'website'}, page_content='Is the new iPhone worth the price? Read this review to find out.'),
 Document(metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')]

距离相似性算法

Elasticsearch 支持以下向量距离相似性算法：

cosine
euclidean
dot_product

余弦相似性算法是默认算法。您可以通过 similarity 参数指定所需的相似性算法。注意：根据检索策略，相似性算法无法在查询时更改。它需要在为字段创建索引映射时设置。如果您需要更改相似性算法，您需要删除索引并使用正确的 distance_strategy 重新创建它。

db = ElasticsearchStore.from_documents(
    docs,
    embeddings,
    es_url="http://localhost:9200",
    index_name="test",
    distance_strategy="COSINE",
    # distance_strategy="EUCLIDEAN_DISTANCE"
    # distance_strategy="DOT_PRODUCT"
)

检索策略

Elasticsearch 相对于其他纯向量数据库具有巨大优势，因为它能够支持广泛的检索策略。在本笔记本中，我们将配置 ElasticsearchStore 以支持一些最常见的检索策略。默认情况下，ElasticsearchStore 使用 DenseVectorStrategy（在 0.2.0 版本之前称为 ApproxRetrievalStrategy）。

DenseVectorStrategy

这将返回与查询向量最相似的前 k 个向量。k 参数在初始化 ElasticsearchStore 时设置。默认值为 10。

from langchain_elasticsearch import DenseVectorStrategy

db = ElasticsearchStore.from_documents(
    docs,
    embeddings,
    es_url="http://localhost:9200",
    index_name="test",
    strategy=DenseVectorStrategy(),
)

docs = db.similarity_search(
    query="What did the president say about Ketanji Brown Jackson?", k=10
)

示例：使用密集向量和关键词搜索的混合检索

此示例将展示如何配置 ElasticsearchStore 以执行混合检索，结合近似语义搜索和基于关键词的搜索。我们使用 RRF 来平衡来自不同检索方法的两个分数。要启用混合检索，我们需要在 DenseVectorStrategy 构造函数中设置 hybrid=True。

db = ElasticsearchStore.from_documents(
    docs,
    embeddings,
    es_url="http://localhost:9200",
    index_name="test",
    strategy=DenseVectorStrategy(hybrid=True),
)

当启用混合时，执行的查询将是近似语义搜索和基于关键词的搜索的组合。它将使用 rrf（互惠排名融合）来平衡来自不同检索方法的两个分数。注意：RRF 需要 Elasticsearch 8.9.0 或更高版本。

{
    "retriever": {
        "rrf": {
            "retrievers": [
                {
                    "standard": {
                        "query": {
                            "bool": {
                                "filter": [],
                                "must": [{"match": {"text": {"query": "foo"}}}],
                            }
                        },
                    },
                },
                {
                    "knn": {
                        "field": "vector",
                        "filter": [],
                        "k": 1,
                        "num_candidates": 50,
                        "query_vector": [1.0, ..., 0.0],
                    },
                },
            ]
        }
    }
}

示例：在 Elasticsearch 中使用嵌入模型进行密集向量搜索

此示例将展示如何配置 ElasticsearchStore 以使用部署在 Elasticsearch 中的嵌入模型进行密集向量检索。要使用此功能，请通过 query_model_id 参数在 DenseVectorStrategy 构造函数中指定 model_id。注意：这要求模型已部署并在 Elasticsearch ML 节点中运行。有关如何使用 eland 部署模型的示例，请参阅笔记本示例。

DENSE_SELF_DEPLOYED_INDEX_NAME = "test-dense-self-deployed"

# 注意：这没有指定嵌入函数
# 相反，我们将使用部署在 Elasticsearch 中的嵌入模型
db = ElasticsearchStore(
    es_cloud_id="<your cloud id>",
    es_user="elastic",
    es_password="<your password>",
    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,
    query_field="text_field",
    vector_query_field="vector_query_field.predicted_value",
    strategy=DenseVectorStrategy(model_id="sentence-transformers__all-minilm-l6-v2"),
)

# 设置一个 Ingest Pipeline 来执行文本字段的嵌入
db.client.ingest.put_pipeline(
    id="test_pipeline",
    processors=[
        {
            "inference": {
                "model_id": "sentence-transformers__all-minilm-l6-v2",
                "field_map": {"query_field": "text_field"},
                "target_field": "vector_query_field",
            }
        }
    ],
)

# 使用 pipeline 创建新索引，
# 不依赖 langchain 创建索引
db.client.indices.create(
    index=DENSE_SELF_DEPLOYED_INDEX_NAME,
    mappings={
        "properties": {
            "text_field": {"type": "text"},
            "vector_query_field": {
                "properties": {
                    "predicted_value": {
                        "type": "dense_vector",
                        "dims": 384,
                        "index": True,
                        "similarity": "l2_norm",
                    }
                }
            },
        }
    },
    settings={"index": {"default_pipeline": "test_pipeline"}},
)

db.from_texts(
    ["hello world"],
    es_cloud_id="<cloud id>",
    es_user="elastic",
    es_password="<cloud password>",
    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,
    query_field="text_field",
    vector_query_field="vector_query_field.predicted_value",
    strategy=DenseVectorStrategy(model_id="sentence-transformers__all-minilm-l6-v2"),
)

# 执行搜索
db.similarity_search("hello world", k=10)

SparseVectorStrategy (ELSER)

此策略使用 Elasticsearch 的稀疏向量检索来检索前 k 个结果。我们目前只支持我们自己的 “ELSER” 嵌入模型。注意：这要求 ELSER 模型已部署并在 Elasticsearch ml 节点中运行。要使用此功能，请在 ElasticsearchStore 构造函数中指定 SparseVectorStrategy（在 0.2.0 版本之前称为 SparseVectorRetrievalStrategy）。您需要提供模型 ID。

from langchain_elasticsearch import SparseVectorStrategy

# 注意，此示例没有嵌入函数。这是因为我们在索引时和查询时在 Elasticsearch 内部推断标记。
# 这要求 ELSER 模型已加载并在 Elasticsearch 中运行。
db = ElasticsearchStore.from_documents(
    docs,
    es_cloud_id="<cloud id>",
    es_user="elastic",
    es_password="<cloud password>",
    index_name="test-elser",
    strategy=SparseVectorStrategy(model_id=".elser_model_2"),
)

db.client.indices.refresh(index="test-elser")

results = db.similarity_search(
    "What did the president say about Ketanji Brown Jackson", k=4
)
print(results[0])

DenseVectorScriptScoreStrategy

此策略使用 Elasticsearch 的脚本评分查询来执行精确向量检索（也称为暴力检索）以检索前 k 个结果。（此策略在 0.2.0 版本之前称为 ExactRetrievalStrategy。）要使用此功能，请在 ElasticsearchStore 构造函数中指定 DenseVectorScriptScoreStrategy。

from langchain_elasticsearch import SparseVectorStrategy

db = ElasticsearchStore.from_documents(
    docs,
    embeddings,
    es_url="http://localhost:9200",
    index_name="test",
    strategy=DenseVectorScriptScoreStrategy(),
)

BM25Strategy

最后，您可以使用全文关键词搜索。要使用此功能，请在 ElasticsearchStore 构造函数中指定 BM25Strategy。

from langchain_elasticsearch import BM25Strategy

db = ElasticsearchStore.from_documents(
    docs,
    es_url="http://localhost:9200",
    index_name="test",
    strategy=BM25Strategy(),
)

BM25RetrievalStrategy

此策略允许用户使用纯 BM25 执行搜索，而无需向量搜索。要使用此功能，请在 ElasticsearchStore 构造函数中指定 BM25RetrievalStrategy。请注意，在下面的示例中，未指定嵌入选项，表示搜索是在不使用嵌入的情况下进行的。

from langchain_elasticsearch import ElasticsearchStore

db = ElasticsearchStore(
    es_url="http://localhost:9200",
    index_name="test_index",
    strategy=ElasticsearchStore.BM25RetrievalStrategy(),
)

db.add_texts(
    ["foo", "foo bar", "foo bar baz", "bar", "bar baz", "baz"],
)

results = db.similarity_search(query="foo", k=10)
print(results)

自定义查询

在搜索时使用 custom_query 参数，您可以调整用于从 Elasticsearch 检索文档的查询。如果您想使用更复杂的查询来支持字段的线性提升，这很有用。

# 一个自定义查询的示例，仅对文本字段执行 BM25 搜索。
def custom_query(query_body: dict, query: str):
    """要在 Elasticsearch 中使用的自定义查询。
    Args:
        query_body (dict): Elasticsearch 查询主体。
        query (str): 查询字符串。
    Returns:
        dict: Elasticsearch 查询主体。
    """
    print("检索策略创建的查询检索器：")
    print(query_body)
    print()

    new_query_body = {"query": {"match": {"text": query}}}

    print("实际在 Elasticsearch 中使用的查询：")
    print(new_query_body)
    print()

    return new_query_body


results = db.similarity_search(
    "What did the president say about Ketanji Brown Jackson",
    k=4,
    custom_query=custom_query,
)
print("Results:")
print(results[0])

自定义文档构建器

在搜索时使用 doc_builder 参数，您可以调整如何使用从 Elasticsearch 检索的数据构建 Document。如果您有未使用 LangChain 创建的索引，这尤其有用。

from typing import Dict

from langchain_core.documents import Document


def custom_document_builder(hit: Dict) -> Document:
    src = hit.get("_source", {})
    return Document(
        page_content=src.get("content", "Missing content!"),
        metadata={
            "page_number": src.get("page_number", -1),
            "original_filename": src.get("original_filename", "Missing filename!"),
        },
    )


results = db.similarity_search(
    "What did the president say about Ketanji Brown Jackson",
    k=4,
    doc_builder=custom_document_builder,
)
print("Results:")
print(results[0])

用于检索增强生成的用法

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

常见问题解答

问题：在将文档索引到 Elasticsearch 时出现超时错误。如何解决？

一个可能的问题是您的文档可能需要更长时间才能索引到 Elasticsearch。ElasticsearchStore 使用 Elasticsearch 批量 API，该 API 有一些默认值，您可以调整以减少超时错误的机会。当您使用 SparseVectorRetrievalStrategy 时，这也是一个好主意。默认值为：

chunk_size: 500
max_chunk_bytes: 100MB

要调整这些值，您可以将 chunk_size 和 max_chunk_bytes 参数传递给 ElasticsearchStore 的 add_texts 方法。

    vector_store.add_texts(
        texts,
        bulk_kwargs={
            "chunk_size": 50,
            "max_chunk_bytes": 200000000
        }
    )

升级到 ElasticsearchStore

如果您已经在基于 Langchain 的项目中使用 Elasticsearch，您可能正在使用旧的实现：ElasticVectorSearch 和 ElasticKNNSearch，这些现在已弃用。我们引入了一个名为 ElasticsearchStore 的新实现，它更灵活且更易于使用。本笔记本将指导您完成升级到新实现的过程。

有什么新内容？

新实现现在是一个名为 ElasticsearchStore 的类，可以通过策略用于近似密集向量、精确密集向量、稀疏向量 (ELSER)、BM25 检索和混合检索。

我正在使用 ElasticKNNSearch

旧实现：

from langchain_community.vectorstores.elastic_vector_search import ElasticKNNSearch

db = ElasticKNNSearch(
  elasticsearch_url="http://localhost:9200",
  index_name="test_index",
  embedding=embedding
)

新实现：

from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy

db = ElasticsearchStore(
  es_url="http://localhost:9200",
  index_name="test_index",
  embedding=embedding,
  # if you use the model_id
  # strategy=DenseVectorStrategy(model_id="test_model")
  # if you use hybrid search
  # strategy=DenseVectorStrategy(hybrid=True)
)

我正在使用 ElasticVectorSearch

旧实现：

from langchain_community.vectorstores.elastic_vector_search import ElasticVectorSearch

db = ElasticVectorSearch(
  elasticsearch_url="http://localhost:9200",
  index_name="test_index",
  embedding=embedding
)

新实现：

from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy

db = ElasticsearchStore(
  es_url="http://localhost:9200",
  index_name="test_index",
  embedding=embedding,
  strategy=DenseVectorScriptScoreStrategy()
)

db.client.indices.delete(
    index="test-metadata, test-elser, test-basic",
    ignore_unavailable=True,
    allow_no_indices=True,
)

API 参考

有关所有 ElasticSearchStore 功能和配置的详细文档，请前往 API 参考

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等以获取实时答案。

Popular Providers

Integrations by component

Elasticsearch 集成

设置

凭证

本地运行 Elasticsearch

使用身份验证运行

如何获取默认 “elastic” 用户的密码？

如何获取 API 密钥？

Elastic cloud

初始化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

相似性搜索

带分数的相似性搜索

通过转换为检索器进行查询

距离相似性算法

检索策略

DenseVectorStrategy

示例：使用密集向量和关键词搜索的混合检索

示例：在 Elasticsearch 中使用嵌入模型进行密集向量搜索

SparseVectorStrategy (ELSER)

DenseVectorScriptScoreStrategy

BM25Strategy

BM25RetrievalStrategy

自定义查询

自定义文档构建器

用于检索增强生成的用法

常见问题解答

问题：在将文档索引到 Elasticsearch 时出现超时错误。如何解决？

升级到 ElasticsearchStore

有什么新内容？

我正在使用 ElasticKNNSearch

我正在使用 ElasticVectorSearch

API 参考

Popular Providers

Integrations by component

​设置

​凭证

​本地运行 Elasticsearch

​使用身份验证运行

​如何获取默认 “elastic” 用户的密码？

​如何获取 API 密钥？

​Elastic cloud

​初始化

​管理向量存储

​向向量存储添加项目

​从向量存储删除项目

​查询向量存储

​直接查询

​相似性搜索

​带分数的相似性搜索

​通过转换为检索器进行查询

​距离相似性算法

​检索策略

​DenseVectorStrategy

​示例：使用密集向量和关键词搜索的混合检索

​示例：在 Elasticsearch 中使用嵌入模型进行密集向量搜索

​SparseVectorStrategy (ELSER)

​DenseVectorScriptScoreStrategy

​BM25Strategy

​BM25RetrievalStrategy

​自定义查询

​自定义文档构建器

​用于检索增强生成的用法

​常见问题解答

​问题：在将文档索引到 Elasticsearch 时出现超时错误。如何解决？

​升级到 ElasticsearchStore

​有什么新内容？

​我正在使用 ElasticKNNSearch

​我正在使用 ElasticVectorSearch

​API 参考

设置

凭证

本地运行 Elasticsearch

使用身份验证运行

如何获取默认 “elastic” 用户的密码？

如何获取 API 密钥？

Elastic cloud

初始化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

相似性搜索

带分数的相似性搜索

通过转换为检索器进行查询

距离相似性算法

检索策略

DenseVectorStrategy

示例：使用密集向量和关键词搜索的混合检索

示例：在 Elasticsearch 中使用嵌入模型进行密集向量搜索

SparseVectorStrategy (ELSER)

DenseVectorScriptScoreStrategy

BM25Strategy

BM25RetrievalStrategy

自定义查询

自定义文档构建器

用于检索增强生成的用法

常见问题解答

问题：在将文档索引到 Elasticsearch 时出现超时错误。如何解决？

升级到 ElasticsearchStore

有什么新内容？

我正在使用 ElasticKNNSearch

我正在使用 ElasticVectorSearch

API 参考