Couchbase 集成

Couchbase 是一个分布式 NoSQL 数据库，适用于跨云、移动和边缘部署的工作负载。它支持向量搜索，适用于需要相似性搜索以及键值和 JSON 文档访问的应用程序。 Couchbase 为 LangChain 提供了两种不同的向量存储实现：

向量存储	索引类型	最低版本	最适用于
`CouchbaseQueryVectorStore`	超大规模向量索引或复合向量索引	Couchbase Server 8.0+	大规模纯向量搜索或结合向量相似性与标量过滤器的搜索
`CouchbaseSearchVectorStore`	搜索向量索引	Couchbase Server 7.6+	结合向量相似性与全文搜索 (FTS) 和地理空间搜索的混合搜索

本教程介绍如何在 Couchbase 中使用向量搜索。您可以使用 Couchbase Capella 或自管理的 Couchbase Server。

设置

要访问 Couchbase 向量存储，您首先需要安装 langchain-couchbase 合作伙伴包：

pip install langchain-couchbase langchain-openai langchain-community

凭据

前往 Couchbase 网站并创建一个新连接，确保保存您的数据库用户名和密码。您还需要一个 OpenAI API 密钥用于嵌入。从 OpenAI 获取一个。

import getpass
import os

COUCHBASE_CONNECTION_STRING = getpass.getpass(
    "Enter the connection string for the Couchbase cluster: "
)
DB_USERNAME = getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD = getpass.getpass("Enter the password for the Couchbase cluster: ")
OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Enter the connection string for the Couchbase cluster:  ········
Enter the username for the Couchbase cluster:  ········
Enter the password for the Couchbase cluster:  ········
Enter your OpenAI API key:  ········

如果您想获得一流的模型调用自动跟踪，也可以通过取消注释下方内容来设置您的 LangSmith API 密钥：

os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

创建 Couchbase 连接对象

我们首先创建一个到 Couchbase 集群的连接，然后将集群对象传递给向量存储。这里，我们使用上面的用户名和密码进行连接。您也可以使用任何其他支持的方式连接到集群。有关连接到 Couchbase 集群的更多信息，请查看文档。

from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
options.apply_profile("wan_development")
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))

我们现在将设置要用于向量搜索的 Couchbase 集群中的桶、作用域和集合名称。在此示例中，我们使用默认的作用域和集合。

BUCKET_NAME = "langchain_bucket"
SCOPE_NAME = "_default"
COLLECTION_NAME = "_default"

CouchbaseQueryVectorStore

CouchbaseQueryVectorStore 允许使用查询和索引服务将 Couchbase 用于向量搜索。它支持两种不同类型的向量索引：

超大规模向量索引 - 针对大型数据集（数十亿文档）上的纯向量搜索进行了优化。最适合内容发现、推荐以及需要高精度且低内存占用的应用程序。超大规模向量索引同时比较向量和标量值。
复合向量索引 - 将全局二级索引 (GSI) 与向量列相结合。非常适合结合向量相似性与标量过滤器的搜索，其中标量过滤掉大部分数据集。复合向量索引首先应用标量过滤器，然后对过滤后的结果执行向量搜索。

有关选择正确索引类型的指导，请参阅选择正确的向量索引。 要求： Couchbase Server 版本 8.0 及以上。有关索引的更多信息，请参阅：

初始化

下面，我们使用集群信息和距离度量创建向量存储对象。首先，设置嵌入（如果尚未设置）：

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

然后创建向量存储：

from langchain_couchbase import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy

vector_store = CouchbaseQueryVectorStore(
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    distance_metric=DistanceStrategy.DOT,
)

距离策略

CouchbaseQueryVectorStore 通过 DistanceStrategy 枚举支持以下距离策略：

策略	描述
`DistanceStrategy.DOT`	点积相似性
`DistanceStrategy.COSINE`	余弦相似性
`DistanceStrategy.EUCLIDEAN`	欧几里得距离（等同于 L2）
`DistanceStrategy.EUCLIDEAN_SQUARED`	平方欧几里得距离（等同于 L2_SQUARED）

指定文本和嵌入字段

您可以选择使用 text_key 和 embedding_key 字段为文档指定文本和嵌入字段。

vector_store_specific = CouchbaseQueryVectorStore(
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    distance_metric=DistanceStrategy.COSINE,
    text_key="text",
    embedding_key="embedding",
)

管理向量存储

创建向量存储后，我们可以通过添加和删除不同的项目与其进行交互。 向向量存储添加项目 我们可以使用 add_documents 函数向向量存储添加项目。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)

创建向量索引 重要提示： 向量索引必须在向向量存储添加文档之后创建。在添加文档后使用 create_index() 方法以启用高效的向量搜索。

from langchain_couchbase.vectorstores import IndexType

# Create a Hyperscale Vector Index
vector_store.create_index(
    index_type=IndexType.HYPERSCALE,
    index_description="IVF,SQ8",
)

或创建复合向量索引：

# Create a Composite Vector Index
vector_store.create_index(
    index_type=IndexType.COMPOSITE,
    index_description="IVF,SQ8",
)

从向量存储中删除项目

vector_store.delete(ids=["3"])

查询向量存储

相似性搜索 执行简单的相似性搜索可以按如下方式进行：

results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* thud [{'bar': 'baz'}]

带过滤器的相似性搜索 您可以使用 where_str 参数通过 SQL++ WHERE 子句过滤结果：

results = vector_store.similarity_search(
    query="thud", k=1, where_str="metadata.bar = 'baz'"
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* thud [{'bar': 'baz'}]

带分数的相似性搜索 您可以通过调用 similarity_search_with_score 方法获取结果的距离分数。较低的距离表示文档更相似。

results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [DIST={score:3f}] {doc.page_content} [{doc.metadata}]")

* [DIST=-0.500724] foo [{'baz': 'bar'}]

异步操作

CouchbaseQueryVectorStore 支持异步操作：

# add documents
await vector_store.aadd_documents(documents=documents, ids=ids)

# delete documents
await vector_store.adelete(ids=["3"])

# search
results = await vector_store.asimilarity_search(query="thud", k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [DIST={score:3f}] {doc.page_content} [{doc.metadata}]")

* [DIST=-0.500724] foo [{'baz': 'bar'}]

用作检索器

您可以将向量存储转换为检索器：

retriever = vector_store.as_retriever(
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")

[Document(id='2', metadata={'bar': 'baz'}, page_content='thud')]

从文本创建

您可以直接从文本列表创建 CouchbaseQueryVectorStore：

texts = ["hello", "world"]

vectorstore = CouchbaseQueryVectorStore.from_texts(
    texts,
    embedding=embeddings,
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    distance_metric=DistanceStrategy.COSINE,
)

CouchbaseSearchVectorStore

CouchbaseSearchVectorStore 允许使用搜索向量索引将 Couchbase 用于向量搜索。搜索向量索引将 Couchbase 搜索索引与向量列相结合，允许进行结合向量搜索与全文搜索 (FTS) 和地理空间搜索的混合搜索。 要求： Couchbase Server 版本 7.6 及以上。有关如何创建支持向量字段的搜索索引的详细信息，请参阅文档：

本教程的搜索索引字段映射

要跟随本文档中的示例，您的搜索索引应包含以下字段的映射：

字段	类型	描述
`text`	text	文档文本内容
`embedding`	vector	向量嵌入字段（维度：`text-embedding-3-large` 为 3072）
`metadata`	object (child mapping)	元数据对象，包含 `source`、`author`、`rating`、`date` 等子字段

注意：

向量字段维度必须与您的嵌入模型匹配（本教程中使用的 text-embedding-3-large 为 3072）
元数据子字段（source、author、rating、date）是混合查询示例所需的
您可以在初始化向量存储时使用 text_key 和 embedding_key 参数自定义字段名称

初始化

下面，我们使用集群信息和搜索索引名称创建向量存储对象。首先，设置嵌入：

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

然后创建向量存储：

from langchain_couchbase import CouchbaseSearchVectorStore

SEARCH_INDEX_NAME = "langchain-test-index"

vector_store = CouchbaseSearchVectorStore(
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    index_name=SEARCH_INDEX_NAME,
)

指定文本和嵌入字段

您可以选择使用 text_key 和 embedding_key 字段为文档指定文本和嵌入字段。

vector_store_specific = CouchbaseSearchVectorStore(
    cluster=cluster,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    index_name=SEARCH_INDEX_NAME,
    text_key="text",
    embedding_key="embedding",
)

管理向量存储

创建向量存储后，我们可以通过添加和删除不同的项目与其进行交互。 向向量存储添加项目 我们可以使用 add_documents 函数向向量存储添加项目。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['f125b836-f555-4449-98dc-cbda4e77ae3f',
 'a28fccde-fd32-4775-9ca8-6cdb22ca7031',
 'b1037c4b-947f-497f-84db-63a4def5080b',
 'c7082b74-b385-4c4b-bbe5-0740909c01db',
 'a7e31f62-13a5-4109-b881-8631aff7d46c',
 '9fcc2894-fdb1-41bd-9a93-8547747650f4',
 'a5b0632d-abaf-4802-99b3-df6b6c99be29',
 '0475592e-4b7f-425d-91fd-ac2459d48a36',
 '94c6db4e-ba07-43ff-aa96-3a5d577db43a',
 'd21c7feb-ad47-4e7d-84c5-785afb189160']

从向量存储中删除项目

vector_store.delete(ids=[uuids[-1]])

True

查询向量存储

创建向量存储并添加相关文档后，您很可能希望在链或代理运行期间对其进行查询。 相似性搜索 执行简单的相似性搜索可以按如下方式进行：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

带分数的相似性搜索 您也可以通过调用 similarity_search_with_score 方法获取结果的分数。

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.553213] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

过滤结果

您可以通过指定文档中文本或元数据上受 Couchbase 搜索服务支持的任何过滤器来过滤搜索结果。 filter 可以是 Couchbase Python SDK 支持的任何有效的 SearchQuery。这些过滤器在执行向量搜索之前应用。如果您想过滤元数据中的某个字段，需要使用 . 指定它。例如，要获取元数据中的 source 字段，您需要指定 metadata.source。请注意，过滤器需要受搜索索引支持。

from couchbase import search

query = "Are there any concerning financial news?"
filter_on_source = search.MatchQuery("news", field="metadata.source")
results = vector_store.similarity_search_with_score(
    query, fields=["metadata.source"], filter=filter_on_source, k=5
)
for res, score in results:
    print(f"* {res.page_content} [{res.metadata}] {score}")

* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}] 0.38733142614364624
* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}] 0.20637883245944977
* The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}] 0.10403035581111908

指定返回字段

您可以在搜索中使用 fields 参数指定要从文档返回的字段。这些字段作为返回文档中 metadata 对象的一部分返回。您可以获取存储在搜索索引中的任何字段。文档的 text_key 作为文档 page_content 的一部分返回。如果您未指定要获取的任何字段，则返回索引中存储的所有字段。如果您想获取元数据中的某个字段，需要使用 . 指定它。例如，要获取元数据中的 source 字段，您需要指定 metadata.source。

query = "What did I eat for breakfast today?"
results = vector_store.similarity_search(query, fields=["metadata.source"])
print(results[0])

page_content='I had chocolate chip pancakes and scrambled eggs for breakfast this morning.' metadata={'source': 'tweet'}

通过转换为检索器进行查询

您也可以将向量存储转换为检索器，以便在链中更轻松地使用。以下是将向量存储转换为检索器，然后使用简单查询和过滤器调用检索器的方法。

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1, "score_threshold": 0.5},
)
filter_on_source = search.MatchQuery("news", field="metadata.source")
retriever.invoke("Stealing from the bank is a crime", filter=filter_on_source)

[Document(id='b480c9c6-b7df-4a22-ac2e-19287af7562d', metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

混合查询

Couchbase 允许您通过将向量搜索结果与文档非向量字段（如 metadata 对象）的搜索相结合来进行混合搜索。结果将基于向量搜索和搜索服务支持的搜索的结果组合。每个组件搜索的分数相加得到结果的总分。要执行混合搜索，有一个可选参数 search_options，可以传递给所有相似性搜索。您可以在 Couchbase 搜索请求参数文档中找到 search_options 的不同搜索/查询可能性。 为混合搜索创建多样化的元数据 为了演示混合搜索，让我们创建具有多样化元数据的文档。我们向元数据添加三个字段：date 在 2010 到 2020 之间，rating 在 1 到 5 之间，author 设置为 John Doe 或 Jane Doe。

from langchain_core.documents import Document

# Create documents with diverse metadata for hybrid search examples
hybrid_docs = [
    Document(
        page_content="The new AI model shows impressive performance on benchmark tests.",
        metadata={"source": "tech", "date": "2019-01-01", "rating": 5, "author": "John Doe"},
    ),
    Document(
        page_content="Stock markets showed mixed results today with tech sector leading gains.",
        metadata={"source": "finance", "date": "2017-01-01", "rating": 3, "author": "Jane Doe"},
    ),
    Document(
        page_content="The annual developer conference announced new framework updates.",
        metadata={"source": "tech", "date": "2018-01-01", "rating": 4, "author": "John Doe"},
    ),
    Document(
        page_content="Weather patterns indicate a mild winter ahead for the region.",
        metadata={"source": "weather", "date": "2016-01-01", "rating": 2, "author": "Jane Doe"},
    ),
    Document(
        page_content="The new smartphone release features advanced camera technology.",
        metadata={"source": "tech", "date": "2020-01-01", "rating": 4, "author": "John Doe"},
    ),
    Document(
        page_content="Economic indicators suggest steady growth in the coming quarter.",
        metadata={"source": "finance", "date": "2017-01-01", "rating": 3, "author": "Jane Doe"},
    ),
]

vector_store.add_documents(hybrid_docs)

query = "Tell me about technology news"
results = vector_store.similarity_search(query)
print(results[0].metadata)

{'author': 'John Doe', 'date': '2020-01-01', 'rating': 4, 'source': 'tech'}

按精确值查询 我们可以搜索 metadata 对象中 author 等文本字段的精确匹配。

query = "What are the latest technology updates?"
results = vector_store.similarity_search(
    query,
    search_options={"query": {"field": "metadata.author", "match": "John Doe"}},
    fields=["metadata.author"],
)
print(results[0])

page_content='The new smartphone release features advanced camera technology.' metadata={'author': 'John Doe'}

按部分匹配查询 我们可以通过为搜索指定模糊度来搜索部分匹配。当您想搜索搜索查询的轻微变体或拼写错误时，这很有用。这里，“Jae” 与 “Jane” 接近（模糊度为 1）。

query = "What are the financial market updates?"
results = vector_store.similarity_search(
    query,
    search_options={
        "query": {"field": "metadata.author", "match": "Jae", "fuzziness": 1}
    },
    fields=["metadata.author"],
)
print(results[0])

page_content='Stock markets showed mixed results today with tech sector leading gains.' metadata={'author': 'Jane Doe'}

按日期范围查询 我们可以搜索在 metadata.date 等日期字段上处于日期范围内的文档。

query = "What happened in the markets?"
results = vector_store.similarity_search(
    query,
    search_options={
        "query": {
            "start": "2016-12-31",
            "end": "2018-01-02",
            "inclusive_start": True,
            "inclusive_end": False,
            "field": "metadata.date",
        }
    },
)
print(results[0])

page_content='Stock markets showed mixed results today with tech sector leading gains.' metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': 'finance'}

按数值范围查询 我们可以搜索在 metadata.rating 等数值字段上处于范围内的文档。

query = "What are the economic indicators for the coming quarter?"
results = vector_store.similarity_search_with_score(
    query,
    search_options={
        "query": {
            "min": 4,
            "max": 5,
            "inclusive_min": True,
            "inclusive_max": True,
            "field": "metadata.rating",
        }
    },
)
print(results[0])

(Document(id='6aeb8413bce340bc893f175cefbb64b3', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': 'finance'}, page_content='Economic indicators suggest steady growth in the coming quarter.'), 0.7944117188453674)

组合多个搜索查询 不同的搜索查询可以使用 AND（合取）或 OR（析取）运算符组合。在此示例中，我们检查评分在 3 到 4 之间且日期在 2017 年的文档。

query = "Tell me about finance"
results = vector_store.similarity_search_with_score(
    query,
    search_options={
        "query": {
            "conjuncts": [
                {"min": 3, "max": 4, "inclusive_max": True, "field": "metadata.rating"},
                {"start": "2016-12-31", "end": "2018-01-01", "field": "metadata.date"},
            ]
        }
    },
)
print(results[0])

(Document(id='0c9af73370c1483caddf9941440edb50', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': 'finance'}, page_content='Stock markets showed mixed results today with tech sector leading gains.'), 0.7275013146103568)

注意混合搜索结果可能包含不满足所有搜索参数的文档。这是由于分数计算方式造成的。分数是向量搜索分数和混合搜索中查询分数的总和。如果向量搜索分数很高，组合分数将高于混合搜索中匹配所有查询的结果。为避免此类结果，请使用 filter 参数代替混合搜索。 将混合搜索查询与过滤器结合 混合搜索可以与过滤器结合，以获得混合搜索和过滤器的最佳结果，匹配要求的结果。在此示例中，我们检查评分在 3 到 5 之间且文本字段中匹配字符串 “market” 的文档。

filter_text = search.MatchQuery("market", field="text")

query = "Tell me about market updates"
results = vector_store.similarity_search_with_score(
    query,
    search_options={
        "query": {
            "min": 3,
            "max": 5,
            "inclusive_min": True,
            "inclusive_max": True,
            "field": "metadata.rating",
        }
    },
    filter=filter_text,
)

print(results[0])

(Document(id='0c9af73370c1483caddf9941440edb50', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': 'finance'}, page_content='Stock markets showed mixed results today with tech sector leading gains.'), 0.4503188681265006)

其他查询 类似地，您可以在 search_options 参数中使用任何支持的查询方法，如地理距离、多边形搜索、通配符、正则表达式等。有关可用查询方法及其语法的更多详细信息，请参阅文档。

用于检索增强生成

有关如何将这些向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分：

常见问题

问题：我应该在创建 CouchbaseSearchVectorStore 对象之前还是之后创建搜索索引？

是的，您需要在创建 CouchbaseSearchVectorStore 对象之前创建搜索索引。

问题：我应该在向 CouchbaseQueryVectorStore 添加文档之前还是之后创建索引？

对于 CouchbaseQueryVectorStore，您应该在使用 create_index() 方法添加文档之后创建索引。这与 CouchbaseSearchVectorStore 不同。

问题：CouchbaseSearchVectorStore 和 CouchbaseQueryVectorStore 有什么区别？

功能	`CouchbaseSearchVectorStore`	`CouchbaseQueryVectorStore`
最低版本	Couchbase Server 7.6+	Couchbase Server 8.0+
索引类型	搜索向量索引	超大规模或复合向量索引
索引创建	在向量存储创建之前	在添加文档之后
过滤	`SearchQuery` 对象	SQL++ WHERE 子句 (`where_str`)
最适用于	混合搜索（向量 + FTS + 地理）	大规模纯向量搜索或向量 + 标量过滤器

问题：我在搜索结果中看不到我指定的所有字段

在 Couchbase 中，我们只能返回存储在搜索索引中的字段。请确保您尝试在搜索结果中访问的字段是搜索索引的一部分。处理此问题的一种方法是在索引中动态索引和存储文档的字段。

在 Capella 中，您需要转到“高级模式”，然后在“常规设置”下拉菜单中，您可以选中“[X] 存储动态字段”或“[X] 索引动态字段”
在 Couchbase Server 中，在索引编辑器（不是快速编辑器）中，在“高级”下拉菜单下，您可以选中“[X] 存储动态字段”或“[X] 索引动态字段”

请注意，这些选项会增加索引的大小。有关动态映射的更多详细信息，请参阅文档。

问题：我在搜索结果中看不到元数据对象

这很可能是由于文档中的 metadata 字段未被 Couchbase 搜索索引索引和/或存储。为了索引文档中的 metadata 字段，您需要将其作为子映射添加到索引中。如果您选择映射所有字段，您将能够按所有元数据字段进行搜索。或者，为了优化索引，您可以选择 metadata 对象内的特定字段进行索引。您可以参考文档了解有关索引子映射的更多信息。创建子映射

问题：filter 和 search_options / 混合查询有什么区别？

过滤器是预过滤器，用于限制搜索索引中搜索的文档。它在 Couchbase Server 7.6.4 及更高版本中可用。混合查询是可用于调整从搜索索引返回的结果的附加搜索查询。过滤器和混合搜索查询具有相同的功能，但语法略有不同。过滤器是 SearchQuery 对象，而混合搜索查询是字典。

API 参考

有关所有功能和配置的详细文档：

将这些文档连接到 Claude、VSCode 等，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交问题。

Popular Providers

Integrations by component

设置

凭据

创建 Couchbase 连接对象

CouchbaseQueryVectorStore

初始化

距离策略

指定文本和嵌入字段

管理向量存储

查询向量存储

异步操作

用作检索器

从文本创建

CouchbaseSearchVectorStore

本教程的搜索索引字段映射

初始化

指定文本和嵌入字段

管理向量存储

查询向量存储

过滤结果

指定返回字段

通过转换为检索器进行查询

混合查询

用于检索增强生成

常见问题

问题：我应该在创建 CouchbaseSearchVectorStore 对象之前还是之后创建搜索索引？

问题：我应该在向 CouchbaseQueryVectorStore 添加文档之前还是之后创建索引？

问题：CouchbaseSearchVectorStore 和 CouchbaseQueryVectorStore 有什么区别？

问题：我在搜索结果中看不到我指定的所有字段

问题：我在搜索结果中看不到元数据对象

问题：filter 和 search_options / 混合查询有什么区别？

API 参考

​设置

​凭据

​创建 Couchbase 连接对象

​CouchbaseQueryVectorStore

​初始化

​距离策略

​指定文本和嵌入字段

​管理向量存储

​查询向量存储

​异步操作

​用作检索器

​从文本创建

​CouchbaseSearchVectorStore

​本教程的搜索索引字段映射

​初始化

​指定文本和嵌入字段

​管理向量存储

​查询向量存储

​过滤结果

​指定返回字段

​通过转换为检索器进行查询

​混合查询

​用于检索增强生成

​常见问题

​问题：我应该在创建 CouchbaseSearchVectorStore 对象之前还是之后创建搜索索引？

​问题：我应该在向 CouchbaseQueryVectorStore 添加文档之前还是之后创建索引？

​问题：CouchbaseSearchVectorStore 和 CouchbaseQueryVectorStore 有什么区别？

​问题：我在搜索结果中看不到我指定的所有字段

​问题：我在搜索结果中看不到元数据对象

​问题：filter 和 search_options / 混合查询有什么区别？

​API 参考

设置

凭据

创建 Couchbase 连接对象

CouchbaseQueryVectorStore

初始化

距离策略

指定文本和嵌入字段

管理向量存储

查询向量存储

异步操作

用作检索器

从文本创建

CouchbaseSearchVectorStore

本教程的搜索索引字段映射

初始化

指定文本和嵌入字段

管理向量存储

查询向量存储

过滤结果

指定返回字段

通过转换为检索器进行查询

混合查询

用于检索增强生成

常见问题

问题：我应该在创建 CouchbaseSearchVectorStore 对象之前还是之后创建搜索索引？

问题：我应该在向 CouchbaseQueryVectorStore 添加文档之前还是之后创建索引？

问题：CouchbaseSearchVectorStore 和 CouchbaseQueryVectorStore 有什么区别？

问题：我在搜索结果中看不到我指定的所有字段

问题：我在搜索结果中看不到元数据对象

问题：filter 和 search_options / 混合查询有什么区别？

API 参考