Couchbase 查询集成

CouchbaseQueryVectorStore 是 Couchbase 中向量搜索的首选实现。它使用查询服务 (SQL++) 和索引服务进行向量相似性搜索，而不是搜索服务。这通过使用带有向量函数的 SQL++ 查询为向量操作提供了一种更强大、更直接的方法。有关 Couchbase 向量搜索功能的更多信息，请参阅官方文档：选择正确的向量索引。

此功能仅在 Couchbase 8.0 及更高版本中可用。

与 `CouchbaseSearchVectorStore` 的主要区别

（以前称为 CouchbaseVectorStore）

查询和索引服务：使用带有 SQL++ 的 Couchbase 查询服务而不是搜索服务
无需索引：基本操作不需要预先配置的搜索索引
SQL++ 语法：支持 WHERE 子句和用于过滤的 SQL++ 查询语法
向量函数：使用 APPROX_VECTOR_DISTANCE 函数进行相似度计算
距离策略：支持多种距离策略（点积、余弦、欧几里得、欧几里得平方）

安装

npm

npm install couchbase @langchain/openai @langchain/community @langchain/core

创建 couchbase 连接对象

我们创建一个到 Couchbase 集群的连接，然后将集群对象传递给向量存储。在这里，我们使用用户名和密码进行连接。您也可以使用任何其他支持的方式连接到您的集群。有关连接到 Couchbase 集群的更多信息，请查看 Node SDK 文档。

import { Cluster } from "couchbase";

const connectionString = "couchbase://localhost";
const dbUsername = "Administrator"; // valid database user with read access to the bucket being queried
const dbPassword = "Password"; // password for the database user

const couchbaseClient = await Cluster.connect(connectionString, {
  username: dbUsername,
  password: dbPassword,
  configProfile: "wanDevelopment",
});

基本设置

import { CouchbaseQueryVectorStore, DistanceStrategy } from "@langchain/community/vectorstores/couchbase_query";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Cluster } from "couchbase";

// Connect to Couchbase
const cluster = await Cluster.connect("couchbase://localhost", {
  username: "Administrator",
  password: "password",
});

// Initialize embeddings
const embeddings = new OpenAIEmbeddings();

// Configure the vector store
const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster,
  bucketName: "my-bucket",
  scopeName: "my-scope",
  collectionName: "my-collection",
  textKey: "text", // optional, defaults to "text"
  embeddingKey: "embedding", // optional, defaults to "embedding"
  distanceStrategy: DistanceStrategy.COSINE, // optional, defaults to DOT
});

创建向量索引

查询向量存储支持创建向量索引以提高搜索性能。有两种类型的索引可用：

Hyperscale 索引

一种针对向量操作优化的专用向量索引，使用 Couchbase 的向量索引功能：

import { IndexType } from "@langchain/community/vectorstores/couchbase_query";

await vectorStore.createIndex({
  indexType: IndexType.HYPERSCALE,
  indexDescription: "IVF,SQ8",
  indexName: "my_vector_index", // optional
  vectorDimension: 1536, // optional, auto-detected from embeddings
  distanceMetric: DistanceStrategy.COSINE, // optional, uses store default
  fields: ["text", "metadata"], // optional, defaults to text field
  whereClause: "type = 'document'", // optional filter
  indexScanNprobes: 10, // optional tuning parameter
  indexTrainlist: 1000, // optional tuning parameter
});

Generated SQL++:

CREATE VECTOR INDEX `my_vector_index` ON `bucket`.`scope`.`collection`
(`embedding` VECTOR) INCLUDE (`text`, `metadata`)
WHERE type = 'document' USING GSI WITH {'dimension': 1536, 'similarity': 'cosine', 'description': 'IVF,SQ8'}

复合索引

包含向量字段和标量字段的通用 GSI 索引：

await vectorStore.createIndex({
  indexType: IndexType.COMPOSITE,
  indexDescription: "IVF1024,SQ8",
  indexName: "my_composite_index",
  vectorDimension: 1536,
  fields: ["text", "metadata.category"],
  whereClause: "created_date > '2023-01-01'",
  indexScanNprobes: 3,
  indexTrainlist: 10000,
});

Generated SQL++:

CREATE INDEX `my_composite_index` ON `bucket`.`scope`.`collection`
(`text`, `metadata.category`, `embedding` VECTOR)
WHERE created_date > '2023-01-01' USING GSI
WITH {'dimension': 1536, 'similarity': 'dot', 'description': 'IVF1024,SQ8', 'scan_nprobes': 3, 'trainlist': 10000}

主要区别

Aspect	Hyperscale Index	Composite Index
SQL++ Syntax	`CREATE VECTOR INDEX`	`CREATE INDEX`
Vector Field	`(field VECTOR)` with `INCLUDE` clause	`(field1, field2, vector_field VECTOR)`
Vector Parameters	Supports all vector parameters	Supports all vector parameters
Optimization	Specialized for vector operations	General-purpose GSI with vector support
Use Case	Pure vector similarity search	Mixed vector and scalar queries
Performance	Optimized for vector distance calculations	Good for hybrid queries
Tuning Parameters	Supports `indexScanNprobes`, `indexTrainlist`	Supports `indexScanNprobes`, `indexTrainlist`
Limitations	Only one vector field, uses INCLUDE for other fields	One vector field among multiple index keys

基本向量搜索示例

以下示例展示了如何使用 Couchbase 查询向量搜索并执行相似性搜索。

import { OpenAIEmbeddings } from "@langchain/openai";
import {
  CouchbaseQueryVectorStore,
  DistanceStrategy,
} from "@langchain/community/vectorstores/couchbase_query";
import { Cluster } from "couchbase";
import { Document } from "@langchain/core/documents";

const connectionString = process.env.COUCHBASE_DB_CONN_STR ?? "couchbase://localhost";
const databaseUsername = process.env.COUCHBASE_DB_USERNAME ?? "Administrator";
const databasePassword = process.env.COUCHBASE_DB_PASSWORD ?? "Password";

const couchbaseClient = await Cluster.connect(connectionString, {
  username: databaseUsername,
  password: databasePassword,
  configProfile: "wanDevelopment",
});

// OpenAI API Key is required to use OpenAIEmbeddings
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster: couchbaseClient,
  bucketName: "testing",
  scopeName: "_default",
  collectionName: "_default",
  textKey: "text",
  embeddingKey: "embedding",
  distanceStrategy: DistanceStrategy.COSINE,
});

// Add documents
const documents = [
  new Document({
    pageContent: "Couchbase is a NoSQL database",
    metadata: { category: "database", type: "document" }
  }),
  new Document({
    pageContent: "Vector search enables semantic similarity",
    metadata: { category: "ai", type: "document" }
  })
];

await vectorStore.addDocuments(documents);

// Perform similarity search
const query = "What is a NoSQL database?";
const results = await vectorStore.similaritySearch(query, 4);
console.log("Search results:", results[0]);

// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 4);
console.log("Document:", resultsWithScores[0][0]);
console.log("Score:", resultsWithScores[0][1]);

搜索文档

基本相似性搜索

// Basic similarity search
const results = await vectorStore.similaritySearch(
  "What is a NoSQL database?",
  4
);

带过滤器的搜索

// Search with filters
const filteredResults = await vectorStore.similaritySearch(
  "database technology",
  4,
  {
    where: "metadata.category = 'database'",
    fields: ["text", "metadata.category"]
  }
);

带分数的搜索

// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(
  "vector search capabilities",
  4
);

复杂过滤

const results = await vectorStore.similaritySearch(
  "search query",
  10,
  {
    where: "metadata.category IN ['tech', 'science'] AND metadata.rating >= 4",
    fields: ["content", "metadata.title", "metadata.rating"]
  }
);

配置选项

距离策略

DistanceStrategy.DOT - Dot Product (default)
DistanceStrategy.COSINE - Cosine Similarity
DistanceStrategy.EUCLIDEAN - Euclidean Distance (also known as L2)
DistanceStrategy.EUCLIDEAN_SQUARED - Euclidean Squared Distance (also known as L2 Squared)

索引类型

IndexType.HYPERSCALE - Specialized vector index for optimal vector search performance
IndexType.COMPOSITE - General-purpose index that can include vector and scalar fields

高级用法

自定义向量字段

const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster,
  bucketName: "my-bucket",
  scopeName: "my-scope",
  collectionName: "my-collection",
  textKey: "content",
  embeddingKey: "vector_embedding",
  distanceStrategy: DistanceStrategy.EUCLIDEAN,
});

从文本创建

const texts = [
  "Couchbase is a NoSQL database",
  "Vector search enables semantic similarity"
];

const metadatas = [
  { category: "database" },
  { category: "ai" }
];

const vectorStore = await CouchbaseQueryVectorStore.fromTexts(
  texts,
  metadatas,
  embeddings,
  {
    cluster,
    bucketName: "my-bucket",
    scopeName: "my-scope",
    collectionName: "my-collection"
  }
);

删除文档

const documentIds = ["doc1", "doc2", "doc3"];
await vectorStore.delete({ ids: documentIds });

性能注意事项

创建索引：使用 createIndex() 创建适当的向量索引以获得更好的性能
选择索引类型：
- 对于主要执行相似性搜索的纯向量搜索工作负载，请使用 Hyperscale 索引
- 对于结合向量相似性和标量字段过滤的混合查询，请使用 复合索引
调整参数：根据您的数据大小和性能要求调整 indexScanNprobes 和 indexTrainlist
尽早过滤：在向量计算之前使用 WHERE 子句减少搜索空间

错误处理

try {
  await vectorStore.createIndex({
    indexType: IndexType.HYPERSCALE,
    indexDescription: "IVF,SQ8",
  });
} catch (error) {
  console.error("Index creation failed:", error.message);
}

常见错误

训练数据不足

如果您看到与训练数据不足相关的错误，您可能需要：

增加 indexTrainlist 参数（默认建议：每个质心约 50 个向量）
确保您的集合中有足够的带有向量嵌入的文档
对于小于 100 万个向量的集合，使用 number_of_vectors / 1000 作为质心
对于更大的集合，使用 sqrt(number_of_vectors) 作为质心

与 `CouchbaseSearchVectorStore` 的比较

Feature	`CouchbaseQueryVectorStore`	`CouchbaseSearchVectorStore`
Service	Query (SQL++)	Search (FTS)
Index Required	Optional (for performance)	Required
Query Language	SQL++ WHERE clauses	Search query syntax
Vector Functions	APPROX_VECTOR_DISTANCE	VectorQuery API
Setup Complexity	Lower	Higher
Performance	Good with indexes	Optimized for search

常见问题

在使用 `CouchbaseQueryVectorStore` 之前我需要创建索引吗？

不，与基于搜索的 CouchbaseSearchVectorStore 不同，基于查询的实现可以在没有预先创建索引的情况下工作。但是，创建适当的向量索引（Hyperscale 或复合）将显着提高查询性能。

什么时候应该使用 hyperscale 与复合索引？

当您主要执行向量相似性搜索且对其他字段的过滤很少时，请使用 Hyperscale 索引
当您经常在同一查询中结合向量相似性与标量字段过滤时，请使用 复合索引
了解更多关于如何选择正确的向量索引

我可以在相同的数据上同时使用 `CouchbaseQueryVectorStore` 和 `CouchbaseSearchVectorStore` 吗？

是的，两者都可以在相同的文档结构上工作。但是，它们使用不同的服务（搜索与查询）并具有不同的索引要求。

Popular Providers

General integrations

RAG integrations

与 `CouchbaseSearchVectorStore` 的主要区别

安装

创建 couchbase 连接对象

基本设置

创建向量索引

Hyperscale 索引

复合索引

主要区别

基本向量搜索示例

搜索文档

基本相似性搜索

带过滤器的搜索

带分数的搜索

复杂过滤

配置选项

距离策略

索引类型

高级用法

自定义向量字段

从文本创建

删除文档

性能注意事项

错误处理

常见错误

训练数据不足

与 `CouchbaseSearchVectorStore` 的比较

常见问题

在使用 `CouchbaseQueryVectorStore` 之前我需要创建索引吗？

什么时候应该使用 hyperscale 与复合索引？

我可以在相同的数据上同时使用 `CouchbaseQueryVectorStore` 和 `CouchbaseSearchVectorStore` 吗？

相关

Popular Providers

General integrations

RAG integrations

​与 CouchbaseSearchVectorStore 的主要区别

​安装

​创建 couchbase 连接对象

​基本设置

​创建向量索引

​Hyperscale 索引

​复合索引

​主要区别

​基本向量搜索示例

​搜索文档

​基本相似性搜索

​带过滤器的搜索

​带分数的搜索

​复杂过滤

​配置选项

​距离策略

​索引类型

​高级用法

​自定义向量字段

​从文本创建

​删除文档

​性能注意事项

​错误处理

​常见错误

​训练数据不足

​与 CouchbaseSearchVectorStore 的比较

​常见问题

​在使用 CouchbaseQueryVectorStore 之前我需要创建索引吗？

​什么时候应该使用 hyperscale 与复合索引？

​我可以在相同的数据上同时使用 CouchbaseQueryVectorStore 和 CouchbaseSearchVectorStore 吗？

​相关

与 `CouchbaseSearchVectorStore` 的主要区别

安装

创建 couchbase 连接对象

基本设置

创建向量索引

Hyperscale 索引

复合索引

主要区别

基本向量搜索示例

搜索文档

基本相似性搜索

带过滤器的搜索

带分数的搜索

复杂过滤

配置选项

距离策略

索引类型

高级用法

自定义向量字段

从文本创建

删除文档

性能注意事项

错误处理

常见错误

训练数据不足

与 `CouchbaseSearchVectorStore` 的比较

常见问题

在使用 `CouchbaseQueryVectorStore` 之前我需要创建索引吗？

什么时候应该使用 hyperscale 与复合索引？

我可以在相同的数据上同时使用 `CouchbaseQueryVectorStore` 和 `CouchbaseSearchVectorStore` 吗？

相关