Skip to main content
CouchbaseQueryVectorStore 是 Couchbase 中向量搜索的首选实现。它使用 查询服务 (SQL++) 和 索引服务 进行向量相似性搜索,而不是搜索服务。这通过使用带有向量函数的 SQL++ 查询为向量操作提供了一种更强大、更直接的方法。 有关 Couchbase 向量搜索功能的更多信息,请参阅官方文档:选择正确的向量索引
此功能仅在 Couchbase 8.0 及更高版本中可用。

CouchbaseSearchVectorStore 的主要区别

(以前称为 CouchbaseVectorStore
  • 查询和索引服务:使用带有 SQL++ 的 Couchbase 查询服务而不是搜索服务
  • 无需索引:基本操作不需要预先配置的搜索索引
  • SQL++ 语法:支持 WHERE 子句和用于过滤的 SQL++ 查询语法
  • 向量函数:使用 APPROX_VECTOR_DISTANCE 函数进行相似度计算
  • 距离策略:支持多种距离策略(点积、余弦、欧几里得、欧几里得平方)

安装

npm
npm install couchbase @langchain/openai @langchain/community @langchain/core

创建 couchbase 连接对象

我们创建一个到 Couchbase 集群的连接,然后将集群对象传递给向量存储。在这里,我们使用用户名和密码进行连接。 您也可以使用任何其他支持的方式连接到您的集群。 有关连接到 Couchbase 集群的更多信息,请查看 Node SDK 文档
import { Cluster } from "couchbase";

const connectionString = "couchbase://localhost";
const dbUsername = "Administrator"; // valid database user with read access to the bucket being queried
const dbPassword = "Password"; // password for the database user

const couchbaseClient = await Cluster.connect(connectionString, {
  username: dbUsername,
  password: dbPassword,
  configProfile: "wanDevelopment",
});

基本设置

import { CouchbaseQueryVectorStore, DistanceStrategy } from "@langchain/community/vectorstores/couchbase_query";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Cluster } from "couchbase";

// Connect to Couchbase
const cluster = await Cluster.connect("couchbase://localhost", {
  username: "Administrator",
  password: "password",
});

// Initialize embeddings
const embeddings = new OpenAIEmbeddings();

// Configure the vector store
const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster,
  bucketName: "my-bucket",
  scopeName: "my-scope",
  collectionName: "my-collection",
  textKey: "text", // optional, defaults to "text"
  embeddingKey: "embedding", // optional, defaults to "embedding"
  distanceStrategy: DistanceStrategy.COSINE, // optional, defaults to DOT
});

创建向量索引

查询向量存储支持创建向量索引以提高搜索性能。有两种类型的索引可用:

Hyperscale 索引

一种针对向量操作优化的专用向量索引,使用 Couchbase 的向量索引功能:
import { IndexType } from "@langchain/community/vectorstores/couchbase_query";

await vectorStore.createIndex({
  indexType: IndexType.HYPERSCALE,
  indexDescription: "IVF,SQ8",
  indexName: "my_vector_index", // optional
  vectorDimension: 1536, // optional, auto-detected from embeddings
  distanceMetric: DistanceStrategy.COSINE, // optional, uses store default
  fields: ["text", "metadata"], // optional, defaults to text field
  whereClause: "type = 'document'", // optional filter
  indexScanNprobes: 10, // optional tuning parameter
  indexTrainlist: 1000, // optional tuning parameter
});
Generated SQL++:
CREATE VECTOR INDEX `my_vector_index` ON `bucket`.`scope`.`collection`
(`embedding` VECTOR) INCLUDE (`text`, `metadata`)
WHERE type = 'document' USING GSI WITH {'dimension': 1536, 'similarity': 'cosine', 'description': 'IVF,SQ8'}

复合索引

包含向量字段和标量字段的通用 GSI 索引:
await vectorStore.createIndex({
  indexType: IndexType.COMPOSITE,
  indexDescription: "IVF1024,SQ8",
  indexName: "my_composite_index",
  vectorDimension: 1536,
  fields: ["text", "metadata.category"],
  whereClause: "created_date > '2023-01-01'",
  indexScanNprobes: 3,
  indexTrainlist: 10000,
});
Generated SQL++:
CREATE INDEX `my_composite_index` ON `bucket`.`scope`.`collection`
(`text`, `metadata.category`, `embedding` VECTOR)
WHERE created_date > '2023-01-01' USING GSI
WITH {'dimension': 1536, 'similarity': 'dot', 'description': 'IVF1024,SQ8', 'scan_nprobes': 3, 'trainlist': 10000}

主要区别

AspectHyperscale IndexComposite Index
SQL++ SyntaxCREATE VECTOR INDEXCREATE INDEX
Vector Field(field VECTOR) with INCLUDE clause(field1, field2, vector_field VECTOR)
Vector ParametersSupports all vector parametersSupports all vector parameters
OptimizationSpecialized for vector operationsGeneral-purpose GSI with vector support
Use CasePure vector similarity searchMixed vector and scalar queries
PerformanceOptimized for vector distance calculationsGood for hybrid queries
Tuning ParametersSupports indexScanNprobes, indexTrainlistSupports indexScanNprobes, indexTrainlist
LimitationsOnly one vector field, uses INCLUDE for other fieldsOne vector field among multiple index keys

基本向量搜索示例

以下示例展示了如何使用 Couchbase 查询向量搜索并执行相似性搜索。
import { OpenAIEmbeddings } from "@langchain/openai";
import {
  CouchbaseQueryVectorStore,
  DistanceStrategy,
} from "@langchain/community/vectorstores/couchbase_query";
import { Cluster } from "couchbase";
import { Document } from "@langchain/core/documents";

const connectionString = process.env.COUCHBASE_DB_CONN_STR ?? "couchbase://localhost";
const databaseUsername = process.env.COUCHBASE_DB_USERNAME ?? "Administrator";
const databasePassword = process.env.COUCHBASE_DB_PASSWORD ?? "Password";

const couchbaseClient = await Cluster.connect(connectionString, {
  username: databaseUsername,
  password: databasePassword,
  configProfile: "wanDevelopment",
});

// OpenAI API Key is required to use OpenAIEmbeddings
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster: couchbaseClient,
  bucketName: "testing",
  scopeName: "_default",
  collectionName: "_default",
  textKey: "text",
  embeddingKey: "embedding",
  distanceStrategy: DistanceStrategy.COSINE,
});

// Add documents
const documents = [
  new Document({
    pageContent: "Couchbase is a NoSQL database",
    metadata: { category: "database", type: "document" }
  }),
  new Document({
    pageContent: "Vector search enables semantic similarity",
    metadata: { category: "ai", type: "document" }
  })
];

await vectorStore.addDocuments(documents);

// Perform similarity search
const query = "What is a NoSQL database?";
const results = await vectorStore.similaritySearch(query, 4);
console.log("Search results:", results[0]);

// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 4);
console.log("Document:", resultsWithScores[0][0]);
console.log("Score:", resultsWithScores[0][1]);

搜索文档

基本相似性搜索

// Basic similarity search
const results = await vectorStore.similaritySearch(
  "What is a NoSQL database?",
  4
);

带过滤器的搜索

// Search with filters
const filteredResults = await vectorStore.similaritySearch(
  "database technology",
  4,
  {
    where: "metadata.category = 'database'",
    fields: ["text", "metadata.category"]
  }
);

带分数的搜索

// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(
  "vector search capabilities",
  4
);

复杂过滤

const results = await vectorStore.similaritySearch(
  "search query",
  10,
  {
    where: "metadata.category IN ['tech', 'science'] AND metadata.rating >= 4",
    fields: ["content", "metadata.title", "metadata.rating"]
  }
);

配置选项

距离策略

索引类型

  • IndexType.HYPERSCALE - Specialized vector index for optimal vector search performance
  • IndexType.COMPOSITE - General-purpose index that can include vector and scalar fields

高级用法

自定义向量字段

const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
  cluster,
  bucketName: "my-bucket",
  scopeName: "my-scope",
  collectionName: "my-collection",
  textKey: "content",
  embeddingKey: "vector_embedding",
  distanceStrategy: DistanceStrategy.EUCLIDEAN,
});

从文本创建

const texts = [
  "Couchbase is a NoSQL database",
  "Vector search enables semantic similarity"
];

const metadatas = [
  { category: "database" },
  { category: "ai" }
];

const vectorStore = await CouchbaseQueryVectorStore.fromTexts(
  texts,
  metadatas,
  embeddings,
  {
    cluster,
    bucketName: "my-bucket",
    scopeName: "my-scope",
    collectionName: "my-collection"
  }
);

删除文档

const documentIds = ["doc1", "doc2", "doc3"];
await vectorStore.delete({ ids: documentIds });

性能注意事项

  1. 创建索引:使用 createIndex() 创建适当的向量索引以获得更好的性能
  2. 选择索引类型
    • 对于主要执行相似性搜索的纯向量搜索工作负载,请使用 Hyperscale 索引
    • 对于结合向量相似性和标量字段过滤的混合查询,请使用 复合索引
  3. 调整参数:根据您的数据大小和性能要求调整 indexScanNprobesindexTrainlist
  4. 尽早过滤:在向量计算之前使用 WHERE 子句减少搜索空间

错误处理

try {
  await vectorStore.createIndex({
    indexType: IndexType.HYPERSCALE,
    indexDescription: "IVF,SQ8",
  });
} catch (error) {
  console.error("Index creation failed:", error.message);
}

常见错误

训练数据不足

如果您看到与训练数据不足相关的错误,您可能需要:
  • 增加 indexTrainlist 参数(默认建议:每个质心约 50 个向量)
  • 确保您的集合中有足够的带有向量嵌入的文档
  • 对于小于 100 万个向量的集合,使用 number_of_vectors / 1000 作为质心
  • 对于更大的集合,使用 sqrt(number_of_vectors) 作为质心

CouchbaseSearchVectorStore 的比较

FeatureCouchbaseQueryVectorStoreCouchbaseSearchVectorStore
ServiceQuery (SQL++)Search (FTS)
Index RequiredOptional (for performance)Required
Query LanguageSQL++ WHERE clausesSearch query syntax
Vector FunctionsAPPROX_VECTOR_DISTANCEVectorQuery API
Setup ComplexityLowerHigher
PerformanceGood with indexesOptimized for search

常见问题

在使用 CouchbaseQueryVectorStore 之前我需要创建索引吗?

不,与基于搜索的 CouchbaseSearchVectorStore 不同,基于查询的实现可以在没有预先创建索引的情况下工作。但是,创建适当的向量索引(Hyperscale 或复合)将显着提高查询性能。

什么时候应该使用 hyperscale 与复合索引?

  • 当您主要执行向量相似性搜索且对其他字段的过滤很少时,请使用 Hyperscale 索引
  • 当您经常在同一查询中结合向量相似性与标量字段过滤时,请使用 复合索引
  • 了解更多关于如何 选择正确的向量索引

我可以在相同的数据上同时使用 CouchbaseQueryVectorStoreCouchbaseSearchVectorStore 吗?

是的,两者都可以在相同的文档结构上工作。但是,它们使用不同的服务(搜索与查询)并具有不同的索引要求。

相关