Skip to main content

概述

向量存储 存储 嵌入 数据并执行相似性搜索。

接口

LangChain 为向量存储提供了一个统一的接口,允许您:
  • addDocuments - 向存储添加文档。
  • delete - 按 ID 删除存储的文档。
  • similaritySearch - 查询语义相似的文档。
这种抽象允许您在不更改应用程序逻辑的情况下切换不同的实现。

初始化

LangChain 中的大多数向量存储在初始化向量存储时接受嵌入模型作为参数。
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
const vectorStore = new MemoryVectorStore(embeddings);

添加文档

您可以使用 addDocuments 函数向向量存储添加文档。
import { Document } from "@langchain/core/documents";
const document = new Document({
  pageContent: "Hello world",
});
await vectorStore.addDocuments([document]);

删除文档

您可以使用 delete 函数从向量存储中删除文档。
await vectorStore.delete({
  filter: {
    pageContent: "Hello world",
  },
});

相似性搜索

使用 similaritySearch 发出语义查询,它返回最接近的嵌入文档:
const results = await vectorStore.similaritySearch("Hello world", 10);
许多向量存储支持如下参数:
  • k — 要返回的结果数量
  • filter — 基于元数据的条件过滤

相似性度量和索引

嵌入相似度可以使用以下方法计算:
  • 余弦相似度
  • 欧几里得距离
  • 点积
高效搜索通常采用索引方法,如 HNSW(分层导航小世界),尽管具体细节取决于向量存储。

元数据过滤

按元数据(例如来源、日期)过滤可以细化搜索结果:
vectorStore.similaritySearch("query", 2, { source: "tweets" });

热门集成

选择嵌入模型:
Install dependencies:
npm i @langchain/openai
Add environment variables:
OPENAI_API_KEY=your-api-key
Instantiate the model:
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-large"
});
Install dependencies
npm i @langchain/openai
Add environment variables:
AZURE_OPENAI_API_INSTANCE_NAME=<YOUR_INSTANCE_NAME>
AZURE_OPENAI_API_KEY=<YOUR_KEY>
AZURE_OPENAI_API_VERSION="2024-02-01"
Instantiate the model:
import { AzureOpenAIEmbeddings } from "@langchain/openai";

const embeddings = new AzureOpenAIEmbeddings({
  azureOpenAIApiEmbeddingsDeploymentName: "text-embedding-ada-002"
});
Install dependencies:
npm i @langchain/aws
Add environment variables:
BEDROCK_AWS_REGION=your-region
Instantiate the model:
import { BedrockEmbeddings } from "@langchain/aws";

const embeddings = new BedrockEmbeddings({
  model: "amazon.titan-embed-text-v1"
});
Install dependencies:
npm i @langchain/google-genai
Add environment variables:
GOOGLE_API_KEY=your-api-key
Instantiate the model:
import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const embeddings = new GoogleGenerativeAIEmbeddings({
  model: "text-embedding-004"
});
Install dependencies:
npm i @langchain/google-vertexai
Add environment variables:
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
Instantiate the model:
import { VertexAIEmbeddings } from "@langchain/google-vertexai";

const embeddings = new VertexAIEmbeddings({
  model: "gemini-embedding-001"
});
Install dependencies:
npm i @langchain/mistralai
Add environment variables:
MISTRAL_API_KEY=your-api-key
Instantiate the model:
import { MistralAIEmbeddings } from "@langchain/mistralai";

const embeddings = new MistralAIEmbeddings({
  model: "mistral-embed"
});
Install dependencies:
npm i @langchain/cohere
Add environment variables:
COHERE_API_KEY=your-api-key
Instantiate the model:
import { CohereEmbeddings } from "@langchain/cohere";

const embeddings = new CohereEmbeddings({
  model: "embed-english-v3.0"
});
Install dependencies:
npm i @langchain/ollama
Instantiate the model:
import { OllamaEmbeddings } from "@langchain/ollama";

const embeddings = new OllamaEmbeddings({
  model: "llama2",
  baseUrl: "http://localhost:11434", // Default value
});
选择向量存储:
npm i langchain
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";

const vectorStore = new MemoryVectorStore(embeddings);
npm i @langchain/community
import { Chroma } from "@langchain/community/vectorstores/chroma";

const vectorStore = new Chroma(embeddings, {
  collectionName: "a-test-collection",
});
npm i @langchain/community
import { FaissStore } from "@langchain/community/vectorstores/faiss";

const vectorStore = new FaissStore(embeddings, {});
npm i @langchain/mongodb
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb"
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
  .db(process.env.MONGODB_ATLAS_DB_NAME)
  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);

const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
  collection,
  indexName: "vector_index",
  textKey: "text",
  embeddingKey: "embedding",
});
npm i @langchain/community
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";

const vectorStore = await PGVectorStore.initialize(embeddings, {});
npm i @langchain/pinecone
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone as PineconeClient } from "@pinecone-database/pinecone";

const pinecone = new PineconeClient();
const vectorStore = new PineconeStore(embeddings, {
  pineconeIndex,
  maxConcurrency: 5,
});
npm i @langchain/redis
import { RedisVectorStore } from "@langchain/redis";

const vectorStore = new RedisVectorStore(embeddings, {
  redisClient: client,
  indexName: "langchainjs-testing",
});
npm i @langchain/qdrant
import { QdrantVectorStore } from "@langchain/qdrant";

const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
  url: process.env.QDRANT_URL,
  collectionName: "langchainjs-testing",
});
npm i @langchain/weaviate
import { WeaviateStore } from "@langchain/weaviate";

const vectorStore = new WeaviateStore(embeddings, {
    client: weaviateClient,
    indexName: "Langchainjs_test",
});
LangChain.js 集成了各种向量存储。您可以在下面查看完整列表:

所有向量存储