LambdaDB 是一款用于构建可扩展 RAG 和智能体应用的无服务器 AI 数据库。
本 notebook 介绍如何在 LangChain 中开始使用 LambdaDB 向量存储。
要访问 LambdaDB 向量存储,您需要创建一个 LambdaDB 账户、获取项目凭据,并安装 langchain-lambdadb 集成包。
LambdaDB 使用基于项目的身份验证,需要项目 URL 和 API 密钥:
import getpass
import os
if "LAMBDADB_PROJECT_URL" not in os . environ :
os . environ [ " LAMBDADB_PROJECT_URL " ] = getpass . getpass ( "Enter your LambdaDB project URL: " )
if "LAMBDADB_API_KEY" not in os . environ :
os . environ [ " LAMBDADB_API_KEY " ] = getpass . getpass ( "Enter your LambdaDB API key: " )
要启用模型调用的自动追踪,请设置您的 LangSmith API 密钥:
os . environ [ " LANGSMITH_API_KEY " ] = getpass . getpass ( "Enter your LangSmith API key: " )
os . environ [ " LANGSMITH_TRACING " ] = "true"
LangChain LambdaDB 集成位于 langchain-lambdadb 包中:
pip install -U langchain-lambdadb
您还需要安装嵌入模型。例如,使用 OpenAI 嵌入:
pip install -U langchain-openai
实例化
LambdaDBVectorStore 需与已有集合配合使用。您必须提前创建好集合,并正确配置向量和文本索引。
from langchain_lambdadb . vectorstores import LambdaDBVectorStore
from langchain_openai import OpenAIEmbeddings
from lambdadb import LambdaDB
import os
# Initialize the LambdaDB client
client = LambdaDB (
server_url = os . environ [ " LAMBDADB_SERVER_URL " ],
project_api_key = os . environ [ " LAMBDADB_API_KEY " ]
)
# Initialize embeddings
embeddings = OpenAIEmbeddings ()
# Connect to an existing collection
vector_store = LambdaDBVectorStore (
client = client ,
collection_name = "my_collection" , # Must exist beforehand
embedding = embeddings ,
)
关键参数
client:LambdaDB 客户端实例(必填)
collection_name:LambdaDB 中已有集合的名称(必填)
embedding:要使用的嵌入函数(必填)
text_field:文档中文本字段的名称(默认:“text”)
vector_field:文档中向量字段的名称(默认:“vector”)
validate_collection:是否验证集合存在且处于活动状态(默认:True)
default_consistent_read:默认使用一致性读取以获得即时一致性,或使用最终一致性以获得更好性能(默认:False)
管理向量存储
添加条目
from langchain_core . documents import Document
document_1 = Document ( page_content = "LambdaDB is a serverless vector database" , metadata = { "source" : "docs" })
document_2 = Document ( page_content = "It supports fast similarity search" , metadata = { "source" : "docs" })
document_3 = Document ( page_content = "Perfect for RAG applications" , metadata = { "category" : "features" })
documents = [ document_1 , document_2 , document_3 ]
ids = vector_store . add_documents ( documents = documents , ids = [ "1" , "2" , "3" ])
print ( f "Added documents with IDs: { ids } " )
文档最大为 50KB。集成会自动将文档分批处理,每批最多 100 个,以满足 LambdaDB 6MB 的请求限制。
删除条目
vector_store . delete ( ids = [ "3" ])
按 ID 获取条目
documents = vector_store . get_by_ids ([ "1" , "2" ])
for doc in documents :
print ( f "* { doc . page_content } [ { doc . metadata } ]" )
查询向量存储
创建向量存储并添加相关文档后,您通常希望在运行链或智能体时对其进行查询。
相似度搜索
执行简单的相似度搜索:
results = vector_store . similarity_search (
query = "What is LambdaDB?" ,
k = 2
)
for doc in results :
print ( f "* { doc . page_content } [ { doc . metadata } ]" )
带分数的相似度搜索
如果您希望在执行相似度搜索时获取对应的相关性分数:
results = vector_store . similarity_search_with_score (
query = "vector database features" ,
k = 2
)
for doc , score in results :
print ( f "* [SIM= { score :.3f } ] { doc . page_content } [ { doc . metadata } ]" )
带过滤的相似度搜索
LambdaDB 支持使用查询字符串语法进行过滤:
results = vector_store . similarity_search (
query = "database" ,
k = 2 ,
filter = { "queryString" : { "query" : "source:docs" }}
)
for doc in results :
print ( f "* { doc . page_content } [ { doc . metadata } ]" )
最大边际相关性(MMR)搜索
MMR 在优化与查询相似性的同时,也兼顾所选文档的多样性:
results = vector_store . max_marginal_relevance_search (
query = "LambdaDB features" ,
k = 2 ,
fetch_k = 10 , # Fetch 10 candidates
lambda_mult = 0.5 , # Balance between relevance (1.0) and diversity (0.0)
)
for doc in results :
print ( f "* { doc . page_content } " )
转换为检索器
您还可以将向量存储转换为检索器,以便在链中更方便地使用:
retriever = vector_store . as_retriever (
search_type = "mmr" ,
search_kwargs = { "k" : 2 , "fetch_k" : 10 }
)
retriever . invoke ( "What is LambdaDB?" )
支持的搜索类型:
"similarity":标准相似度搜索(默认)
"mmr":最大边际相关性搜索
"similarity_score_threshold":带分数阈值的相似度搜索
异步操作
LambdaDBVectorStore 支持所有操作的异步方法:
# Add documents
ids = await vector_store . aadd_documents ( documents = documents )
# Delete documents
await vector_store . adelete ( ids = [ "3" ])
# Search
results = await vector_store . asimilarity_search ( query = "LambdaDB" , k = 2 )
for doc in results :
print ( f "* { doc . page_content } " )
# Search with score
results = await vector_store . asimilarity_search_with_score ( query = "database" , k = 2 )
for doc , score in results :
print ( f "* [SIM= { score :.3f } ] { doc . page_content } " )
目前,由于 LambdaDB 客户端尚不支持异步操作,异步方法将以同步方式运行。
一致性控制
LambdaDB 支持两种一致性模式:
最终一致性 (默认):性能更快,但写入后数据可能有约 1 分钟的延迟
一致性读取 :即时一致性,但对性能略有影响
# Use consistent reads for a specific operation
results = vector_store . similarity_search (
query = "LambdaDB" ,
k = 2 ,
consistent_read = True
)
# Or set consistent reads as the default
vector_store = LambdaDBVectorStore (
client = client ,
collection_name = "my_collection" ,
embedding = embeddings ,
default_consistent_read = True # All reads will be consistent by default
)
从文本创建
您可以一步完成向量存储的创建并填充文本:
from langchain_lambdadb . vectorstores import LambdaDBVectorStore
texts = [
"LambdaDB is a serverless vector database" ,
"It supports fast similarity search" ,
"Perfect for RAG applications"
]
metadatas = [
{ "source" : "docs" },
{ "source" : "docs" },
{ "category" : "features" }
]
vector_store = LambdaDBVectorStore . from_texts (
texts = texts ,
embedding = embeddings ,
metadatas = metadatas ,
client = client ,
collection_name = "my_collection" ,
ids = [ "1" , "2" , "3" ]
)
用于检索增强生成
以下是使用 LambdaDB 进行 RAG 的完整示例:
from langchain_lambdadb . vectorstores import LambdaDBVectorStore
from langchain_openai import OpenAIEmbeddings , ChatOpenAI
from langchain_core . prompts import ChatPromptTemplate
from langchain_core . runnables import RunnablePassthrough
from langchain_core . output_parsers import StrOutputParser
from lambdadb import LambdaDB
import os
# Initialize
client = LambdaDB (
project_url = os . environ [ " LAMBDADB_PROJECT_URL " ],
project_api_key = os . environ [ " LAMBDADB_API_KEY " ]
)
embeddings = OpenAIEmbeddings ()
vector_store = LambdaDBVectorStore (
client = client ,
collection_name = "my_collection" ,
embedding = embeddings
)
# Create retriever
retriever = vector_store . as_retriever ( search_kwargs = { "k" : 3 })
# Create RAG chain
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate . from_template ( template )
model = ChatOpenAI ()
chain = (
{ "context" : retriever , "question" : RunnablePassthrough ()}
| prompt
| model
| StrOutputParser ()
)
# Use the chain
response = chain . invoke ( "What is LambdaDB?" )
print ( response )
主要特性
文档大小限制
每个文档最大:50KB
集成会验证文档大小,超出限制时将报错
批量处理
文档会自动以每批 100 个进行批量插入操作
满足 LambdaDB 6MB 的请求限制
支持 LambdaDB 的查询字符串语法进行元数据过滤
示例:filter={"queryString": {"query": "field:value"}}
搜索选项
相似度搜索 :查找与查询相似的文档
MMR 搜索 :兼顾相似性和多样性
分数阈值 :按相似度分数过滤结果
一致性读取 :控制读取一致性与性能之间的权衡
API 参考
有关所有 LambdaDBVectorStore 功能和配置的详细文档,请参阅 API 参考 。
其他资源
通过 MCP 将这些文档连接 到 Claude、VSCode 等,获取实时答案。