SingleStoreVectorStore 集成

SingleStore 是一个强大的高性能分布式 SQL 数据库解决方案，专为在云端和本地环境中均能出色运行而设计。凭借多功能特性集，它提供无缝的部署选项，同时提供卓越的性能。

SingleStore 的一个突出特性是其对向量存储和操作的高级支持，使其成为需要复杂 AI 能力（如文本相似度匹配）应用程序的理想选择。通过内置向量函数 dot_product 和 euclidean_distance，SingleStore 使开发者能够高效地实现复杂算法。对于希望在 SingleStore 中利用向量数据的开发者，有一个全面的教程可供参考，指导他们了解处理向量数据的各个细节。本教程深入介绍了 SingleStoreDB 中的向量存储，展示了其基于向量相似度进行搜索的能力。利用向量索引，查询可以以出色的速度执行，实现相关数据的快速检索。此外，SingleStore 的向量存储与基于 Lucene 的全文索引无缝集成，支持强大的文本相似度搜索。用户可以根据文档元数据对象的选定字段过滤搜索结果，提升查询精度。 SingleStore 的独特之处在于能够以多种方式结合向量和全文搜索，提供灵活性和多功能性。无论是先按文本或向量相似度进行预过滤再选择最相关数据，还是采用加权求和方法计算最终相似度分数，开发者都有多种选择。总的来说，SingleStore 为管理和查询向量数据提供了全面的解决方案，为 AI 驱动的应用程序提供卓越的性能和灵活性。

类	包	JS 支持
SingleStoreVectorStore	langchain_singlestore	✅

关于 langchain-community 版本的 SingleStoreDB（已弃用），请参阅v0.2 文档。

安装配置

要访问 SingleStore 向量存储，需要安装 langchain-singlestore 集成包。 pip install -qU “langchain-singlestore”

初始化

要初始化 SingleStoreVectorStore，您需要一个 Embeddings 对象和 SingleStore 数据库的连接参数。

必填参数

embedding（Embeddings）：文本嵌入模型。

可选参数

distance_strategy（DistanceStrategy）：计算向量距离的策略，默认为 DOT_PRODUCT。选项：
- DOT_PRODUCT：计算两个向量的标量积。
- EUCLIDEAN_DISTANCE：计算两个向量间的欧氏距离。
table_name（str）：表名，默认为 embeddings。
content_field（str）：存储内容的字段，默认为 content。
metadata_field（str）：存储元数据的字段，默认为 metadata。
vector_field（str）：存储向量的字段，默认为 vector。
id_field（str）：存储 ID 的字段，默认为 id。
use_vector_index（bool）：启用向量索引（需要 SingleStore 8.5+），默认为 False。
vector_index_name（str）：向量索引名称，use_vector_index 为 False 时忽略。
vector_index_options（dict）：向量索引选项，use_vector_index 为 False 时忽略。
vector_size（int）：向量大小，use_vector_index 为 True 时必填。
use_full_text_search（bool）：启用对内容的全文索引，默认为 False。

连接池参数

pool_size（int）：连接池中活跃连接数，默认为 5。
max_overflow（int）：超出 pool_size 的最大连接数，默认为 10。
timeout（float）：连接超时时间（秒），默认为 30。

数据库连接参数

host（str）：数据库的主机名、IP 或 URL。
user（str）：数据库用户名。
password（str）：数据库密码。
port（int）：数据库端口，默认为 3306。
database（str）：数据库名称。

其他选项

pure_python（bool）：启用纯 Python 模式。
local_infile（bool）：允许本地文件上传。
charset（str）：字符串值的字符集。
ssl_key、ssl_cert、ssl_ca（str）：SSL 文件路径。
ssl_disabled（bool）：禁用 SSL。
ssl_verify_cert（bool）：验证服务器证书。
ssl_verify_identity（bool）：验证服务器身份。
autocommit（bool）：启用自动提交。
results_type（str）：查询结果的结构（如 tuples、dicts）。

import os

from langchain_singlestore.vectorstores import SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

vector_store = SingleStoreVectorStore(embeddings=embeddings)

管理向量存储

SingleStoreVectorStore 假设 Document 的 ID 是整数。以下是管理向量存储的示例。

向向量存储添加条目

可以按如下方式向向量存储添加文档：

pip install -qU langchain-core

from langchain_core.documents import Document

docs = [
    Document(
        page_content="""In the parched desert, a sudden rainstorm brought relief,
            as the droplets danced upon the thirsty earth, rejuvenating the landscape
            with the sweet scent of petrichor.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Amidst the bustling cityscape, the rain fell relentlessly,
            creating a symphony of pitter-patter on the pavement, while umbrellas
            bloomed like colorful flowers in a sea of gray.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""High in the mountains, the rain transformed into a delicate
            mist, enveloping the peaks in a mystical veil, where each droplet seemed to
            whisper secrets to the ancient rocks below.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Blanketing the countryside in a soft, pristine layer, the
            snowfall painted a serene tableau, muffling the world in a tranquil hush
            as delicate flakes settled upon the branches of trees like nature's own
            lacework.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""In the urban landscape, snow descended, transforming
            bustling streets into a winter wonderland, where the laughter of
            children echoed amidst the flurry of snowballs and the twinkle of
            holiday lights.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""Atop the rugged peaks, snow fell with an unyielding
            intensity, sculpting the landscape into a pristine alpine paradise,
            where the frozen crystals shimmered under the moonlight, casting a
            spell of enchantment over the wilderness below.""",
        metadata={"category": "snow"},
    ),
]


vector_store.add_documents(docs)

更新向量存储中的条目

要更新向量存储中的已有文档，使用以下代码：

updated_document = Document(
    page_content="qux", metadata={"source": "https://another-example.com"}
)

vector_store.update_documents(document_id="1", document=updated_document)

从向量存储删除条目

要从向量存储中删除文档，使用以下代码：

vector_store.delete(ids=["3"])

查询向量存储

创建向量存储并添加相关文档后，您很可能希望在链或 agent 运行期间对其进行查询。

直接查询

执行简单的相似性搜索如下：

results = vector_store.similarity_search(query="trees in the snow", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

如果要执行相似性搜索并获取对应的评分，可以运行：

TODO: Edit and then run code cell to generate output

results = vector_store.similarity_search_with_score(query="trees in the snow", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

元数据过滤

SingleStoreDB 通过支持基于元数据字段的预过滤来增强搜索能力，精细化搜索结果。这一功能使开发者和数据分析师能够精调查询，确保搜索结果精确符合需求。通过使用特定元数据属性过滤搜索结果，用户可以缩小查询范围，只关注相关数据子集。 SingleStoreVectorStore 支持使用强大的查询运算符进行简单和高级元数据过滤。

简单元数据过滤

对精确匹配和向后兼容性使用简单的字典样式语法：

# Filter by a single field
query = "trees branches"
docs = vector_store.similarity_search(
    query, filter={"category": "snow"}
)

# Filter by multiple fields (implicit AND)
docs = vector_store.similarity_search(
    query="landmarks",
    filter={"country": "France", "category": "museum"}
)

高级元数据过滤

使用 $eq、$gt、$in、$and、$or 等运算符进行复杂查询的高级过滤： 比较运算符：

# Greater than, less than, and other comparisons
results = vector_store.similarity_search(
    query="old structures",
    k=10,
    filter={"year_built": {"$lt": 1900}}  # Built before 1900
)

# Other operators: $eq, $ne, $gt, $gte, $lte
results = vector_store.similarity_search(
    query="landmarks",
    filter={"year_built": {"$gte": 1800, "$lte": 1950}}
)

集合运算符：

# Check if value is in a list
results = vector_store.similarity_search(
    query="landmarks",
    k=10,
    filter={"country": {"$in": ["France", "UK"]}}
)

# Not in ($nin)
results = vector_store.similarity_search(
    query="museums",
    filter={"country": {"$nin": ["USA", "Canada"]}}
)

存在性检查：

# Check if a field exists
results = vector_store.similarity_search(
    query="heritage sites",
    k=10,
    filter={"heritage_status": {"$exists": True}}
)

逻辑运算符：

# Combine multiple conditions with $and
results = vector_store.similarity_search(
    query="european landmarks",
    k=10,
    filter={
        "$and": [
            {"category": "landmark"},
            {"year_built": {"$gte": 1800}},
            {"country": {"$in": ["France", "UK"]}}
        ]
    }
)

# Use $or for alternative conditions
results = vector_store.similarity_search(
    query="cultural sites",
    filter={
        "$or": [
            {"category": "museum"},
            {"category": "landmark"}
        ]
    }
)

# Complex nested queries
results = vector_store.similarity_search(
    query="cultural sites",
    k=10,
    filter={
        "$or": [
            {
                "$and": [
                    {"category": "museum"},
                    {"country": "France"}
                ]
            },
            {
                "$and": [
                    {"category": "landmark"},
                    {"year_built": {"$lt": 1900}}
                ]
            }
        ]
    }
)

向量索引

通过利用 ANN 向量索引，可以在 SingleStore DB 8.5 或更高版本中提高搜索效率。在创建向量存储对象时设置 use_vector_index=True 即可激活此功能。此外，如果您的向量维度与默认的 OpenAI 嵌入大小（1536）不同，请相应地指定 vector_size 参数。

搜索策略

SingleStoreDB 提供多种搜索策略，每种策略都专为特定用例和用户偏好而设计。默认的 VECTOR_ONLY 策略使用 dot_product 或 euclidean_distance 等向量运算直接计算向量间的相似度分数；TEXT_ONLY 则采用基于 Lucene 的全文搜索，特别适合以文本为中心的应用。对于寻求平衡方案的用户，FILTER_BY_TEXT 先根据文本相似度精练结果再进行向量比较，而 FILTER_BY_VECTOR 则优先考虑向量相似度，在评估文本相似度之前先过滤结果以获得最佳匹配。值得注意的是，FILTER_BY_TEXT 和 FILTER_BY_VECTOR 都需要全文索引才能运行。此外，WEIGHTED_SUM 是一种复杂策略，通过对向量和文本相似度加权来计算最终相似度分数，但仅使用点积距离计算，同样需要全文索引。这些多样化策略使用户能够根据独特需求微调搜索，促进高效精确的数据检索与分析。SingleStoreDB 的混合方法（如 FILTER_BY_TEXT、FILTER_BY_VECTOR 和 WEIGHTED_SUM 策略）将向量和基于文本的搜索无缝融合，以最大化效率和准确性，确保用户能够充分利用平台能力满足各类应用需求。

from langchain_singlestore.vectorstores import DistanceStrategy

docsearch = SingleStoreVectorStore.from_documents(
    docs,
    embeddings,
    distance_strategy=DistanceStrategy.DOT_PRODUCT,  # Use dot product for similarity search
    use_vector_index=True,  # Use vector index for faster search
    use_full_text_search=True,  # Use full text index
)

vectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.VECTOR_ONLY,
    filter={"category": "rain"},
)
print(vectorResults[0].page_content)

textResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.TEXT_ONLY,
)
print(textResults[0].page_content)

filteredByTextResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_TEXT,
    filter_threshold=0.1,
)
print(filteredByTextResults[0].page_content)

filteredByVectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_VECTOR,
    filter_threshold=0.1,
)
print(filteredByVectorResults[0].page_content)

weightedSumResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.WEIGHTED_SUM,
    text_weight=0.2,
    vector_weight=0.8,
)
print(weightedSumResults[0].page_content)

转换为检索器进行查询

也可以将向量存储转换为检索器，以便在链中更方便地使用。

retriever = vector_store.as_retriever(search_kwargs={"k": 1})
retriever.invoke("trees in the snow")

多模态示例：利用 CLIP 和 OpenClip 嵌入

在多模态数据分析领域，整合图像和文本等多种信息类型变得越来越重要。CLIP 是一种强大的工具，能够将图像和文本嵌入到共享的语义空间中，从而实现通过相似性搜索跨模态检索相关内容。举个例子，考虑一个需要有效分析多模态数据的应用场景。在这个例子中，我们利用 OpenClip 多模态嵌入的能力，该嵌入基于 CLIP 框架。通过 OpenClip，我们可以将文本描述与对应图像无缝嵌入，实现全面的分析和检索任务。无论是根据文本查询识别视觉相似图像，还是查找与特定视觉内容相关的文本段落，OpenClip 都能使用户以卓越的效率和准确性探索多模态数据并提取洞察。

pip install -U langchain openai lanchain-singlestore langchain-experimental

import os

from langchain_experimental.open_clip import OpenCLIPEmbeddings
from langchain_singlestore.vectorstores import SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

TEST_IMAGES_DIR = "../../modules/images"

docsearch = SingleStoreVectorStore(OpenCLIPEmbeddings())

image_uris = sorted(
    [
        os.path.join(TEST_IMAGES_DIR, image_name)
        for image_name in os.listdir(TEST_IMAGES_DIR)
        if image_name.endswith(".jpg")
    ]
)

# Add images
docsearch.add_images(uris=image_uris)

用于检索增强生成

关于如何将此向量存储用于检索增强生成（RAG）的指南，请参阅以下章节：

API 参考

所有 SingleStore Document Loader 功能和配置的详细文档，请参阅 GitHub 页面：https://github.com/singlestore-labs/langchain-singlestore/

在 GitHub 上编辑此页面或提交 issue。

将这些文档连接到 Claude、VSCode 等，通过 MCP 获取实时答案。

Popular Providers

Integrations by component

安装配置

初始化

必填参数

可选参数

连接池参数

数据库连接参数

其他选项

管理向量存储

向向量存储添加条目

更新向量存储中的条目

从向量存储删除条目

查询向量存储

直接查询

元数据过滤

简单元数据过滤

高级元数据过滤

向量索引

搜索策略

转换为检索器进行查询

多模态示例：利用 CLIP 和 OpenClip 嵌入

用于检索增强生成

API 参考

Popular Providers

Integrations by component

​安装配置

​初始化

​必填参数

​可选参数

​连接池参数

​数据库连接参数

​其他选项

​管理向量存储

​向向量存储添加条目

​更新向量存储中的条目

​从向量存储删除条目

​查询向量存储

​直接查询

​元数据过滤

​简单元数据过滤

​高级元数据过滤

​向量索引

​搜索策略

​转换为检索器进行查询

​多模态示例：利用 CLIP 和 OpenClip 嵌入

​用于检索增强生成

​API 参考

安装配置

初始化

必填参数

可选参数

连接池参数

数据库连接参数

其他选项

管理向量存储

向向量存储添加条目

更新向量存储中的条目

从向量存储删除条目

查询向量存储

直接查询

元数据过滤

简单元数据过滤

高级元数据过滤

向量索引

搜索策略

转换为检索器进行查询

多模态示例：利用 CLIP 和 OpenClip 嵌入

用于检索增强生成

API 参考