Skip to main content
阿里云 OpenSearch 是一个用于开发智能搜索服务的一站式平台。OpenSearch 基于 阿里巴巴 自研的大规模分布式搜索引擎构建,服务于阿里巴巴集团内超过 500 个业务场景以及数千家阿里云客户。OpenSearch 可帮助在电商、O2O、多媒体、内容行业、社区与论坛、企业大数据查询等各类搜索场景中开发搜索服务。
OpenSearch 帮助您构建高质量、免运维、高性能的智能搜索服务,为用户提供高效、精准的搜索体验。
OpenSearch 提供向量搜索功能。在特定场景下,尤其是试题搜索和图像搜索场景中,您可以将向量搜索功能与多模态搜索功能结合使用,以提升搜索结果的准确性。
本 notebook 展示了如何使用 阿里云 OpenSearch 向量检索版 的相关功能。

环境准备

购买并配置实例

阿里云购买 OpenSearch 向量检索版实例,并根据帮助文档完成实例配置。 运行前,您需要确保 OpenSearch 向量检索版实例已正常启动并运行。

阿里云 OpenSearch 向量存储类

AlibabaCloudOpenSearch 类支持以下函数:
  • add_texts
  • add_documents
  • from_texts
  • from_documents
  • similarity_search
  • asimilarity_search
  • similarity_search_by_vector
  • asimilarity_search_by_vector
  • similarity_search_with_relevance_scores
  • delete_doc_by_texts
请阅读帮助文档以快速了解并配置 OpenSearch 向量检索版实例。 如果在使用过程中遇到任何问题,请随时联系 xingshaomin.xsm@alibaba-inc.com,我们将竭力为您提供协助与支持。 实例启动并运行后,请按照以下步骤拆分文档、获取嵌入向量、连接阿里云 OpenSearch 实例、为文档建立索引并执行向量检索。 首先需要安装以下 Python 包:
pip install -qU  langchain-community alibabacloud_ha3engine_vector
我们将使用 OpenAIEmbeddings,因此需要获取 OpenAI API Key。
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

示例

from langchain_community.vectorstores import (
    AlibabaCloudOpenSearch,
    AlibabaCloudOpenSearchSettings,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
拆分文档并获取嵌入向量。
from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
创建 OpenSearch 配置。
settings = AlibabaCloudOpenSearchSettings(
    endpoint=" The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
    instance_id="The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
    protocol="Communication Protocol between SDK and Server, default is http.",
    username="The username specified when purchasing the instance.",
    password="The password specified when purchasing the instance.",
    namespace="The instance data will be partitioned based on the namespace field. If the namespace is enabled, you need to specify the namespace field name during initialization. Otherwise, the queries cannot be executed correctly.",
    tablename="The table name specified during instance configuration.",
    embedding_field_separator="Delimiter specified for writing vector field data, default is comma.",
    output_fields="Specify the field list returned when invoking OpenSearch, by default it is the value list of the field mapping field.",
    field_name_mapping={
        "id": "id",  # The id field name mapping of index document.
        "document": "document",  # The text field name mapping of index document.
        "embedding": "embedding",  # The embedding field name mapping of index document.
        "name_of_the_metadata_specified_during_search": "opensearch_metadata_field_name,=",
        # The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query,
        # Currently supported logical operators are: > (greater than), < (less than), = (equal to), <= (less than or equal to), >= (greater than or equal to), != (not equal to).
        # Refer to this link: https://help.aliyun.com/zh/open-search/vector-search-edition/filter-expression
    },
)

# for example

# settings = AlibabaCloudOpenSearchSettings(
#     endpoint='ha-cn-5yd3fhdm102.public.ha.aliyuncs.com',
#     instance_id='ha-cn-5yd3fhdm102',
#     username='instance user name',
#     password='instance password',
#     table_name='test_table',
#     field_name_mapping={
#         "id": "id",
#         "document": "document",
#         "embedding": "embedding",
#         "string_field": "string_filed,=",
#         "int_field": "int_filed,=",
#         "float_field": "float_field,=",
#         "double_field": "double_field,="
#
#     },
# )
通过配置创建 OpenSearch 访问实例。
# Create an opensearch instance and index docs.
opensearch = AlibabaCloudOpenSearch.from_texts(
    texts=docs, embedding=embeddings, config=settings
)
# Create an opensearch instance.
opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)
添加文本并构建索引。
metadatas = [
    {"string_field": "value1", "int_field": 1, "float_field": 1.0, "double_field": 2.0},
    {"string_field": "value2", "int_field": 2, "float_field": 3.0, "double_field": 4.0},
    {"string_field": "value3", "int_field": 3, "float_field": 5.0, "double_field": 6.0},
]
# the key of metadatas must match field_name_mapping in settings.
opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)
查询并检索数据。
query = "What did the president say about Ketanji Brown Jackson"
docs = opensearch.similarity_search(query)
print(docs[0].page_content)
带元数据过滤的查询与数据检索。
query = "What did the president say about Ketanji Brown Jackson"
metadata = {
    "string_field": "value1",
    "int_field": 1,
    "float_field": 1.0,
    "double_field": 2.0,
}
docs = opensearch.similarity_search(query, filter=metadata)
print(docs[0].page_content)
如果在使用过程中遇到任何问题,请随时联系 xingshaomin.xsm@alibaba-inc.com,我们将竭力为您提供协助与支持。