Skip to main content
ApertureDB 是一个数据库,用于存储、索引和管理多模态数据,包括文本、图像、视频、边界框和嵌入向量,以及与之关联的元数据。 本 notebook 介绍如何使用 ApertureDB 的嵌入向量功能。

安装 ApertureDB Python SDK

安装用于编写 ApertureDB 客户端代码的 Python SDK
pip install -qU aperturedb

运行 ApertureDB 实例

要继续操作,您需要启动并运行一个 ApertureDB 实例,并配置您的环境以使用它。 有多种方式可以实现,例如:
docker run --publish 55555:55555 aperturedata/aperturedb-standalone
adb config create local --active --no-interactive

下载网络文档

我们将对一个网页进行小规模爬取。
# For loading documents from web
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.aperturedata.io")
docs = loader.load()
USER_AGENT environment variable not set, consider setting it to identify your requests.

选择嵌入模型

我们将使用 OllamaEmbeddings,因此需要导入必要的模块。 Ollama 可以作为 Docker 容器运行,详见文档,例如:
# Run server
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Tell server to load a specific model
docker exec ollama ollama run llama2
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings()

将文档拆分为片段

我们希望将单个文档拆分为多个片段。
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

从文档和嵌入向量创建向量存储

此代码在 ApertureDB 实例中创建一个向量存储。 在实例中,该向量存储以”描述符集(descriptor set)“的形式表示。 默认情况下,描述符集名为 langchain。以下代码将为每个文档生成嵌入向量,并将其作为描述符存储到 ApertureDB 中。由于嵌入向量正在生成,此过程需要几秒钟。
from langchain_community.vectorstores import ApertureDB

vector_db = ApertureDB.from_documents(documents, embeddings)

选择大语言模型

我们再次使用之前配置好的 Ollama 服务器进行本地处理。
from langchain_community.llms import Ollama

llm = Ollama(model="llama2")

构建 RAG 链

现在我们拥有了创建 RAG(检索增强生成)链所需的全部组件。该链执行以下操作:
  1. 为用户查询生成嵌入描述符
  2. 使用向量存储查找与用户查询相似的文本片段
  3. 使用提示模板将用户查询和上下文文档传递给 LLM
  4. 返回 LLM 的答案
# Create prompt
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")


# Create a chain that passes documents to an LLM
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)


# Treat the vectorstore as a document retriever
retriever = vector_db.as_retriever()


# Create a RAG chain that connects the retriever to the LLM
from langchain_classic.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)
Based on the provided context, ApertureDB can store images. In fact, it is specifically designed to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. So, ApertureDB has the capability to store and manage images.

运行 RAG 链

最后,我们向链传入一个问题并获取答案。由于 LLM 需要根据查询和上下文文档生成答案,此过程需要几秒钟。
user_query = "How can ApertureDB store images?"
response = retrieval_chain.invoke({"input": user_query})
print(response["answer"])
Based on the provided context, ApertureDB can store images in several ways:

1. Multimodal data management: ApertureDB offers a unified interface to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. This means that images can be stored along with other types of data in a single database instance.
2. Image storage: ApertureDB provides image storage capabilities through its integration with the public cloud providers or on-premise installations. This allows customers to host their own ApertureDB instances and store images on their preferred cloud provider or on-premise infrastructure.
3. Vector database: ApertureDB also offers a vector database that enables efficient similarity search and classification of images based on their semantic meaning. This can be useful for applications where image search and classification are important, such as in computer vision or machine learning workflows.

Overall, ApertureDB provides flexible and scalable storage options for images, allowing customers to choose the deployment model that best suits their needs.