Skip to main content
ScaNN(可扩展近邻算法)是一种在大规模场景下进行高效向量相似度搜索的方法。 ScaNN 包含用于最大内积搜索的搜索空间剪枝和量化,也支持其他距离函数,如欧氏距离。该实现针对具有 AVX2 支持的 x86 处理器进行了优化。请参阅其 Google Research GitHub 以了解更多详情。 您需要使用 pip install -qU langchain-community 安装 langchain-community 才能使用此集成。

安装

通过 pip 安装 ScaNN。或者,您可以按照 ScaNN 官网 上的说明从源代码安装。
pip install -qU  scann

检索演示

以下我们展示如何将 ScaNN 与 Huggingface 嵌入结合使用。
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import ScaNN
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)


model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

db = ScaNN.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

docs[0]

RetrievalQA 演示

接下来,我们演示将 ScaNN 与 Google PaLM API 结合使用。 您可以从 developers.generativeai.google/tutorials/setup 获取 API 密钥。
from langchain_classic.chains import RetrievalQA
from langchain_community.chat_models.google_palm import ChatGooglePalm

palm_client = ChatGooglePalm(google_api_key="YOUR_GOOGLE_PALM_API_KEY")

qa = RetrievalQA.from_chain_type(
    llm=palm_client,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 10}),
)
print(qa.run("What did the president say about Ketanji Brown Jackson?"))
The president said that Ketanji Brown Jackson is one of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.
print(qa.run("What did the president say about Michael Phelps?"))
The president did not mention Michael Phelps in his speech.

保存和加载本地检索索引

db.save_local("/tmp/db", "state_of_union")
restored_db = ScaNN.load_local("/tmp/db", embeddings, index_name="state_of_union")