使用 LangChain 构建语义搜索引擎

概述

本教程将使您熟悉 LangChain 的文档加载器、嵌入和向量存储抽象。这些抽象旨在支持从（向量）数据库和其他来源检索数据，以便与 LLM 工作流集成。对于需要获取数据以作为模型推理一部分进行推理的应用程序（例如检索增强生成或 RAG）来说，它们非常重要。在这里，我们将构建一个针对 PDF 文档的搜索引擎。这将使我们能够检索 PDF 中与输入查询相似的段落。该指南还包括在搜索引擎之上实现一个最小化的 RAG。

概念

本指南侧重于文本数据的检索。我们将涵盖以下概念：

设置

安装

本教程需要 langchain-community 和 pypdf 包：

pip install langchain-community pypdf

更多详情，请参阅我们的安装指南。

LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤和多次 LLM 调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部究竟发生了什么变得至关重要。最好的方法是使用 LangSmith。在您通过上述链接注册后，请确保设置环境变量以开始记录跟踪：

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

或者，如果在笔记本中，您可以使用以下方式设置它们：

import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

1. 文档和文档加载器

LangChain 实现了一个 Document 抽象，旨在表示一个文本单元及其关联的元数据。它具有三个属性：

page_content：表示内容的字符串；
metadata：包含任意元数据的字典；
id：（可选）文档的字符串标识符。

metadata 属性可以捕获有关文档来源、其与其他文档的关系以及其他信息。请注意，单个 Document 对象通常表示较大文档的一个块。我们可以根据需要生成示例文档：

from langchain_core.documents import Document

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
]

然而，LangChain 生态系统实现了文档加载器，可与数百个常见源集成。这使得将这些源的数据轻松纳入您的 AI 应用程序变得容易。

加载文档

让我们将 PDF 加载到一系列 Document 对象中。这是一个示例 PDF — 2023 年 Nike 的 10-K 文件。我们可以查阅 LangChain 文档以获取可用的 PDF 文档加载器。

from langchain_community.document_loaders import PyPDFLoader

file_path = "../example_data/nke-10k-2023.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

PyPDFLoader 为每个 PDF 页面加载一个 Document 对象。对于每个对象，我们可以轻松访问：

页面的字符串内容；
包含文件名和页码的元数据。

print(f"{docs[0].page_content[:200]}\n")
print(docs[0].metadata)

Table of Contents
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 10-K
(Mark One)
☑ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934
FO

{'source': '../example_data/nke-10k-2023.pdf', 'page': 0}

分割

对于信息检索和下游问答目的而言，页面可能是一个过于粗糙的表示。我们的最终目标是检索能够回答输入查询的 Document 对象，进一步分割 PDF 有助于确保文档相关部分的含义不会被周围文本“冲淡”。我们可以使用文本分割器来实现此目的。在这里，我们将使用一个简单的基于字符进行分区的文本分割器。我们将文档分割成 1000 个字符的块，块之间有 200 个字符的重叠。重叠有助于减轻将语句与其重要上下文分离的可能性。我们使用 RecursiveCharacterTextSplitter，它将使用常见的分隔符（如换行符）递归分割文档，直到每个块达到合适的大小。这是通用文本用例的推荐文本分割器。我们设置 add_start_index=True，以便将每个分割文档在初始文档中开始的字符索引保留为元数据属性“start_index”。

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

print(len(all_splits))

2. 嵌入

向量搜索是存储和搜索非结构化数据（例如非结构化文本）的常用方法。其思想是存储与文本关联的数字向量。给定一个查询，我们可以将其嵌入为相同维度的向量，并使用向量相似度度量（例如余弦相似度）来识别相关文本。 LangChain 支持来自数十个提供商的嵌入。这些模型指定了如何将文本转换为数字向量。让我们选择一个模型：

pip install -U "langchain-openai"

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

pip install -U "langchain-openai"

import getpass
import os

if not os.environ.get("AZURE_OPENAI_API_KEY"):
    os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass("Enter API key for Azure: ")

from langchain_openai import AzureOpenAIEmbeddings

embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

pip install -qU langchain-google-genai

import getpass
import os

if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

pip install -qU langchain-google-vertexai

from langchain_google_vertexai import VertexAIEmbeddings

embeddings = VertexAIEmbeddings(model="text-embedding-005")

pip install -qU langchain-aws

from langchain_aws import BedrockEmbeddings

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

pip install -qU langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

pip install -qU langchain-ollama

from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="llama3")

pip install -qU langchain-cohere

import getpass
import os

if not os.environ.get("COHERE_API_KEY"):
    os.environ["COHERE_API_KEY"] = getpass.getpass("Enter API key for Cohere: ")

from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(model="embed-english-v3.0")

pip install -qU langchain-mistralai

import getpass
import os

if not os.environ.get("MISTRALAI_API_KEY"):
    os.environ["MISTRALAI_API_KEY"] = getpass.getpass("Enter API key for MistralAI: ")

from langchain_mistralai import MistralAIEmbeddings

embeddings = MistralAIEmbeddings(model="mistral-embed")

pip install -qU langchain-nomic

import getpass
import os

if not os.environ.get("NOMIC_API_KEY"):
    os.environ["NOMIC_API_KEY"] = getpass.getpass("Enter API key for Nomic: ")

from langchain_nomic import NomicEmbeddings

embeddings = NomicEmbeddings(model="nomic-embed-text-v1.5")

pip install -qU langchain-nvidia-ai-endpoints

import getpass
import os

if not os.environ.get("NVIDIA_API_KEY"):
    os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter API key for NVIDIA: ")

from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embeddings = NVIDIAEmbeddings(model="NV-Embed-QA")

pip install -qU langchain-voyageai

import getpass
import os

if not os.environ.get("VOYAGE_API_KEY"):
    os.environ["VOYAGE_API_KEY"] = getpass.getpass("Enter API key for Voyage AI: ")

from langchain-voyageai import VoyageAIEmbeddings

embeddings = VoyageAIEmbeddings(model="voyage-3")

pip install -qU langchain-ibm

import getpass
import os

if not os.environ.get("WATSONX_APIKEY"):
    os.environ["WATSONX_APIKEY"] = getpass.getpass("Enter API key for IBM watsonx: ")

from langchain_ibm import WatsonxEmbeddings

embeddings = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="<WATSONX PROJECT_ID>",
)

pip install -qU langchain-core

from langchain_core.embeddings import DeterministicFakeEmbedding

embeddings = DeterministicFakeEmbedding(size=4096)

pip install -qU langchain-isaacus

import getpass
import os

if not os.environ.get("ISAACUS_API_KEY"):
os.environ["ISAACUS_API_KEY"] = getpass.getpass("Enter API key for Isaacus: ")

from langchain_isaacus import IsaacusEmbeddings

embeddings = IsaacusEmbeddings(model="kanon-2-embedder")

vector_1 = embeddings.embed_query(all_splits[0].page_content)
vector_2 = embeddings.embed_query(all_splits[1].page_content)

assert len(vector_1) == len(vector_2)
print(f"Generated vectors of length {len(vector_1)}\n")
print(vector_1[:10])

Generated vectors of length 1536

[-0.008586574345827103, -0.03341241180896759, -0.008936782367527485, -0.0036674530711025, 0.010564599186182022, 0.009598285891115665, -0.028587326407432556, -0.015824200585484505, 0.0030416189692914486, -0.012899317778646946]

掌握了用于生成文本嵌入的模型后，我们可以接下来将它们存储在支持高效相似度搜索的特殊数据结构中。

3. 向量存储

LangChain VectorStore 对象包含用于向存储中添加文本和 Document 对象的方法，以及使用各种相似度度量进行查询的方法。它们通常使用嵌入模型初始化，这些模型决定了文本数据如何转换为数字向量。 LangChain 包含一套与不同向量存储技术的集成。一些向量存储由提供商托管（例如各种云提供商），需要特定的凭据才能使用；一些（如 Postgres）在可以本地运行或通过第三方运行的单独基础设施中运行；其他可以在内存中运行以用于轻量级工作负载。让我们选择一个向量存储：

pip install -U "langchain-core"

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

pip install -qU  boto3

from opensearchpy import RequestsHttpConnection

service = "es"  # 必须将服务设置为 'es'
region = "us-east-2"
credentials = boto3.Session(
    aws_access_key_id="xxxxxx", aws_secret_access_key="xxxxx"
).get_credentials()
awsauth = AWS4Auth("xxxxx", "xxxxxx", region, service, session_token=credentials.token)

vector_store = OpenSearchVectorSearch.from_documents(
    docs,
    embeddings,
    opensearch_url="host url",
    http_auth=awsauth,
    timeout=300,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    index_name="test-index",
)

pip install -U "langchain-astradb"

from langchain_astradb import AstraDBVectorStore

vector_store = AstraDBVectorStore(
    embedding=embeddings,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
    collection_name="astra_vector_langchain",
    token=ASTRA_DB_APPLICATION_TOKEN,
    namespace=ASTRA_DB_NAMESPACE,
)

pip install -qU langchain-chroma

from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # 本地保存数据的位置，如果不需要则移除
)

pip install -qU langchain-community faiss-cpu

import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embedding_dim = len(embeddings.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

pip install -qU langchain-milvus

from langchain_milvus import Milvus

URI = "./milvus_example.db"

vector_store = Milvus(
    embedding_function=embeddings,
    connection_args={"uri": URI},
    index_params={"index_type": "FLAT", "metric_type": "L2"},
)

pip install -qU langchain-mongodb

from langchain_mongodb import MongoDBAtlasVectorSearch

vector_store = MongoDBAtlasVectorSearch(
    embedding=embeddings,
    collection=MONGODB_COLLECTION,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    relevance_score_fn="cosine",
)

pip install -qU langchain-postgres

from langchain_postgres import PGVector

vector_store = PGVector(
    embeddings=embeddings,
    collection_name="my_docs",
    connection="postgresql+psycopg://...",
)

pip install -qU langchain-postgres

from langchain_postgres import PGEngine, PGVectorStore

pg_engine = PGEngine.from_connection_string(
    url="postgresql+psycopg://..."
)

vector_store = PGVectorStore.create_sync(
    engine=pg_engine,
    table_name='test_table',
    embedding_service=embeddings
)

pip install -qU langchain-pinecone

from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone

pc = Pinecone(api_key=...)
index = pc.Index(index_name)

vector_store = PineconeVectorStore(embedding=embeddings, index=index)

pip install -qU langchain-qdrant

from qdrant_client.models import Distance, VectorParams
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

client = QdrantClient(":memory:")

vector_size = len(embeddings.embed_query("sample text"))

if not client.collection_exists("test"):
    client.create_collection(
        collection_name="test",
        vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE)
    )
vector_store = QdrantVectorStore(
    client=client,
    collection_name="test",
    embedding=embeddings,
)

实例化我们的向量存储后，我们现在可以对文档建立索引。

ids = vector_store.add_documents(documents=all_splits)

请注意，大多数向量存储实现允许您连接到现有的向量存储——例如，通过提供客户端、索引名称或其他信息。有关更多详细信息，请参阅特定集成的文档。一旦我们实例化了一个包含文档的 VectorStore，我们就可以对其进行查询。VectorStore 包含用于查询的方法：

同步和异步；
按字符串查询和按向量；
带或不带返回相似度分数；
按相似度和最大边际相关性（以平衡相似度与检索结果的多样性）。

这些方法通常在其输出中包含一个 Document 对象列表。用法嵌入通常将文本表示为“密集”向量，使得含义相似的文本在几何上接近。这使我们只需传入问题即可检索相关信息，而无需了解文档中使用的任何特定关键词。根据与字符串查询的相似度返回文档：

results = vector_store.similarity_search(
    "How many distribution centers does Nike have in the US?"
)

print(results[0])

page_content='direct to consumer operations sell products through the following number of retail stores in the United States:
U.S. RETAIL STORES NUMBER
NIKE Brand factory stores 213
NIKE Brand in-line stores (including employee-only stores) 74
Converse stores (including factory stores) 82
TOTAL 369
In the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information.
2023 FORM 10-K 2' metadata={'page': 4, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 3125}

异步查询：

results = await vector_store.asimilarity_search("When was Nike incorporated?")

print(results[0])

page_content='Table of Contents
PART I
ITEM 1. BUSINESS
GENERAL
NIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our,"
"NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise.
Our principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is
the largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores
and sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales' metadata={'page': 3, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 0}

返回分数：

# 注意，提供商实现不同的分数；这里的分数
# 是一个距离度量，与相似度成反比。

results = vector_store.similarity_search_with_score("What was Nike's revenue in 2023?")
doc, score = results[0]
print(f"Score: {score}\n")
print(doc)

Score: 0.23699893057346344

page_content='Table of Contents
FISCAL 2023 NIKE BRAND REVENUE HIGHLIGHTS
The following tables present NIKE Brand revenues disaggregated by reportable operating segment, distribution channel and major product line:
FISCAL 2023 COMPARED TO FISCAL 2022
•NIKE, Inc. Revenues were $51.2 billion in fiscal 2023, which increased 10% and 16% compared to fiscal 2022 on a reported and currency-neutral basis, respectively.
The increase was due to higher revenues in North America, Europe, Middle East & Africa ("EMEA"), APLA and Greater China, which contributed approximately 7, 6,
2 and 1 percentage points to NIKE, Inc. Revenues, respectively.
•NIKE Brand revenues, which represented over 90% of NIKE, Inc. Revenues, increased 10% and 16% on a reported and currency-neutral basis, respectively. This
increase was primarily due to higher revenues in Men's, the Jordan Brand, Women's and Kids' which grew 17%, 35%,11% and 10%, respectively, on a wholesale
equivalent basis.' metadata={'page': 35, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 0}

根据与嵌入查询的相似度返回文档：

embedding = embeddings.embed_query("How were Nike's margins impacted in 2023?")

results = vector_store.similarity_search_by_vector(embedding)
print(results[0])

page_content='Table of Contents
GROSS MARGIN
FISCAL 2023 COMPARED TO FISCAL 2022
For fiscal 2023, our consolidated gross profit increased 4% to $22,292 million compared to $21,479 million for fiscal 2022. Gross margin decreased 250 basis points to
43.5% for fiscal 2023 compared to 46.0% for fiscal 2022 due to the following:
*Wholesale equivalent
The decrease in gross margin for fiscal 2023 was primarily due to:
•Higher NIKE Brand product costs, on a wholesale equivalent basis, primarily due to higher input costs and elevated inbound freight and logistics costs as well as
product mix;
•Lower margin in our NIKE Direct business, driven by higher promotional activity to liquidate inventory in the current period compared to lower promotional activity in
the prior period resulting from lower available inventory supply;
•Unfavorable changes in net foreign currency exchange rates, including hedges; and
•Lower off-price margin, on a wholesale equivalent basis.
This was partially offset by:' metadata={'page': 36, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 0}

了解更多：

4. 检索器

LangChain VectorStore 对象不继承 Runnable。LangChain 检索器是 Runnables，因此它们实现了一组标准方法（例如，同步和异步 invoke 和 batch 操作）。虽然我们可以从向量存储构造检索器，但检索器也可以与非向量存储的数据源（例如外部 API）接口。我们可以自己创建一个简单的版本，而无需子类化 Retriever。如果我们选择希望用于检索文档的方法，我们可以轻松创建一个 runnable。下面我们将围绕 similarity_search 方法构建一个：

from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain


@chain
def retriever(query: str) -> List[Document]:
    return vector_store.similarity_search(query, k=1)


retriever.batch(
    [
        "How many distribution centers does Nike have in the US?",
        "When was Nike incorporated?",
    ],
)

[[Document(metadata={'page': 4, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 3125}, page_content='direct to consumer operations sell products through the following number of retail stores in the United States:\nU.S. RETAIL STORES NUMBER\nNIKE Brand factory stores 213 \nNIKE Brand in-line stores (including employee-only stores) 74 \nConverse stores (including factory stores) 82 \nTOTAL 369 \nIn the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information.\n2023 FORM 10-K 2')],
 [Document(metadata={'page': 3, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 0}, page_content='Table of Contents\nPART I\nITEM 1. BUSINESS\nGENERAL\nNIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our,"\n"NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise.\nOur principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is\nthe largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores\nand sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales')]]

向量存储实现了一个 as_retriever 方法，该方法将生成一个检索器，具体来说是一个 VectorStoreRetriever。这些检索器包括特定的 search_type 和 search_kwargs 属性，用于标识要调用底层向量存储的哪些方法以及如何对其进行参数化。例如，我们可以使用以下方式复制上述内容：

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1},
)

retriever.batch(
    [
        "How many distribution centers does Nike have in the US?",
        "When was Nike incorporated?",
    ],
)

[[Document(metadata={'page': 4, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 3125}, page_content='direct to consumer operations sell products through the following number of retail stores in the United States:\nU.S. RETAIL STORES NUMBER\nNIKE Brand factory stores 213 \nNIKE Brand in-line stores (including employee-only stores) 74 \nConverse stores (including factory stores) 82 \nTOTAL 369 \nIn the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information.\n2023 FORM 10-K 2')],
 [Document(metadata={'page': 3, 'source': '../example_data/nke-10k-2023.pdf', 'start_index': 0}, page_content='Table of Contents\nPART I\nITEM 1. BUSINESS\nGENERAL\nNIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our,"\n"NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise.\nOur principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is\nthe largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores\nand sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales')]]

VectorStoreRetriever 支持搜索类型 "similarity"（默认）、"mmr"（最大边际相关性，如上所述）和 "similarity_score_threshold"。我们可以使用后者通过相似度分数对检索器输出的文档进行阈值设置。检索器可以轻松地集成到更复杂的应用程序中，例如检索增强生成 (RAG) 应用程序，这些应用程序将给定问题与检索到的上下文结合到 LLM 的提示中。要了解有关构建此类应用程序的更多信息，请查看RAG 教程教程。

后续步骤

您现在已经了解了如何针对 PDF 文档构建语义搜索引擎。有关文档加载器的更多信息：

有关嵌入的更多信息：

有关向量存储的更多信息：

有关 RAG 的更多信息，请参阅：

构建检索增强生成 (RAG) 应用

在 GitHub 上编辑此页面或提交问题。

通过 MCP 将这些文档连接到 Claude、VSCode 等以获取实时答案。

Tutorials

Conceptual overviews

Additional resources

使用 LangChain 构建语义搜索引擎

概述

概念

设置

安装

LangSmith

1. 文档和文档加载器

加载文档

分割

2. 嵌入

3. 向量存储

4. 检索器

后续步骤

Tutorials

Conceptual overviews

Additional resources

​概述

​概念

​设置

​安装

​LangSmith

​1. 文档和文档加载器

​加载文档

​分割

​2. 嵌入

​3. 向量存储

​4. 检索器

​后续步骤

概述

概念

设置

安装

LangSmith

1. 文档和文档加载器

加载文档

分割

2. 嵌入

3. 向量存储

4. 检索器

后续步骤