Skip to main content
Azure SQL 提供了一个专用的 向量数据类型,可简化在关系数据库内直接创建、存储和查询向量嵌入的过程。这消除了对单独向量数据库及相关集成的需求,提高了解决方案的安全性,同时降低了整体复杂性。
Azure SQL 是一项强大的服务,结合了可扩展性、安全性和高可用性,提供了现代数据库解决方案的所有优势。它利用复杂的查询优化器和企业功能,在执行传统 SQL 查询的同时执行向量相似性搜索,从而增强数据分析和决策能力。 阅读更多关于 使用 Azure SQL 数据库的智能应用程序 的信息。 此笔记本展示了如何利用此集成的 SQL 向量数据库 来存储文档并使用余弦(余弦距离)、L2(欧几里得距离)和 IP(内积)执行向量搜索查询,以定位与查询向量接近的文档。

设置

安装 langchain-sqlserver Python 包。 代码位于名为 langchain-sqlserver 的集成包中。
!pip install langchain-sqlserver==0.1.1

凭据

运行此笔记本不需要凭据,只需确保已下载 langchain-sqlserver 包。 如果您希望获得最佳级别的模型调用自动跟踪,也可以通过取消注释以下内容来设置您的 LangSmith API 密钥:
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

初始化

from langchain_sqlserver import SQLServer_VectorStore
在 Azure 门户中的数据库设置下找到您的 Azure SQL DB 连接字符串。 更多信息:连接到 Azure SQL DB - Python
import os

import pyodbc

# 定义您的 SQLServer 连接字符串
_CONNECTION_STRING = (
    "Driver={ODBC Driver 18 for SQL Server};"
    "Server=<YOUR_DBSERVER>.database.windows.net,1433;"
    "Database=test;"
    "TrustServerCertificate=yes;"
    "Connection Timeout=60;"
    "LongAsMax=yes;"
)

# 连接字符串可能有所不同:
# "mssql+pyodbc://<username>:<password><servername>/<dbname>?driver=ODBC+Driver+18+for+SQL+Server" -> 指定用户名和密码
# "mssql+pyodbc://<servername>/<dbname>?driver=ODBC+Driver+18+for+SQL+Server&Trusted_connection=yes" -> 使用受信任连接
# "mssql+pyodbc://<servername>/<dbname>?driver=ODBC+Driver+18+for+SQL+Server" -> 使用 EntraID 连接
# "mssql+pyodbc://<servername>/<dbname>?driver=ODBC+Driver+18+for+SQL+Server&Trusted_connection=no" -> 使用 EntraID 连接
在此示例中,我们使用 Azure OpenAI 生成嵌入,但您也可以使用 LangChain 中提供的不同嵌入。 您可以按照此 指南 在 Azure 门户上部署 Azure OpenAI 实例。一旦您的实例运行,请确保您拥有实例名称和密钥。您可以在 Azure 门户中实例的“密钥和终结点”部分找到密钥。
!pip install langchain-openai
# 导入必要的库
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

# 设置您的 AzureOpenAI 详细信息
azure_endpoint = "https://<YOUR_ENDPOINT>.openai.azure.com/"
azure_deployment_name_embedding = "text-embedding-3-small"
azure_deployment_name_chatcompletion = "chatcompletion"
azure_api_version = "2023-05-15"
azure_api_key = "YOUR_KEY"


# 使用 AzureChatOpenAI 进行聊天补全
llm = AzureChatOpenAI(
    azure_endpoint=azure_endpoint,
    azure_deployment=azure_deployment_name_chatcompletion,
    openai_api_version=azure_api_version,
    openai_api_key=azure_api_key,
)

# 使用 AzureOpenAIEmbeddings 进行嵌入
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=azure_endpoint,
    azure_deployment=azure_deployment_name_embedding,
    openai_api_version=azure_api_version,
    openai_api_key=azure_api_key,
)

管理向量存储

from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_sqlserver import SQLServer_VectorStore

# 初始化向量存储
vector_store = SQLServer_VectorStore(
    connection_string=_CONNECTION_STRING,
    distance_strategy=DistanceStrategy.COSINE,  # 可选,如果未提供,默认为 COSINE
    embedding_function=embeddings,  # 您可以使用 LangChain 中提供的不同嵌入
    embedding_length=1536,
    table_name="langchain_test_table",  # 使用自定义名称的表
)

向向量存储添加项目

## 我们将在此示例中使用一些人工数据
query = [
    "我购买了几种 Vitality 罐装狗粮产品,发现它们质量都很好。该产品看起来更像炖菜而不是加工肉,闻起来也更好。我的拉布拉多很挑剔,她比大多数狗更喜欢这个产品。",
    "糖果只是红色的,没有味道。只是平淡且有嚼劲。我再也不会买了。",
    "6 天后到达,已经变质了,我无法吃掉这 6 袋!!",
    "以大约每杯 25 美分的价格购买了这些,这是我当地杂货店价格的一半,而且他们很少备货辣味。这些东西是我办公室的绝佳零食,时间总是很紧,有时无法出去吃顿真正的饭。这是我最喜欢的即食午餐口味之一,每次打折时我都会回来购买。",
    "如果您正在为孩子们寻找一种不那么凌乱的甘草版本,那么一定要试试这些!它们柔软、易于咀嚼,在车里、夏天、海滩等地方不会让您的手变得粘糊糊的。我们喜欢所有的口味,有时会将它们与巧克力混合,制作出非常美味的零食!好东西,价格也很好,强烈推荐!",
    "我们很难在当地找到这个——送货很快,不再需要在当地杂货店的面粉货架上上下寻找。",
    "好东西太多了?我们慢慢地将这种颗粒饲料混合进去,慢慢地将 Felidae 的比例转变为全国性的垃圾食品品牌,直到碗里全是天然食品。到那时,猫们已经无法忍受了。真是一团糟。我们已经放弃了。",
    "嘿,描述说 360 克——大约是 13 盎司,每罐低于 4.00 美元。不可能——这大约是 100 克罐的价格。",
    "这些白色切达干酪扁平面包的味道就像普通的饼干——这并不坏,除了我买它们是因为我想要奶酪味。<br /><br />什么是巨大的失望?盒子的包装多么具有误导性。盒子上的照片(我在商店买的)看起来像是装满了长扁平面包(扩展了盒子的长度和宽度)。错了!装饼干的塑料托盘大约小了 2 周围——留给您大约 15 个左右的小扁平面包。<br /><br />同样糟糕的是,该公司声称他们使用可生物降解和环保的包装。失败!他们用一个巨大的盒子装了数量少得可怜的饼干。一点也不环保。<br /><br />我会再买这些吗?不——我感觉被骗了。其他饼干(如芝麻龙蒿)给您一点<br />物有所值,而且味道更好。",
    "我将此产品用于我儿子的冰沙,他很喜欢。此外,我在淋浴时使用这种油作为皮肤调理剂,它使我的皮肤看起来很棒。我肚子上的一些妊娠纹很快就消失了。强烈推荐!!!",
    "多年来一直在服用椰子油。这是零售市场上最好的。我希望它是玻璃瓶装的,但这个就是。",
]

query_metadata = [
    {"id": 1, "summary": "优质狗粮"},
    {"id": 8, "summary": "难吃,无味"},
    {"id": 4, "summary": "产品变质"},
    {"id": 11, "summary": "物超所值且方便的拉面"},
    {"id": 5, "summary": "非常适合孩子!"},
    {"id": 2, "summary": "美味的沙拉三明治"},
    {"id": 9, "summary": "差点害死猫"},
    {"id": 6, "summary": "价格不可能正确"},
    {"id": 3, "summary": "味道中性,数量具有欺骗性!"},
    {"id": 7, "summary": "这东西很棒"},
    {"id": 10, "summary": "评论没有说谎"},
]
vector_store.add_texts(texts=query, metadatas=query_metadata)
[1, 8, 4, 11, 5, 2, 9, 6, 3, 7, 10]

查询向量存储

一旦您的向量存储创建完毕并添加了相关文档,您很可能希望在运行链或代理时对其进行查询。 执行简单的相似性搜索可以如下完成:
# 在查询的嵌入和文档的嵌入之间执行相似性搜索
simsearch_result = vector_store.similarity_search("Good reviews", k=3)
print(simsearch_result)
[Document(metadata={'id': 1, 'summary': 'Good Quality Dog Food'}, page_content='I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.'), Document(metadata={'id': 7, 'summary': 'This stuff is great'}, page_content='I have used this product in smoothies for my son and he loves it. Additionally, I use this oil in the shower as a skin conditioner and it has made my skin look great. Some of the stretch marks on my belly has disappeared quickly. Highly recommend!!!'), Document(metadata={'id': 5, 'summary': 'Great for the kids!'}, page_content="If you are looking for a less messy version of licorice for the children, then be sure to try these!  They're soft, easy to chew, and they don't get your hands all sticky and gross in the car, in the summer, at the beach, etc. We love all the flavos and sometimes mix these in with the chocolate to have a very nice snack! Great item, great price too, highly recommend!")]

筛选支持

向量存储支持一组可以应用于文档元数据字段的筛选器。此功能使开发人员和数据分析师能够优化其查询,确保搜索结果与其需求准确一致。通过基于特定元数据属性应用筛选器,用户可以限制搜索范围,仅关注最相关的数据子集。
# 混合搜索 -> 筛选 id 不等于 1 的情况。
hybrid_simsearch_result = vector_store.similarity_search(
    "Good reviews", k=3, filter={"id": {"$ne": 1}}
)
print(hybrid_simsearch_result)
[Document(metadata={'id': 7, 'summary': 'This stuff is great'}, page_content='I have used this product in smoothies for my son and he loves it. Additionally, I use this oil in the shower as a skin conditioner and it has made my skin look great. Some of the stretch marks on my belly has disappeared quickly. Highly recommend!!!'), Document(metadata={'id': 5, 'summary': 'Great for the kids!'}, page_content="If you are looking for a less messy version of licorice for the children, then be sure to try these!  They're soft, easy to chew, and they don't get your hands all sticky and gross in the car, in the summer, at the beach, etc. We love all the flavos and sometimes mix these in with the chocolate to have a very nice snack! Great item, great price too, highly recommend!"), Document(metadata={'id': 3, 'summary': 'Taste is neutral, quantity is DECEITFUL!'}, page_content='The taste of these white cheddar flat breads is like a regular cracker - which is not bad, except that I bought them because I wanted a cheese taste.<br /><br />What was a HUGE disappointment? How misleading the packaging of the box is. The photo on the box (I bought these in store) makes it look like it is full of long flatbreads (expanding the length and width of the box). Wrong! The plastic tray that holds the crackers is about 2 smaller all around - leaving you with about 15 or so small flatbreads.<br /><br />What is also bad about this is that the company states they use biodegradable and eco-friendly packaging. FAIL! They used a HUGE box for a ridiculously small amount of crackers. Not ecofriendly at all.<br /><br />Would I buy these again? No - I feel ripped off. The other crackers (like Sesame Tarragon) give you a little<br />more bang for your buck and have more flavor.')]

带分数的相似性搜索

如果您想执行相似性搜索并接收相应的分数,可以运行:
simsearch_with_score_result = vector_store.similarity_search_with_score(
    "Not a very good product", k=12
)
print(simsearch_with_score_result)
[(Document(metadata={'id': 3, 'summary': 'Taste is neutral, quantity is DECEITFUL!'}, page_content='The taste of these white cheddar flat breads is like a regular cracker - which is not bad, except that I bought them because I wanted a cheese taste.<br /><br />What was a HUGE disappointment? How misleading the packaging of the box is. The photo on the box (I bought these in store) makes it look like it is full of long flatbreads (expanding the length and width of the box). Wrong! The plastic tray that holds the crackers is about 2 smaller all around - leaving you with about 15 or so small flatbreads.<br /><br />What is also bad about this is that the company states they use biodegradable and eco-friendly packaging. FAIL! They used a HUGE box for a ridiculously small amount of crackers. Not ecofriendly at all.<br /><br />Would I buy these again? No - I feel ripped off. The other crackers (like Sesame Tarragon) give you a little<br />more bang for your buck and have more flavor.'), 0.651870006770711), (Document(metadata={'id': 8, 'summary': 'Nasty No flavor'}, page_content='The candy is just red , No flavor . Just  plan and chewy .  I would never buy them again'), 0.6908952973052638), (Document(metadata={'id': 4, 'summary': 'stale product'}, page_content='Arrived in 6 days and were so stale i could not eat any of the 6 bags!!'), 0.7360955776468822), (Document(metadata={'id': 1, 'summary': 'Good Quality Dog Food'}, page_content='I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.'), 0.7408823529514486), (Document(metadata={'id': 9, 'summary': 'Nearly killed the cats'}, page_content="Too much of a good thing? We worked this kibble in over time, slowly shifting the percentage of Felidae to national junk-food brand until the bowl was all natural. By this time, the cats couldn't keep it in or down. What a mess. We've moved on."), 0.782995248991772), (Document(metadata={'id': 7, 'summary': 'This stuff is great'}, page_content='I have used this product in smoothies for my son and he loves it. Additionally, I use this oil in the shower as a skin conditioner and it has made my skin look great. Some of the stretch marks on my belly has disappeared quickly. Highly recommend!!!'), 0.7912681479906212), (Document(metadata={'id': 2, 'summary': 'yum falafel'}, page_content='We had trouble finding this locally - delivery was fast, no more hunting up and down the flour aisle at our local grocery stores.'), 0.809213468778896), (Document(metadata={'id': 10, 'summary': "The reviews don't lie"}, page_content='Been taking Coconut Oil for YEARS.  This is the best on the retail market.  I wish it was in glass, but this is the one.'), 0.8281482301097155), (Document(metadata={'id': 5, 'summary': 'Great for the kids!'}, page_content="If you are looking for a less messy version of licorice for the children, then be sure to try these!  They're soft, easy to chew, and they don't get your hands all sticky and gross in the car, in the summer, at the beach, etc. We love all the flavos and sometimes mix these in with the chocolate to have a very nice snack! Great item, great price too, highly recommend!"), 0.8283754326400574), (Document(metadata={'id': 6, 'summary': 'Price cannot be correct'}, page_content='Hey, the description says 360 grams - that is roughly 13 ounces at under $4.00 per can. No way - that is the approximate price for a 100 gram can.'), 0.8323967822635847), (Document(metadata={'id': 11, 'summary': 'Great value and convenient ramen'}, page_content="Got these on sale for roughly 25 cents per cup, which is half the price of my local grocery stores, plus they rarely stock the spicy flavors. These things are a GREAT snack for my office where time is constantly crunched and sometimes you can't escape for a real meal. This is one of my favorite flavors of Instant Lunch and will be back to buy every time it goes on sale."), 0.8387189489406939)]
有关可在 Azure SQL 向量存储上执行的不同搜索的完整列表,请参阅 API 参考

当您已有要搜索的嵌入时进行相似性搜索

# 如果您已有要搜索的嵌入
simsearch_by_vector = vector_store.similarity_search_by_vector(
    [-0.0033353185281157494, -0.017689190804958344, -0.01590404286980629, ...]
)
print(simsearch_by_vector)
[Document(metadata={'id': 8, 'summary': 'Nasty No flavor'}, page_content='The candy is just red , No flavor . Just  plan and chewy .  I would never buy them again'), Document(metadata={'id': 4, 'summary': 'stale product'}, page_content='Arrived in 6 days and were so stale i could not eat any of the 6 bags!!'), Document(metadata={'id': 3, 'summary': 'Taste is neutral, quantity is DECEITFUL!'}, page_content='The taste of these white cheddar flat breads is like a regular cracker - which is not bad, except that I bought them because I wanted a cheese taste.<br /><br />What was a HUGE disappointment? How misleading the packaging of the box is. The photo on the box (I bought these in store) makes it look like it is full of long flatbreads (expanding the length and width of the box). Wrong! The plastic tray that holds the crackers is about 2 smaller all around - leaving you with about 15 or so small flatbreads.<br /><br />What is also bad about this is that the company states they use biodegradable and eco-friendly packaging. FAIL! They used a HUGE box for a ridiculously small amount of crackers. Not ecofriendly at all.<br /><br />Would I buy these again? No - I feel ripped off. The other crackers (like Sesame Tarragon) give you a little<br />more bang for your buck and have more flavor.'), Document(metadata={'id': 6, 'summary': 'Price cannot be correct'}, page_content='Hey, the description says 360 grams - that is roughly 13 ounces at under $4.00 per can. No way - that is the approximate price for a 100 gram can.')]
# 带分数的相似性搜索(如果您已有要搜索的嵌入)
simsearch_by_vector_with_score = vector_store.similarity_search_by_vector_with_score(
    [-0.0033353185281157494, -0.017689190804958344, -0.01590404286980629, ...]
)
print(simsearch_by_vector_with_score)
[(Document(metadata={'id': 8, 'summary': 'Nasty No flavor'}, page_content='The candy is just red , No flavor . Just  plan and chewy .  I would never buy them again'), 0.9648153551769503), (Document(metadata={'id': 4, 'summary': 'stale product'}, page_content='Arrived in 6 days and were so stale i could not eat any of the 6 bags!!'), 0.9655108580341948), (Document(metadata={'id': 3, 'summary': 'Taste is neutral, quantity is DECEITFUL!'}, page_content='The taste of these white cheddar flat breads is like a regular cracker - which is not bad, except that I bought them because I wanted a cheese taste.<br /><br />What was a HUGE disappointment? How misleading the packaging of the box is. The photo on the box (I bought these in store) makes it look like it is full of long flatbreads (expanding the length and width of the box). Wrong! The plastic tray that holds the crackers is about 2 smaller all around - leaving you with about 15 or so small flatbreads.<br /><br />What is also bad about this is that the company states they use biodegradable and eco-friendly packaging. FAIL! They used a HUGE box for a ridiculously small amount of crackers. Not ecofriendly at all.<br /><br />Would I buy these again? No - I feel ripped off. The other crackers (like Sesame Tarragon) give you a little<br />more bang for your buck and have more flavor.'), 0.9840511208615808), (Document(metadata={'id': 6, 'summary': 'Price cannot be correct'}, page_content='Hey, the description says 360 grams - that is roughly 13 ounces at under $4.00 per can. No way - that is the approximate price for a 100 gram can.'), 0.9915737524649991)]

从向量存储中删除项目

按 ID 删除行

# 按 id 删除行
vector_store.delete(["3", "7"])
True

删除向量存储

# 删除向量存储
vector_store.drop()

从 Azure Blob 存储加载文档

以下是将文件从 Azure Blob 存储容器加载到 SQL 向量存储的示例,该过程将文档拆分为块。 Azure Blob 存储 是 Microsoft 的云对象存储解决方案。Blob 存储针对存储大量非结构化数据进行了优化。
pip install azure-storage-blob
from langchain.document_loaders import AzureBlobStorageFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

# 定义您的连接字符串和 Blob 详细信息
conn_str = "DefaultEndpointsProtocol=https;AccountName=<YourBlobName>;AccountKey=<YourAccountKey>==;EndpointSuffix=core.windows.net"
container_name = "<YourContainerName"
blob_name = "01 Harry Potter and the Sorcerers Stone.txt"

# 创建 AzureBlobStorageFileLoader 实例
loader = AzureBlobStorageFileLoader(
    conn_str=conn_str, container=container_name, blob_name=blob_name
)

# 从 Azure Blob 存储加载文档
documents = loader.load()

# 如果需要,将文档拆分为更小的块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_documents = text_splitter.split_documents(documents)

# 打印拆分文档的数量
print(f"Number of split documents: {len(split_documents)}")
Number of split documents: 528
API 参考:AzureBlobStorageContainerLoader
# # 初始化向量存储并将文档及其嵌入插入 AzureSQLDB
vector_store = SQLServer_VectorStore(
    connection_string=_CONNECTION_STRING,
    distance_strategy=DistanceStrategy.COSINE,
    embedding_function=embeddings,
    embedding_length=1536,
    table_name="harrypotter",
)  # 替换为您的实际向量存储初始化

# 将拆分的文档逐个添加到向量存储
for i, doc in enumerate(split_documents):
    vector_store.add_documents(documents=[doc], ids=[f"doc_{i}"])

print("Documents added to the vector store successfully!")
Documents added to the vector store successfully!

直接查询

from typing import List, Tuple

# 执行相似性搜索
query = "Why did the Dursleys not want Harry in their house?"
docs_with_score: List[Tuple[Document, float]] = (
    vector_store.similarity_search_with_score(query)
)

for doc, score in docs_with_score:
    print("-" * 60)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 60)
------------------------------------------------------------
Score:  0.3626232679001803
The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley pretended she didn’t have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn’t want Dudley mixing with a child like that.
------------------------------------------------------------
------------------------------------------------------------
Score:  0.44752797298657554
The Dursleys’ house had four bedrooms: one for Uncle Vernon and Aunt Petunia, one for visitors (usually Uncle Vernon’s sister, Marge), one where Dudley slept, and one where Dudley kept all the toys and things that wouldn’t fit into his first bedroom. It only took Harry one trip upstairs to move everything he owned from the cupboard to this room. He sat down on the bed and stared around him. Nearly everything in here was broken. The month-old video camera was lying on top of a small, working tank Dudley had once driven over the next door neighbor’s dog; in the corner was Dudley’s first-ever television set, which he’d put his foot through when his favorite program had been canceled; there was a large birdcage, which had once held a parrot that Dudley had swapped at school for a real air rifle, which was up on a shelf with the end all bent because Dudley had sat on it. Other shelves were full of books. They were the only things in the room that looked as though they’d never been touched.
------------------------------------------------------------
------------------------------------------------------------
Score:  0.4652486419877385
M r. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.
------------------------------------------------------------
------------------------------------------------------------
Score:  0.4739086301927252
Hagrid was watching him sadly.

“Took yeh from the ruined house myself, on Dumbledore’s orders. Brought yeh ter this lot….”

“Load of old tosh,” said Uncle Vernon. Harry jumped; he had almost forgotten that the Dursleys were there. Uncle Vernon certainly seemed to have got back his courage. He was glaring at Hagrid and his fists were clenched.

“Now, you listen here, boy,” he snarled, “I accept there’s something strange about you, probably nothing a good beating wouldn’t have cured — and as for all this about your parents, well, they were weirdoes, no denying it, and the world’s better off without them in my opinion — asked for all they got, getting mixed up with these wizarding types — just what I expected, always knew they’d come to a sticky end —”

But at that moment, Hagrid leapt from the sofa and drew a battered pink umbrella from inside his coat. Pointing this at Uncle Vernon like a sword, he said, “I’m warning you, Dursley — I’m warning you — one more word….”
------------------------------------------------------------

用于检索增强生成的用法

用例 1:基于故事书的问答系统

问答功能允许用户询问有关故事、角色和事件的具体问题,并获得简洁、内容丰富的答案。这不仅增强了他们对书籍的理解,还让他们感觉自己是魔法宇宙的一部分。

通过转换为检索器进行查询

LangChain 向量存储通过实现高效的相似性搜索来简化构建复杂的问答系统,以根据用户查询找到前 10 个相关文档。检索器 是从 vector_store 创建的,问答链是使用 create_stuff_documents_chain 函数构建的。提示模板是使用 ChatPromptTemplate 类精心制作的,确保结构化和内容丰富的响应。在问答应用程序中,向用户显示用于生成答案的源非常重要。LangChain 内置的 create_retrieval_chain 会将检索到的源文档传播到输出中的“context”键下: 阅读更多关于 LangChain RAG 教程和术语 的信息。
from typing import List, Tuple

import pandas as pd
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate


# 定义执行 RAG 链调用的函数
def get_answer_and_sources(user_query: str):
    # 执行带分数的相似性搜索
    docs_with_score: List[Tuple[Document, float]] = (
        vector_store.similarity_search_with_score(
            user_query,
            k=10,
        )
    )

    # 从顶部结果中提取上下文
    context = "\n".join([doc.page_content for doc, score in docs_with_score])

    # 定义系统提示
    system_prompt = (
        "您是基于书中故事的问答任务助手。 "
        "使用以下检索到的上下文片段来回答问题。 "
        "如果您不知道答案,请说明您不知道,但也可以建议用户使用粉丝小说功能来生成有趣的故事。 "
        "最多使用 5 个句子,并通过提供 1-2 句的背景上下文来保持答案简洁。"
        "\n\n"
        "{context}"
    )

    # 创建提示模板
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

    # 创建检索器和链
    retriever = vector_store.as_retriever()
    question_answer_chain = create_stuff_documents_chain(llm, prompt)
    rag_chain = create_retrieval_chain(retriever, question_answer_chain)

    # 定义输入
    input_data = {"input": user_query}

    # 调用 RAG 链
    response = rag_chain.invoke(input_data)

    # 打印答案
    print("Answer:", response["answer"])

    # 为表格准备数据
    data = {
        "Doc ID": [
            doc.metadata.get("source", "N/A").split("/")[-1]
            for doc in response["context"]
        ],
        "Content": [
            doc.page_content[:50] + "..."
            if len(doc.page_content) > 100
            else doc.page_content
            for doc in response["context"]
        ],
    }

    # 创建 DataFrame
    df = pd.DataFrame(data)

    # 打印表格
    print("\nSources:")
    print(df.to_markdown(index=False))
# 定义用户查询
user_query = "How did Harry feel when he first learnt that he was a Wizard?"

# 调用函数以获取答案和来源
get_answer_and_sources(user_query)
Answer: When Harry first learned that he was a wizard, he felt quite sure there had been a horrible mistake. He struggled to believe it because he had spent his life being bullied and mistreated by the Dursleys. If he was really a wizard, he wondered why he hadn't been able to use magic to defend himself. This disbelief and surprise were evident when he gasped, “I’m a what?”

Sources:
| Doc ID                                      | Content                                               |
|:--------------------------------------------|:------------------------------------------------------|
| 01 Harry Potter and the Sorcerers Stone.txt | Harry was wondering what a wizard did once he’d fi... |
| 01 Harry Potter and the Sorcerers Stone.txt | Harry realized his mouth was open and closed it qu... |
| 01 Harry Potter and the Sorcerers Stone.txt | “Most of us reckon he’s still out there somewhere ... |
| 01 Harry Potter and the Sorcerers Stone.txt | “Ah, go boil yer heads, both of yeh,” said Hagrid.... |
# 定义用户查询
user_query = "Did Harry have a pet? What was it"

# 调用函数以获取答案和来源
get_answer_and_sources(user_query)
Yes, Harry had a pet owl named Hedwig. He decided to call her Hedwig after finding the name in a book titled *A History of Magic*.

Sources:
| Doc ID                                      | Content                                               |
|:--------------------------------------------|:------------------------------------------------------|
| 01 Harry Potter and the Sorcerers Stone.txt | Harry sank down next to the bowl of peas. “What di... |
| 01 Harry Potter and the Sorcerers Stone.txt | Harry kept to his room, with his new owl for compa... |
| 01 Harry Potter and the Sorcerers Stone.txt | As the snake slid swiftly past him, Harry could ha... |
| 01 Harry Potter and the Sorcerers Stone.txt | Ron reached inside his jacket and pulled out a fat... |

API 参考

有关 SQLServer Vectorstore 功能和配置的详细文档,请前往 API 参考:https://python.langchain.com/api_reference/sqlserver/index.html

相关