Skip to main content

设置与安装

要使用此功能,请安装 langchain-hana 包:
pip install langchain_hana
然后,创建与 SAP HANA Cloud 实例的连接。
import os

from dotenv import load_dotenv
from hdbcli import dbapi

# Load environment variables if needed
load_dotenv()

# Establish connection to SAP HANA Cloud
connection = dbapi.connect(
    address=os.environ.get("HANA_DB_ADDRESS"),
    port=os.environ.get("HANA_DB_PORT"),
    user=os.environ.get("HANA_DB_USER"),
    password=os.environ.get("HANA_DB_PASSWORD")
)
HanaSparqlQAChain 将以下功能整合在一起:
  1. 感知模式的 SPARQL 生成
  2. 查询执行(针对 SAP HANA)
  3. 自然语言答案格式化

初始化

你需要:
  • 一个用于生成和解释查询的 LLM
  • 一个 HanaRdfGraph(包含连接、graph_uri 和本体)
请按照 HanaRdfGraph 中的步骤了解更多关于创建 HanaRdfGraph 实例的信息。 导入 HanaSparqlQAChain
from langchain_hana import HanaSparqlQAChain
qa_chain = HanaSparqlQAChain.from_llm(
    llm=llm, graph=graph, allow_dangerous_requests=True, verbose=True
)

流程概览

  1. SPARQL 生成
  • 使用 SPARQL_GENERATION_SELECT_PROMPT
  • 输入:
    • schema(来自 graph.get_schema 的 Turtle 格式)
    • prompt (user’s question)
  1. 查询后处理
  • 从 LLM 输出中提取 SPARQL 代码。
  • 若缺少 FROM <graph_uri> 则自动注入
  • 确保声明所需的常用前缀(rdf:rdfs:owl:xsd:
  1. 执行
  • 调用 graph.query(generated_sparql)
  1. 答案生成
  • Uses SPARQL_QA_PROMPT
  • Inputs:
    • context(原始查询结果)
    • prompt(原始问题)

提示词模板

”SPARQL 生成”提示词

sparql_generation_prompt 用于引导 LLM 根据用户问题和提供的模式生成 SPARQL 查询。

答案生成提示词

qa_prompt 指示 LLM 仅根据数据库结果生成自然语言答案。 默认提示词可在此处找到:prompts.py

自定义提示词

你可以在初始化时覆盖默认提示词:
qa_chain = HanaSparqlQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    verbose=True,
    sparql_generation_prompt=YOUR_SPARQL_PROMPT,
    qa_prompt=YOUR_QA_PROMPT
)
  • sparql_generation_prompt 的输入变量必须为:["schema", "prompt"]
  • qa_prompt 的输入变量必须为:["context", "prompt"]

示例:在“电影”知识图谱上进行问答

前提条件: 你必须拥有一个启用了三元组存储功能的 SAP HANA Cloud 实例。 详细说明请参阅:启用三元组存储
加载 kgdocu_movies 示例数据。请参见知识图谱示例
Below we’ll:
  1. 实例化指向“电影”数据图的 HanaRdfGraph
  2. 将其包装在由 LLM 驱动的 HanaSparqlQAChain
  3. Ask natural-language questions and print out the chain’s responses
这演示了 LLM 如何在幕后生成 SPARQL,对 SAP HANA 执行查询,并返回人类可读的答案。 首先,创建与 SAP HANA Cloud 实例的连接。
import os

from dotenv import load_dotenv
from hdbcli import dbapi

# Load environment variables if needed
load_dotenv()

# Establish connection to SAP HANA Cloud
connection = dbapi.connect(
    address=os.environ.get("HANA_DB_ADDRESS"),
    port=os.environ.get("HANA_DB_PORT"),
    user=os.environ.get("HANA_DB_USER"),
    password=os.environ.get("HANA_DB_PASSWORD")
)
然后,设置知识图谱实例
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain_hana import HanaRdfGraph, HanaSparqlQAChain

# from langchain_openai import ChatOpenAI  # or your chosen LLM
# Set up the Knowledge Graph
graph_uri = "kgdocu_movies"

graph = HanaRdfGraph(
    connection=connection,
    graph_uri=graph_uri,
    auto_extract_ontology=True
)
# a basic graph schema is extracted from the data graph. This schema will guide the LLM to generate a proper SPARQL query.
schema_graph = graph.get_schema
print(schema_graph.serialize(format="turtle"))
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://kg.demo.sap.com/acted_in> a owl:ObjectProperty ;
    rdfs:label "acted_in" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/dateOfBirth> a owl:DatatypeProperty ;
    rdfs:label "dateOfBirth" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range xsd:dateTime .

<http://kg.demo.sap.com/directed> a owl:ObjectProperty ;
    rdfs:label "directed" ;
    rdfs:domain <http://kg.demo.sap.com/Director> ;
    rdfs:range <http://kg.demo.sap.com/Film> .

<http://kg.demo.sap.com/genre> a owl:ObjectProperty ;
    rdfs:label "genre" ;
    rdfs:domain <http://kg.demo.sap.com/Film> ;
    rdfs:range <http://kg.demo.sap.com/Genre> .

<http://kg.demo.sap.com/placeOfBirth> a owl:ObjectProperty ;
    rdfs:label "placeOfBirth" ;
    rdfs:domain <http://kg.demo.sap.com/Actor> ;
    rdfs:range <http://kg.demo.sap.com/Place> .

<http://kg.demo.sap.com/title> a owl:DatatypeProperty ;
    rdfs:label "title" ;
    rdfs:domain <http://kg.demo.sap.com/Film> ;
    rdfs:range xsd:string .

rdfs:label a owl:DatatypeProperty ;
    rdfs:label "label" ;
    rdfs:domain <http://kg.demo.sap.com/Actor>,
        <http://kg.demo.sap.com/Director>,
        <http://kg.demo.sap.com/Genre>,
        <http://kg.demo.sap.com/Place> ;
    rdfs:range xsd:string .

<http://kg.demo.sap.com/Director> a owl:Class ;
    rdfs:label "Director" .

<http://kg.demo.sap.com/Genre> a owl:Class ;
    rdfs:label "Genre" .

<http://kg.demo.sap.com/Place> a owl:Class ;
    rdfs:label "Place" .

<http://kg.demo.sap.com/Actor> a owl:Class ;
    rdfs:label "Actor" .

<http://kg.demo.sap.com/Film> a owl:Class ;
    rdfs:label "Film" .
之后,初始化 LLM。
# Initialize the LLM
llm = ChatOpenAI(proxy_model_name="gpt-4o", temperature=0)
然后,我们创建一个 SPARQL 问答链
# Create a SPARQL QA Chain
chain = HanaSparqlQAChain.from_llm(
    llm=llm,
    verbose=True,
    allow_dangerous_requests=True,
    graph=graph,
)
# output = chain.invoke("Which movies are in the data?")
# output = chain.invoke("In which movies did Keanu Reeves and Carrie-Anne Moss play in together")
# output = chain.invoke("which movie genres are in the data?")
# output = chain.invoke("which are the two most assigned movie genres?")
# output = chain.invoke("where were the actors of "Blade Runner" born?")
# output = chain.invoke("which actors acted together in a movie and were born in the same city?")
output = chain.invoke("which actors acted in Blade Runner?")

print(output["result"])


> Entering new HanaSparqlQAChain chain...
Generated SPARQL:
\`\`\`
PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel
WHERE {
    ?movie rdf:type kg:Film .
    ?movie kg:title ?movieTitle .
    ?actor kg:acted_in ?movie .
    ?actor rdfs:label ?actorLabel .
    FILTER(?movieTitle = "Blade Runner")
}
\`\`\`
Final SPARQL:

PREFIX kg: <http://kg.demo.sap.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?actor ?actorLabel

FROM <kgdocu_movies>
WHERE {
    ?movie rdf:type kg:Film .
    ?movie kg:title ?movieTitle .
    ?actor kg:acted_in ?movie .
    ?actor rdfs:label ?actorLabel .
    FILTER(?movieTitle = "Blade Runner")
}

Full Context:
actor,actorLabel
http://www.wikidata.org/entity/Q1353691,Morgan Paull
http://www.wikidata.org/entity/Q1372770,William Sanderson
http://www.wikidata.org/entity/Q358990,James Hong
http://www.wikidata.org/entity/Q498420,M. Emmet Walsh
http://www.wikidata.org/entity/Q81328,Q81328
http://www.wikidata.org/entity/Q723780,Brion James
http://www.wikidata.org/entity/Q207596,Daryl Hannah
http://www.wikidata.org/entity/Q1691628,Joe Turkel
http://www.wikidata.org/entity/Q236702,Joanna Cassidy
http://www.wikidata.org/entity/Q213574,Rutger Hauer
http://www.wikidata.org/entity/Q3143555,Hy Pyke
http://www.wikidata.org/entity/Q211415,Edward James Olmos
http://www.wikidata.org/entity/Q230736,Sean Young


> Finished chain.
The actors who acted in Blade Runner are Morgan Paull, William Sanderson, James Hong, M. Emmet Walsh, Brion James, Daryl Hannah, Joe Turkel, Joanna Cassidy, Rutger Hauer, Hy Pyke, Edward James Olmos, and Sean Young.

幕后发生了什么?

  1. SPARQL Generation The chain invokes the LLM with your Turtle-formatted ontology (graph.get_schema) and the user’s question using the SPARQL_GENERATION_SELECT_PROMPT. The LLM then emits a valid SELECT query tailored to your schema.
  2. 预处理与执行
  • Extract & clean: Pull the raw SPARQL text out of the LLM’s response.
  • Inject graph context: Add FROM <graph_uri> if it’s missing and ensure common prefixes (rdf:, rdfs:, owl:, xsd:) are declared.
  • 在 HANA 上运行:通过 HanaRdfGraph.query() 在命名图上执行最终查询。
  1. 答案生成 返回的 CSV(或 Turtle)结果再次传入 LLM——这次使用 SPARQL_QA_PROMPT。LLM 严格基于检索到的数据生成简洁、人类可读的答案,不产生幻觉。