Kinetica 自然语言转 SQL 集成

Kinetica 是一款内置文本转 SQL 生成支持的数据库。本 notebook 演示如何使用 Kinetica 将自然语言转换为 SQL，并简化数据检索流程。本演示旨在展示文本生成工作流程，而非 LLM 的功能。

概述

在 Kinetica LLM 工作流中，您需要在数据库中创建一个 LLM 上下文，其中包含推理所需的信息，包括表、注解、规则和样本。调用 ChatKinetica.load_messages_from_context() 将从数据库中检索上下文信息，以便创建聊天提示。聊天提示由一个 SystemMessage 以及包含样本（即问题/SQL 对）的 HumanMessage/AIMessage 对组成。您可以向该列表追加更多样本对，但它并非用于实现典型的自然语言对话。当您从聊天提示创建链并执行时，Kinetica LLM 将根据输入生成 SQL。您还可以选择使用 KineticaSqlOutputParser 来执行 SQL 并将结果作为 dataframe 返回。目前支持以下 2 个 LLM 用于 SQL 生成：

Kinetica SQL-GPT：该 LLM 基于 OpenAI ChatGPT API。
Kinetica SqlAssist：该 LLM 专为与 Kinetica 数据库集成而构建，可在安全的客户本地环境中运行。

本演示将使用 SqlAssist。更多信息请参阅 Kinetica 文档站点。

前提条件

开始之前，您需要一个 Kinetica DB 实例。如果没有，可以获取免费开发版实例。您需要安装以下包…

pip install -qU langchain-kinetica faker

数据库连接

您必须在以下环境变量中设置数据库连接。如果使用虚拟环境，可以在项目的 .env 文件中设置：

KINETICA_URL：数据库连接 URL（例如 http://localhost:9191）
KINETICA_USER：数据库用户名
KINETICA_PASSWD：安全密码。

如果能成功创建 KineticaChatLLM 实例，则说明连接成功。

from langchain_kinetica import ChatKinetica

kinetica_llm = ChatKinetica()

# Test table we will create
table_name = "demo.user_profiles"

# LLM Context we will create
kinetica_ctx = "demo.test_llm_ctx"

2026-02-02 19:39:09.975 INFO     [GPUdb] Connected to Kinetica! (host=http://localhost:19191 api=7.2.3.3 server=7.2.3.5)

创建测试数据

在生成 SQL 之前，我们需要创建一个 Kinetica 表和一个可对该表进行推理的 LLM 上下文。

创建一些虚假用户档案

我们将使用 faker 包创建一个包含 100 条虚假档案的 dataframe。

from collections.abc import Generator

import pandas as pd
from faker import Faker

Faker.seed(5467)
faker = Faker(locale="en-US")


def profile_gen(count: int) -> Generator:
    for p_id in range(count):
        rec = dict(id=p_id, **faker.simple_profile())
        rec["birthdate"] = pd.Timestamp(rec["birthdate"])
        yield rec


load_df = pd.DataFrame.from_records(data=profile_gen(100), index="id")
print(load_df.head())

            username             name sex  \
id
     eduardo69       Haley Beck   F
      lbarrera  Joshua Stephens   M
       bburton     Paula Kaiser   F
     melissa49      Wendy Reese   F
 melissacarter      Manuel Rios   M

                                                address                    mail  \
id
 59836 Carla Causeway Suite 939\nPort Eugene, I...  meltondenise@yahoo.com
 3108 Christina Forges\nPort Timothychester, KY...     erica80@hotmail.com
                  Unit 7405 Box 3052\nDPO AE 09858  timothypotts@gmail.com
 6408 Christopher Hill Apt. 459\nNew Benjamin, ...        dadams@gmail.com
  2241 Bell Gardens Suite 723\nScottside, CA 38463  williamayala@gmail.com

    birthdate
id
1999-08-22
1926-04-17
1935-08-19
1990-07-10
1932-11-30

从 dataframe 创建 Kinetica 表

from gpudb import GPUdbTable

gpudb_table = GPUdbTable.from_df(
    load_df,
    db=kinetica_llm.kdbc,
    table_name=table_name,
    clear_table=True,
    load_data=True,
)

# See the Kinetica column types
print(gpudb_table.type_as_df())

        name    type   properties
 username  string     [char32]
     name  string     [char32]
      sex  string      [char2]
  address  string     [char64]
     mail  string     [char32]
birthdate    long  [timestamp]

创建 LLM 上下文

您可以使用 Kinetica Workbench UI 创建 LLM 上下文，也可以使用 CREATE OR REPLACE CONTEXT 语法手动创建。这里我们使用引用所创建表的 SQL 语法来创建上下文。

from gpudb import GPUdbSamplesClause, GPUdbSqlContext, GPUdbTableClause

table_ctx = GPUdbTableClause(table=table_name, comment="Contains user profiles.")

samples_ctx = GPUdbSamplesClause(
    samples=[
        (
            "How many users born after 1970 are there?",
            f"""
            select count(1) as num_users
                from {table_name}
                where birthdate > '1970-01-01';
            """,
        )
    ]
)

context_sql = GPUdbSqlContext(
    name=kinetica_ctx, tables=[table_ctx], samples=samples_ctx
).build_sql()

print(context_sql)
count_affected = kinetica_llm.kdbc.execute(context_sql)
count_affected

CREATE OR REPLACE CONTEXT "demo"."test_llm_ctx" (
    TABLE = "demo"."user_profiles",
    COMMENT = 'Contains user profiles.'
),
(
    SAMPLES = (
        'How many users born after 1970 are there?' = 'select count(1) as num_users
    from demo.user_profiles
    where birthdate > ''1970-01-01'';' )
)

1

使用 LangChain 进行推理

在以下示例中，我们将基于前面创建的表和 LLM 上下文创建一个链。该链将生成 SQL 并以 dataframe 形式返回结果数据。

从 Kinetica 数据库加载聊天提示

load_messages_from_context() 函数将从数据库中检索上下文并将其转换为聊天消息列表，用于创建 ChatPromptTemplate。

from langchain_core.prompts import ChatPromptTemplate

# load the context from the database
ctx_messages = kinetica_llm.load_messages_from_context(kinetica_ctx)

# Add the input prompt. This is where input question will be substituted.
ctx_messages.append(("human", "{input}"))

# Create the prompt template.
prompt_template = ChatPromptTemplate.from_messages(ctx_messages)
print(prompt_template.pretty_repr())

================================ System Message ================================

CREATE TABLE demo.user_profiles AS
(
    username VARCHAR (32) NOT NULL,
    name VARCHAR (32) NOT NULL,
    sex VARCHAR (2) NOT NULL,
    address VARCHAR (64) NOT NULL,
    mail VARCHAR (32) NOT NULL,
    birthdate TIMESTAMP NOT NULL
);
COMMENT ON TABLE demo.user_profiles IS 'Contains user profiles.';

================================ Human Message =================================

How many users born after 1970 are there?

================================== Ai Message ==================================

select count(1) as num_users
    from demo.user_profiles
    where birthdate > '1970-01-01';

================================ Human Message =================================

{input}

创建链

该链的最后一个元素是 KineticaSqlOutputParser，它将执行 SQL 并返回 dataframe。这是可选的，如果省略则只返回 SQL。

from langchain_kinetica import (
    KineticaSqlOutputParser,
    KineticaSqlResponse,
)

chain = prompt_template | kinetica_llm | KineticaSqlOutputParser(kdbc=kinetica_llm.kdbc)

生成 SQL

我们创建的链将以问题作为输入，并返回包含生成的 SQL 和数据的 KineticaSqlResponse。问题必须与用于创建提示的 LLM 上下文相关。

# Here you must ask a question relevant to the LLM context provided in the
# prompt template.
response: KineticaSqlResponse = chain.invoke(
    {"input": "What users were born after 1990?"}
)

print(f"SQL: {response.sql}")
print(response.dataframe.head())

SQL: SELECT *
FROM demo.user_profiles
WHERE birthdate > '1990-01-01';
        username             name sex  \
    eduardo69       Haley Beck   F
    melissa49      Wendy Reese   F
      james26  Patricia Potter   F
  mooreandrew    Wendy Ramirez   F
melissabutler      Alexa Kelly   F

                                                address                    mail  \
59836 Carla Causeway Suite 939\nPort Eugene, I...  meltondenise@yahoo.com
6408 Christopher Hill Apt. 459\nNew Benjamin, ...        dadams@gmail.com
        7977 Jonathan Meadow\nJerryside, OH 55205      jpatrick@gmail.com
      8089 Gonzalez Fields\nJordanville, KS 22824    mathew05@hotmail.com
            1904 Burke Roads\nPort Anne, DE 81252     douglas38@yahoo.com

    birthdate
1999-08-25
1990-07-13
2010-03-21
2000-03-25
2023-02-01

在 GitHub 上编辑此页面或提交问题。

将这些文档连接到 Claude、VSCode 等，通过 MCP 获取实时解答。

Popular Providers

Integrations by component

概述

前提条件

数据库连接

创建测试数据

创建一些虚假用户档案

从 dataframe 创建 Kinetica 表

创建 LLM 上下文

使用 LangChain 进行推理

从 Kinetica 数据库加载聊天提示

创建链

生成 SQL

Popular Providers

Integrations by component

​概述

​前提条件

​数据库连接

​创建测试数据

​创建一些虚假用户档案

​从 dataframe 创建 Kinetica 表

​创建 LLM 上下文

​使用 LangChain 进行推理

​从 Kinetica 数据库加载聊天提示

​创建链

​生成 SQL

概述

前提条件

数据库连接

创建测试数据

创建一些虚假用户档案

从 dataframe 创建 Kinetica 表

创建 LLM 上下文

使用 LangChain 进行推理

从 Kinetica 数据库加载聊天提示

创建链

生成 SQL