Soniox

开始在 LangChain 中使用 Soniox 音频转录加载器。

安装

安装包：

pip install langchain-soniox

凭证

从 Soniox 控制台获取您的 Soniox API 密钥，并将其设置为环境变量：

export SONIOX_API_KEY=your_api_key

使用方法

基本转录

以下示例演示如何使用 SonioxDocumentLoader 转录音频文件，并使用 LLM 生成摘要。

from langchain_soniox import SonioxDocumentLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

audio_file_url = "https://soniox.com/media/examples/coffee_shop.mp3"
loader = SonioxDocumentLoader(file_url=audio_file_url)

print(f"Transcribing {audio_file_url}...")
docs = loader.load()

transcript_text = docs[0].page_content
print(f"Transcript: {transcript_text}")

# 创建摘要转录的链
prompt = ChatPromptTemplate.from_template(
    "Write a concise summary of the following speech:\n\n{transcript}"
)

chain = prompt | ChatOpenAI(model="gpt-5-mini") | StrOutputParser()
summary = chain.invoke({"transcript": transcript_text})
print(summary)

您也可以从本地文件或字节数据加载音频：

# 使用本地文件路径
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")

# 使用二进制数据
with open("/path/to/audio.mp3", "rb") as f:
    audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)

异步转录

对于异步操作，请使用 aload() 或 alazy_load()：

import asyncio
from langchain_soniox import SonioxDocumentLoader

async def transcribe_async():
    loader = SonioxDocumentLoader(
        file_url="https://soniox.com/media/examples/coffee_shop.mp3"
    )

    docs = [doc async for doc in loader.alazy_load()]
    print(docs[0].page_content)

asyncio.run(transcribe_async())

高级用法

语言提示

Soniox 能够自动检测并转录 60 余种语言 的语音。当您知道音频中可能出现哪些语言时，可提供 language_hints 来提高准确性，引导识别偏向这些语言。语言提示不会限制识别范围——它们只是将模型偏向指定的语言，同时仍允许检测其他语言（如有）。

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        language_hints=["en", "es"],
    ),
)

docs = loader.load()

更多详情，请参阅 Soniox 语言提示文档。

说话人分离

启用说话人识别以区分不同的说话人：

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_speaker_diarization=True,
    ),
)

docs = loader.load()

# 从元数据中访问说话人信息
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_speaker != token["speaker"]:
        current_speaker = token["speaker"]
        output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

# 分析对话
prompt = ChatPromptTemplate.from_template(
    """
    Analyze the following conversation between speakers.
    Identify the intent of each speaker.

    Conversation:
    {conversation}
    """
)

chain = prompt | ChatOpenAI(model="gpt-5-mini") | StrOutputParser()
analysis = chain.invoke({"conversation": output})
print(analysis)

语言识别

启用自动语言检测和识别：

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_language_identification=True,
    ),
)

docs = loader.load()

# 从元数据中访问语言信息
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_language != token["language"]:
        current_language = token["language"]
        output += f"\n[{current_language}] {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

提升准确性的上下文

提供特定领域的上下文以提升转录准确性。上下文有助于模型理解您的领域、识别重要术语并应用自定义词汇。 context 对象支持以下四个可选部分：

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    StructuredContext,
    StructuredContextGeneralItem,
    StructuredContextTranslationTerm,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        context=StructuredContext(
            # 结构化键值信息（领域、主题、意图等）
            general=[
                StructuredContextGeneralItem(key="domain", value="Healthcare"),
                StructuredContextGeneralItem(
                    key="topic", value="Diabetes management consultation"
                ),
                StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
            ],
            # 较长的自由格式背景文本或相关文档
            text="The patient has a history of...",
            # 领域特定或不常见的词汇
            terms=["Celebrex", "Zyrtec", "Xanax"],
            # 针对歧义术语的自定义翻译
            translation_terms=[
                StructuredContextTranslationTerm(
                    source="Mr. Smith", target="Sr. Smith"
                ),
                StructuredContextTranslationTerm(source="MRI", target="RM"),
            ],
        ),
    ),
)

docs = loader.load()

更多详情，请参阅 Soniox 上下文文档。

翻译

将任何检测到的语言翻译为目标语言：

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    TranslationConfig,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        translation=TranslationConfig(
            type="one_way",
            target_language="fr",
        ),
        language_hints=["en"],
    ),
)

docs = list(loader.lazy_load())

translated_text = ""
original_text = ""

for token in docs[0].metadata["tokens"]:
    if token["translation_status"] == "translation":
        translated_text += token["text"]
    else:
        original_text += token["text"]

print(original_text)
print(translated_text)

您也可以使用 two_way 翻译类型同时转录两种语言并互译。更多信息请参阅此处。

API 参考

构造函数参数

参数	类型	必填	默认值	描述
`file_path`	`str`	否*	`None`	要转录的本地音频文件路径
`file_data`	`bytes`	否*	`None`	要转录的音频文件二进制数据
`file_url`	`str`	否*	`None`	要转录的音频文件 URL
`api_key`	`str`	否	`SONIOX_API_KEY` 环境变量	Soniox API 密钥
`base_url`	`str`	否	`https://api.soniox.com/v1`	API 基础 URL（见区域端点）
`options`	`SonioxTranscriptionOptions`	否	`SonioxTranscriptionOptions()`	转录选项
`polling_interval_seconds`	`float`	否	`1.0`	状态轮询间隔（秒）
`timeout_seconds`	`float`	否	`300.0`（5 分钟）	等待转录的最长时间
`http_request_timeout_seconds`	`float`	否	`60.0`	单次 HTTP 请求超时时间

* 您必须且只能指定以下之一：file_path、file_data 或 file_url。

转录选项

SonioxTranscriptionOptions 类支持以下参数：

参数	类型	描述
`model`	`str`	要使用的异步模型（见可用模型）
`language_hints`	`list[str]`	转录语言提示（ISO 语言代码）
`language_hints_strict`	`bool`	严格执行语言提示
`enable_speaker_diarization`	`bool`	启用说话人识别
`enable_language_identification`	`bool`	启用语言检测
`translation`	`TranslationConfig`	翻译配置
`context`	`StructuredContext`	用于提升准确性的上下文
`client_reference_id`	`str`	用于记录的自定义参考 ID
`webhook_url`	`str`	完成通知的 Webhook URL
`webhook_auth_header_name`	`str`	Webhook 的自定义认证请求头名称
`webhook_auth_header_value`	`str`	Webhook 的自定义认证请求头值

完整选项列表请参阅 API 文档。

返回值

lazy_load() 和 alazy_load() 方法返回一个 Document 对象：

Document(
    page_content=str,  # 转录的文本
    metadata={
        "source": str,  # 文件 URL、路径或 "file_upload"
        "transcription_id": str,  # 唯一转录 ID
        "audio_duration_ms": int,  # 音频时长（毫秒）
        "model": str,  # 转录使用的模型
        "created_at": str,  # ISO 8601 时间戳
        "tokens": list[dict],  # 详细的词元级信息
    }
)

元数据中的 tokens 数组包含每个转录词的详细信息：

text：转录文本
start_ms：开始时间（毫秒）
end_ms：结束时间（毫秒）
speaker：说话人 ID（若启用了分离），例如 "1"、"2" 等
language：检测到的语言（若启用了识别），例如 "en"、"fr" 等
translation_status：翻译状态（"original"、"translated" 或 "none"）

更多信息请参阅 Soniox API 参考。

Popular Providers

Integrations by component

安装

凭证

使用方法

基本转录

异步转录

高级用法

语言提示

说话人分离

语言识别

提升准确性的上下文

翻译

API 参考

构造函数参数

转录选项

返回值

相关资源

Popular Providers

Integrations by component

​安装

​凭证

​使用方法

​基本转录

​异步转录

​高级用法

​语言提示

​说话人分离

​语言识别

​提升准确性的上下文

​翻译

​API 参考

​构造函数参数

​转录选项

​返回值

​相关资源

安装

凭证

使用方法

基本转录

异步转录

高级用法

语言提示

说话人分离

语言识别

提升准确性的上下文

翻译

API 参考

构造函数参数

转录选项

返回值

相关资源