Please read the LangSmith architectural overview and guide on connecting to external ClickHouse before proceeding with this guide.
Architecture overview
The architecture of using LangSmith-managed ClickHouse with your self-hosted LangSmith instance is similar to using a fully self-hosted ClickHouse instance, with a few key differences:- You will need to set up a private network connection between your LangSmith instance and the LangSmith-managed ClickHouse instance. This is to ensure that your data is secure and that you can connect to the ClickHouse instance from your self-hosted LangSmith instance.
- With this option, sensitive information (inputs and outputs) of your traces will be stored in cloud object storage (S3 or GCS) within your cloud instead of ClickHouse to ensure that sensitive information doesn’t leave your VPC. For more details on where particular data fields are stored, refer to Data storage.
- The LangSmith team will monitor your ClickHouse instance and ensure that it is running smoothly. This allows us to track metrics like run-ingestion delay and query performance.

Requirements
- You must use a supported blob storage option. Read the blob storage guide for more information.
- To use private endpoints, ensure that your VPC is in a ClickHouse Cloud supported region. Otherwise, you will need to use a public endpoint we will secure with firewall rules. Your VPC will need to have a NAT gateway to allow us to allowlist your traffic.
- You must have a VPC that can connect to the LangSmith-managed ClickHouse service. You will need to work with our team to set up the necessary networking.
- You must have a LangSmith self-hosted instance running. You can use our managed ClickHouse service with both Kubernetes and Docker installations.
Data storage
ClickHouse stores runs and feedback data, specifically:- All feedback data fields.
- Some run data fields.
inputs, outputs, errors, manifests, extras, and events of a run, since these fields may contain LLM prompts and completions. With LangSmith-managed ClickHouse, these sensitive fields are stored in cloud object storage (S3 or GCS) within your cloud, while the rest of the run data is stored in ClickHouse, ensuring sensitive information never leaves your VPC.
Stored feedback data fields
Because all feedback data is stored in ClickHouse, do not send sensitive information in feedback (scores and annotations/comments) or in any other run fields that are mentioned in Stored run data fields.
| 字段名称 (Field Name) | 类型 (Type) | 描述 (Description) |
|---|---|---|
id | UUID | 记录本身的唯一标识符 |
created_at | datetime | 记录创建时的时间戳 |
modified_at | datetime | 记录最后修改时的时间戳 |
session_id | UUID | 运行所属的实验或追踪项目的唯一标识符 |
run_id | UUID | 会话中特定运行的唯一标识符 |
key | string | 描述反馈标准的键,例如 'correctness' |
score | number | 与反馈键关联的数值评分 |
value | string | 保留用于存储与评分关联的值。适用于分类反馈。 |
comment | string | 与记录关联的任何评论或注释。这可以是所给评分的理由。 |
correction | object | 保留用于存储更正详情(如有) |
feedback_source | object | 包含反馈来源信息的对象 |
feedback_source.type | string | 反馈来源的类型,例如 'api'、'app'、'evaluator' |
feedback_source.metadata | object | 保留用于额外元数据,当前 |
feedback_source.user_id | UUID | 提供反馈的用户的唯一标识符 |
Stored run data fields
Run data fields are split between the managed ClickHouse database and your cloud object storage (e.g., S3 or GCS).For run fields stored in object storage, only a reference or pointer is kept in ClickHouse. For example,
inputs and outputs content are offloaded to S3/GCS, with the ClickHouse record storing corresponding S3 URLs in the inputs_s3_urls and outputs_s3_urls fields.| Field | Storage Location |
|---|---|
id | ClickHouse |
name | ClickHouse |
inputs | Object Storage |
run_type | ClickHouse |
start_time | ClickHouse |
end_time | ClickHouse |
extra | Object Storage |
error | Object Storage |
outputs | Object Storage |
events | Object Storage |
tags | ClickHouse |
trace_id | ClickHouse |
dotted_order | ClickHouse |
status | ClickHouse |
child_run_ids | ClickHouse |
direct_child_run_ids | ClickHouse |
parent_run_ids | ClickHouse |
feedback_stats | ClickHouse |
reference_example_id | ClickHouse |
total_tokens | ClickHouse |
prompt_tokens | ClickHouse |
completion_tokens | ClickHouse |
total_cost | ClickHouse |
prompt_cost | ClickHouse |
completion_cost | ClickHouse |
first_token_time | ClickHouse |
session_id | ClickHouse |
in_dataset | ClickHouse |
parent_run_id | ClickHouse |
execution_order (deprecated) | ClickHouse |
serialized | ClickHouse |
manifest_id (deprecated) | ClickHouse |
manifest_s3_id | ClickHouse |
inputs_s3_urls | ClickHouse |
outputs_s3_urls | ClickHouse |
price_model_id | ClickHouse |
app_path | ClickHouse |
last_queued_at | ClickHouse |
share_token | ClickHouse |
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.


