Cheerio 集成

本笔记本提供了快速入门 CheerioWebBaseLoader 文档加载器的概览。有关所有 CheerioWebBaseLoader 功能和配置的详细文档，请参阅 API 参考。

概述

集成详情

本示例介绍如何使用 Cheerio 从网页加载数据。每个网页将创建一个文档。 Cheerio 是一个快速、轻量级的库，允许你使用类似 jQuery 的语法来解析和遍历 HTML 文档。你可以使用 Cheerio 从网页中提取数据，而无需在浏览器中渲染它们。但是，Cheerio 不模拟网页浏览器，因此无法执行页面上的 JavaScript 代码。这意味着它无法从需要 JavaScript 渲染的动态网页中提取数据。为此，你可以改用 PlaywrightWebBaseLoader 或 PuppeteerWebBaseLoader。

类	包	本地	可序列化	Python 支持
`CheerioWebBaseLoader`	@langchain/community	✅	✅	❌

加载器特性

来源	Web 支持	Node 支持
`CheerioWebBaseLoader`	✅	✅

设置

要访问 CheerioWebBaseLoader 文档加载器，你需要安装 @langchain/community 集成包以及 cheerio 对等依赖项。

凭证

如果你想获得模型调用的自动跟踪，也可以取消注释以下内容来设置你的 LangSmith API 密钥：

# export LANGSMITH_TRACING="true"
# export LANGSMITH_API_KEY="your-api-key"

安装

LangChain CheerioWebBaseLoader 集成位于 @langchain/community 包中：

npm install @langchain/community @langchain/core cheerio

yarn add @langchain/community @langchain/core cheerio

pnpm add @langchain/community @langchain/core cheerio

实例化

现在我们可以实例化模型对象并加载文档：

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio"

const loader = new CheerioWebBaseLoader("https://news.ycombinator.com/item?id=34817881", {
  // 可选参数：...
})

加载

const docs = await loader.load()
docs[0]

Document {
  pageContent: '\n' +
    '        \n' +
    '                  Hacker News\n' +
    '                            new | past | comments | ask | show | jobs | submit            \n' +
    '                              login\n' +
    '                          \n' +
    '              \n' +
    '\n' +
    '        \n' +
    '            What Lights the Universe’s Standard Candles? (quantamagazine.org)\n' +
    '          75 points by Amorymeltzer on Feb 17, 2023  | hide | past | favorite | 6 comments        \n' +
    '              \n' +
    '        \n' +
    '                  \n' +
    '          \n' +
    '          delta_p_delta_x on Feb 17, 2023           \n' +
    '             | next [–]          \n' +
    '                  \n' +
    "                  Astrophysical and cosmological simulations are often insightful. They're also very cross-disciplinary; besides the obvious astrophysics, there's networking and sysadmin, parallel computing and algorithm theory (so that the simulation programs are actually fast but still accurate), systems design, and even a bit of graphic design for the visualisations.Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.\n" +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                      \n' +
    '          \n' +
    '          froeb on Feb 18, 2023           \n' +
    '             | parent | next [–]          \n' +
    '                  \n' +
    "                  Supernova simulations are especially interesting too. I have heard them described as the only time in physics when all 4 of the fundamental forces are important. The explosion can be quite finicky too. If I remember right, you can't get supernova to explode properly in 1D simulations, only in higher dimensions. This was a mystery until the realization that turbulence is necessary for supernova to trigger--there is no turbulent flow in 1D.\n" +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                        \n' +
    '          \n' +
    '          andrewflnr on Feb 17, 2023           \n' +
    '             | prev | next [–]          \n' +
    '                  \n' +
    "                  Whoa. I didn't know the accretion theory of Ia supernovae was dead, much less that it had been since 2011.\n" +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                  \n' +
    '          \n' +
    '          andreareina on Feb 17, 2023           \n' +
    '             | prev | next [–]          \n' +
    '                  \n' +
    '                  This seems  to be the paper https://academic.oup.com/mnras/article/517/4/5260/6779709\n' +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                  \n' +
    '          \n' +
    '          andreareina on Feb 17, 2023           \n' +
    '             | prev [–]          \n' +
    '                  \n' +
    "                  Wouldn't double detonation show up as variance in the brightness?\n" +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                      \n' +
    '          \n' +
    '          yencabulator on Feb 18, 2023           \n' +
    '             | parent [–]          \n' +
    '                  \n' +
    '                  Or widening of the peak. If one type Ia supernova goes 1,2,3,2,1, the sum of two could go    1+0=1\n' +
    '    2+1=3\n' +
    '    3+2=5\n' +
    '    2+3=5\n' +
    '    1+2=3\n' +
    '    0+1=1\n' +
    '                      \n' +
    '                  \n' +
    '      \n' +
    '        \n' +
    '                  \n' +
    '  \n' +
    '\n' +
    '\n' +
    'Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact\n' +
    'Search:       \n' +
    '      \n' +
    '  \n',
  metadata: { source: 'https://news.ycombinator.com/item?id=34817881' },
  id: undefined
}

console.log(docs[0].metadata)

{ source: 'https://news.ycombinator.com/item?id=34817881' }

附加配置

CheerioWebBaseLoader 在实例化加载器时支持附加配置。以下是使用 selector 字段的示例，使其仅从提供的 HTML 类名加载内容：

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio"

const loaderWithSelector = new CheerioWebBaseLoader("https://news.ycombinator.com/item?id=34817881", {
  selector: "p",
});

const docsWithSelector = await loaderWithSelector.load();
docsWithSelector[0].pageContent;

Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.

API 参考

有关所有 CheerioWebBaseLoader 功能和配置的详细文档，请参阅 API 参考。

将这些文档连接到 Claude、VSCode 等，通过 MCP 获取实时答案。

在 GitHub 上编辑此页面或提交问题。

Popular Providers

General integrations

RAG integrations

概述

集成详情

加载器特性

设置

凭证

安装

实例化

加载

附加配置

API 参考

​概述

​集成详情

​加载器特性

​设置

​凭证

​安装

​实例化

​加载

​附加配置

​API 参考

概述

集成详情

加载器特性

设置

凭证

安装

实例化

加载

附加配置

API 参考