Google Cloud Storage 是一种用于存储非结构化数据的托管服务。本文介绍如何从
Google Cloud Storage (GCS) 目录(存储桶) 加载文档对象。
Copy
pip install -qU langchain-google-community[gcs]
Copy
from langchain_google_community import GCSDirectoryLoader
Copy
loader = GCSDirectoryLoader(project_name="aist", bucket="testing-hwc")
Copy
loader.load()
Copy
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Copy
[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpz37njh7u/fake.docx'}, lookup_index=0)]
指定前缀
您还可以指定一个前缀,以更精细地控制要加载的文件范围——包括加载特定文件夹中的所有文件。Copy
loader = GCSDirectoryLoader(project_name="aist", bucket="testing-hwc", prefix="fake")
Copy
loader.load()
Copy
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Copy
[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpylg6291i/fake.docx'}, lookup_index=0)]
单个文件加载失败时继续执行
GCS 存储桶中的文件在处理过程中可能会引发错误。启用continue_on_failure=True 参数可允许静默失败,即处理单个文件失败时不会中断整个函数,而是记录一条警告日志。
Copy
loader = GCSDirectoryLoader(
project_name="aist", bucket="testing-hwc", continue_on_failure=True
)
Copy
loader.load()
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

