Neo4j 是一个内置向量相似度搜索支持的开源图数据库。它支持:
- 近似最近邻搜索
- 欧式相似度和余弦相似度
- 结合向量搜索和关键词搜索的混合搜索
Neo4jVector)。
请参阅安装说明。
Copy
# Pip install necessary package
pip install -qU neo4j
pip install -qU langchain-openai langchain-neo4j
pip install -qU tiktoken
OpenAIEmbeddings,因此需要获取 OpenAI API 密钥。
Copy
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Copy
OpenAI API Key: ········
Copy
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
Copy
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
Copy
# Neo4jVector requires the Neo4j database credentials
url = "bolt://localhost:7687"
username = "neo4j"
password = "password"
# You can also use environment variables instead of directly passing named parameters
# os.environ["NEO4J_URI"] = "bolt://localhost:7687"
# os.environ["NEO4J_USERNAME"] = "neo4j"
# os.environ["NEO4J_PASSWORD"] = "pleaseletmein"
使用余弦距离的相似度搜索(默认)
Copy
# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.
db = Neo4jVector.from_documents(
docs, OpenAIEmbeddings(), url=url, username=username, password=password
)
Copy
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)
Copy
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
Copy
--------------------------------------------------------------------------------
Score: 0.9076391458511353
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you're at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I'd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 0.8912242650985718
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she's been nominated, she's received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, we've installed new technology like cutting-edge scanners to better detect drug smuggling.
We've set up joint patrols with Mexico and Guatemala to catch more human traffickers.
We're putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
We're securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
--------------------------------------------------------------------------------
使用现有向量存储
上面我们从头创建了一个向量存储。然而,通常我们希望使用现有的向量存储。为此,我们可以直接初始化它。Copy
index_name = "vector" # default index name
store = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=index_name,
)
from_existing_graph 方法从现有图中初始化向量存储。此方法从数据库中提取相关文本信息,并将文本嵌入计算后存储回数据库。
Copy
# First we create sample data in graph
store.query(
"CREATE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle', age: 33})"
)
Copy
[]
Copy
# Now we initialize from existing graph
existing_graph = Neo4jVector.from_existing_graph(
embedding=OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
node_label="Person",
text_node_properties=["name", "location"],
embedding_node_property="embedding",
)
result = existing_graph.similarity_search("Slovenia", k=1)
Copy
result[0]
Copy
Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})
Copy
# First we create sample data and index in graph
store.query(
"MERGE (p:Person {name: 'Tomaz'}) "
"MERGE (p1:Person {name:'Leann'}) "
"MERGE (p1)-[:FRIEND {text:'example text', embedding:$embedding}]->(p2)",
params={"embedding": OpenAIEmbeddings().embed_query("example text")},
)
# Create a vector index
relationship_index = "relationship_vector"
store.query(
"""
CREATE VECTOR INDEX $relationship_index
IF NOT EXISTS
FOR ()-[r:FRIEND]-() ON (r.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
""",
params={"relationship_index": relationship_index},
)
Copy
[]
Copy
relationship_vector = Neo4jVector.from_existing_relationship_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=relationship_index,
text_node_property="text",
)
relationship_vector.similarity_search("Example")
Copy
[Document(page_content='example text')]
元数据过滤
Neo4j 向量存储还支持通过结合并行运行时和精确最近邻搜索来进行元数据过滤。 需要 Neo4j 5.18 或更高版本。 等值过滤的语法如下:Copy
existing_graph.similarity_search(
"Slovenia",
filter={"hobby": "Bicycle", "name": "Tomaz"},
)
Copy
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]
$eq: 等于$ne: 不等于$lt: 小于$lte: 小于或等于$gt: 大于$gte: 大于或等于$in: 在值列表中$nin: 不在值列表中$between: 在两个值之间$like: 文本包含值$ilike: 小写文本包含值
Copy
existing_graph.similarity_search(
"Slovenia",
filter={"hobby": {"$eq": "Bicycle"}, "age": {"$gt": 15}},
)
Copy
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]
OR 运算符
Copy
existing_graph.similarity_search(
"Slovenia",
filter={"$or": [{"hobby": {"$eq": "Bicycle"}}, {"age": {"$gt": 15}}]},
)
Copy
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]
添加文档
我们可以向现有向量存储添加文档。Copy
store.add_documents([Document(page_content="foo")])
Copy
['acbd18db4cc2f85cedef654fccc4a4d8']
Copy
docs_with_score = store.similarity_search_with_score("foo")
Copy
docs_with_score[0]
Copy
(Document(page_content='foo'), 0.9999997615814209)
使用检索查询自定义响应
您也可以通过使用自定义 Cypher 代码片段来自定义响应,该片段可从图中获取其他信息。 在底层,最终的 Cypher 语句构造如下:Copy
read_query = (
"CALL db.index.vector.queryNodes($index, $k, $embedding) "
"YIELD node, score "
) + retrieval_query
text: Union[str, Dict] = 用于填充文档page_content的值score: Float = 相似度分数metadata: Dict = 文档的附加元数据
Copy
retrieval_query = """
RETURN "Name:" + node.name AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
Copy
[Document(page_content='Name:Tomaz', metadata={'foo': 'bar'})]
embedding 之外的所有节点属性作为字典传递给 text 列的示例:
Copy
retrieval_query = """
RETURN node {.name, .age, .hobby} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
Copy
[Document(page_content='name: Tomaz\nage: 33\nhobby: Bicycle\n', metadata={'foo': 'bar'})]
Copy
retrieval_query = """
RETURN node {.*, embedding:Null, extra: $extra} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1, params={"extra": "ParamInfo"})
Copy
[Document(page_content='location: Slovenia\nextra: ParamInfo\nname: Tomaz\nage: 33\nhobby: Bicycle\nembedding: None\n', metadata={'foo': 'bar'})]
混合搜索(向量 + 关键词)
Neo4j 同时集成了向量索引和关键词索引,允许您使用混合搜索方法Copy
# The Neo4jVector Module will connect to Neo4j and create a vector and keyword indices if needed.
hybrid_db = Neo4jVector.from_documents(
docs,
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
search_type="hybrid",
)
Copy
index_name = "vector" # default index name
keyword_index_name = "keyword" # default keyword index name
store = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=index_name,
keyword_index_name=keyword_index_name,
search_type="hybrid",
)
检索器选项
本节展示如何将Neo4jVector 用作检索器。
Copy
retriever = store.as_retriever()
retriever.invoke(query)[0]
Copy
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you're at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I'd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt'})
带来源的问答
本节介绍如何使用RetrievalQAWithSourcesChain 对索引进行带来源的问答。
Copy
from langchain_classic.chains import RetrievalQAWithSourcesChain
from langchain_openai import ChatOpenAI
Copy
chain = RetrievalQAWithSourcesChain.from_chain_type(
ChatOpenAI(temperature=0), chain_type="stuff", retriever=retriever
)
Copy
chain.invoke(
{"question": "What did the president say about Justice Breyer"},
return_only_outputs=True,
)
Copy
{'answer': 'The president honored Justice Stephen Breyer for his service to the country and mentioned his retirement from the United States Supreme Court.\n',
'sources': '../../how_to/state_of_the_union.txt'}
通过 MCP 将这些文档连接到 Claude、VSCode 等工具,获取实时解答。

