大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

今天是2023年12月7日，星期四，北京，天气晴。

我们来继续看看大模型与知识图谱的一些结合工作。

图数据库是知识图谱产业中十分重要的一环，而在大模型时代下，现有图数据库也需要思变，其需要跟大模型进行结合。例如，Nebular-graphneo4jstar-dogtigergraph等，其与langchain结合，通通过大模型生成sql查询语句等，都是一些有趣的尝试。

本文对这些工作进行介绍，供大家一起思考。

一、NebulaGraph的Graph-RAG

文章《Knowledge Graphs & LLMs: Integrating Large Language Models with NebulaGraph》（地址：https://www.nebula-graph.io/posts/knowledge-graphs-via-natural-language）中，给出了nebular-graph与大模型的一些尝试。

我们在之前的文章《基于知识图谱的大模型检索增强实现策略：Graph RAG实现基本原理及优化思路》（地址：https://mp.weixin.qq.com/s/ulhu7qj93d3PRWoUpNcCug）中，对基于知识图谱召回来进行大模型问答增强的方案进行了介绍。

Graph RAG是由悦数图数据提出的概念，是一种基于知识图谱的检索增强技术，通过构建图模型的知识表达，将实体和关系之间的联系用图的形式进行展示，然后利用大语言模型 LLM进行检索增强。

Graph RAG 将知识图谱等价于一个超大规模的词汇表，而实体和关系则对应于单词。通过这种方式，Graph RAG 在检索时能够将实体和关系作为单元进行联合建模。

一个简单的 Graph RAG 思想在于，对用户输入的query提取实体，然后构造子图形成上下文，最后送入大模型完成生成，如下代码所示：

def simple_graph_rag(query_str, nebulagraph_store, llm): entities = _get_key_entities(query_str, llm) graph_rag_context = _retrieve_subgraph_context(entities) return _synthesize_answer( query_str, graph_rag_context, llm)

首先，使用LLM(或其他)模型从问题中提取关键实体。

def _get_key_entities(query_str, llm=None ,with_llm=True): ... return _expand_synonyms(entities)

其次，根据这些实体检索子图，深入到一定的深度，例如可以是2度甚至更多。

def _retrieve_subgraph_context(entities, depth=2, limit=30): ... return nebulagraph_store.get_relations(entities, depth, limit)

最后，利用获得的上下文利用LLM产生答案。

def _synthesize_answer(query_str, graph_rag_context, llm): return llm.predict(PROMPT_SYNTHESIZE_AND_REFINE, query_str, graph_rag_context)

这样一来，知识图谱召回可以作为一路和传统的召回进行融合。

例如，下图所示，当用户输入，tell me about Peter quill时，先识别关键词quil，编写cypher语句获得二跳结果。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

又如，如下例子来说明：

用户输入：Tell me events about NASA

得到关键词：Query keywords: [‘NASA’, ‘events’]

召回二度逻辑：

Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]

nasa [‘public release date’, ‘mid-2023’]
nasa [‘announces’, ‘future space telescope programs’]
nasa [‘publishes images of’, ‘debris disk’]
nasa [‘discovers’, ‘exoplanet lhs 475 b’]

送入LLM完成问答。

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me events about NASA > Starting query: Tell me events about NASA > Starting query: Tell me events about NASA INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['NASA', 'events'] > Query keywords: ['NASA', 'events'] > Query keywords: ['NASA', 'events'] INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` nasa ['public release date', 'mid-2023'] nasa ['announces', 'future space telescope programs'] nasa ['publishes images of', 'debris disk'] nasa ['discovers', 'exoplanet lhs 475 b'] > Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` nasa ['public release date', 'mid-2023'] nasa ['announces', 'future space telescope programs'] nasa ['publishes images of', 'debris disk'] nasa ['discovers', 'exoplanet lhs 475 b'] > Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` nasa ['public release date', 'mid-2023'] nasa ['announces', 'future space telescope programs'] nasa ['publishes images of', 'debris disk'] nasa ['discovers', 'exoplanet lhs 475 b'] INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 159 tokens > [get_response] Total LLM token usage: 159 tokens > [get_response] Total LLM token usage: 159 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 159 tokens > [get_response] Total LLM token usage: 159 tokens > [get_response] Total LLM token usage: 159 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens

二、TigerGraph与langchain的结合

TigerGraph的《Integrating TigerGraph and Large Language Models for Generative AI》（地址：https://www.tigergraph.com/blog/integrating-tigergraph-and-llms-for-generative-ai/）中，也是从图数据库的角度，谈了谈与大模型的结合。

其中，一个demo级别的尝试，是LangChain与TigerGraph的Python驱动程序pyTigerGraph集成。

具体的实例可以参考：https://github.com/tigergraph/graph-ml-notebooks/blob/main/applications/large_language_models/TigerGraph_LangChain_Demo.ipynb

1、具体执行逻辑

三个用于与TigerGraph交互的工具：MapQuestionToSchema、GenerateFunction和ExecuteFunction。

LangChain将LLM包装成一个代理，该代理可以自行推理和执行一系列任务。在这种情况下，无论向数据库提出什么问题，代理都会执行相同的一般执行流程。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

首先，使用工具MapQuestionToSchema将问题映射到图表的schema。 MapQuestionToSchema要求LLM翻译用户的问题以利用标准模式元素。例如，如果有“机构”的顶点类型，并且用户问“有多少所大学？”，该工具将返回“有多少个机构顶点？”的问题。工具中的LLM会自动将模式元素的同义词转换为其标准化形式。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

然后，标准化问题传递给GenerateFunction工具，该工具填充正确的pyTigerGraph函数调用以在数据库上运行。 GenerateFunction然后使用另一个LLM创建pyTigerGraph函数调用。对于问题“有多少个机构顶点？”然后将转换为“getVertexCount（“机构”）”。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

最后，该函数调用在ExecuteFunction工具中运行，这返回来自数据库的回复，然后由代理解析并以自然英语陈述答案。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

2、一个具体的例子

使用使用OGB MAG数据集的例子进行说明，包括有关论文、作者、主题和机构的信息，以及以下模式中显示的关系。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

例如，询问：“how many papers are there?”简单问题时，利用langchain可以完成答案搜索。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

又如，询问：“how many papers have a y attribute equal to 1?”复杂问题时。

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

三、stardog、neo4j与大模型结合的尝试

文章《How AI Uses Stardog》（地址：https://www.stardog.com/blog/how-ai-uses-stardog/）中介绍了stardog数据库与大模型结合的案例

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

此外，对于neo4j而言，地址https://neo4j.com/generativeAI/中也加上了图数据库与大模型之间的融合工作：

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

其提出neo4j可以从以下四个方面进行工作：

在LangChain、LlamaIndex等协调框架中直接使用Neo4j；在Neo4j知识图谱中添加矢量嵌入并建立索引；利用云端和本地的所有模型提供者为用户输入生成嵌入信息；通过向量索引中的相似性搜索查找最相关的节点，并从知识图谱中检索上下文信息；用用户问题提示任何 LLM（云或本地）进行自然语言搜索；用检索增强生成的上下文信息将LLM置于基建位置。

import neo4j import langchain.embeddings import langchain.chat_models import langchain.prompts.chat

emb = OpenAIEmbeddings() # VertexAIEmbeddings() or BedrockEmbeddings() or …
llm = ChatOpenAI() # ChatVertexAI() or BedrockChat() or ChatOllama() …

vector = emb.embed_query(user_input)

vectory_query = “”“
// find products by similarity search in vector index
CALL db.index.vector.queryNodes(‘products’, 5, $embedding) yield node as product, score

// enrich with additional explicit relationships from the knowledge graph
MATCH (product)-[:HAS_CATEGORY]->(cat), (product)-[:BY_BRAND]->(brand)
MATCH (product)-[:HAS_REVIEW]->(review {rating:5})<-[:WROTE]-(customer)

// return relevant contextual information
RETURN product.Name, product.Description, brand.Name, cat.Name,
collect(review { .Date, .Text })[0..5] as reviews, score
““”

records = neo4j.driver.execute_query(vectory_query, embedding = vector)
context = format_context(records)

template = “”“
You are a helpful assistant that helps users find information for their shopping needs.
Only use the context provided, do not add any additional information.
Context: {context}
User question: {question}
““”
chain = prompt(template) | llm

answer = chain.invoke({“question”:user_input, “context”:context}).content

总结

本文主要介绍了现有图数据库跟大模型结合的一些动作，他们也在思变，可以看看。例如，Nebular-graphneo4jstar-dogtigergraph等，其与langchain结合，通通过大模型生成sql查询语句等，但具体效果如何，在真实场景下如何落地，还需要进一步观察和实践。

参考文献

1、https://www.tigergraph.com/blog/integrating-tigergraph-and-llms-for-generative-ai/

2、https://www.nebula-graph.io/

3、https://www.stardog.com/blog/how-ai-uses-stardog/

4、https://neo4j.com/

5、https://www.nebula-graph.io/posts/knowledge-graphs-via-natural-language

关于我们

老刘，刘焕勇，NLP开源爱好者与践行者，主页：https://liuhuanyong.github.io。

老刘说NLP，将定期发布语言资源、工程实践、技术总结等内容，欢迎关注。

对于想加入更优质的知识图谱、事件图谱、大模型AIGC实践、相关分享的，可关注公众号，在后台菜单栏中点击会员社区->会员入群加入。

ufabet มีเกมให้เลือกเล่นมากมาย: เกมเดิมพันหลากหลาย ครบทุกค่ายดัง

tornado crypto mixer Discover the power of privacy with TornadoCash! Learn how this decentralized mixer ensures your transactions remain confidential.

ดูบอลสด Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

ดูบอลสด Pretty! This has been a really wonderful post. Many thanks for providing these details.

ดูบอลสด Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

Obrazy Sztuka Nowoczesna Thank you for this wonderful contribution to the topic. Your ability to explain complex ideas simply is admirable.

ufabet Hi there to all, for the reason that I am genuinely keen of reading this website’s post to be updated on a regular basis. It carries pleasant stuff.

ufabet You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality!

ufabet Very well presented. Every quote was awesome and thanks for sharing the content. Keep sharing and keep motivating others.

大模型时代下KG与图数据库的思变：从NebulaGraph、TigerGraph、Neo4j与大模型的结合尝试

一、NebulaGraph的Graph-RAG

二、TigerGraph与langchain的结合

三、stardog、neo4j与大模型结合的尝试

总结

参考文献

关于我们

长城汽车自研芯片点亮！提前布局下一代架构RISC-V，魏建军：不能再受制于人

英特尔最强服务器CPU来了！AI性能直接翻倍

支付宝进军大模型医疗应用，技术一号位：我们有4个切入点

利用公开知识定向提升大模型，腾讯优图&上交大新方法性能达SOTA

腾势Z9GT上市33.48万元起，标配易三方高阶智驾

高通被曝求购英特尔，手机芯片王者并购PC芯片王者！需要中国同意

最癫AI社交App上线3天爆火！注册即送百万粉丝，网友警告：别试，上瘾

AI“大姨”现场刁难智能客服！直击一群AI打PK赛，真能落地的那种

浩鲸科技鲸智BI大模型发布，从算法炫技到价值落地

GPT-4o能玩《黑神话》！精英怪胜率超人类，无强化学习纯大模型方案