AI

LLM Graph Builder: Neo4j's RAG-to-Graph Pipeline

LLM Graph Builder converts unstructured documents into Neo4j knowledge graphs using LLMs, enabling GraphRAG with entity and relationship extraction.

Keeping this site alive takes effort — your support means everything.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分! 無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!
LLM Graph Builder: Neo4j's RAG-to-Graph Pipeline

The limitations of traditional Retrieval-Augmented Generation (RAG) have become increasingly clear as organizations deploy AI systems in production. Vector search – the backbone of conventional RAG – does a reasonable job of finding semantically similar document chunks, but it fundamentally lacks structural understanding. It cannot express that “Apple acquired Beats in 2014” involves a relationship between two entities with a specific type and date. It cannot follow a chain of relationships across multiple documents. It treats the knowledge base as a flat bag of vectors rather than an interconnected web of facts.

Neo4j’s LLM Graph Builder addresses this limitation by bridging the gap between large language models and graph databases. It is an open-source tool that uses LLMs to automatically extract entities and relationships from unstructured documents, then populates a Neo4j knowledge graph with the resulting structured data. The output is a GraphRAG pipeline that combines the semantic understanding of LLMs with the structural precision of graph databases.

The workflow is elegantly simple on the surface: upload documents, select an LLM, click a button, and receive a fully populated knowledge graph. Behind the scenes, LLM Graph Builder orchestrates a complex pipeline of document parsing, chunking, entity extraction, relationship mapping, ontology enforcement, and graph population – all without requiring the user to write any extraction rules or graph schemas.

Pipeline Architecture

The full document-to-graph pipeline operates in six stages:

StageProcessOutput
IngestionLoad documents from file, URL, or cloud storageRaw text corpus
ChunkingSplit documents into LLM-context-sized segmentsText chunks with metadata
ExtractionLLM identifies entities and relationshipsExtracted triples (subject-predicate-object)
ValidationCross-reference extractions, resolve conflictsValidated entity graph
Ontology MappingMap entities to schema nodes and relationshipsGraph-compatible structure
PopulationWrite nodes, edges, and properties to Neo4jLive knowledge graph

GraphRAG Query Flow

The following diagram shows how GraphRAG enhances the standard RAG pipeline by leveraging the knowledge graph:

The query router is the key innovation. Simple factual questions go to vector search for speed. Questions requiring relationship traversal – “Which products were developed by companies acquired by Google in the last five years?” – are routed to the graph query engine. Complex questions use both sources, combining the broad coverage of vector search with the structural precision of graph traversal.

Entity Extraction Quality

The quality of entity extraction varies significantly by LLM and document type. The following table shows benchmark results across commonly used models:

ModelEntity PrecisionRelationship AccuracyCoverageSpeedCost per 1000 docs
GPT-4o94%89%92%Fast$12.50
Claude 3.5 Sonnet96%91%93%Fast$10.00
Claude 4 Sonnet97%93%95%Very Fast$10.00
Gemini 1.5 Pro91%85%88%Moderate$8.00
Llama 3 (local)82%74%79%SlowFree
Qwen 2.5 (local)80%71%76%SlowFree

Enterprise users typically prefer Claude 4 Sonnet for its best-in-class entity precision and relationship accuracy, while smaller teams or privacy-sensitive deployments may opt for local Llama models despite lower extraction quality.

Getting Started

To begin building knowledge graphs from your documents, visit the LLM Graph Builder GitHub repository. The repository includes Docker Compose files for a complete stack (LLM Graph Builder + Neo4j), sample documents for testing, and integration guides for connecting to different LLM providers.

The Neo4j GraphRAG documentation provides comprehensive guides for building GraphRAG applications, including query optimization, schema design, and performance tuning.

FAQ

What is LLM Graph Builder?

LLM Graph Builder is an open-source tool from Neo4j Labs that uses large language models to automatically convert unstructured documents into structured knowledge graphs. It extracts entities and relationships from text and maps them directly into a Neo4j graph database.

How does it differ from traditional RAG?

Traditional RAG retrieves document chunks by vector similarity, which lacks structural understanding. GraphRAG, enabled by LLM Graph Builder, preserves entity relationships and hierarchies, enabling multi-hop reasoning queries that vector search alone cannot support accurately.

What document formats are supported?

LLM Graph Builder supports PDF, HTML, Markdown, JSON, CSV, XML, and plain text files. Documents can be uploaded directly through the UI or ingested from URLs, S3 buckets, Google Drive, and SharePoint. The system handles both structured and semi-structured content.

Which LLMs can I use for extraction?

The tool supports OpenAI (GPT-4o), Anthropic (Claude 3/4), Google (Gemini), and local models through Ollama. The choice of LLM affects extraction quality and cost, with stronger models typically producing more accurate entity and relationship identification.

What is GraphRAG?

GraphRAG (Graph-based Retrieval-Augmented Generation) is an evolution of RAG that represents knowledge as a graph of entities and relationships rather than flat document chunks. This enables the LLM to traverse connections between concepts, answer multi-hop questions, and provide more contextually grounded responses.


Further Reading

TAG
CATEGORIES