The limitations of traditional Retrieval-Augmented Generation (RAG) have become increasingly clear as organizations deploy AI systems in production. Vector search – the backbone of conventional RAG – does a reasonable job of finding semantically similar document chunks, but it fundamentally lacks structural understanding. It cannot express that “Apple acquired Beats in 2014” involves a relationship between two entities with a specific type and date. It cannot follow a chain of relationships across multiple documents. It treats the knowledge base as a flat bag of vectors rather than an interconnected web of facts.
Neo4j’s LLM Graph Builder addresses this limitation by bridging the gap between large language models and graph databases. It is an open-source tool that uses LLMs to automatically extract entities and relationships from unstructured documents, then populates a Neo4j knowledge graph with the resulting structured data. The output is a GraphRAG pipeline that combines the semantic understanding of LLMs with the structural precision of graph databases.
The workflow is elegantly simple on the surface: upload documents, select an LLM, click a button, and receive a fully populated knowledge graph. Behind the scenes, LLM Graph Builder orchestrates a complex pipeline of document parsing, chunking, entity extraction, relationship mapping, ontology enforcement, and graph population – all without requiring the user to write any extraction rules or graph schemas.
Pipeline Architecture
The full document-to-graph pipeline operates in six stages:
| Stage | Process | Output |
|---|---|---|
| Ingestion | Load documents from file, URL, or cloud storage | Raw text corpus |
| Chunking | Split documents into LLM-context-sized segments | Text chunks with metadata |
| Extraction | LLM identifies entities and relationships | Extracted triples (subject-predicate-object) |
| Validation | Cross-reference extractions, resolve conflicts | Validated entity graph |
| Ontology Mapping | Map entities to schema nodes and relationships | Graph-compatible structure |
| Population | Write nodes, edges, and properties to Neo4j | Live knowledge graph |
GraphRAG Query Flow
The following diagram shows how GraphRAG enhances the standard RAG pipeline by leveraging the knowledge graph:
flowchart TD
Q[User Question] --> Router{Query Router}
Router -->|"Simple fact<br>lookup"| Vector[Vector Search<br>Semantic Chunks]
Router -->|"Multi-hop<br>relationship query"| Graph[Graph Query<br>Cypher Traversal]
Router -->|"Complex<br>reasoning"| Hybrid[Hybrid Search<br>Graph + Vector]
Vector --> Context1[Retrieved Chunks]
Graph --> Context2[Graph Subgraph]
Hybrid --> Context3[Combined Context]
Context1 --> LLM1[LLM Response]
Context2 --> LLM2[LLM Response]
Context3 --> LLM3[LLM Response]
LLM1 --> Answer[Final Answer]
LLM2 --> Answer
LLM3 --> AnswerThe query router is the key innovation. Simple factual questions go to vector search for speed. Questions requiring relationship traversal – “Which products were developed by companies acquired by Google in the last five years?” – are routed to the graph query engine. Complex questions use both sources, combining the broad coverage of vector search with the structural precision of graph traversal.
Entity Extraction Quality
The quality of entity extraction varies significantly by LLM and document type. The following table shows benchmark results across commonly used models:
| Model | Entity Precision | Relationship Accuracy | Coverage | Speed | Cost per 1000 docs |
|---|---|---|---|---|---|
| GPT-4o | 94% | 89% | 92% | Fast | $12.50 |
| Claude 3.5 Sonnet | 96% | 91% | 93% | Fast | $10.00 |
| Claude 4 Sonnet | 97% | 93% | 95% | Very Fast | $10.00 |
| Gemini 1.5 Pro | 91% | 85% | 88% | Moderate | $8.00 |
| Llama 3 (local) | 82% | 74% | 79% | Slow | Free |
| Qwen 2.5 (local) | 80% | 71% | 76% | Slow | Free |
Enterprise users typically prefer Claude 4 Sonnet for its best-in-class entity precision and relationship accuracy, while smaller teams or privacy-sensitive deployments may opt for local Llama models despite lower extraction quality.
Getting Started
To begin building knowledge graphs from your documents, visit the LLM Graph Builder GitHub repository. The repository includes Docker Compose files for a complete stack (LLM Graph Builder + Neo4j), sample documents for testing, and integration guides for connecting to different LLM providers.
The Neo4j GraphRAG documentation provides comprehensive guides for building GraphRAG applications, including query optimization, schema design, and performance tuning.
FAQ
What is LLM Graph Builder?
LLM Graph Builder is an open-source tool from Neo4j Labs that uses large language models to automatically convert unstructured documents into structured knowledge graphs. It extracts entities and relationships from text and maps them directly into a Neo4j graph database.
How does it differ from traditional RAG?
Traditional RAG retrieves document chunks by vector similarity, which lacks structural understanding. GraphRAG, enabled by LLM Graph Builder, preserves entity relationships and hierarchies, enabling multi-hop reasoning queries that vector search alone cannot support accurately.
What document formats are supported?
LLM Graph Builder supports PDF, HTML, Markdown, JSON, CSV, XML, and plain text files. Documents can be uploaded directly through the UI or ingested from URLs, S3 buckets, Google Drive, and SharePoint. The system handles both structured and semi-structured content.
Which LLMs can I use for extraction?
The tool supports OpenAI (GPT-4o), Anthropic (Claude 3/4), Google (Gemini), and local models through Ollama. The choice of LLM affects extraction quality and cost, with stronger models typically producing more accurate entity and relationship identification.
What is GraphRAG?
GraphRAG (Graph-based Retrieval-Augmented Generation) is an evolution of RAG that represents knowledge as a graph of entities and relationships rather than flat document chunks. This enables the LLM to traverse connections between concepts, answer multi-hop questions, and provide more contextually grounded responses.
Further Reading
- LLM Graph Builder GitHub Repository – Source code, documentation, and sample projects
- Neo4j GenAI Integration Documentation – Building GraphRAG applications with Neo4j
- Microsoft GraphRAG Paper – The original research behind GraphRAG methodology
- Memgraph Database Guide – Alternative graph database for real-time processing
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!