LLM Graph Builder: Neo4j's RAG-to-Graph Pipeline

Q: "What is LLM Graph Builder?"

"LLM Graph Builder is an open-source tool from Neo4j Labs that uses large language models to automatically convert unstructured documents into structured knowledge graphs. It extracts entities and relationships from text and maps them directly into a Neo4j graph database."

Q: "How does it differ from traditional RAG?"

"Traditional RAG retrieves document chunks by vector similarity, which lacks structural understanding. GraphRAG, enabled by LLM Graph Builder, preserves entity relationships and hierarchies, enabling multi-hop reasoning queries that vector search alone cannot support accurately."

Q: "What document formats are supported?"

"LLM Graph Builder supports PDF, HTML, Markdown, JSON, CSV, XML, and plain text files. Documents can be uploaded directly through the UI or ingested from URLs, S3 buckets, Google Drive, and SharePoint. The system handles both structured and semi-structured content."

Q: "Which LLMs can I use for extraction?"

"The tool supports OpenAI (GPT-4o), Anthropic (Claude 3/4), Google (Gemini), and local models through Ollama. The choice of LLM affects extraction quality and cost, with stronger models typically producing more accurate entity and relationship identification."

Q: "What is GraphRAG?"

"GraphRAG (Graph-based Retrieval-Augmented Generation) is an evolution of RAG that represents knowledge as a graph of entities and relationships rather than flat document chunks. This enables the LLM to traverse connections between concepts, answer multi-hop questions, and provide more contextually grounded responses."

LLM Graph Builder converts unstructured documents into Neo4j knowledge graphs using LLMs, enabling GraphRAG with entity and relationship extraction.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 05, 2026 5 min read

The limitations of traditional Retrieval-Augmented Generation (RAG) have become increasingly clear as organizations deploy AI systems in production. Vector search – the backbone of conventional RAG – does a reasonable job of finding semantically similar document chunks, but it fundamentally lacks structural understanding. It cannot express that “Apple acquired Beats in 2014” involves a relationship between two entities with a specific type and date. It cannot follow a chain of relationships across multiple documents. It treats the knowledge base as a flat bag of vectors rather than an interconnected web of facts.

Neo4j’s LLM Graph Builder addresses this limitation by bridging the gap between large language models and graph databases. It is an open-source tool that uses LLMs to automatically extract entities and relationships from unstructured documents, then populates a Neo4j knowledge graph with the resulting structured data. The output is a GraphRAG pipeline that combines the semantic understanding of LLMs with the structural precision of graph databases.

The workflow is elegantly simple on the surface: upload documents, select an LLM, click a button, and receive a fully populated knowledge graph. Behind the scenes, LLM Graph Builder orchestrates a complex pipeline of document parsing, chunking, entity extraction, relationship mapping, ontology enforcement, and graph population – all without requiring the user to write any extraction rules or graph schemas.

Pipeline Architecture

The full document-to-graph pipeline operates in six stages:

Stage	Process	Output
Ingestion	Load documents from file, URL, or cloud storage	Raw text corpus
Chunking	Split documents into LLM-context-sized segments	Text chunks with metadata
Extraction	LLM identifies entities and relationships	Extracted triples (subject-predicate-object)
Validation	Cross-reference extractions, resolve conflicts	Validated entity graph
Ontology Mapping	Map entities to schema nodes and relationships	Graph-compatible structure
Population	Write nodes, edges, and properties to Neo4j	Live knowledge graph

GraphRAG Query Flow

The following diagram shows how GraphRAG enhances the standard RAG pipeline by leveraging the knowledge graph:

flowchart TD
    Q[User Question] --> Router{Query Router}
    Router -->|"Simple fact<br>lookup"| Vector[Vector Search<br>Semantic Chunks]
    Router -->|"Multi-hop<br>relationship query"| Graph[Graph Query<br>Cypher Traversal]
    Router -->|"Complex<br>reasoning"| Hybrid[Hybrid Search<br>Graph + Vector]

    Vector --> Context1[Retrieved Chunks]
    Graph --> Context2[Graph Subgraph]
    Hybrid --> Context3[Combined Context]

    Context1 --> LLM1[LLM Response]
    Context2 --> LLM2[LLM Response]
    Context3 --> LLM3[LLM Response]

    LLM1 --> Answer[Final Answer]
    LLM2 --> Answer
    LLM3 --> Answer

The query router is the key innovation. Simple factual questions go to vector search for speed. Questions requiring relationship traversal – “Which products were developed by companies acquired by Google in the last five years?” – are routed to the graph query engine. Complex questions use both sources, combining the broad coverage of vector search with the structural precision of graph traversal.

Entity Extraction Quality

The quality of entity extraction varies significantly by LLM and document type. The following table shows benchmark results across commonly used models:

Model	Entity Precision	Relationship Accuracy	Coverage	Speed	Cost per 1000 docs
GPT-4o	94%	89%	92%	Fast	$12.50
Claude 3.5 Sonnet	96%	91%	93%	Fast	$10.00
Claude 4 Sonnet	97%	93%	95%	Very Fast	$10.00
Gemini 1.5 Pro	91%	85%	88%	Moderate	$8.00
Llama 3 (local)	82%	74%	79%	Slow	Free
Qwen 2.5 (local)	80%	71%	76%	Slow	Free

Enterprise users typically prefer Claude 4 Sonnet for its best-in-class entity precision and relationship accuracy, while smaller teams or privacy-sensitive deployments may opt for local Llama models despite lower extraction quality.

Getting Started

To begin building knowledge graphs from your documents, visit the LLM Graph Builder GitHub repository. The repository includes Docker Compose files for a complete stack (LLM Graph Builder + Neo4j), sample documents for testing, and integration guides for connecting to different LLM providers.

The Neo4j GraphRAG documentation provides comprehensive guides for building GraphRAG applications, including query optimization, schema design, and performance tuning.

FAQ

What is LLM Graph Builder?

LLM Graph Builder is an open-source tool from Neo4j Labs that uses large language models to automatically convert unstructured documents into structured knowledge graphs. It extracts entities and relationships from text and maps them directly into a Neo4j graph database.

How does it differ from traditional RAG?

Traditional RAG retrieves document chunks by vector similarity, which lacks structural understanding. GraphRAG, enabled by LLM Graph Builder, preserves entity relationships and hierarchies, enabling multi-hop reasoning queries that vector search alone cannot support accurately.

What document formats are supported?

LLM Graph Builder supports PDF, HTML, Markdown, JSON, CSV, XML, and plain text files. Documents can be uploaded directly through the UI or ingested from URLs, S3 buckets, Google Drive, and SharePoint. The system handles both structured and semi-structured content.

Which LLMs can I use for extraction?

The tool supports OpenAI (GPT-4o), Anthropic (Claude 3/4), Google (Gemini), and local models through Ollama. The choice of LLM affects extraction quality and cost, with stronger models typically producing more accurate entity and relationship identification.

What is GraphRAG?

GraphRAG (Graph-based Retrieval-Augmented Generation) is an evolution of RAG that represents knowledge as a graph of entities and relationships rather than flat document chunks. This enables the LLM to traverse connections between concepts, answer multi-hop questions, and provide more contextually grounded responses.

LLM Graph Builder: Neo4j's RAG-to-Graph Pipeline

Pipeline Architecture

GraphRAG Query Flow

Entity Extraction Quality

Getting Started

FAQ

What is LLM Graph Builder?

How does it differ from traditional RAG?

What document formats are supported?

Which LLMs can I use for extraction?

What is GraphRAG?

Further Reading

LATEST POST

Workday, Anthropic, and LISC Join Forces to Launch AI Solopreneurship Accelerato

Sensor Tower Acquires AppMagic, Filling SMB Data Analytics Gap

Musk, Cook, and Fink Expected to Join Trump's Delegation to Beijing This Week

TAG

CATEGORIES