Understanding unfamiliar codebases is one of the hardest challenges in software development. Code-Graph by FalkorDB tackles this problem in a novel way: by transforming source code repositories into fully queryable knowledge graphs that you can interrogate in natural language.
Instead of reading through files linearly or relying on code search tools that treat code as flat text, Code-Graph analyzes your codebase at the Abstract Syntax Tree (AST) level, extracting every significant entity – classes, functions, methods, modules, arguments, variables – and mapping their relationships into a property graph stored in FalkorDB. The result is a structured, navigable representation of your entire codebase that supports natural language queries like “Show me all classes that depend on the DatabaseConnection class” or “Find unused utility functions in the auth module.”
graph TD
A[GitHub Repository URL] --> B[AST Parser]
B --> C[Entity Extraction]
B --> D[Relationship Mapping]
C --> E[FalkorDB Property Graph]
D --> E
F[Natural Language Query] --> G[LLM Cypher Translation]
G --> E
E --> H[Query Results]
H --> I[LLM Answer Summary]
I --> J[Developer]What Is Code-Graph?
Code-Graph is an open-source developer tool that converts source code repositories into queryable knowledge graphs. It was created by the FalkorDB team to demonstrate the power of graph databases in code analysis scenarios, and it has grown into a standalone tool with active development and a growing feature set.
The tool operates in two phases. In the analysis phase, it downloads a repository, parses every source file with language-specific AST parsers, and builds a graph in FalkorDB where:
- Nodes represent code entities (classes, functions, modules, interfaces, variables)
- Edges represent relationships (CALLS, INHERITS_FROM, CONTAINS, HAS_ARGUMENT, DEPENDS_ON, IMPLEMENTS)
In the query phase, users can ask questions in plain English. An LLM (configurable between models like GPT-4o and Llama 3-70B) translates the question into an OpenCypher query, executes it against the FalkorDB graph, and summarizes the results into a natural language answer.
Which Programming Languages Are Supported?
Code-Graph currently supports three programming languages, with more on the roadmap:
| Language | Status | AST Parser Used |
|---|---|---|
| Python | Supported | Built-in Python AST module |
| Java | Supported | JavaParser |
| C# | Supported | Roslyn (Microsoft.CodeAnalysis) |
| C/C++ | Planned | TBD |
| JavaScript/TypeScript | Planned | TBD |
| Go | Planned | TBD |
Each language parser extracts language-specific entities and relationships. For example, Java parsing captures interfaces, abstract classes, annotations, and generic type parameters, while Python parsing captures duck-typing patterns, decorators, and module-level functions.
How Does Code-Graph Work in Detail?
The analysis pipeline consists of several stages:
| Stage | Input | Output | Description |
|---|---|---|---|
| Repository Clone | GitHub URL | Local git clone | Downloads the target repository |
| AST Parsing | Source files | Entity AST nodes | Language-specific parsers break code into structural elements |
| Entity Extraction | AST nodes | Graph nodes | Classes, functions, modules, arguments become FalkorDB nodes |
| Relationship Mapping | AST structure | Graph edges | CALLS, INHERITS_FROM, CONTAINS, DEPENDS_ON become edges |
| Graph Storage | Nodes + Edges | FalkorDB property graph | All data is persisted in FalkorDB with properties |
| Query Translation | Natural language | OpenCypher query | LLM converts English questions to graph queries |
| Result Rendering | Query results | Natural language answer | LLM summarizes returned graph data |
The entity extraction is particularly thorough. For each class, Code-Graph records its methods, fields, base classes, implemented interfaces, decorators (Python) or annotations (Java), and file location. For each function, it records parameters, return type, called functions, and access modifiers.
How Does LLM Integration Work?
The LLM integration is the key to Code-Graph’s usability. Instead of requiring developers to learn Cypher syntax, the tool accepts natural language queries and uses an LLM to translate them:
Example workflow:
- Developer asks: “Which classes in the data access layer use the connection pool?”
- LLM generates:
MATCH (c:Class)-[:CONTAINS]->(m:Method)-[:CALLS]->(f:Function) WHERE f.name CONTAINS 'getConnection' RETURN c.name, m.name - FalkorDB executes the query, returning matching classes and methods
- LLM summarizes the results in plain English
This approach makes codebase exploration accessible to junior developers, onboarding engineers, and anyone who needs to understand a codebase without memorizing its file structure. It also supports more advanced use cases like impact analysis (“If I change the DatabaseConnection class, which 15 other classes will be affected?”) and dead code detection (“List all public methods that are never called from anywhere in the codebase”).
Getting Started with Code-Graph
git clone https://github.com/FalkorDB/code-graph.git
cd code-graph
npm install
docker run -p 6379:6379 -it --rm falkordb/falkordb
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY
npm run dev
Open http://localhost:3000/ in your browser, enter any public GitHub repository URL, and Code-Graph will analyze it and present an interactive query interface. A live demo is available at code-graph.falkordb.com.
FAQ
What is Code-Graph? Code-Graph is an open-source tool by FalkorDB that analyzes source code repositories and creates queryable knowledge graphs. It uses static AST analysis to extract code entities and their relationships, stores them in FalkorDB, and enables natural language querying via LLMs.
Which programming languages are supported? Code-Graph currently supports Python, Java, and C#. Support for C, JavaScript, and Go is planned for future releases.
How does Code-Graph work? It parses source files using AST (Abstract Syntax Tree) parsers to extract entities like classes, functions, modules, and variables. These become nodes in a FalkorDB graph, with edges representing relationships like CALLS, INHERITS_FROM, CONTAINS, and DEPENDS_ON.
How does LLM integration work? Code-Graph uses a GraphRAG pipeline that translates natural language questions (e.g., ‘Which classes implement the Strategy pattern?’) into Cypher queries against the code graph. The query results are then summarized by the LLM into human-readable answers.
How do I get started with Code-Graph? Clone the repo at github.com/FalkorDB/code-graph, run npm install, start FalkorDB with docker, set your OPENAI_API_KEY, and run npm run dev. Open http://localhost:3000/ and enter a GitHub URL to analyze any public repository.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!