Developer Tools

Code-Graph: Open-Source Tool for Analyzing Source Code as Queryable Knowledge Graphs

Code-Graph by FalkorDB analyzes source code repositories and creates queryable knowledge graphs for Python, Java, and C# using FalkorDB and LLMs.

Keeping this site alive takes effort — your support means everything.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分! 無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!
Code-Graph: Open-Source Tool for Analyzing Source Code as Queryable Knowledge Graphs

Understanding unfamiliar codebases is one of the hardest challenges in software development. Code-Graph by FalkorDB tackles this problem in a novel way: by transforming source code repositories into fully queryable knowledge graphs that you can interrogate in natural language.

Instead of reading through files linearly or relying on code search tools that treat code as flat text, Code-Graph analyzes your codebase at the Abstract Syntax Tree (AST) level, extracting every significant entity – classes, functions, methods, modules, arguments, variables – and mapping their relationships into a property graph stored in FalkorDB. The result is a structured, navigable representation of your entire codebase that supports natural language queries like “Show me all classes that depend on the DatabaseConnection class” or “Find unused utility functions in the auth module.”

What Is Code-Graph?

Code-Graph is an open-source developer tool that converts source code repositories into queryable knowledge graphs. It was created by the FalkorDB team to demonstrate the power of graph databases in code analysis scenarios, and it has grown into a standalone tool with active development and a growing feature set.

The tool operates in two phases. In the analysis phase, it downloads a repository, parses every source file with language-specific AST parsers, and builds a graph in FalkorDB where:

  • Nodes represent code entities (classes, functions, modules, interfaces, variables)
  • Edges represent relationships (CALLS, INHERITS_FROM, CONTAINS, HAS_ARGUMENT, DEPENDS_ON, IMPLEMENTS)

In the query phase, users can ask questions in plain English. An LLM (configurable between models like GPT-4o and Llama 3-70B) translates the question into an OpenCypher query, executes it against the FalkorDB graph, and summarizes the results into a natural language answer.

Which Programming Languages Are Supported?

Code-Graph currently supports three programming languages, with more on the roadmap:

LanguageStatusAST Parser Used
PythonSupportedBuilt-in Python AST module
JavaSupportedJavaParser
C#SupportedRoslyn (Microsoft.CodeAnalysis)
C/C++PlannedTBD
JavaScript/TypeScriptPlannedTBD
GoPlannedTBD

Each language parser extracts language-specific entities and relationships. For example, Java parsing captures interfaces, abstract classes, annotations, and generic type parameters, while Python parsing captures duck-typing patterns, decorators, and module-level functions.

How Does Code-Graph Work in Detail?

The analysis pipeline consists of several stages:

StageInputOutputDescription
Repository CloneGitHub URLLocal git cloneDownloads the target repository
AST ParsingSource filesEntity AST nodesLanguage-specific parsers break code into structural elements
Entity ExtractionAST nodesGraph nodesClasses, functions, modules, arguments become FalkorDB nodes
Relationship MappingAST structureGraph edgesCALLS, INHERITS_FROM, CONTAINS, DEPENDS_ON become edges
Graph StorageNodes + EdgesFalkorDB property graphAll data is persisted in FalkorDB with properties
Query TranslationNatural languageOpenCypher queryLLM converts English questions to graph queries
Result RenderingQuery resultsNatural language answerLLM summarizes returned graph data

The entity extraction is particularly thorough. For each class, Code-Graph records its methods, fields, base classes, implemented interfaces, decorators (Python) or annotations (Java), and file location. For each function, it records parameters, return type, called functions, and access modifiers.

How Does LLM Integration Work?

The LLM integration is the key to Code-Graph’s usability. Instead of requiring developers to learn Cypher syntax, the tool accepts natural language queries and uses an LLM to translate them:

Example workflow:

  1. Developer asks: “Which classes in the data access layer use the connection pool?”
  2. LLM generates: MATCH (c:Class)-[:CONTAINS]->(m:Method)-[:CALLS]->(f:Function) WHERE f.name CONTAINS 'getConnection' RETURN c.name, m.name
  3. FalkorDB executes the query, returning matching classes and methods
  4. LLM summarizes the results in plain English

This approach makes codebase exploration accessible to junior developers, onboarding engineers, and anyone who needs to understand a codebase without memorizing its file structure. It also supports more advanced use cases like impact analysis (“If I change the DatabaseConnection class, which 15 other classes will be affected?”) and dead code detection (“List all public methods that are never called from anywhere in the codebase”).

Getting Started with Code-Graph

git clone https://github.com/FalkorDB/code-graph.git
cd code-graph
npm install
docker run -p 6379:6379 -it --rm falkordb/falkordb
export OPENAI_API_KEY=YOUR_OPENAI_API_KEY
npm run dev

Open http://localhost:3000/ in your browser, enter any public GitHub repository URL, and Code-Graph will analyze it and present an interactive query interface. A live demo is available at code-graph.falkordb.com.

FAQ

What is Code-Graph? Code-Graph is an open-source tool by FalkorDB that analyzes source code repositories and creates queryable knowledge graphs. It uses static AST analysis to extract code entities and their relationships, stores them in FalkorDB, and enables natural language querying via LLMs.

Which programming languages are supported? Code-Graph currently supports Python, Java, and C#. Support for C, JavaScript, and Go is planned for future releases.

How does Code-Graph work? It parses source files using AST (Abstract Syntax Tree) parsers to extract entities like classes, functions, modules, and variables. These become nodes in a FalkorDB graph, with edges representing relationships like CALLS, INHERITS_FROM, CONTAINS, and DEPENDS_ON.

How does LLM integration work? Code-Graph uses a GraphRAG pipeline that translates natural language questions (e.g., ‘Which classes implement the Strategy pattern?’) into Cypher queries against the code graph. The query results are then summarized by the LLM into human-readable answers.

How do I get started with Code-Graph? Clone the repo at github.com/FalkorDB/code-graph, run npm install, start FalkorDB with docker, set your OPENAI_API_KEY, and run npm run dev. Open http://localhost:3000/ and enter a GitHub URL to analyze any public repository.

Further Reading

TAG
CATEGORIES