Skip to content

Commit eb3e57c

Browse files
authored
Merge pull request #20 from Jakedismo/feature/architecture-diagrams
Create interactive and animated architecture diagrams
2 parents 91f29fd + fd9a8c9 commit eb3e57c

File tree

4 files changed

+266
-0
lines changed

4 files changed

+266
-0
lines changed

architecture.html

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>CodeGraph Interactive Architecture</title>
7+
<link rel="stylesheet" href="style.css">
8+
</head>
9+
<body>
10+
<h1>CodeGraph System Architecture</h1>
11+
12+
<div id="diagram-container">
13+
<p><strong>Instructions:</strong> To view the interactive diagram, please render the Mermaid diagram from <a href="architecture.md">architecture.md</a> into an SVG file named <code>architecture.svg</code> and place it in the same directory as this HTML file. You can use the <a href="https://mermaid.live" target="_blank">Mermaid Live Editor</a> for this.</p>
14+
<object id="architecture-svg" type="image/svg+xml" data="architecture.svg">
15+
Your browser does not support SVG.
16+
</object>
17+
</div>
18+
19+
<div id="narrative">
20+
<h2>2. Narrative on Capabilities and Performance</h2>
21+
22+
<h3>Overview</h3>
23+
<p>CodeGraph is a revolutionary, MCP-based codebase intelligence platform designed to transform any compatible Large Language Model (LLM) into a codebase expert. It achieves this through advanced semantic analysis, powered by the Qwen2.5-Coder-14B-128K model, providing deep insights into any given codebase. The system is built with a local-first philosophy, ensuring privacy and performance.</p>
24+
25+
<h3>Core Capabilities</h3>
26+
<ul>
27+
<li><strong>Semantic Intelligence</strong>: At its heart, CodeGraph leverages the Qwen2.5-Coder-14B model with a 128K context window for a complete and nuanced understanding of the codebase.</li>
28+
<li><strong>Single-Pass Edge Processing</strong>: A revolutionary unified Abstract Syntax Tree (AST) parsing approach extracts both nodes (code symbols) and edges (relationships) in a single pass, significantly improving processing speed.</li>
29+
<li><strong>AI-Enhanced Symbol Resolution</strong>: Achieves an impressive 85-90% success rate in linking code entities by using a multi-tiered approach that culminates in semantic similarity matching for otherwise unresolvable symbols.</li>
30+
<li><strong>Conversational AI (RAG)</strong>: The system provides a Retrieval-Augmented Generation (RAG) engine, enabling users to interact with their codebase using natural language. This is exposed through tools like <code>codebase_qa</code> and <code>code_documentation</code>.</li>
31+
<li><strong>Intelligent Caching</strong>: A sophisticated caching layer that uses semantic similarity matching to achieve high cache hit rates (50-80%+), dramatically speeding up subsequent queries.</li>
32+
<li><strong>Pattern Detection</strong>: An advanced ML pipeline analyzes team conventions and coding patterns, providing insights into codebase health and consistency.</li>
33+
<li><strong>MCP Protocol Integration</strong>: CodeGraph is compatible with any MCP-enabled agent, including Claude Code, Codex CLI, and Gemini CLI, allowing for seamless integration into existing developer workflows.</li>
34+
</ul>
35+
36+
<h3>Architecture Deep Dive</h3>
37+
<p>The CodeGraph system is a modular, multi-crate Rust workspace, designed for performance, maintainability, and scalability.</p>
38+
39+
<h4>Component Breakdown:</h4>
40+
<ul>
41+
<li><span class="component" data-component-id="A">`codegraph-core`</span>: The foundational crate of the entire system. It defines the core data structures, traits, and types that are used across all other components, ensuring a consistent data model. It has no internal dependencies.</li>
42+
<li><span class="component" data-component-id="B">`codegraph-parser`</span>: Responsible for parsing source code into ASTs using Tree-sitter. It supports 11 programming languages and is responsible for the initial extraction of semantic nodes and their relationships (edges).</li>
43+
<li><span class="component" data-component-id="C">`codegraph-graph`</span>: This component manages the storage and retrieval of the code graph data (nodes and edges) using RocksDB, a high-performance embedded key-value store. It provides the backbone for dependency analysis and architectural exploration.</li>
44+
<li><span class="component" data-component-id="D">`codegraph-vector`</span>: Handles the creation of vector embeddings from code snippets and provides fast similarity search capabilities using FAISS. It supports multiple embedding providers, including local ONNX models and Ollama.</li>
45+
<li><span class="component" data-component-id="E">`codegraph-ai`</span>: The intelligence layer of the system. It integrates with the Qwen model and uses the data from the graph and vector stores to provide advanced features like AI-powered symbol resolution, impact analysis, and semantic search.</li>
46+
<li><span class="component" data-component-id="F">`codegraph-mcp`</span>: The main entry point for the command-line interface (CLI) and the primary MCP server. It orchestrates the other components to deliver the full suite of CodeGraph tools and functionalities.</li>
47+
<li><span class="component" data-component-id="G">`codegraph-api`</span>: Provides a REST and GraphQL API server (using Axum) for programmatic access to CodeGraph's capabilities, allowing for integration with external tools and services.</li>
48+
<li><span class="component" data-component-id="H">`core-rag-mcp-server`</span>: A dedicated, production-ready MCP server that exposes the RAG (Retrieval-Augmented Generation) functionality, enabling conversational AI features.</li>
49+
<li><span class="component" data-component-id="I">`codegraph-cache`</span>: An AI-powered caching system that intelligently stores and retrieves results from vector operations, significantly improving performance for repeated or similar queries.</li>
50+
<li><strong>Utility Crates</strong>:
51+
<ul>
52+
<li><span class="component" data-component-id="J">`codegraph-concurrent`</span>: Provides concurrent data structures and utilities for parallel processing.</li>
53+
<li><span class="component" data-component-id="K">`codegraph-git`</span>: Integrates with Git repositories to enable features like incremental indexing based on file changes.</li>
54+
<li><span class="component" data-component-id="L">`codegraph-queue`</span>: A priority queue system for managing tasks and operations.</li>
55+
<li><span class="component" data-component-id="M">`codegraph-lb`</span>: An intelligent load balancer for distributing requests and managing resources.</li>
56+
<li><span class="component" data-component-id="N">`codegraph-zerocopy`</span>: Implements zero-copy data structures and serialization for highly efficient data handling.</li>
57+
</ul>
58+
</li>
59+
</ul>
60+
61+
<h4>Data Flow (Indexing):</h4>
62+
<ol>
63+
<li>The <code>codegraph index</code> command is initiated via the <code>codegraph-mcp</code> CLI.</li>
64+
<li><code>codegraph-parser</code> recursively scans the target directory, parsing files for supported languages into ASTs.</li>
65+
<li>In a single pass, it extracts semantic nodes (functions, classes, etc.) and edges (calls, imports).</li>
66+
<li>The extracted nodes and edges are sent to <code>codegraph-graph</code>, which stores them in a RocksDB database.</li>
67+
<li>The semantic nodes are also passed to <code>codegraph-vector</code>, which generates 384-dimensional vector embeddings using the configured provider (ONNX or Ollama).</li>
68+
<li>These embeddings are stored in a FAISS index for fast similarity search.</li>
69+
</ol>
70+
71+
<h3>Performance Analysis</h3>
72+
<p>CodeGraph is engineered for high performance, especially on modern, high-memory systems.</p>
73+
<ul>
74+
<li><strong>Indexing Speed</strong>: The system can parse and index code at a remarkable speed. For instance, it can process over 170,000 lines of code in just under half a second. The single-pass extraction process contributes a 50% speed improvement over traditional two-phase methods.</li>
75+
<li><strong>Embedding Performance</strong>: The choice of embedding provider offers a trade-off between speed and quality.
76+
<ul>
77+
<li><strong>ONNX (`all-MiniLM-L6-v2`)</strong>: Offers blazing-fast embedding generation, capable of indexing a 2.5 million line codebase in about 32 minutes. This is ideal for large codebases and rapid, iterative development.</li>
78+
<li><strong>Ollama (`nomic-embed-code`)</strong>: Provides state-of-the-art, code-specialized embeddings for maximum retrieval accuracy, though at a slower pace.</li>
79+
</ul>
80+
</li>
81+
<li><strong>High-Memory Optimization</strong>: The system automatically detects the available system memory and adjusts its performance parameters accordingly. On a 128GB M4 Max system, it can increase the number of workers to 16 and the batch size to 20,480, enabling ultra-high performance indexing.</li>
82+
<li><strong>Query Latency</strong>: Vector searches with FAISS are typically completed in sub-second time, and the intelligent caching layer further reduces latency for repeated queries to milliseconds.</li>
83+
</ul>
84+
85+
<h3>Conclusion</h3>
86+
<p>CodeGraph's architecture is a well-designed, modular system that effectively combines modern AI capabilities with high-performance engineering. Its local-first approach, coupled with its powerful semantic analysis and conversational AI features, makes it a revolutionary tool for developers seeking to gain a deeper understanding of their codebases. The system is not only powerful but also highly configurable, allowing users to balance performance and accuracy to suit their specific needs.</p>
87+
</div>
88+
89+
<script src="interactive.js"></script>
90+
</body>
91+
</html>

architecture.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# CodeGraph System Architecture
2+
3+
This document provides a detailed overview of the CodeGraph system architecture, including a component-level dependency diagram.
4+
5+
## 1. Interactive Architecture Diagram
6+
7+
An interactive and animated version of the architecture diagram is available in [`architecture.html`](architecture.html).
8+
9+
To view it, open the HTML file in your browser. For the interactivity to work, you will first need to generate the `architecture.svg` file by rendering the Mermaid diagram below using the [Mermaid Live Editor](https://mermaid.live).
10+
11+
## 2. Component-Level Dependency Architecture
12+
13+
The following diagram illustrates the dependencies between the various crates (components) in the CodeGraph system.
14+
15+
```mermaid
16+
graph TD
17+
subgraph "Core"
18+
A[codegraph-core]
19+
end
20+
21+
subgraph "Data Processing"
22+
B[codegraph-parser] --> A
23+
C[codegraph-graph] --> A
24+
D[codegraph-vector] --> A
25+
E[codegraph-ai] --> A
26+
E --> C
27+
E --> D
28+
end
29+
30+
subgraph "Application Logic"
31+
F[codegraph-mcp] --> A
32+
F --> B
33+
F --> C
34+
F --> D
35+
F --> E
36+
G[codegraph-api] --> A
37+
G --> B
38+
G --> C
39+
G --> D
40+
H[core-rag-mcp-server] --> A
41+
H --> B
42+
H --> D
43+
H --> I[codegraph-cache]
44+
end
45+
46+
subgraph "Utilities"
47+
I[codegraph-cache] --> A
48+
J[codegraph-concurrent] --> A
49+
K[codegraph-git] --> A
50+
L[codegraph-queue] --> A
51+
L --> J
52+
M[codegraph-lb]
53+
N[codegraph-zerocopy] --> A
54+
end
55+
56+
style F fill:#f9f,stroke:#333,stroke-width:2px
57+
style G fill:#f9f,stroke:#333,stroke-width:2px
58+
style H fill:#f9f,stroke:#333,stroke-width:2px
59+
```

interactive.js

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
document.addEventListener('DOMContentLoaded', () => {
2+
const svgObject = document.getElementById('architecture-svg');
3+
4+
svgObject.addEventListener('load', () => {
5+
const svgDoc = svgObject.contentDocument;
6+
if (!svgDoc) {
7+
console.error("Could not access SVG document. Make sure it's from the same origin.");
8+
return;
9+
}
10+
11+
const components = document.querySelectorAll('.component');
12+
13+
components.forEach(component => {
14+
const componentId = component.dataset.componentId;
15+
// Mermaid.js generates IDs like `A-codegraph-core`
16+
const node = svgDoc.querySelector(`[id^="${componentId}-"]`);
17+
18+
if (node) {
19+
component.addEventListener('mouseover', () => {
20+
node.classList.add('highlight');
21+
});
22+
23+
component.addEventListener('mouseout', () => {
24+
node.classList.remove('highlight');
25+
});
26+
}
27+
});
28+
});
29+
30+
// Handle case where SVG fails to load
31+
svgObject.addEventListener('error', () => {
32+
console.error('SVG file could not be loaded. Please ensure architecture.svg exists.');
33+
});
34+
});

style.css

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
body {
2+
font-family: sans-serif;
3+
line-height: 1.6;
4+
margin: 20px;
5+
background-color: #f4f4f4;
6+
color: #333;
7+
}
8+
9+
h1, h2, h3, h4 {
10+
color: #005a9c;
11+
}
12+
13+
#diagram-container {
14+
margin-bottom: 20px;
15+
border: 1px solid #ccc;
16+
padding: 10px;
17+
background-color: #fff;
18+
text-align: center;
19+
}
20+
21+
#narrative {
22+
background-color: #fff;
23+
padding: 20px;
24+
border: 1px solid #ccc;
25+
}
26+
27+
.component {
28+
cursor: pointer;
29+
color: #005a9c;
30+
font-weight: bold;
31+
text-decoration: underline;
32+
}
33+
34+
.component:hover {
35+
color: #003366;
36+
}
37+
38+
/* Animation for the SVG */
39+
#architecture-svg .node {
40+
opacity: 0;
41+
animation: fadeIn 0.5s ease-out forwards;
42+
}
43+
44+
#architecture-svg .edgePath {
45+
opacity: 0;
46+
animation: fadeIn 0.8s ease-out forwards;
47+
}
48+
49+
/* Stagger the animation for each node */
50+
#architecture-svg .node:nth-child(1) { animation-delay: 0.1s; }
51+
#architecture-svg .node:nth-child(2) { animation-delay: 0.2s; }
52+
#architecture-svg .node:nth-child(3) { animation-delay: 0.3s; }
53+
#architecture-svg .node:nth-child(4) { animation-delay: 0.4s; }
54+
#architecture-svg .node:nth-child(5) { animation-delay: 0.5s; }
55+
#architecture-svg .node:nth-child(6) { animation-delay: 0.6s; }
56+
#architecture-svg .node:nth-child(7) { animation-delay: 0.7s; }
57+
#architecture-svg .node:nth-child(8) { animation-delay: 0.8s; }
58+
#architecture-svg .node:nth-child(9) { animation-delay: 0.9s; }
59+
#architecture-svg .node:nth-child(10) { animation-delay: 1.0s; }
60+
#architecture-svg .node:nth-child(11) { animation-delay: 1.1s; }
61+
#architecture-svg .node:nth-child(12) { animation-delay: 1.2s; }
62+
#architecture-svg .node:nth-child(13) { animation-delay: 1.3s; }
63+
#architecture-svg .node:nth-child(14) { animation-delay: 1.4s; }
64+
65+
66+
@keyframes fadeIn {
67+
from {
68+
opacity: 0;
69+
transform: translateY(20px);
70+
}
71+
to {
72+
opacity: 1;
73+
transform: translateY(0);
74+
}
75+
}
76+
77+
/* Highlighting for interactive diagram */
78+
.highlight rect {
79+
fill: #ffcc00 !important;
80+
stroke: #ff6600 !important;
81+
stroke-width: 4px !important;
82+
}

0 commit comments

Comments
 (0)