Merge pull request #20 from Jakedismo/feature/architecture-diagrams

Jakedismo · web-flow · commit eb3e57c6b873 · 2025-10-05T00:47:59.000+03:00
Create interactive and animated architecture diagrams
diff --git a/architecture.html b/architecture.html
@@ -0,0 +1,91 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>CodeGraph Interactive Architecture</title>
+    <link rel="stylesheet" href="style.css">
+</head>
+<body>
+    <h1>CodeGraph System Architecture</h1>
+
+    <div id="diagram-container">
+        <p><strong>Instructions:</strong> To view the interactive diagram, please render the Mermaid diagram from <a href="architecture.md">architecture.md</a> into an SVG file named <code>architecture.svg</code> and place it in the same directory as this HTML file. You can use the <a href="https://mermaid.live" target="_blank">Mermaid Live Editor</a> for this.</p>
+        <object id="architecture-svg" type="image/svg+xml" data="architecture.svg">
+            Your browser does not support SVG.
+        </object>
+    </div>
+
+    <div id="narrative">
+        <h2>2. Narrative on Capabilities and Performance</h2>
+
+        <h3>Overview</h3>
+        <p>CodeGraph is a revolutionary, MCP-based codebase intelligence platform designed to transform any compatible Large Language Model (LLM) into a codebase expert. It achieves this through advanced semantic analysis, powered by the Qwen2.5-Coder-14B-128K model, providing deep insights into any given codebase. The system is built with a local-first philosophy, ensuring privacy and performance.</p>
+
+        <h3>Core Capabilities</h3>
+        <ul>
+            <li><strong>Semantic Intelligence</strong>: At its heart, CodeGraph leverages the Qwen2.5-Coder-14B model with a 128K context window for a complete and nuanced understanding of the codebase.</li>
+            <li><strong>Single-Pass Edge Processing</strong>: A revolutionary unified Abstract Syntax Tree (AST) parsing approach extracts both nodes (code symbols) and edges (relationships) in a single pass, significantly improving processing speed.</li>
+            <li><strong>AI-Enhanced Symbol Resolution</strong>: Achieves an impressive 85-90% success rate in linking code entities by using a multi-tiered approach that culminates in semantic similarity matching for otherwise unresolvable symbols.</li>
+            <li><strong>Conversational AI (RAG)</strong>: The system provides a Retrieval-Augmented Generation (RAG) engine, enabling users to interact with their codebase using natural language. This is exposed through tools like <code>codebase_qa</code> and <code>code_documentation</code>.</li>
+            <li><strong>Intelligent Caching</strong>: A sophisticated caching layer that uses semantic similarity matching to achieve high cache hit rates (50-80%+), dramatically speeding up subsequent queries.</li>
+            <li><strong>Pattern Detection</strong>: An advanced ML pipeline analyzes team conventions and coding patterns, providing insights into codebase health and consistency.</li>
+            <li><strong>MCP Protocol Integration</strong>: CodeGraph is compatible with any MCP-enabled agent, including Claude Code, Codex CLI, and Gemini CLI, allowing for seamless integration into existing developer workflows.</li>
+        </ul>
+
+        <h3>Architecture Deep Dive</h3>
+        <p>The CodeGraph system is a modular, multi-crate Rust workspace, designed for performance, maintainability, and scalability.</p>
+
+        <h4>Component Breakdown:</h4>
+        <ul>
+            <li><span class="component" data-component-id="A">`codegraph-core`</span>: The foundational crate of the entire system. It defines the core data structures, traits, and types that are used across all other components, ensuring a consistent data model. It has no internal dependencies.</li>
+            <li><span class="component" data-component-id="B">`codegraph-parser`</span>: Responsible for parsing source code into ASTs using Tree-sitter. It supports 11 programming languages and is responsible for the initial extraction of semantic nodes and their relationships (edges).</li>
+            <li><span class="component" data-component-id="C">`codegraph-graph`</span>: This component manages the storage and retrieval of the code graph data (nodes and edges) using RocksDB, a high-performance embedded key-value store. It provides the backbone for dependency analysis and architectural exploration.</li>
+            <li><span class="component" data-component-id="D">`codegraph-vector`</span>: Handles the creation of vector embeddings from code snippets and provides fast similarity search capabilities using FAISS. It supports multiple embedding providers, including local ONNX models and Ollama.</li>
+            <li><span class="component" data-component-id="E">`codegraph-ai`</span>: The intelligence layer of the system. It integrates with the Qwen model and uses the data from the graph and vector stores to provide advanced features like AI-powered symbol resolution, impact analysis, and semantic search.</li>
+            <li><span class="component" data-component-id="F">`codegraph-mcp`</span>: The main entry point for the command-line interface (CLI) and the primary MCP server. It orchestrates the other components to deliver the full suite of CodeGraph tools and functionalities.</li>
+            <li><span class="component" data-component-id="G">`codegraph-api`</span>: Provides a REST and GraphQL API server (using Axum) for programmatic access to CodeGraph's capabilities, allowing for integration with external tools and services.</li>
+            <li><span class="component" data-component-id="H">`core-rag-mcp-server`</span>: A dedicated, production-ready MCP server that exposes the RAG (Retrieval-Augmented Generation) functionality, enabling conversational AI features.</li>
+            <li><span class="component" data-component-id="I">`codegraph-cache`</span>: An AI-powered caching system that intelligently stores and retrieves results from vector operations, significantly improving performance for repeated or similar queries.</li>
+            <li><strong>Utility Crates</strong>:
+                <ul>
+                    <li><span class="component" data-component-id="J">`codegraph-concurrent`</span>: Provides concurrent data structures and utilities for parallel processing.</li>
+                    <li><span class="component" data-component-id="K">`codegraph-git`</span>: Integrates with Git repositories to enable features like incremental indexing based on file changes.</li>
+                    <li><span class="component" data-component-id="L">`codegraph-queue`</span>: A priority queue system for managing tasks and operations.</li>
+                    <li><span class="component" data-component-id="M">`codegraph-lb`</span>: An intelligent load balancer for distributing requests and managing resources.</li>
+                    <li><span class="component" data-component-id="N">`codegraph-zerocopy`</span>: Implements zero-copy data structures and serialization for highly efficient data handling.</li>
+                </ul>
+            </li>
+        </ul>
+
+        <h4>Data Flow (Indexing):</h4>
+        <ol>
+            <li>The <code>codegraph index</code> command is initiated via the <code>codegraph-mcp</code> CLI.</li>
+            <li><code>codegraph-parser</code> recursively scans the target directory, parsing files for supported languages into ASTs.</li>
+            <li>In a single pass, it extracts semantic nodes (functions, classes, etc.) and edges (calls, imports).</li>
+            <li>The extracted nodes and edges are sent to <code>codegraph-graph</code>, which stores them in a RocksDB database.</li>
+            <li>The semantic nodes are also passed to <code>codegraph-vector</code>, which generates 384-dimensional vector embeddings using the configured provider (ONNX or Ollama).</li>
+            <li>These embeddings are stored in a FAISS index for fast similarity search.</li>
+        </ol>
+
+        <h3>Performance Analysis</h3>
+        <p>CodeGraph is engineered for high performance, especially on modern, high-memory systems.</p>
+        <ul>
+            <li><strong>Indexing Speed</strong>: The system can parse and index code at a remarkable speed. For instance, it can process over 170,000 lines of code in just under half a second. The single-pass extraction process contributes a 50% speed improvement over traditional two-phase methods.</li>
+            <li><strong>Embedding Performance</strong>: The choice of embedding provider offers a trade-off between speed and quality.
+                <ul>
+                    <li><strong>ONNX (`all-MiniLM-L6-v2`)</strong>: Offers blazing-fast embedding generation, capable of indexing a 2.5 million line codebase in about 32 minutes. This is ideal for large codebases and rapid, iterative development.</li>
+                    <li><strong>Ollama (`nomic-embed-code`)</strong>: Provides state-of-the-art, code-specialized embeddings for maximum retrieval accuracy, though at a slower pace.</li>
+                </ul>
+            </li>
+            <li><strong>High-Memory Optimization</strong>: The system automatically detects the available system memory and adjusts its performance parameters accordingly. On a 128GB M4 Max system, it can increase the number of workers to 16 and the batch size to 20,480, enabling ultra-high performance indexing.</li>
+            <li><strong>Query Latency</strong>: Vector searches with FAISS are typically completed in sub-second time, and the intelligent caching layer further reduces latency for repeated queries to milliseconds.</li>
+        </ul>
+
+        <h3>Conclusion</h3>
+        <p>CodeGraph's architecture is a well-designed, modular system that effectively combines modern AI capabilities with high-performance engineering. Its local-first approach, coupled with its powerful semantic analysis and conversational AI features, makes it a revolutionary tool for developers seeking to gain a deeper understanding of their codebases. The system is not only powerful but also highly configurable, allowing users to balance performance and accuracy to suit their specific needs.</p>
+    </div>
+
+    <script src="interactive.js"></script>
+</body>
+</html>
diff --git a/architecture.md b/architecture.md
@@ -0,0 +1,59 @@
+# CodeGraph System Architecture
+
+This document provides a detailed overview of the CodeGraph system architecture, including a component-level dependency diagram.
+
+## 1. Interactive Architecture Diagram
+
+An interactive and animated version of the architecture diagram is available in [`architecture.html`](architecture.html).
+
+To view it, open the HTML file in your browser. For the interactivity to work, you will first need to generate the `architecture.svg` file by rendering the Mermaid diagram below using the [Mermaid Live Editor](https://mermaid.live).
+
+## 2. Component-Level Dependency Architecture
+
+The following diagram illustrates the dependencies between the various crates (components) in the CodeGraph system.
+
+```mermaid
+graph TD
+    subgraph "Core"
+        A[codegraph-core]
+    end
+
+    subgraph "Data Processing"
+        B[codegraph-parser] --> A
+        C[codegraph-graph] --> A
+        D[codegraph-vector] --> A
+        E[codegraph-ai] --> A
+        E --> C
+        E --> D
+    end
+
+    subgraph "Application Logic"
+        F[codegraph-mcp] --> A
+        F --> B
+        F --> C
+        F --> D
+        F --> E
+        G[codegraph-api] --> A
+        G --> B
+        G --> C
+        G --> D
+        H[core-rag-mcp-server] --> A
+        H --> B
+        H --> D
+        H --> I[codegraph-cache]
+    end
+
+    subgraph "Utilities"
+        I[codegraph-cache] --> A
+        J[codegraph-concurrent] --> A
+        K[codegraph-git] --> A
+        L[codegraph-queue] --> A
+        L --> J
+        M[codegraph-lb]
+        N[codegraph-zerocopy] --> A
+    end
+
+    style F fill:#f9f,stroke:#333,stroke-width:2px
+    style G fill:#f9f,stroke:#333,stroke-width:2px
+    style H fill:#f9f,stroke:#333,stroke-width:2px
+```
diff --git a/interactive.js b/interactive.js
@@ -0,0 +1,34 @@
+document.addEventListener('DOMContentLoaded', () => {
+    const svgObject = document.getElementById('architecture-svg');
+
+    svgObject.addEventListener('load', () => {
+        const svgDoc = svgObject.contentDocument;
+        if (!svgDoc) {
+            console.error("Could not access SVG document. Make sure it's from the same origin.");
+            return;
+        }
+
+        const components = document.querySelectorAll('.component');
+
+        components.forEach(component => {
+            const componentId = component.dataset.componentId;
+            // Mermaid.js generates IDs like `A-codegraph-core`
+            const node = svgDoc.querySelector(`[id^="${componentId}-"]`);
+
+            if (node) {
+                component.addEventListener('mouseover', () => {
+                    node.classList.add('highlight');
+                });
+
+                component.addEventListener('mouseout', () => {
+                    node.classList.remove('highlight');
+                });
+            }
+        });
+    });
+
+    // Handle case where SVG fails to load
+    svgObject.addEventListener('error', () => {
+        console.error('SVG file could not be loaded. Please ensure architecture.svg exists.');
+    });
+});
diff --git a/style.css b/style.css
@@ -0,0 +1,82 @@
+body {
+    font-family: sans-serif;
+    line-height: 1.6;
+    margin: 20px;
+    background-color: #f4f4f4;
+    color: #333;
+}
+
+h1, h2, h3, h4 {
+    color: #005a9c;
+}
+
+#diagram-container {
+    margin-bottom: 20px;
+    border: 1px solid #ccc;
+    padding: 10px;
+    background-color: #fff;
+    text-align: center;
+}
+
+#narrative {
+    background-color: #fff;
+    padding: 20px;
+    border: 1px solid #ccc;
+}
+
+.component {
+    cursor: pointer;
+    color: #005a9c;
+    font-weight: bold;
+    text-decoration: underline;
+}
+
+.component:hover {
+    color: #003366;
+}
+
+/* Animation for the SVG */
+#architecture-svg .node {
+    opacity: 0;
+    animation: fadeIn 0.5s ease-out forwards;
+}
+
+#architecture-svg .edgePath {
+    opacity: 0;
+    animation: fadeIn 0.8s ease-out forwards;
+}
+
+/* Stagger the animation for each node */
+#architecture-svg .node:nth-child(1) { animation-delay: 0.1s; }
+#architecture-svg .node:nth-child(2) { animation-delay: 0.2s; }
+#architecture-svg .node:nth-child(3) { animation-delay: 0.3s; }
+#architecture-svg .node:nth-child(4) { animation-delay: 0.4s; }
+#architecture-svg .node:nth-child(5) { animation-delay: 0.5s; }
+#architecture-svg .node:nth-child(6) { animation-delay: 0.6s; }
+#architecture-svg .node:nth-child(7) { animation-delay: 0.7s; }
+#architecture-svg .node:nth-child(8) { animation-delay: 0.8s; }
+#architecture-svg .node:nth-child(9) { animation-delay: 0.9s; }
+#architecture-svg .node:nth-child(10) { animation-delay: 1.0s; }
+#architecture-svg .node:nth-child(11) { animation-delay: 1.1s; }
+#architecture-svg .node:nth-child(12) { animation-delay: 1.2s; }
+#architecture-svg .node:nth-child(13) { animation-delay: 1.3s; }
+#architecture-svg .node:nth-child(14) { animation-delay: 1.4s; }
+
+
+@keyframes fadeIn {
+    from {
+        opacity: 0;
+        transform: translateY(20px);
+    }
+    to {
+        opacity: 1;
+        transform: translateY(0);
+    }
+}
+
+/* Highlighting for interactive diagram */
+.highlight rect {
+    fill: #ffcc00 !important;
+    stroke: #ff6600 !important;
+    stroke-width: 4px !important;
+}