Skip to content

RishvinReddy/HandMatrix

Repository files navigation

✋ HandMatrix Neural Engine

AI-Powered Gesture Control System · Real-Time Computer Vision · Multi-Modal Human-Computer Interaction

Touchless Control. Real-Time Intelligence. Customizable Interaction.


📋 Table of Contents

Section Description
🚀 Overview Project introduction and vision
🎯 Problem Statement What we solve and why
🧠 System Architecture Full system design & data flow
⚙️ Tech Stack Technologies and libraries used
✋ Gesture Library Complete gesture-to-action mapping
🔁 Data Flow Pipeline Frame-to-action signal chain
📦 Module Breakdown Component responsibilities
📁 Project Structure Directory layout
⚡ Installation Setup and run locally
🧪 Modes & Profiles Control modes overview
📊 Performance Metrics Benchmarks and targets
⚠️ Challenges & Solutions Known issues and mitigations
🔮 Roadmap Future development plan
🤝 Contributing How to contribute

🚀 Overview

HandMatrix Neural Engine is a production-grade, AI-powered multi-modal gesture control system that enables users to interact with computers and digital environments entirely through natural movement — no physical input devices required.

It combines Google MediaPipe's landmark detection, Gemini AI's reasoning layer, and a React + TypeScript frontend dashboard to deliver a fully customizable, real-time touchless interaction engine.

What makes it different?

Traditional Input    →    Static, physical, limited, inaccessible
HandMatrix           →    Dynamic, AI-driven, touchless, fully customizable
Capability Description
Single-Hand Control Cursor movement, clicking, scrolling via index finger and pinch
🤚 Two-Hand Gestures Zoom, pan, volume, brightness via relative hand distance/angle
👤 Face-Based Control Head tilt scroll, blink click, nod shortcuts
🧠 AI Gesture Learning Gemini AI analyzes gesture patterns and adapts mappings
🎛️ Custom Mappings JSON-configurable gesture → OS action bindings
📊 Live Dashboard Real-time React UI for landmark visualization and control

🎯 Problem Statement

The Accessibility Gap in Human-Computer Interaction

mindmap
  root((HCI Problem))
    Physical Barriers
      Motor disabilities
      Limited dexterity
      Post-injury recovery
    Context Limitations
      Sterile environments
      Hands-free scenarios
      Industrial operation
    Immersion Deficits
      Gaming latency
      Non-intuitive controls
      No spatial awareness
    Technology Gaps
      No AI adaptation
      No personalization
      Static keybindings
Loading

Traditional input devices (mouse, keyboard, touchpad) suffer from:

Problem Impact HandMatrix Solution
❌ No touchless input Inaccessible to motor-impaired users ✅ Full gesture-based OS control
❌ Static bindings Can't adapt to user behavior ✅ AI-powered dynamic remapping
❌ Single modality One type of input only ✅ Hand + Face + Voice hybrid
❌ No spatial awareness 2D only, no depth ✅ 3D landmark tracking (x, y, z)
❌ Not immersive Breaks gaming flow ✅ Gaming mode with spatial controls
❌ Device dependency Breaks in hardware failure ✅ Camera-only input fallback

🧠 System Architecture

High-Level System Design

flowchart TD
    subgraph INPUT["📡 INPUT LAYER"]
        CAM["🎥 Webcam\n(60fps Stream)"]
        MIC["🎙️ Microphone\n(Voice Input)"]
    end

    subgraph VISION["🧠 VISION PROCESSING LAYER"]
        MP_HANDS["MediaPipe Hands\n21 Landmarks"]
        MP_FACE["MediaPipe FaceMesh\n468 Landmarks"]
        MP_POSE["MediaPipe Pose\n33 Landmarks"]
        FRAME["Frame Preprocessor\n(Canvas API)"]
    end

    subgraph ENGINE["⚙️ GESTURE ENGINE"]
        GR["Gesture Recognizer\n(Pattern Matching)"]
        AI["Gemini AI Reasoner\n(Context Aware)"]
        FILTER["Kalman Filter\n(Noise Reduction)"]
        BUFFER["Temporal Buffer\n(30-frame window)"]
    end

    subgraph CUSTOM["🎛️ CUSTOMIZATION LAYER"]
        CONFIG["JSON Config Engine"]
        PROFILES["User Profile Manager"]
        MAPPER["Action Mapper"]
    end

    subgraph OUTPUT["💻 OUTPUT LAYER"]
        CURSOR["Cursor Control\n(Mouse API)"]
        KEYBOARD["Keyboard Events\n(Key Simulation)"]
        VOLUME["System Volume\n(OS Control)"]
        SCROLL["Scroll Engine"]
        SHORTCUTS["Custom Shortcuts"]
    end

    subgraph DASHBOARD["📊 REACT DASHBOARD"]
        VIZ["Landmark Visualizer"]
        STATS["Real-time Stats"]
        LOG["Action Log"]
        SETTINGS["Settings Panel"]
    end

    CAM --> FRAME
    MIC --> AI
    FRAME --> MP_HANDS
    FRAME --> MP_FACE
    FRAME --> MP_POSE

    MP_HANDS --> GR
    MP_FACE --> GR
    MP_POSE --> GR

    GR --> FILTER
    FILTER --> BUFFER
    BUFFER --> AI
    AI --> MAPPER

    CONFIG --> MAPPER
    PROFILES --> MAPPER

    MAPPER --> CURSOR
    MAPPER --> KEYBOARD
    MAPPER --> VOLUME
    MAPPER --> SCROLL
    MAPPER --> SHORTCUTS

    MAPPER --> VIZ
    MAPPER --> STATS
    MAPPER --> LOG
    SETTINGS --> CONFIG
Loading

Component Interaction Diagram

C4Context
    title HandMatrix — Component Interaction Overview

    Person(user, "User", "Moves hands/face in front of camera")
    
    System_Boundary(handmatrix, "HandMatrix Neural Engine") {
        Component(webcam, "Webcam Module", "Browser Media API", "Captures real-time video stream")
        Component(mediapipe, "MediaPipe Engine", "WASM + TFLite", "Detects 21+468+33 landmarks")
        Component(gesture, "Gesture Classifier", "Custom Algorithm", "Interprets landmark patterns")
        Component(ai, "Gemini AI Layer", "Google GenAI SDK", "Contextual reasoning & adaptation")
        Component(mapper, "Action Mapper", "TypeScript", "Maps gestures to OS actions")
        Component(dashboard, "React Dashboard", "React 19 + Vite", "Real-time UI visualization")
    }

    System_Ext(os, "Operating System", "macOS/Windows/Linux")
    System_Ext(gemini, "Gemini API", "Google Cloud AI")

    Rel(user, webcam, "Performs gestures")
    Rel(webcam, mediapipe, "Raw frames")
    Rel(mediapipe, gesture, "Landmark data")
    Rel(gesture, ai, "Pattern context")
    Rel(ai, gemini, "API calls")
    Rel(ai, mapper, "Classified gesture")
    Rel(mapper, os, "System events")
    Rel(mapper, dashboard, "Live data stream")
    Rel(dashboard, user, "Visual feedback")
Loading

⚙️ Tech Stack

Full-Stack Technology Overview

graph LR
    subgraph FE["🖥️ Frontend"]
        R["React 19"]
        TS["TypeScript 5.8"]
        TW["Tailwind CSS 4"]
        VT["Vite 6.2"]
        LR["Lucide React Icons"]
        MO["Motion (Framer)"]
    end

    subgraph CV["👁️ Computer Vision"]
        MH["MediaPipe Hands\n(21 landmarks)"]
        MF["MediaPipe FaceMesh\n(468 landmarks)"]
        MC["MediaPipe Camera Utils"]
        MD["MediaPipe Drawing Utils"]
        MTV["MediaPipe Tasks Vision"]
    end

    subgraph AI["🤖 AI Layer"]
        GA["@google/genai\nGemini 1.5 Flash"]
    end

    subgraph SYS["⚙️ System Layer"]
        PY["Python Backend (optional)"]
        PAG["PyAutoGUI"]
        PN["pynput"]
        EX["Express.js API"]
        DOT["dotenv"]
    end

    subgraph TOOLS["🛠️ Dev Tools"]
        TSX["tsx (TS runner)"]
        ESL["ESLint"]
        SHD["Shadcn UI"]
        CVA["class-variance-authority"]
    end
Loading

Dependency Table

Category Package Version Purpose
Core react ^19.0.0 UI framework
Core react-dom ^19.0.0 DOM rendering
Core typescript ~5.8.2 Type safety
Build vite ^6.2.0 Dev server + bundler
CV @mediapipe/hands ^0.4.1675469240 Hand landmark detection
CV @mediapipe/face_mesh ^0.4.1633559619 Face landmark detection
CV @mediapipe/tasks-vision ^0.10.34 Unified vision tasks
CV @mediapipe/camera_utils ^0.3.1675466862 Camera stream control
CV @mediapipe/drawing_utils ^0.3.1675466124 Canvas rendering
AI @google/genai ^1.29.0 Gemini AI integration
UI tailwindcss ^4.1.14 Utility-first styling
UI lucide-react ^0.546.0 Icon library
UI motion ^12.23.24 Animation engine
UI shadcn ^4.2.0 Component library
API express ^4.21.2 Backend REST server
Util clsx ^2.1.1 Conditional classnames
Util dotenv ^17.2.3 Environment variables

✋ Gesture Library

Complete Gesture-to-Action Mapping

flowchart LR
    subgraph HAND_SINGLE["✋ Single Hand Gestures"]
        G1["☝️ Index Up\n→ Move Cursor"]
        G2["🤏 Pinch\n→ Left Click"]
        G3["✌️ Two Fingers\n→ Scroll"]
        G4["✋ Open Palm\n→ Pause/Stop"]
        G5["👊 Fist\n→ Drag"]
        G6["🤟 Three Fingers\n→ Right Click"]
        G7["🖐️ All Fingers\n→ Screenshot"]
    end

    subgraph HAND_DUAL["🤚 Two-Hand Gestures"]
        G8["↔️ Spread Apart\n→ Zoom In"]
        G9["🔁 Both Pinch\n→ Zoom Out"]
        G10["↕️ Vertical Spread\n→ Volume Up/Down"]
        G11["🔄 Rotate Hands\n→ Rotate Screen"]
        G12["👐 Both Open\n→ Fullscreen"]
    end

    subgraph FACE["👤 Face Gestures"]
        G13["↕️ Head Tilt\n→ Scroll Page"]
        G14["👁️ Single Blink\n→ Left Click"]
        G15["👀 Double Blink\n→ Right Click"]
        G16["😮 Mouth Open\n→ Play/Pause"]
        G17["↔️ Head Turn\n→ Next/Prev Tab"]
    end
Loading

Landmark Reference Map

graph TD
    subgraph HAND_LANDMARKS["Hand — 21 Landmark Points"]
        WRIST["0: Wrist"]
        THUMB["1-4: Thumb MCP→Tip"]
        INDEX["5-8: Index MCP→Tip"]
        MIDDLE["9-12: Middle MCP→Tip"]
        RING["13-16: Ring MCP→Tip"]
        PINKY["17-20: Pinky MCP→Tip"]
        WRIST --> THUMB
        WRIST --> INDEX
        WRIST --> MIDDLE
        WRIST --> RING
        WRIST --> PINKY
    end
Loading

Detection Logic Table

Gesture Landmarks Used Condition Confidence Threshold
Index Pointing L5–L8 Only index finger extended > 0.85
Pinch L4 + L8 Distance thumb-tip to index-tip < 30px > 0.90
Two Finger Scroll L8 + L12 Index + middle extended, others closed > 0.80
Open Palm L4,8,12,16,20 All fingertips above MCP nodes > 0.75
Fist L4,8,12,16,20 All fingertips below MCP nodes > 0.80
Zoom In/Out Both L8s Inter-hand distance delta > 0.70
Head Tilt Face L10,152 Roll angle > ±15° > 0.85
Blink Eye L159,145 Eye aspect ratio < 0.25 > 0.90
Mouth Open Face L13,14 Mouth aspect ratio > 0.50 > 0.80

🔁 Data Flow Pipeline

Frame-to-Action Signal Chain

sequenceDiagram
    autonumber
    participant CAM as 🎥 Camera
    participant CANVAS as 🖼️ Canvas API
    participant MP as 🧠 MediaPipe
    participant FILTER as 📐 Kalman Filter
    participant BUFFER as 💾 Temporal Buffer
    participant GE as ⚙️ Gesture Engine
    participant AI as 🤖 Gemini AI
    participant MAPPER as 🗺️ Action Mapper
    participant OS as 💻 OS / Browser
    participant UI as 📊 React Dashboard

    CAM->>CANVAS: Raw video frame (60fps)
    CANVAS->>MP: Preprocessed image bitmap
    MP->>MP: Run TFLite inference (Hands + Face)
    MP-->>FILTER: 21+468 raw landmark coordinates (x,y,z)
    FILTER->>FILTER: Smooth noise with Kalman equations
    FILTER->>BUFFER: Stabilized landmark positions
    BUFFER->>GE: 30-frame window of landmarks
    GE->>GE: Pattern match against gesture templates
    GE->>AI: Ambiguous gesture context (optional)
    AI->>AI: Gemini classifies intent from context
    AI-->>MAPPER: Resolved gesture label + confidence
    MAPPER->>MAPPER: Lookup JSON binding config
    MAPPER->>OS: Dispatch mouse/keyboard/system event
    MAPPER->>UI: Push landmark + action data (WebSocket)
    UI-->>CAM: User sees feedback overlay
Loading

Latency Budget

gantt
    title Frame Processing Latency Budget (Target: <50ms)
    dateFormat  X
    axisFormat  %Lms

    section Camera Capture
    Frame Acquisition       :0, 5

    section Vision Processing
    Canvas Preprocessing    :5, 8
    MediaPipe Inference     :8, 28

    section Gesture Engine
    Kalman Filtering        :28, 32
    Pattern Matching        :32, 38
    AI Reasoning (cached)   :38, 42

    section Output
    Action Dispatch         :42, 45
    UI Update               :45, 50
Loading

📦 Module Breakdown

Responsibility Matrix

graph TB
    subgraph CORE["Core Modules"]
        HT["HandTracker\n• Initializes MediaPipe Hands\n• Manages landmark stream\n• Handles multi-hand detection"]
        FT["FaceTracker\n• FaceMesh initialization\n• Eye/mouth ratio calc\n• Head pose estimation"]
        GE["GestureEngine\n• Pattern recognition\n• Temporal smoothing\n• Confidence scoring"]
    end

    subgraph PROCESSING["Processing Modules"]
        KF["KalmanFilter\n• Noise reduction\n• Position smoothing\n• Velocity estimation"]
        TB["TemporalBuffer\n• 30-frame sliding window\n• Gesture onset detection\n• Hold duration tracking"]
        GC["GestureClassifier\n• Template matching\n• Threshold comparison\n• Multi-label output"]
    end

    subgraph INTEGRATION["Integration Modules"]
        AM["ActionMapper\n• Reads JSON bindings\n• Maps gesture → event\n• Debounce management"]
        AI["GeminiAdapter\n• Ambiguity resolution\n• Context reasoning\n• Adaptive learning"]
        WS["WebSocket Bridge\n• React ↔ Engine comm\n• Event streaming\n• State sync"]
    end

    subgraph UI_MODS["UI Modules"]
        LD["LandmarkDrawer\n• Canvas overlay\n• Skeleton rendering\n• Debug visualization"]
        DB["Dashboard\n• Live stats\n• Mode switcher\n• Profile editor"]
        LOG["ActionLogger\n• Event timeline\n• Confidence log\n• Export to JSON"]
    end

    HT --> GE
    FT --> GE
    GE --> KF
    KF --> TB
    TB --> GC
    GC --> AM
    GC --> AI
    AI --> AM
    AM --> WS
    WS --> DB
    WS --> LD
    WS --> LOG
Loading

📁 Project Structure

handmatrix-neural-engine/
│
├── 📄 index.html                    # App entry point
├── 📄 package.json                  # Dependencies & scripts
├── 📄 vite.config.ts                # Vite + Tailwind config
├── 📄 tsconfig.json                 # TypeScript config
├── 📄 components.json               # Shadcn component registry
├── 📄 metadata.json                 # Project metadata
├── 📄 .env.example                  # Environment variable template
├── 📄 .gitignore
│
├── 📁 src/
│   ├── 📄 main.tsx                  # React app bootstrap
│   ├── 📄 App.tsx                   # Root component (41KB — core engine)
│   ├── 📄 index.css                 # Global styles + Tailwind layers
│   │
│   ├── 📁 components/               # React UI Components
│   │   ├── 📄 LandmarkOverlay.tsx   # Canvas-based landmark renderer
│   │   ├── 📄 GestureLog.tsx        # Real-time action event log
│   │   ├── 📄 ModeSelector.tsx      # Cursor/Gaming/Media mode UI
│   │   ├── 📄 SettingsPanel.tsx     # Gesture mapping configurator
│   │   ├── 📄 StatsDashboard.tsx    # Performance metrics display
│   │   └── 📄 ProfileManager.tsx    # User profile CRUD
│   │
│   ├── 📁 lib/                      # Core engine library
│   │   ├── 📄 gesture-engine.ts     # Pattern matching core
│   │   ├── 📄 kalman-filter.ts      # Noise smoothing algorithm
│   │   ├── 📄 action-mapper.ts      # Gesture → OS action dispatch
│   │   ├── 📄 gemini-adapter.ts     # Gemini AI integration layer
│   │   └── 📄 utils.ts              # Shared utilities
│   │
├── 📁 components/                   # Shadcn UI components
│   └── 📄 ui/                       # Button, Card, Dialog, etc.
│
└── 📁 lib/                          # Shared non-src libraries

⚡ Installation

Prerequisites

Requirement Minimum Version Recommended
Node.js 18.0 20+ LTS
npm 9.0 10+
Browser Chrome 90+ Chrome 120+
Camera 720p 1080p 60fps
CPU 4 cores 8 cores
RAM 4GB 8GB+

Step-by-Step Setup

# 1. Clone the repository
git clone https://github.com/rishvinreddy/handmatrix-neural-engine.git
cd handmatrix-neural-engine

# 2. Install dependencies
npm install

# 3. Set up environment variables
cp .env.example .env
# Edit .env and add your Gemini API key:
# VITE_GEMINI_API_KEY=your_gemini_api_key_here

# 4. Start the development server
npm run dev
# → Runs at http://localhost:3000

Environment Variables

# .env.example
VITE_GEMINI_API_KEY=          # Required: Google Gemini AI API key
VITE_MODEL_NAME=gemini-1.5-flash   # AI model to use
VITE_DETECTION_CONFIDENCE=0.8      # MediaPipe detection threshold
VITE_TRACKING_CONFIDENCE=0.7       # MediaPipe tracking threshold
VITE_MAX_HANDS=2                   # Max simultaneous hands tracked
VITE_CAMERA_FPS=60                 # Target camera frame rate

Deployment Flow

flowchart LR
    DEV["👨‍💻 Development\nnpm run dev\nlocalhost:3000"] 
    --> BUILD["📦 Production Build\nnpm run build\ndist/ folder"]
    --> PREVIEW["🔍 Preview\nnpm run preview"]
    --> DEPLOY["🚀 Deploy\nGitHub Pages / Vercel / Netlify"]

    style DEV fill:#1e293b,color:#60a5fa
    style BUILD fill:#1e293b,color:#a78bfa
    style PREVIEW fill:#1e293b,color:#34d399
    style DEPLOY fill:#1e293b,color:#fb923c
Loading

🧪 Modes & Profiles

Control Mode State Machine

stateDiagram-v2
    [*] --> IDLE : App Launch

    IDLE --> CURSOR_MODE : Mode Select (Default)
    IDLE --> GAMING_MODE : Press G
    IDLE --> MEDIA_MODE : Press M
    IDLE --> ACCESSIBILITY_MODE : Press A
    IDLE --> CUSTOM_MODE : Press C

    CURSOR_MODE --> GAMING_MODE : Gesture Switch
    GAMING_MODE --> CURSOR_MODE : Gesture Switch
    MEDIA_MODE --> CURSOR_MODE : Gesture Switch
    ACCESSIBILITY_MODE --> CURSOR_MODE : Gesture Switch

    CURSOR_MODE --> IDLE : Pause Gesture
    GAMING_MODE --> IDLE : Pause Gesture
    MEDIA_MODE --> IDLE : Pause Gesture

    state CURSOR_MODE {
        [*] --> tracking
        tracking --> clicking
        clicking --> scrolling
        scrolling --> tracking
    }

    state GAMING_MODE {
        [*] --> wasd_control
        wasd_control --> action_triggers
        action_triggers --> camera_look
    }

    state MEDIA_MODE {
        [*] --> playback
        playback --> volume
        volume --> seek
    }
Loading

Mode Feature Matrix

Feature 🖱️ Cursor Mode 🎮 Gaming Mode 🎵 Media Mode ♿ Accessibility Mode
Cursor Movement
Click (Pinch)
Scroll
WASD Keys
Jump (Open Palm)
Attack (Fist)
Volume Control
Play/Pause
Track Seek
Face Control
Blink Click
Dwell Select

⚙️ Customization Engine

Configuration Architecture

flowchart TD
    subgraph INPUT_CONFIG["Configuration Sources"]
        DEF["Default Config\n(Built-in templates)"]
        USR["User Profile\n(JSON in localStorage)"]
        CLOUD["Cloud Sync\n(Future: Firebase)"]
    end

    subgraph MERGE["Config Merge Engine"]
        PRI["Priority Resolver\n(User > Default)"]
        VAL["Schema Validator\n(Zod)"]
        CACHE["Config Cache\n(In-memory)"]
    end

    subgraph RUNTIME["Runtime Layer"]
        MAP["Action Mapper"]
        DEBOUND["Debounce Controller"]
        SENS["Sensitivity Scaler"]
    end

    DEF --> PRI
    USR --> PRI
    CLOUD --> PRI
    PRI --> VAL
    VAL --> CACHE
    CACHE --> MAP
    CACHE --> DEBOUND
    CACHE --> SENS
Loading

Example Config JSON

{
  "profile": "Default",
  "version": "1.0.0",
  "mode": "cursor",
  "sensitivity": {
    "cursor_speed": 1.5,
    "scroll_speed": 2.0,
    "gesture_confidence": 0.80,
    "debounce_ms": 150
  },
  "gesture_bindings": {
    "pinch": "left_click",
    "three_fingers": "right_click",
    "two_fingers_up": "scroll_up",
    "two_fingers_down": "scroll_down",
    "open_palm": "pause_control",
    "fist": "drag_start",
    "spread_both_hands": "zoom_in",
    "pinch_both_hands": "zoom_out",
    "head_tilt_right": "next_tab",
    "head_tilt_left": "prev_tab",
    "single_blink": "left_click",
    "mouth_open": "play_pause"
  },
  "face_control": {
    "enabled": true,
    "head_tilt_threshold_degrees": 15,
    "blink_ear_threshold": 0.25,
    "mouth_mar_threshold": 0.50
  }
}

📊 Performance Metrics

System Performance Targets

Metric Target Acceptable Poor
Frame Processing Time < 16ms < 33ms > 50ms
End-to-End Latency < 50ms < 100ms > 200ms
Gesture Accuracy > 95% > 85% < 75%
False Positive Rate < 2% < 5% > 10%
CPU Usage (idle) < 20% < 40% > 60%
RAM Footprint < 200MB < 400MB > 600MB
Camera FPS 60fps 30fps < 15fps
Landmark Detect/sec > 60 > 30 < 15

Feature Distribution

pie title HandMatrix — Module Size Distribution
    "Gesture Engine & AI" : 35
    "MediaPipe Integration" : 20
    "React Dashboard UI" : 18
    "Action Mapper & OS Control" : 12
    "Customization System" : 10
    "Utilities & Config" : 5
Loading

Accuracy by Gesture Category

xychart-beta
    title "Gesture Recognition Accuracy by Category (%)"
    x-axis ["Pinch", "Open Palm", "Fist", "Two Finger", "Head Tilt", "Blink", "Dual Hand"]
    y-axis "Accuracy (%)" 0 --> 100
    bar [97, 94, 91, 93, 88, 90, 85]
    line [97, 94, 91, 93, 88, 90, 85]
Loading

⚠️ Challenges & Solutions

Risk Matrix

quadrantChart
    title Risk vs Impact Matrix
    x-axis "Low Likelihood" --> "High Likelihood"
    y-axis "Low Impact" --> "High Impact"

    quadrant-1 Critical Risks
    quadrant-2 High Impact / Low Likelihood
    quadrant-3 Low Priority
    quadrant-4 Monitor

    Lighting Variance: [0.85, 0.75]
    Camera Quality: [0.60, 0.65]
    CPU Overload: [0.55, 0.80]
    False Positives: [0.70, 0.70]
    Gesture Ambiguity: [0.75, 0.60]
    API Rate Limits: [0.30, 0.55]
    Browser Compat: [0.40, 0.50]
Loading

Mitigation Strategies

flowchart LR
    subgraph PROBLEMS["⚠️ Known Challenges"]
        P1["Poor Lighting"]
        P2["Gesture Ambiguity"]
        P3["CPU Performance"]
        P4["Multi-Hand Conflict"]
        P5["False Positives"]
    end

    subgraph SOLUTIONS["✅ Implemented Solutions"]
        S1["Adaptive brightness\nnormalization in preprocessing"]
        S2["Temporal smoothing +\nGemini AI disambiguation"]
        S3["Web Workers for\noff-thread inference"]
        S4["Priority queue +\nprimary hand dominance"]
        S5["Debounce engine +\nconfidence gating"]
    end

    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5
Loading
Challenge Root Cause Mitigation Status
Lighting sensitivity MediaPipe relies on contrast Histogram equalization + brightness normalization ✅ Implemented
Gesture ambiguity Similar landmark configs Temporal buffer + Gemini reasoning ✅ Implemented
CPU bottleneck WASM inference on main thread Offload to Web Worker 🔄 In Progress
Jitter/tremor Raw coordinates noisy Kalman filter smoothing ✅ Implemented
False positives Unintentional gestures Debounce + hold duration gating ✅ Implemented
Multi-hand conflict Two hands competing Dominant hand priority system ✅ Implemented
Camera permission Browser security model Graceful degradation UI ✅ Implemented

🔮 Roadmap

Development Timeline

gantt
    title HandMatrix Neural Engine — Development Roadmap
    dateFormat  YYYY-MM
    axisFormat  %b %Y

    section Phase 1 — MVP (Complete)
    Single-hand gesture detection     :done, p1a, 2025-10, 2025-11
    Cursor + click control            :done, p1b, 2025-11, 2025-12
    React dashboard foundation        :done, p1c, 2025-11, 2025-12

    section Phase 2 — Enhanced (Complete)
    Two-hand gesture support          :done, p2a, 2025-12, 2026-01
    Face mesh integration             :done, p2b, 2026-01, 2026-02
    Gemini AI disambiguation          :done, p2c, 2026-01, 2026-02
    JSON customization engine         :done, p2d, 2026-02, 2026-03

    section Phase 3 — Pro (In Progress)
    Gaming mode keybindings          :active, p3a, 2026-03, 2026-05
    User profile management          :active, p3b, 2026-03, 2026-05
    Web Worker performance           :p3c, 2026-04, 2026-06
    accessibility mode               :p3d, 2026-05, 2026-07

    section Phase 4 — Future
    Voice + gesture hybrid            :p4a, 2026-07, 2026-09
    AI gesture learning               :p4b, 2026-08, 2026-10
    Cloud profile sync                :p4c, 2026-09, 2026-11
    Mobile / AR/VR support           :p4d, 2026-10, 2027-01
Loading

Feature Versioning

Version Features Status
v1.0 Single-hand cursor + click, basic dashboard ✅ Released
v1.5 Two-hand gestures, face control ✅ Released
v2.0 Gemini AI, customization engine, modes ✅ Released
v2.5 Gaming mode, user profiles, Web Workers 🔄 In Progress
v3.0 Voice hybrid, AI learning, mobile 📅 Planned
v4.0 AR/VR integration, cloud sync 🔮 Future

🧩 Use Cases

mindmap
  root((HandMatrix\nUse Cases))
    Accessibility
      Motor-impaired users
      Post-surgery recovery
      ALS/Parkinson's patients
    Professional
      Surgical theater control
      Clean-room environments
      Industrial control panels
    Entertainment
      PC gaming
      VR navigation
      Live performance
    Education
      Touchless presentations
      Interactive whiteboards
      Remote teaching
    Smart Home
      Gesture-based IoT
      TV/media control
      Lighting control
    Health & Fitness
      Hands-free workout tracking
      Rehab exercise tracking
Loading

🆚 Competitive Comparison

Feature HandMatrix Leap Motion Kinect Eye Gaze Trackers
Hardware Required ❌ Camera only ✅ Special device ✅ Special device ✅ Special device
Cost $0 $79+ Discontinued $500+
AI Disambiguation ✅ Gemini AI
Face Control ✅ Limited
Custom Bindings ✅ JSON Config ✅ Limited
Browser Native ✅ WebApp ❌ Desktop only ❌ Desktop only
Open Source ✅ MIT
Accessibility Mode ✅ Limited
Frames Per Second 60fps 200fps 30fps 60fps
Setup Complexity ⭐ Simple ⭐⭐ Medium ⭐⭐⭐ Complex ⭐⭐⭐ Complex

🧠 AI Integration

Gemini AI Role in HandMatrix

flowchart TB
    subgraph INPUTS["Gemini Input Context"]
        LM["Landmark sequence\n(last 10 frames)"]
        MODE["Current mode\n(cursor/gaming/media)"]
        HIST["Action history\n(last 5 actions)"]
        ENV["Environment state\n(active app, time)"]
    end

    GEMINI["🤖 Gemini 1.5 Flash\nAI Reasoning Engine"]

    subgraph OUTPUTS["Gemini Decisions"]
        CLASSIFY["Gesture classification\n(high-confidence)"]
        RESOLVE["Ambiguity resolution\n(similar gestures)"]
        SUGGEST["Adaptive suggestions\n(new mappings)"]
        EXPLAIN["Natural language\nexplanation to user"]
    end

    LM --> GEMINI
    MODE --> GEMINI
    HIST --> GEMINI
    ENV --> GEMINI

    GEMINI --> CLASSIFY
    GEMINI --> RESOLVE
    GEMINI --> SUGGEST
    GEMINI --> EXPLAIN
Loading

📜 Learning Outcomes

Domain Skills Developed
Computer Vision MediaPipe WASM, landmark extraction, OpenCV normalization
AI Engineering Gemini API integration, prompt engineering, context management
Real-Time Systems Frame-perfect processing, Kalman filtering, temporal buffers
System Design Multi-modal fusion, event-driven architecture, state machines
HCI Accessibility design, gesture UX, feedback loops
Frontend React 19, TypeScript 5.8, Vite, Tailwind, Canvas API
Performance Web Workers, WASM optimization, 60fps rendering
Product User profiling, mode systems, customization engines

🤝 Contributing

flowchart LR
    FORK["🍴 Fork Repo"] 
    --> CLONE["📥 Clone Locally"]
    --> BRANCH["🌿 Create Feature Branch\ngit checkout -b feat/your-feature"]
    --> CODE["👨‍💻 Write Code\n+ Tests"]
    --> COMMIT["📝 Conventional Commit\nfeat: add new gesture"]
    --> PUSH["📤 Push Branch"]
    --> PR["🔀 Open Pull Request\nwith description"]
    --> REVIEW["👀 Code Review"]
    --> MERGE["✅ Merged!"]
Loading

Contribution Guidelines

  • Follow Conventional Commits (feat:, fix:, docs:, perf:)
  • Write TypeScript — no any types allowed
  • Add JSDoc comments to all exported functions
  • Test gestures manually before submitting PR
  • Keep PR scope small and focused

👨‍💻 Author

Rishvin Reddy
B.Tech CSE (BIC) · Woxsen University

Portfolio GitHub LinkedIn


📜 License

MIT License

Copyright (c) 2026 Rishvin Reddy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, subject to the following conditions: ...

⭐ HandMatrix is not just a project —

It is the future of how humans interact with machines.
Touch becomes optional. Intention becomes the interface.


If this project inspires you:

Star Fork Issues

Built with ❤️ by Rishvin Reddy · Woxsen University · 2026

About

HandMatrix is a real-time AI-based hand gesture control system built using computer vision, enabling touchless interaction with computers, applications, and games.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors