Skip to content

xuruiray/stackchan-local

Repository files navigation

StackChan Local

English | Chinese

Runtime Target ESP-IDF Desktop WebUI License

StackChan Local is a local-first desktop daemon, WebUI, and ESP-IDF firmware for M5Stack StackChan on ESP32-S3. Its goal is to let users quickly verify from a desktop that every hardware sensor and onboard module is working normally, then try a small set of local applications such as Codex status alerts, hardware expression control, and face-position tracking. The hardware connects to a Mac on the LAN over WebSocket, while the browser console, Codex integration, and optional local vision services run on the desktop.

The current architecture has three explicit firmware layers:

  • hardware: board profile, buses, chip drivers, and device-facing IO.
  • services: hardware application behavior such as display, motion, sensors, power, audio, network, and local companion protocol handling.
  • system: boot, lifecycle, settings, diagnostics, and ESP-IDF platform adapters.

Visual Overview

Hardware target Local hardware console
StackChan hardware StackChan Local hardware console
M5Stack StackChan on ESP32-S3 with GC0308 camera, touch, IMU, servos, RGB, audio, and power modules. React + Vite console for modules, applications, raw snapshots, logs, and camera streams.

Project Snapshot

Dimension Current design
Project goal Desktop-first hardware validation for all StackChan sensors and modules, plus simple local applications
Hardware target M5Stack StackChan / ESP32-S3 only
Firmware stack ESP-IDF 5.5.4 with system, hardware, and services layers
Desktop stack TypeScript daemon, React + Vite WebUI, local Python vision sidecar
Transport LAN WebSocket at ws://<mac-ip>:8787/stackchan/local plus HTTP/SSE/MJPEG on 8788
Control model Structured safe commands; no raw JSON command console
Vision and avatar model Local face-position tracking, hardware expression presets, optional avatar JSON, and no identity recognition
Hardware observability Per-module pages, public snapshot, logs, I2C scan, stream metrics, and sensor availability reasons

Contents

What It Does

  • Provides a desktop console for quickly checking whether StackChan hardware sensors and modules are connected, available, and returning valid data.
  • Runs simple local applications on top of the verified hardware, including Codex status alerts, hardware expression control, optional completion TTS, RGB light alerts, and face-position tracking.
  • Runs StackChan locally over LAN with desktop-side control and application orchestration.
  • Mirrors Codex activity into hardware states such as idle, thinking, and speaking.
  • Streams camera frames to the desktop for local face-position tracking.
  • Exposes a componentized React console at http://localhost:8788.
  • Reports hardware telemetry for power, touch, IMU with BMM150 magnetometer data, camera, servos, audio, RTC, NFC, IR, LTR553, INA226, Wi-Fi, BLE, RGB, and IO expander state.
  • Provides MCP tools so Codex can query status, speak, move the head, capture images, set modes, and control face tracking.

Face tracking uses local position detection only and does not perform identity recognition.

Repository Layout

.
├── assets/              README assets
├── desktop/             TypeScript daemon, WebSocket server, MCP server, vision, TTS, WebUI server
│   ├── src/
│   │   ├── codex/       Codex session watcher
│   │   ├── device/      Device registry and snapshots
│   │   ├── mcp/         MCP tool server
│   │   ├── preview/     8788 HTTP/SSE/MJPEG/API server
│   │   ├── robot/       Command controller and motion arbitration
│   │   ├── tts/         Completion announcer and provider integration
│   │   ├── vision/      Face detector sidecar and tracking controller
│   │   └── ws/          Firmware WebSocket protocol server
│   └── preview-ui/      React + Vite hardware console
├── firmware/            ESP-IDF firmware project for M5Stack StackChan / ESP32-S3
│   └── main/
│       ├── hardware/    Board profile, bus, driver, and sensor modules
│       ├── services/    Display, sensors, motion, audio, power, network, local companion
│       ├── system/      Boot, core context, lifecycle, diagnostics, ESP-IDF adapters
│       ├── third_party/ Passive chip libraries
│       └── app/         Local Companion UI entry
├── protocol/            Shared TypeScript protocol types and JSON schema validation
└── scripts/             Build, flash, and hygiene scripts

Architecture

Runtime Topology

flowchart LR
  Codex["Codex / MCP"] --> Desktop["desktop daemon"]
  Browser["React WebUI :8788"] --> Desktop
  Desktop --> Vision["Python OpenCV detector"]
  Desktop <-->|"ws://<mac-ip>:8787/stackchan/local"| Firmware["ESP32-S3 firmware"]
  Firmware --> System["system"]
  Firmware --> Services["services"]
  Firmware --> Hardware["hardware"]
  Hardware --> Devices["PMIC, display, touch, camera, audio, servos, sensors, network"]
Loading

Firmware Ownership

flowchart TB
  AppMain["app_main"] --> Boot["system/boot"]
  Boot --> Context["system/core/SystemContext"]
  Context --> BoardProfile["hardware/board/m5stack_stackchan/BoardProfile"]
  BoardProfile --> Registry["HardwareRegistry"]
  Registry --> Bus["hardware/bus"]
  Registry --> Drivers["hardware drivers"]
  Context --> ServiceRegistry["ServiceRegistry"]
  ServiceRegistry --> SensorService["services/sensors"]
  ServiceRegistry --> MotionService["services/motion"]
  ServiceRegistry --> DisplayService["services/display"]
  ServiceRegistry --> CompanionService["services/local_companion"]
  CompanionService --> Telemetry["snapshots, camera frames, logs"]
  SensorService --> Telemetry
  MotionService --> Drivers
  DisplayService --> Drivers
Loading

Command And Telemetry Loop

sequenceDiagram
  participant UI as React WebUI
  participant Desktop as Desktop daemon
  participant Firmware as ESP32-S3 firmware
  participant Service as Firmware service
  participant Driver as Hardware driver

  UI->>Desktop: POST /api/* or SSE subscribe
  Desktop->>Firmware: structured robot command over WebSocket
  Firmware->>Service: dispatch command
  Service->>Driver: read, write, or control
  Driver-->>Service: hardware result
  Service-->>Firmware: snapshot or ACK
  Firmware-->>Desktop: telemetry, camera frame, command ACK
  Desktop-->>UI: /status, /events, /debug/*, MJPEG/JPEG
Loading

Desktop Responsibilities

  • Listen for firmware WebSocket connections on 8787.
  • Validate protocol messages with shared schemas from protocol/.
  • Maintain device sessions, heartbeat state, command ACKs, and public snapshots.
  • Expose the React WebUI, status APIs, debug logs, SSE updates, and raw/processed camera streams on 8788.
  • Run MCP tools for Codex.
  • Watch Codex session state and dispatch companion mode changes.
  • Run optional completion TTS and RGB light alerts.
  • Run local face-position tracking through desktop/scripts/face_detector.py.

Firmware Responsibilities

  • Boot the M5Stack StackChan board profile and initialize hardware drivers.
  • Compose drivers into services for display, motion, sensors, power, audio, network, and local companion transport.
  • Connect to the desktop daemon using mDNS or saved fallback WebSocket URL.
  • Send heartbeat, state, hardware status, touch, IMU, battery, Wi-Fi, camera, and audio telemetry.
  • Execute commands for mode, audio playback, camera stream, RGB, servo motion, and face tracking.
  • Keep local avatar rendering, blinking, idle behavior, and power policy on-device.

Runtime Endpoints

Surface Default Owner Purpose
Firmware WebSocket ws://<mac-ip>:8787/stackchan/local desktop/src/ws Firmware session, heartbeat, telemetry, commands
Preview WebUI http://localhost:8788/ desktop/src/preview + desktop/preview-ui Browser console for modules, applications, and debug pages
Public status GET /status Preview server Device/session summary consumed by the UI
Public snapshot GET /debug/snapshot Preview server Raw public JSON snapshot for diagnostics
Logs GET /debug/logs, GET /debug/log-events Preview server Daemon logs and streaming log updates
Camera streams GET /frame.jpg, GET /stream.mjpg Preview server Latest raw/processed camera frames
Service discovery _stackchan-local._tcp Desktop daemon mDNS discovery for firmware pairing

Firmware Layering

firmware/main/
  hardware/
    board/m5stack_stackchan/   pinmap, hardware_config, BoardProfile
    bus/                       I2C device/bus helpers
    power/                     AXP2101 and backlight
    display/                   ILI9342/LVGL driver boundary
    touch/                     FT6336 screen touch
    audio/                     ES7210/AW88298/CoreS3 codec surface
    camera/                    GC0308 camera
    motion/                    SCS servo driver surface
    io_expander/               AW9523/PY32 IO expander
    lighting/                  RGB strip driver boundary
    sensors/                   SI12T, BMI270, BMM150, RTC, INA226, LTR553, NFC, IR, mic level
    network/                   Wi-Fi, BLE, provisioning helpers

  services/
    display/                   LVGL runtime, avatar binding, status display, RGB behavior
    sensors/                   Polling, snapshots, I2C diagnostics, sensor events
    motion/                    Servo calibration and expression-motion output
    power/                     Servo power and IO expander power policy
    audio/                     Codec service, wake word/audio runtime, mic level
    network/                   Wi-Fi, SNTP, BLE provisioning
    expression_motion/         Avatar, animation, modifiers, StackChan motion engine
    local_companion/           WebSocket session, command dispatch, telemetry, media streams

  system/
    boot/                      Startup sequence and runtime boot
    core/                      SystemContext, settings, event bus, service registry, diagnostics
    lifecycle/                 Reboot, power off, factory reset/runtime state
    power_policy/              Idle power policy namespace
    platform/esp_idf/          ESP-IDF adapters

  third_party/                 Passive chip libraries only

Current firmware boundaries:

  • hardware drivers take bus/config dependencies and expose begin, available, read, write, or control style APIs.
  • hardware must not depend on LVGL app objects, Local Companion services, desktop protocol code, or Board::GetInstance().
  • services compose drivers and publish application-level behavior, telemetry, and events.
  • system owns boot order, shared context, lifecycle, settings, diagnostics, and ESP-IDF adapters.
  • third_party contains passive chip libraries only.

WebUI

The WebUI is served by the desktop daemon at http://localhost:8788. It is a React + Vite app under desktop/preview-ui/.

The console is designed as the primary hardware validation surface. It has three groups:

  • Modules: one page per chip or hardware module: Power / PMIC, Display, Screen Touch, Head Touch, IMU, Camera, Servo, IO Expander, RGB LED, RTC, ALS/Proximity, NFC, IR, Audio, Wi-Fi/BLE. Power includes AXP2101 and INA226; IMU includes BMI270 and BMM150 magnetometer data.
  • Applications: simple app flows built on verified hardware, currently Codex announcer/light alert, hardware expression control, and face-position tracking.
  • Debug: system counters, raw public snapshot, and daemon logs.

Camera pages expose separate raw and processed streams:

  • Raw preview: camera stream before face detection.
  • Face tracking: processed stream with face-position overlay.

Capability Matrix

Group Pages or services Data shown Safe commands
Power AXP2101, INA226, backlight policy Battery, charge state, rail current/power, availability reasons Read-only status
Touch and motion sensors FT6336, SI12T, BMI270, BMM150, LTR553 Touch state, accel/gyro, fused attitude, magnetometer, ALS/proximity Read-only status
Camera and vision GC0308 raw stream, face-position processed stream FPS, frame interval, latency, JPEG size, face target Stream on/off, capture, FPS selection
Actuators SCS servos, RGB LED, IO expander Servo power, RGB state, expander availability Move head, set RGB color/brightness
Audio and time ES7210, AW88298, RTC Mic availability, codec status, RTC time TTS/say through MCP
Network Wi-Fi, BLE provisioning, mDNS Link state, SSID/IP, RSSI, reconnect counters Provisioning and runtime network commands through services
Applications Codex announcer/light alert, hardware expression control, face-position tracking App state, enabled flags, expression capability, tracking target and latency Send expression presets/avatar JSON, toggle tracking, adjust FPS, companion mode commands
Debug System, raw snapshot, logs Heap, counters, command ACKs, public JSON, daemon logs Read-only diagnostics

Quick Start

1. Install Desktop Dependencies

npm install
cp .env.example .env

Edit .env before using real hardware. At minimum, change:

STACKCHAN_PAIRING_TOKEN=dev-local-token

2. Start Desktop Daemon

npm run dev

Default endpoints:

  • Firmware WebSocket: ws://<mac-ip>:8787/stackchan/local
  • WebUI: http://localhost:8788
  • mDNS service: _stackchan-local._tcp

3. Optional Face Tracking Setup

npm run vision:install
STACKCHAN_FACE_TRACKING=1 npm run dev

Face tracking uses the local Python OpenCV YuNet sidecar and fixed 320 x 240 camera input. The WebUI exposes stream options and center-point PID controls.

4. Build And Flash Firmware

Use ESP-IDF 5.5.4 for the current firmware tree:

source ~/esp/esp-idf-v5.5.4/export.sh
npm run firmware:build
npm run firmware:check-local-only
npm run firmware:flash

Equivalent raw ESP-IDF commands:

cd firmware
idf.py set-target esp32s3
idf.py build
idf.py -p /dev/cu.usbmodem21301 flash monitor

If the device has no saved Wi-Fi credentials, it starts a StackChan-XXXX provisioning AP. Connect to it and open http://192.168.4.1.

Configuration

Common settings live in .env.example.

Important defaults:

  • STACKCHAN_LOCAL_PORT=8787
  • STACKCHAN_PREVIEW_PORT=8788
  • STACKCHAN_PAIRING_TOKEN=dev-local-token
  • STACKCHAN_FACE_TRACKING=0
  • STACKCHAN_FACE_TRACKING_CAMERA_PRESET=fast
  • STACKCHAN_FACE_TRACKING_SPEED=420
  • STACKCHAN_FACE_TRACKING_DEADBAND=0.045
  • STACKCHAN_FACE_TRACKING_TRACE_LOG=logs/face-tracking.ndjson
  • STACKCHAN_CODEX_STATUS=1
  • STACKCHAN_VOLCENGINE_TTS_ENABLED=0

Do not commit real pairing tokens or provider API keys.

MCP Tools

Run MCP mode with:

npm run mcp

Available tools:

  • stackchan_status
  • stackchan_say
  • stackchan_react
  • stackchan_move_head
  • stackchan_play_animation
  • stackchan_capture_image
  • stackchan_set_mode
  • stackchan_face_tracking

Development Commands

Goal Command
Install workspace dependencies npm install
Start desktop daemon and WebUI npm run dev
Run MCP server mode npm run mcp
Type-check all TypeScript packages npm run typecheck
Run protocol and desktop tests npm test
Build preview UI and desktop TypeScript npm run build
Install local vision dependencies npm run vision:install
Build firmware source ~/esp/esp-idf-v5.5.4/export.sh && npm run firmware:build
Flash firmware source ~/esp/esp-idf-v5.5.4/export.sh && npm run firmware:flash
Check firmware local-only boundaries npm run firmware:check-local-only

Testing

Desktop and protocol:

npm run typecheck
npm test
npm run check

Targeted checks:

npm test -w @stackchan-local/protocol
npm test -w @stackchan-local/desktop
npm run typecheck -w @stackchan-local/desktop

Firmware:

source ~/esp/esp-idf-v5.5.4/export.sh
npm run firmware:build
npm run firmware:check-local-only

Hardware acceptance after flashing:

  • http://localhost:8788/, /status, /debug/snapshot, and /debug/logs return 200.
  • Device shows online in the WebUI.
  • Module pages update for PMIC/battery, INA226 power monitor, touch, head touch, IMU with magnetometer, RTC, mic, camera, RGB/io expander, servos, NFC, IR, and LTR553.
  • Present hardware reports valid non-NaN values and reacts to touch, motion, light, sound, and camera stimuli.
  • Missing or unsupported modules report available:false with a clear reason.
  • /frame.jpg or /stream.mjpg returns a valid JPEG stream when camera streaming is enabled.
  • Firmware serial logs and desktop logs have no unexplained ERROR, no persistent WARN spam, no reconnect loop, and no repeated sensor init timeout.

Privacy And Safety

  • Camera frames stay on the LAN between the hardware and desktop daemon.
  • Face tracking is local position detection only, not identity recognition.
  • Cloud TTS is optional and disabled by default.
  • Pairing tokens and API keys belong in .env, not in Git.

Project Status

This is an experimental local hardware/software project for macOS plus M5Stack StackChan on ESP32-S3. The current focus is fast desktop-side validation of hardware sensors and modules, stable local control, simple companion applications, and a clean firmware layering model. Cross-platform desktop packaging and production-grade firmware release flow still need hardening.

License

MIT for StackChan Local project code unless a subdirectory or managed dependency states otherwise.

About

Local-first desktop daemon and firmware overlay for a Codex-connected StackChan robot.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors