StackChan Local is a local-first desktop daemon, WebUI, and ESP-IDF firmware for M5Stack StackChan on ESP32-S3. Its goal is to let users quickly verify from a desktop that every hardware sensor and onboard module is working normally, then try a small set of local applications such as Codex status alerts, hardware expression control, and face-position tracking. The hardware connects to a Mac on the LAN over WebSocket, while the browser console, Codex integration, and optional local vision services run on the desktop.
The current architecture has three explicit firmware layers:
hardware: board profile, buses, chip drivers, and device-facing IO.services: hardware application behavior such as display, motion, sensors, power, audio, network, and local companion protocol handling.system: boot, lifecycle, settings, diagnostics, and ESP-IDF platform adapters.
| Dimension | Current design |
|---|---|
| Project goal | Desktop-first hardware validation for all StackChan sensors and modules, plus simple local applications |
| Hardware target | M5Stack StackChan / ESP32-S3 only |
| Firmware stack | ESP-IDF 5.5.4 with system, hardware, and services layers |
| Desktop stack | TypeScript daemon, React + Vite WebUI, local Python vision sidecar |
| Transport | LAN WebSocket at ws://<mac-ip>:8787/stackchan/local plus HTTP/SSE/MJPEG on 8788 |
| Control model | Structured safe commands; no raw JSON command console |
| Vision and avatar model | Local face-position tracking, hardware expression presets, optional avatar JSON, and no identity recognition |
| Hardware observability | Per-module pages, public snapshot, logs, I2C scan, stream metrics, and sensor availability reasons |
- What It Does
- Repository Layout
- Architecture
- Runtime Endpoints
- Firmware Layering
- WebUI
- Capability Matrix
- Quick Start
- Development Commands
- Testing
- Provides a desktop console for quickly checking whether StackChan hardware sensors and modules are connected, available, and returning valid data.
- Runs simple local applications on top of the verified hardware, including Codex status alerts, hardware expression control, optional completion TTS, RGB light alerts, and face-position tracking.
- Runs StackChan locally over LAN with desktop-side control and application orchestration.
- Mirrors Codex activity into hardware states such as idle, thinking, and speaking.
- Streams camera frames to the desktop for local face-position tracking.
- Exposes a componentized React console at
http://localhost:8788. - Reports hardware telemetry for power, touch, IMU with BMM150 magnetometer data, camera, servos, audio, RTC, NFC, IR, LTR553, INA226, Wi-Fi, BLE, RGB, and IO expander state.
- Provides MCP tools so Codex can query status, speak, move the head, capture images, set modes, and control face tracking.
Face tracking uses local position detection only and does not perform identity recognition.
.
├── assets/ README assets
├── desktop/ TypeScript daemon, WebSocket server, MCP server, vision, TTS, WebUI server
│ ├── src/
│ │ ├── codex/ Codex session watcher
│ │ ├── device/ Device registry and snapshots
│ │ ├── mcp/ MCP tool server
│ │ ├── preview/ 8788 HTTP/SSE/MJPEG/API server
│ │ ├── robot/ Command controller and motion arbitration
│ │ ├── tts/ Completion announcer and provider integration
│ │ ├── vision/ Face detector sidecar and tracking controller
│ │ └── ws/ Firmware WebSocket protocol server
│ └── preview-ui/ React + Vite hardware console
├── firmware/ ESP-IDF firmware project for M5Stack StackChan / ESP32-S3
│ └── main/
│ ├── hardware/ Board profile, bus, driver, and sensor modules
│ ├── services/ Display, sensors, motion, audio, power, network, local companion
│ ├── system/ Boot, core context, lifecycle, diagnostics, ESP-IDF adapters
│ ├── third_party/ Passive chip libraries
│ └── app/ Local Companion UI entry
├── protocol/ Shared TypeScript protocol types and JSON schema validation
└── scripts/ Build, flash, and hygiene scripts
flowchart LR
Codex["Codex / MCP"] --> Desktop["desktop daemon"]
Browser["React WebUI :8788"] --> Desktop
Desktop --> Vision["Python OpenCV detector"]
Desktop <-->|"ws://<mac-ip>:8787/stackchan/local"| Firmware["ESP32-S3 firmware"]
Firmware --> System["system"]
Firmware --> Services["services"]
Firmware --> Hardware["hardware"]
Hardware --> Devices["PMIC, display, touch, camera, audio, servos, sensors, network"]
flowchart TB
AppMain["app_main"] --> Boot["system/boot"]
Boot --> Context["system/core/SystemContext"]
Context --> BoardProfile["hardware/board/m5stack_stackchan/BoardProfile"]
BoardProfile --> Registry["HardwareRegistry"]
Registry --> Bus["hardware/bus"]
Registry --> Drivers["hardware drivers"]
Context --> ServiceRegistry["ServiceRegistry"]
ServiceRegistry --> SensorService["services/sensors"]
ServiceRegistry --> MotionService["services/motion"]
ServiceRegistry --> DisplayService["services/display"]
ServiceRegistry --> CompanionService["services/local_companion"]
CompanionService --> Telemetry["snapshots, camera frames, logs"]
SensorService --> Telemetry
MotionService --> Drivers
DisplayService --> Drivers
sequenceDiagram
participant UI as React WebUI
participant Desktop as Desktop daemon
participant Firmware as ESP32-S3 firmware
participant Service as Firmware service
participant Driver as Hardware driver
UI->>Desktop: POST /api/* or SSE subscribe
Desktop->>Firmware: structured robot command over WebSocket
Firmware->>Service: dispatch command
Service->>Driver: read, write, or control
Driver-->>Service: hardware result
Service-->>Firmware: snapshot or ACK
Firmware-->>Desktop: telemetry, camera frame, command ACK
Desktop-->>UI: /status, /events, /debug/*, MJPEG/JPEG
- Listen for firmware WebSocket connections on
8787. - Validate protocol messages with shared schemas from
protocol/. - Maintain device sessions, heartbeat state, command ACKs, and public snapshots.
- Expose the React WebUI, status APIs, debug logs, SSE updates, and raw/processed camera streams on
8788. - Run MCP tools for Codex.
- Watch Codex session state and dispatch companion mode changes.
- Run optional completion TTS and RGB light alerts.
- Run local face-position tracking through
desktop/scripts/face_detector.py.
- Boot the M5Stack StackChan board profile and initialize hardware drivers.
- Compose drivers into services for display, motion, sensors, power, audio, network, and local companion transport.
- Connect to the desktop daemon using mDNS or saved fallback WebSocket URL.
- Send heartbeat, state, hardware status, touch, IMU, battery, Wi-Fi, camera, and audio telemetry.
- Execute commands for mode, audio playback, camera stream, RGB, servo motion, and face tracking.
- Keep local avatar rendering, blinking, idle behavior, and power policy on-device.
| Surface | Default | Owner | Purpose |
|---|---|---|---|
| Firmware WebSocket | ws://<mac-ip>:8787/stackchan/local |
desktop/src/ws |
Firmware session, heartbeat, telemetry, commands |
| Preview WebUI | http://localhost:8788/ |
desktop/src/preview + desktop/preview-ui |
Browser console for modules, applications, and debug pages |
| Public status | GET /status |
Preview server | Device/session summary consumed by the UI |
| Public snapshot | GET /debug/snapshot |
Preview server | Raw public JSON snapshot for diagnostics |
| Logs | GET /debug/logs, GET /debug/log-events |
Preview server | Daemon logs and streaming log updates |
| Camera streams | GET /frame.jpg, GET /stream.mjpg |
Preview server | Latest raw/processed camera frames |
| Service discovery | _stackchan-local._tcp |
Desktop daemon | mDNS discovery for firmware pairing |
firmware/main/
hardware/
board/m5stack_stackchan/ pinmap, hardware_config, BoardProfile
bus/ I2C device/bus helpers
power/ AXP2101 and backlight
display/ ILI9342/LVGL driver boundary
touch/ FT6336 screen touch
audio/ ES7210/AW88298/CoreS3 codec surface
camera/ GC0308 camera
motion/ SCS servo driver surface
io_expander/ AW9523/PY32 IO expander
lighting/ RGB strip driver boundary
sensors/ SI12T, BMI270, BMM150, RTC, INA226, LTR553, NFC, IR, mic level
network/ Wi-Fi, BLE, provisioning helpers
services/
display/ LVGL runtime, avatar binding, status display, RGB behavior
sensors/ Polling, snapshots, I2C diagnostics, sensor events
motion/ Servo calibration and expression-motion output
power/ Servo power and IO expander power policy
audio/ Codec service, wake word/audio runtime, mic level
network/ Wi-Fi, SNTP, BLE provisioning
expression_motion/ Avatar, animation, modifiers, StackChan motion engine
local_companion/ WebSocket session, command dispatch, telemetry, media streams
system/
boot/ Startup sequence and runtime boot
core/ SystemContext, settings, event bus, service registry, diagnostics
lifecycle/ Reboot, power off, factory reset/runtime state
power_policy/ Idle power policy namespace
platform/esp_idf/ ESP-IDF adapters
third_party/ Passive chip libraries only
Current firmware boundaries:
hardwaredrivers take bus/config dependencies and exposebegin,available,read,write, orcontrolstyle APIs.hardwaremust not depend on LVGL app objects, Local Companion services, desktop protocol code, orBoard::GetInstance().servicescompose drivers and publish application-level behavior, telemetry, and events.systemowns boot order, shared context, lifecycle, settings, diagnostics, and ESP-IDF adapters.third_partycontains passive chip libraries only.
The WebUI is served by the desktop daemon at http://localhost:8788. It is a React + Vite app under desktop/preview-ui/.
The console is designed as the primary hardware validation surface. It has three groups:
- Modules: one page per chip or hardware module: Power / PMIC, Display, Screen Touch, Head Touch, IMU, Camera, Servo, IO Expander, RGB LED, RTC, ALS/Proximity, NFC, IR, Audio, Wi-Fi/BLE. Power includes AXP2101 and INA226; IMU includes BMI270 and BMM150 magnetometer data.
- Applications: simple app flows built on verified hardware, currently Codex announcer/light alert, hardware expression control, and face-position tracking.
- Debug: system counters, raw public snapshot, and daemon logs.
Camera pages expose separate raw and processed streams:
- Raw preview: camera stream before face detection.
- Face tracking: processed stream with face-position overlay.
| Group | Pages or services | Data shown | Safe commands |
|---|---|---|---|
| Power | AXP2101, INA226, backlight policy | Battery, charge state, rail current/power, availability reasons | Read-only status |
| Touch and motion sensors | FT6336, SI12T, BMI270, BMM150, LTR553 | Touch state, accel/gyro, fused attitude, magnetometer, ALS/proximity | Read-only status |
| Camera and vision | GC0308 raw stream, face-position processed stream | FPS, frame interval, latency, JPEG size, face target | Stream on/off, capture, FPS selection |
| Actuators | SCS servos, RGB LED, IO expander | Servo power, RGB state, expander availability | Move head, set RGB color/brightness |
| Audio and time | ES7210, AW88298, RTC | Mic availability, codec status, RTC time | TTS/say through MCP |
| Network | Wi-Fi, BLE provisioning, mDNS | Link state, SSID/IP, RSSI, reconnect counters | Provisioning and runtime network commands through services |
| Applications | Codex announcer/light alert, hardware expression control, face-position tracking | App state, enabled flags, expression capability, tracking target and latency | Send expression presets/avatar JSON, toggle tracking, adjust FPS, companion mode commands |
| Debug | System, raw snapshot, logs | Heap, counters, command ACKs, public JSON, daemon logs | Read-only diagnostics |
npm install
cp .env.example .envEdit .env before using real hardware. At minimum, change:
STACKCHAN_PAIRING_TOKEN=dev-local-tokennpm run devDefault endpoints:
- Firmware WebSocket:
ws://<mac-ip>:8787/stackchan/local - WebUI:
http://localhost:8788 - mDNS service:
_stackchan-local._tcp
npm run vision:install
STACKCHAN_FACE_TRACKING=1 npm run devFace tracking uses the local Python OpenCV YuNet sidecar and fixed 320 x 240 camera input. The WebUI exposes stream options and center-point PID controls.
Use ESP-IDF 5.5.4 for the current firmware tree:
source ~/esp/esp-idf-v5.5.4/export.sh
npm run firmware:build
npm run firmware:check-local-only
npm run firmware:flashEquivalent raw ESP-IDF commands:
cd firmware
idf.py set-target esp32s3
idf.py build
idf.py -p /dev/cu.usbmodem21301 flash monitorIf the device has no saved Wi-Fi credentials, it starts a StackChan-XXXX provisioning AP. Connect to it and open http://192.168.4.1.
Common settings live in .env.example.
Important defaults:
STACKCHAN_LOCAL_PORT=8787STACKCHAN_PREVIEW_PORT=8788STACKCHAN_PAIRING_TOKEN=dev-local-tokenSTACKCHAN_FACE_TRACKING=0STACKCHAN_FACE_TRACKING_CAMERA_PRESET=fastSTACKCHAN_FACE_TRACKING_SPEED=420STACKCHAN_FACE_TRACKING_DEADBAND=0.045STACKCHAN_FACE_TRACKING_TRACE_LOG=logs/face-tracking.ndjsonSTACKCHAN_CODEX_STATUS=1STACKCHAN_VOLCENGINE_TTS_ENABLED=0
Do not commit real pairing tokens or provider API keys.
Run MCP mode with:
npm run mcpAvailable tools:
stackchan_statusstackchan_saystackchan_reactstackchan_move_headstackchan_play_animationstackchan_capture_imagestackchan_set_modestackchan_face_tracking
| Goal | Command |
|---|---|
| Install workspace dependencies | npm install |
| Start desktop daemon and WebUI | npm run dev |
| Run MCP server mode | npm run mcp |
| Type-check all TypeScript packages | npm run typecheck |
| Run protocol and desktop tests | npm test |
| Build preview UI and desktop TypeScript | npm run build |
| Install local vision dependencies | npm run vision:install |
| Build firmware | source ~/esp/esp-idf-v5.5.4/export.sh && npm run firmware:build |
| Flash firmware | source ~/esp/esp-idf-v5.5.4/export.sh && npm run firmware:flash |
| Check firmware local-only boundaries | npm run firmware:check-local-only |
Desktop and protocol:
npm run typecheck
npm test
npm run checkTargeted checks:
npm test -w @stackchan-local/protocol
npm test -w @stackchan-local/desktop
npm run typecheck -w @stackchan-local/desktopFirmware:
source ~/esp/esp-idf-v5.5.4/export.sh
npm run firmware:build
npm run firmware:check-local-onlyHardware acceptance after flashing:
http://localhost:8788/,/status,/debug/snapshot, and/debug/logsreturn 200.- Device shows online in the WebUI.
- Module pages update for PMIC/battery, INA226 power monitor, touch, head touch, IMU with magnetometer, RTC, mic, camera, RGB/io expander, servos, NFC, IR, and LTR553.
- Present hardware reports valid non-NaN values and reacts to touch, motion, light, sound, and camera stimuli.
- Missing or unsupported modules report
available:falsewith a clear reason. /frame.jpgor/stream.mjpgreturns a valid JPEG stream when camera streaming is enabled.- Firmware serial logs and desktop logs have no unexplained
ERROR, no persistentWARNspam, no reconnect loop, and no repeated sensor init timeout.
- Camera frames stay on the LAN between the hardware and desktop daemon.
- Face tracking is local position detection only, not identity recognition.
- Cloud TTS is optional and disabled by default.
- Pairing tokens and API keys belong in
.env, not in Git.
This is an experimental local hardware/software project for macOS plus M5Stack StackChan on ESP32-S3. The current focus is fast desktop-side validation of hardware sensors and modules, stable local control, simple companion applications, and a clean firmware layering model. Cross-platform desktop packaging and production-grade firmware release flow still need hardening.
MIT for StackChan Local project code unless a subdirectory or managed dependency states otherwise.

