Hand gestures from your webcam → MIDI CC on a virtual port, aimed at macOS Apple Silicon (M1+) and Logic Pro X (or any DAW) with low in-app latency and on-screen telemetry.
Phase 1 includes: OpenCV capture (latest-frame draining), MediaPipe Hands, left/right + up/down on two CCs, “openness” on a third CC with hysteresis, rolling p50/p95 latency stats, and a debug overlay.
- macOS (primary target; virtual MIDI uses CoreMIDI).
- Python 3.10+ recommended (Apple Silicon homebrew Python is ideal). Python 3.8+ is supported for the package metadata, but use a current Python for best wheel support and latency work.
cd air-controller
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip setuptools wheel
pip install -e .
air-controllerPress q in the preview window to quit.
- Virtual MIDI port name defaults to
Air Controller(override insrc/air_controller/config/default.yamlor a custom YAML via--config). - Camera index defaults to
0in config. Override at launch with--camera 1, or switch while running:[/]cycles through indices 0–9 that actually open;0–9jumps to that index if it opens. - List devices:
air-controller --list-camerasprints which indices OpenCV can open (Continuity Camera / built-in order varies).
The OpenCV window title is Air Controller. If the camera LED is on but you see no window:
- Use Cmd+` (cycle app windows) or Mission Control to find the Python process window (it may open behind Cursor or the terminal).
- Run from Terminal.app (or iTerm) instead of an embedded IDE terminal if the window still never appears.
- Install
opencv-python(default in this repo), notopencv-python-headlessalone: the headless wheel disables HighGUI on many platforms, soimshowwill not open a window. - On first launch the app primes the window (
imshow+waitKeybefore resize) for macOS Cocoa, opens HighGUI before MediaPipe init, and tries to bring Python to the front via AppleScript. If macOS asks to control System Events, allow it once so the preview can surface. - The app always calls
waitKeyeach frame (even when the camera has not delivered a frame yet), so the GUI should stay responsive.
Many setups use index 0 for iPhone (Continuity Camera) and 1 for the Mac built-in camera—confirm with --list-cameras. Set the default in YAML or use --camera / runtime keys as above.
- Launch
air-controllerfirst so the virtual port exists, then open Logic. - Create a software instrument track (or MIDI-controlled track).
- In the track header / inspector, set MIDI input to
Air Controller(or your configured port name). - Map CC 16 / 17 / 18 (defaults) in your instrument or use Smart Controls / Controller Assignments as you prefer.
Analog Lab / Arturia (and similar): each preset maps CC numbers to parameters however the sound designer chose—CC16 is not inherently “time.” Watch the overlay’s orange / green / purple rows and move one hand axis at a time, then use MIDI learn in Analog Lab (or Logic’s MIDI assignment tools) to route each CC to the knob you want. Logic’s MIDI monitor helps confirm the stream matches the bars.
- The Python MediaPipe Hands stack used here runs inference on the CPU. There is no Metal / CoreML / GPU delegate exposed for this
mediapipesolutions path on macOS in v1. - Frame rate is dominated by hand model cost and camera resolution. Lower
camera.max_frame_sidein YAML (e.g.320) for higher FPS; raise it if you need more precision at the frame edge. tracking.model_complexity:0(default) is fastest;1is heavier.- The preview shows FPS and cap→infer ms so you can see the effect of changing YAML.
In-app ms (overlay) is only part of the story. For playing through plugins:
- Reduce the I/O buffer size (e.g. 64 or 128 samples) when your CPU allows it.
- Enable Low Latency Mode while tracking if you use latency-inducing plugins.
- Prefer instruments/plugins without large lookahead or disable oversampling while testing.
Re-baseline capture→MIDI p95 from the overlay after changing buffer size; document your typical numbers in ROADMAP.md Notes once stable.
Example baseline (fill in on your machine): on first ship, record something like: capture→MIDI p95 ≈ ___ ms @ 384px long side, M-series, Logic buffer 128 samples.
Copy and edit defaults:
cp src/air_controller/config/default.yaml my.yaml
air-controller --config my.yaml --camera 0Key fields:
| Key | Meaning |
|---|---|
camera.probe_max_index |
Highest index used by [/] and --list-cameras (default 1 = only 0 and 1; raise if you have more capture devices). |
camera.max_frame_side |
Longest edge before inference (smaller → higher FPS; default 384). Try 320 if needed. |
tracking.model_complexity |
MediaPipe Hands 0 (fast) or 1 (heavier). |
midi.port_name |
Virtual port visible in Logic. |
midi.cc_x / cc_y / cc_open |
CC numbers (0–127). |
gestures.open_score_* |
Hysteresis thresholds on normalized openness (tune if open/closed is finicky). |
gestures.smoothing_alpha |
EMA on axes/openness when smoothing_preset is custom. |
gestures.smoothing_preset |
responsive (faster EMA, ~more jitter) or steady (smoother, ~more lag) or custom (use smoothing_alpha). |
mapping.mode |
absolute (whole frame → MIDI) or relative (hand vs n-key neutral, scaled by relative_span). |
mapping.invert_x / invert_y |
Flip axes before MIDI (defaults match Phase 1 L/R and up/down intent). |
mapping.contrast |
>1 exaggerates motion around center; <1 compresses. |
mapping.output_dead_zone |
Snap to MIDI center when mapped value sits near middle. |
midi.min_interval_ms |
Per-CC send spacing (0 = off). |
calibration.* |
Usually written by profile / o; load with --profile. |
--profile path.yamlmerges a file on top of--config(or bundled defaults). Use it for per-instrument or per-camera setups (calibration + mapping + gesture hints).- In the preview:
ncaptures the current smoothed wrist position as neutral (shown as “neutral captured”). Pair withmapping.mode: relativeso MIDI is centered when your hand is in that pose. owritescalibration,mapping, and agesturessnippet to--dump-profileif set, else./air-controller-profile.yaml. Reload later with--profilethat file.--log-csv out.csvappends coarse latency rows about every 15 frames (cap_to_midip50/p95 and FPS).
Smoothing presets (rule-of-thumb lag): the overlay shows ~EMA lag ≈ ((1-\alpha)/\alpha) × frame period. At 60 FPS, responsive (α≈0.52) is on the order of ~15 ms, steady (α≈0.18) ~75 ms—heuristic only; measure in your DAW if it matters.
Auto-exposure can confuse landmark stability. In Photo Booth or System Settings → Camera, avoid extreme backlight; for repeatable tests, prefer consistent lighting and a plain background. (Future: lock exposure via AVFoundation if needed.)
- Left / right, up / down: smoothed wrist position → CC X and CC Y. CC X is inverted: left side of the image → higher MIDI, right → lower (so mirror/selfie framing often feels like “CW vs CCW” on a knob). CC Y uses top of frame → higher MIDI (same as before).
- Open vs closed: from mean fingertip–wrist spread with hysteresis; CC open carries a smoothed 0–127 openness value.
When no hand is detected, MIDI is not spammed; smoothing state resets when the hand disappears.
pip install -e .
python -m air_controller.main --helpSee ROADMAP.md for phased delivery and agent workflow.
Specify your license here (not set in Phase 1).