Skip to content

aselekoglu/air-controller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Air Controller

Hand gestures from your webcam → MIDI CC on a virtual port, aimed at macOS Apple Silicon (M1+) and Logic Pro X (or any DAW) with low in-app latency and on-screen telemetry.

Phase 1 includes: OpenCV capture (latest-frame draining), MediaPipe Hands, left/right + up/down on two CCs, “openness” on a third CC with hysteresis, rolling p50/p95 latency stats, and a debug overlay.

Requirements

  • macOS (primary target; virtual MIDI uses CoreMIDI).
  • Python 3.10+ recommended (Apple Silicon homebrew Python is ideal). Python 3.8+ is supported for the package metadata, but use a current Python for best wheel support and latency work.

Quick start

cd air-controller
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip setuptools wheel
pip install -e .
air-controller

Press q in the preview window to quit.

  • Virtual MIDI port name defaults to Air Controller (override in src/air_controller/config/default.yaml or a custom YAML via --config).
  • Camera index defaults to 0 in config. Override at launch with --camera 1, or switch while running: [ / ] cycles through indices 0–9 that actually open; 09 jumps to that index if it opens.
  • List devices: air-controller --list-cameras prints which indices OpenCV can open (Continuity Camera / built-in order varies).

Preview window not visible (macOS)

The OpenCV window title is Air Controller. If the camera LED is on but you see no window:

  • Use Cmd+` (cycle app windows) or Mission Control to find the Python process window (it may open behind Cursor or the terminal).
  • Run from Terminal.app (or iTerm) instead of an embedded IDE terminal if the window still never appears.
  • Install opencv-python (default in this repo), not opencv-python-headless alone: the headless wheel disables HighGUI on many platforms, so imshow will not open a window.
  • On first launch the app primes the window (imshow + waitKey before resize) for macOS Cocoa, opens HighGUI before MediaPipe init, and tries to bring Python to the front via AppleScript. If macOS asks to control System Events, allow it once so the preview can surface.
  • The app always calls waitKey each frame (even when the camera has not delivered a frame yet), so the GUI should stay responsive.

iPhone vs Mac camera

Many setups use index 0 for iPhone (Continuity Camera) and 1 for the Mac built-in camera—confirm with --list-cameras. Set the default in YAML or use --camera / runtime keys as above.

Logic Pro X routing

  1. Launch air-controller first so the virtual port exists, then open Logic.
  2. Create a software instrument track (or MIDI-controlled track).
  3. In the track header / inspector, set MIDI input to Air Controller (or your configured port name).
  4. Map CC 16 / 17 / 18 (defaults) in your instrument or use Smart Controls / Controller Assignments as you prefer.

Analog Lab / Arturia (and similar): each preset maps CC numbers to parameters however the sound designer chose—CC16 is not inherently “time.” Watch the overlay’s orange / green / purple rows and move one hand axis at a time, then use MIDI learn in Analog Lab (or Logic’s MIDI assignment tools) to route each CC to the knob you want. Logic’s MIDI monitor helps confirm the stream matches the bars.

FPS and performance (MediaPipe on CPU)

  • The Python MediaPipe Hands stack used here runs inference on the CPU. There is no Metal / CoreML / GPU delegate exposed for this mediapipe solutions path on macOS in v1.
  • Frame rate is dominated by hand model cost and camera resolution. Lower camera.max_frame_side in YAML (e.g. 320) for higher FPS; raise it if you need more precision at the frame edge.
  • tracking.model_complexity: 0 (default) is fastest; 1 is heavier.
  • The preview shows FPS and cap→infer ms so you can see the effect of changing YAML.

Low latency in Logic (heard latency)

In-app ms (overlay) is only part of the story. For playing through plugins:

  • Reduce the I/O buffer size (e.g. 64 or 128 samples) when your CPU allows it.
  • Enable Low Latency Mode while tracking if you use latency-inducing plugins.
  • Prefer instruments/plugins without large lookahead or disable oversampling while testing.

Re-baseline capture→MIDI p95 from the overlay after changing buffer size; document your typical numbers in ROADMAP.md Notes once stable.

Example baseline (fill in on your machine): on first ship, record something like: capture→MIDI p95 ≈ ___ ms @ 384px long side, M-series, Logic buffer 128 samples.

Configuration

Copy and edit defaults:

cp src/air_controller/config/default.yaml my.yaml
air-controller --config my.yaml --camera 0

Key fields:

Key Meaning
camera.probe_max_index Highest index used by [/] and --list-cameras (default 1 = only 0 and 1; raise if you have more capture devices).
camera.max_frame_side Longest edge before inference (smaller → higher FPS; default 384). Try 320 if needed.
tracking.model_complexity MediaPipe Hands 0 (fast) or 1 (heavier).
midi.port_name Virtual port visible in Logic.
midi.cc_x / cc_y / cc_open CC numbers (0–127).
gestures.open_score_* Hysteresis thresholds on normalized openness (tune if open/closed is finicky).
gestures.smoothing_alpha EMA on axes/openness when smoothing_preset is custom.
gestures.smoothing_preset responsive (faster EMA, ~more jitter) or steady (smoother, ~more lag) or custom (use smoothing_alpha).
mapping.mode absolute (whole frame → MIDI) or relative (hand vs n-key neutral, scaled by relative_span).
mapping.invert_x / invert_y Flip axes before MIDI (defaults match Phase 1 L/R and up/down intent).
mapping.contrast >1 exaggerates motion around center; <1 compresses.
mapping.output_dead_zone Snap to MIDI center when mapped value sits near middle.
midi.min_interval_ms Per-CC send spacing (0 = off).
calibration.* Usually written by profile / o; load with --profile.

Phase 2 — profiles, calibration, mapping

  • --profile path.yaml merges a file on top of --config (or bundled defaults). Use it for per-instrument or per-camera setups (calibration + mapping + gesture hints).
  • In the preview: n captures the current smoothed wrist position as neutral (shown as “neutral captured”). Pair with mapping.mode: relative so MIDI is centered when your hand is in that pose.
  • o writes calibration, mapping, and a gestures snippet to --dump-profile if set, else ./air-controller-profile.yaml. Reload later with --profile that file.
  • --log-csv out.csv appends coarse latency rows about every 15 frames (cap_to_midi p50/p95 and FPS).

Smoothing presets (rule-of-thumb lag): the overlay shows ~EMA lag ≈ ((1-\alpha)/\alpha) × frame period. At 60 FPS, responsive (α≈0.52) is on the order of ~15 ms, steady (α≈0.18) ~75 ms—heuristic only; measure in your DAW if it matters.

Camera exposure (stability)

Auto-exposure can confuse landmark stability. In Photo Booth or System Settings → Camera, avoid extreme backlight; for repeatable tests, prefer consistent lighting and a plain background. (Future: lock exposure via AVFoundation if needed.)

Gestures (Phase 1)

  • Left / right, up / down: smoothed wrist position → CC X and CC Y. CC X is inverted: left side of the image → higher MIDI, right → lower (so mirror/selfie framing often feels like “CW vs CCW” on a knob). CC Y uses top of frame → higher MIDI (same as before).
  • Open vs closed: from mean fingertip–wrist spread with hysteresis; CC open carries a smoothed 0–127 openness value.

When no hand is detected, MIDI is not spammed; smoothing state resets when the hand disappears.

Development

pip install -e .
python -m air_controller.main --help

See ROADMAP.md for phased delivery and agent workflow.

License

Specify your license here (not set in Phase 1).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages