A small experimental computer vision workspace for gesture/object detection and data collection. Don't ask why it's called six-seven. (Disclaimer: This README is AI-assisted)
Three primary scripts live in this repo:
src/capture_training_data.py: A fast image capture utility for labeled gesture datasets with continuous mode, single-shot capture, adjustable FPS, and per-gesture folders.src/detect.py: Real-time YOLOv8 + DeepFace emotion analysis on webcam frames, with emotion confidence overlays.src/infer.py: Runs a Roboflow workflow (InferencePipeline) for detection + classification, annotates frames with boxes/labels, and can display per-class reference images.
- Lightweight YOLOv8 model (
yolov8n.pt) bundled for quick prototyping. - Keyboard-driven gesture dataset capture (
g,c, space,+/-,q). - Live DeepFace emotion percentage overlays.
- Roboflow workflow integration for composed inference.
- Reference image caching to cut I/O overhead.
Create and activate a virtual environment, then install dependencies:
pip install ultralytics deepface python-dotenv supervision networkxIf you use Roboflow workflows, set your API key in a .env file at project root:
ROBOFLOW_API_KEY="YOUR_KEY"
Capture gesture training data:
python src/capture_training_data.pyRun YOLO + emotion detection:
python src/detect.pyRun Roboflow workflow inference:
python src/infer.pyExit any live window with q.
Captured gesture images are written to public/training-data/<gesture_name>/ with timestamped filenames. Reference images for display should live in public/reference-images/ and be named by class (e.g., thumbs-up.png).
- Reduce frame resolution before inference for higher FPS.
- Use
FRAME_SKIPininfer.py(if present) to process every Nth frame. - Cache reference images (already implemented) to avoid repeated disk reads.
- Throttle expensive emotion analysis (every N frames) if adding to other scripts.
- Log predictions to CSV / JSON.
- Integrate a model training pipeline for collected gestures.
- Optional GPU acceleration / ONNX export.