HandWave is a real-time American Sign Language (ASL) recognition web application. It uses MediaPipe for hand landmark extraction, a scikit-learn RandomForest classifier for gesture classification, and a Flask backend to serve predictions to a browser-based frontend. The full pipeline — from data collection to model training to live inference — is self-contained and Dockerized.
.
├── .github
│ └── workflows
│ └── ci.yml
├── Dockerfile
├── LICENSE
├── README.md
├── collecting_data.py
├── data.pickle
├── label_map.pickle
├── model.p
├── model_test.py
├── model_train.py
├── processing_data.py
├── pytest.ini
├── requirements.txt
├── src
│ └── handwave_asl.gif
├── tests
│ ├── conftest.py
│ └── test_model_loading.py
└── webapp
├── app.py
├── asl_model.py
├── static
│ ├── css
│ │ └── style.css
│ └── js
│ └── app.js
└── templates
└── index.html
| Layer | Technology |
|---|---|
| Hand landmark detection | MediaPipe Hands (Python + JS) |
| Feature extraction | OpenCV, MediaPipe (processing_data.py) |
| ML model | scikit-learn RandomForestClassifier |
| Backend | Flask + Gunicorn |
| Frontend | Vanilla JS, HTML5 Canvas, MediaPipe JS SDK |
| Containerization | Docker (python:3.10-slim) |
| CI/CD | GitHub Actions (lint → test → Docker Hub push) |
- Real-time webcam feed with hand landmark overlay (MediaPipe JS)
- Server-side ASL letter prediction via
/predictPOST endpoint - Word builder with Manual and Hold input modes
- Sentence history with per-word copy and copy-all support
- Keyboard shortcuts (
A,S,Backspace,Enter,Esc) - Dockerized — single image, no local setup required
- CI/CD pipeline: lint (flake8) → test (pytest) → push to Docker Hub on
main
graph LR
subgraph Browser["Browser (Client)"]
A[User webcam] --> B[MediaPipe Hands JS]
B --> C[Hand landmark overlay\non Canvas]
B --> D[Capture JPEG frame]
D --> E[POST /predict\nbase64 image]
E --> P[Word Builder UI]
end
subgraph Server["Flask Server"]
F[app.py /predict] --> G[base64 decode\nPIL Image]
G --> H[asl_model.py\nASLModel.predict]
H --> I[processing_data.py\npreprocess → 42-float vector]
I --> J[RandomForestClassifier\nmodel.predict]
J --> K[idx_to_class lookup]
K --> L[JSON response\nprediction label]
end
E --> F
L --> P
sequenceDiagram
participant U as User
participant JS as Browser JS
participant MP as MediaPipe JS
participant Flask as Flask /predict
participant ASL as ASLModel
participant RF as RandomForest
U->>JS: Start Camera
JS->>MP: Send video frame (onFrame)
MP-->>JS: onResults (landmarks + multiHandLandmarks)
JS->>JS: Draw landmarks on Canvas (mirrored)
JS->>Flask: POST /predict {image: base64 JPEG}
Flask->>Flask: base64 decode → PIL Image
Flask->>ASL: model.predict(pil_image)
ASL->>ASL: preprocess() → 42-float vector
ASL->>RF: model.predict(X)
RF-->>ASL: pred_idx
ASL-->>Flask: idx_to_class[pred_idx]
Flask-->>JS: {"prediction": "A"}
JS->>JS: Push to predictionBuffer (size 5)
JS->>JS: getStablePrediction() → majority vote
JS-->>U: Display stable letter on canvas + UI
flowchart TD
A["collecting_data.py<br/>Webcam → JPEG images per label"] --> B["processing_data.py<br/>MediaPipe → 42-float vectors<br/>data.pickle + label_map.pickle"]
B --> C["model_train.py<br/>RandomForestClassifier<br/>n_estimators=100"]
C --> D["model.p + label_map.pickle<br/>model serialized separately from label map"]
D --> E["webapp/asl_model.py<br/>ASLModel wraps classifier"]
E --> F["POST /predict endpoint<br/>real-time inference"]
Feature vector: 21 hand landmarks × 2 (x, y) = 42 floats, normalized by subtracting the minimum x and y of the detected hand so the vector is position-invariant.
Artifact formats:
model_train.pysavesmodel.pas{"model": ..., "label_map": ...}(both bundled).webapp/asl_model.py'sload_model()loads the classifier frommodel.pand the label map from the separatelabel_map.picklefile. Any label map embedded insidemodel.pis ignored by the web app.model_test.py'sload_model()first tries the label map embedded inmodel.p, then falls back to a standalonelabel_map.pickleif none is found there.
Label map format: label_map.pickle stores {class_name: index} (e.g. {"A": 0, "B": 1, ...}). ASLModel inverts this to idx_to_class for decoding predictions.
flowchart LR
PR[Push / PR to main] --> L[flake8 lint]
L --> T["pytest<br/>test_model_loading.py"]
T --> B{Branch = main?}
B -- Yes --> D["docker/build-push-action<br/>asap2016asap/handwave-app:latest"]
B -- No --> Skip[Skip Docker push]
Secrets required in GitHub repository settings:
| Secret | Purpose |
|---|---|
DOCKER_USERNAME |
Docker Hub username |
DOCKER_PASSWORD |
Docker Hub password / access token |
git clone https://github.com/AnupamKumar-1/HandWave.git
cd HandWave
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtpython collecting_data.py
# Enter labels when prompted, e.g.: A,B,C,space,del
# Press Q to start recording each label (100 images per label)Images are saved to ./data/<label>/.
python processing_data.py
# Outputs: data.pickle, label_map.picklepython model_train.py
# Outputs: model.p (contains {"model": ...} with the RandomForest classifier)
# label_map.pickle is read as input and must already exist
# Displays confusion matrix and classification reportpython model_test.py
# Press Esc to quitcd webapp
python app.py
# Listening on http://localhost:5000docker build -t handwave .Note:
model.pandlabel_map.picklemust exist at the project root before building the image, as they are copied in viaCOPY . ..
docker run -e PORT=5000 -p 5000:5000 handwaveOpen http://localhost:5000.
The container uses gunicorn as the WSGI server (webapp.app:app) and requires the PORT environment variable to be set — there is no default inside the container since gunicorn bypasses the app.py __main__ block entirely.
docker run -e PORT=8080 -p 8080:8080 handwaveServes index.html (the main UI).
Request body (JSON):
{ "image": "data:image/jpeg;base64,<base64-encoded frame>" }Response (JSON):
{ "prediction": "A" }On error:
{ "error": "<error message>" }The endpoint strips the data:image/...;base64, prefix, decodes to a PIL image, runs ASLModel.predict() (MediaPipe preprocessing → RandomForest), and returns the predicted ASL letter.
pytest --maxfail=1 --disable-warnings -qTests live in tests/test_model_loading.py and cover:
load_label_maploads and returns the correct dict from a pickle fileload_modelreturns anASLModelinstance with amodelattribute- Model file existence check
conftest.py adds the project root to sys.path so webapp.* imports resolve correctly.
| Key | Action |
|---|---|
A |
Add current detected letter to word |
S |
Add space |
Backspace |
Undo last character |
Enter |
Commit word to sentence history |
Esc |
Clear current word |
This project is licensed under the MIT License.
Feel free to open issues or submit pull requests!
