An AI-powered, hands-free study assistant built for students with physical disabilities. LearnMate runs as a Chrome extension overlay on any webpage — listening to your voice, watching your eye gestures, and responding with spoken explanations, concept diagrams, and lip-synced avatar teaching videos. No keyboard required.
Status: In Production · actively being built
Students with motor impairments — limited hand mobility, cerebral palsy, muscular dystrophy — are underserved by every existing edtech tool. Screen readers address content consumption, not comprehension. LearnMate addresses the gap: a student who can't type should still be able to ask a question, get an explanation, and watch it taught to them — entirely hands-free, on any page they're already reading.
- Voice control — say "Hey LearnMate, explain binary search trees" and the panel responds
- Eye gesture control — wink left eye (2s) to trigger an explanation; wink right eye (2s) to generate a video
- AI explanations — Groq LLM (llama-3.1-8b-instant) generates clear, beginner-friendly explanations
- Diagram generation — Mermaid concept maps + Wikipedia images combined into a visual slideshow
- Lip-synced avatar video — Wav2Lip animates a face speaking the explanation, streamed directly into the browser panel
- Works on any webpage — Chrome extension injects the overlay without disrupting the page
Chrome Extension (overlay.js)
↓ HTTPS fetch
Flask Backend (app.py)
├── Groq LLM → explanation / summary text
├── gTTS → text-to-speech MP3
├── Mermaid (mmdc) → concept diagram PNG
├── Wikipedia API → topic images
├── FFmpeg → composite slideshow + avatar
└── Wav2Lip → lip-sync MP3 to avatar face → MP4
↓
Video streamed back → played inline in extension panel
Voice (wake_listener.py) → /command → overlay_state
Gesture (eye_blink_cursor.py) → /command → overlay_state
↓ polled every 1.5s by extension
Panel reacts: explain / video / minimize / navigate
- Python 3.10+
- Node.js (for
mmdc— Mermaid CLI) - FFmpeg on PATH
- A Groq API key
- Chrome browser
git clone https://github.com/s3ak6i-dev/LearnMate.git
cd LearnMate
pip install -r requirements.txt
npm install -g @mermaid-js/mermaid-cliInstall Wav2Lip dependencies:
pip install mediapipe face-alignmentReplace the Groq API key in app.py:
client = Groq(api_key="YOUR_GROQ_API_KEY", ...)Add your avatar image as avatar.jpg in the project root.
Start everything at once:
python launch.pyOptions:
python launch.py --no-gesture # voice only (no webcam)
python launch.py --no-voice # gesture only (no microphone)First-time HTTPS trust (required once):
- Open Chrome → go to
https://127.0.0.1:5000 - Click Advanced → Proceed to 127.0.0.1 (unsafe)
- Done — the extension will now work on all HTTPS sites
- Open
chrome://extensions - Enable Developer mode
- Click Load unpacked → select the
learnmate_extension/folder - The LearnMate panel will appear on every webpage
| Say | Action |
|---|---|
| "Hey LearnMate" | Open / expand the panel |
| "LearnMate leave" | Minimize the panel |
| "Explain binary search trees" | Get an AI explanation |
| "Summarize this" | Get a short bullet-point summary |
| "Generate a video for this" | Start Wav2Lip video generation |
| "Scroll down" | Scroll the page |
| "Go back" / "Go forward" | Browser navigation |
| Gesture | Action |
|---|---|
| Left eye wink, hold 2s | Explain current topic |
| Right eye wink, hold 2s | Generate teaching video |
| Mouth open | Scroll (wider = faster) |
| Both eyes blink | Left click |
| Right eye blink | Right click |
| Left eye hold | Click and drag |
LearnMate/
├── app.py # Flask backend — all AI pipelines
├── launch.py # Single launcher for all components
├── voice_input.py # Always-on voice command loop
├── wake_listener.py # Wake-word ("Hey LearnMate") listener
├── eye_blink_cursor.py # Gesture + iris cursor control
├── avatar.jpg # Avatar face used by Wav2Lip
├── requirements.txt
├── learnmate_extension/
│ ├── manifest.json
│ ├── overlay.js # Extension UI logic + Flask polling
│ └── overlay.css # Extension panel styles
└── Wav2Lip/ # Wav2Lip model + inference
| Layer | Technology |
|---|---|
| Browser extension | Chrome MV3, Vanilla JS |
| Backend | Python, Flask |
| LLM | Groq API (llama-3.1-8b-instant) |
| Text-to-speech | gTTS |
| Lip-sync | Wav2Lip |
| Diagram generation | Mermaid CLI (mmdc) |
| Image enrichment | Wikipedia API |
| Video compositing | FFmpeg, imageio |
| Gesture tracking | MediaPipe Face Mesh, OpenCV |
| Voice recognition | SpeechRecognition (Google Web Speech API) |
- Calibration UI — first-time setup to tune gesture thresholds per user
- Conversational follow-up — "explain it simpler", "give me an example" in context
- Institutional pilots — disability support centres at universities
Built over 12 weeks by a team of engineers focused on accessible edtech.
MIT