Pitch your idea, see the picture.
Web app that turns a spoken brainstorm into a visual diagram. Record yourself narrating an idea → audio is transcribed → Claude picks a diagram type and writes Mermaid → diagram is rendered on a shareable page.
When one person describes an idea verbally to a team, others often have trouble visualizing it. Pitch Picture gives the speaker — and their team — a visual artifact instead of a 15-minute audio file.
| Layer | Tech |
|---|---|
| Frontend | React 19 · TypeScript · Vite · React Router (HashRouter) |
| Backend | Node · Express · TypeScript |
| Auth + DB + Storage | Supabase (Postgres, private audio-recordings bucket) |
| Speech-to-text | Deepgram Nova-3 |
| LLM | Anthropic claude-sonnet-4-5 |
| Diagram render | mermaid (client-side) |
| Package manager | pnpm (each package has its own lockfile) |
PitchPicture/
├── react/ # Vite frontend
│ └── src/
│ ├── pages/ # Home, Recording, Processing, Result, Share, History
│ ├── components/ # Recorder, DiagramView
│ └── lib/ # auth, api, supabase, types
├── api/ # Express backend
│ └── src/
│ ├── routes/ # sessions.ts, share.ts
│ ├── services/ # claude.ts, deepgram.ts, mermaid.ts, supabase.ts
│ ├── middleware/ # auth.ts (Supabase JWT)
│ ├── lib/types.ts # API contract (source of truth)
│ ├── pipeline.ts # transcribe → analyze orchestration
│ ├── app.ts # Express setup + CORS
│ └── index.ts # port listen
├── supabase/migrations/ # schema, RLS, storage bucket
└── scripts/ # test-analysis.ts, test-pipeline.ts
You'll need accounts for Supabase, Anthropic, and Deepgram. All three have free tiers that cover development.
Each of root, react/, and api/ has its own package.json:
pnpm install # root devDeps (used by scripts/)
pnpm -C react install # frontend deps
pnpm -C api install # backend deps- Create a new project at supabase.com
- Open SQL Editor → paste
supabase/migrations/20260512_init.sql→ Run - Verify: Table Editor shows
sessionstable, Storage showsaudio-recordingsbucket - Authentication → Users → Add user, create a test user with Auto Confirm User checked
Copy the example env files and fill them in:
cp api/.env.example api/.env
cp react/.env.example react/.envapi/.env (backend secrets — never commit):
ANTHROPIC_API_KEY— from console.anthropic.comDEEPGRAM_API_KEY— from console.deepgram.com ($200 free credits)SUPABASE_URL— Supabase Project Settings → APISUPABASE_SERVICE_ROLE_KEY— same page (⚠️ bypasses RLS, backend only)
react/.env (browser-safe):
VITE_API_URL=http://localhost:3001VITE_SUPABASE_URL— same as backendVITE_SUPABASE_ANON_KEY— Supabase Project Settings → API (⚠️ NOT the service-role key)
In two terminals:
pnpm dev:web # terminal 1 — Vite on localhost:5173
pnpm dev:api # terminal 2 — Express on localhost:3001Open the browser, sign in with your test user, hit "Start recording".
pnpm dev:web # frontend only (Vite on 5173)
pnpm dev:api # backend only (Express on 3001)
pnpm build # build both
pnpm test:analysis scripts/transcripts/<file>.txt # transcript → Claude
pnpm test:pipeline <path-to-audio>.{webm,mp3,m4a,wav,ogg,flac} # audio → ClaudeThe test scripts load api/.env automatically via tsx --env-file.
mic → MediaRecorder (webm/opus @ 64kbps)
→ POST /api/sessions/:id/audio
→ Supabase Storage (private, scoped per user)
→ Deepgram Nova-3 → transcript
→ Claude Sonnet 4.5 → Mermaid + summary + key concepts
→ mermaid.render() in browser → SVG diagram
The backend returns immediately after upload; processSession(id) runs in the background. The frontend polls GET /api/sessions/:id every 2s until status flips to ready.
While recording, the browser's SpeechRecognition API shows live captions for the speaker's benefit. These are display-only — the canonical transcript still comes from Deepgram after upload, so the two can differ. Captions work in Chromium browsers and Safari; Firefox simply hides the caption box.
You can pause and resume mid-recording to gather your thoughts — paused time doesn't count toward the 30-minute cap. After stopping, you get to listen back and either use the take or discard it and start over; nothing is uploaded (and no session is created) until you confirm.
From a finished diagram you can refine it: record a short follow-up ("add a step between X and Y") and Claude updates the existing diagram instead of starting over. A failed refine is non-destructive — the original diagram stays intact.
| Method | Path | Auth | Purpose |
|---|---|---|---|
POST |
/api/sessions |
✅ | Create session, returns id + uploading |
POST |
/api/sessions/:id/audio |
✅ | Upload audio (multipart, field audio), kicks off pipeline |
POST |
/api/sessions/:id/retry |
✅ | Re-run pipeline against existing audio (status failed) |
POST |
/api/sessions/:id/refine |
✅ | Refine the diagram with a follow-up recording (status ready) |
GET |
/api/sessions/:id |
✅ | Full session — frontend polls every 2s |
GET |
/api/sessions |
✅ | List current user's sessions |
DELETE |
/api/sessions/:id |
✅ | Delete row + audio file |
GET |
/api/share/:token |
❌ | Public read-only view |
All authenticated routes need Authorization: Bearer <supabase_jwt>.
- Single speaker, no diarization
- Post-session processing (no streaming)
- Web only
- Email/password auth (Google OAuth coming)
- Recordings capped at 30 min
Out of scope for v1: multi-speaker, real-time, diagram editing, team workspaces, integrations, mobile, PNG export.
Private — not yet licensed.