Skip to content

imcuttle/flipbook-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

102 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎨 Flipbook Canvas

English Β· δΈ­ζ–‡

Node React Vite Express TypeScript SQLite Multimodal PRs Welcome GitHub stars

Browse fully-interactive, exported flipbooks right in your browser β€” click hotspots to drill in, no install needed.

✨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge β€” one click at a time.

πŸ’‘ Inspired by and a re-implementation of the product idea behind flipbook.page β€” credit to the original team for the click-to-explore canvas concept.

A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.


✨ Why this is fun

Most "AIη”»ε›Ύ" demos stop at one image. This one turns each image into a playable knowledge surface:

  • πŸ–±οΈ Long-press anywhere on a picture β†’ the model reads what's under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
  • πŸ“š Encyclopedia-style output β€” every node ships with a 150–220-char caption and 20–40 in-image labels (place names, dates, numbers…), all OCR'd back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
  • 🌳 Infinite tree of canvases β€” every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.
  • ⏳ Watch it think β€” a node is saved and linkable the instant you click, then its title / caption / scene prompt type out live; share the link and a friend on another device watches the same stream fill in.

πŸ“Έ Screenshots

Click-to-explore demo
Click-to-explore β€” long-press any region to drill in
Woodpecker walkthrough
End-to-end pipeline β€” search β†’ planner β†’ ImageGen β†’ drill-down
Gallery and canvas
Gallery + canvas β€” every canvas is persisted, shareable, replayable

πŸš€ Highlights

  • πŸ–±οΈ Click-to-explore: long-press (1 s) anywhere on a node's image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
  • ⏳ Live-streaming, linkable generating nodes: the moment you click, the child node is persisted under its final id and its parent hotspot links to it immediately β€” so it's shareable / openable on any device while still generating. Its title, caption and image prompt type out live (token-streamed via SSE), the catalog shows a spinner row, and a refresh or cross-device open resumes the stream from the on-disk snapshot. On failure the half-node is auto-deleted.
  • 🌫️ Progressive image loading: every PNG gets blur β†’ thumbnail β†’ medium β†’ full variants (sharp). Gallery cards blur-up, the canvas swaps to full-res when ready β€” no broken-image flashes, fast first paint.
  • πŸ–ΌοΈ Portrait & landscape canvases: pick orientation per canvas (mobile portrait viewports default to portrait); filter the gallery by All / Landscape / Portrait with the choice synced to the URL.
  • ⚑ Per-node parallelism: up to 4 different spots in parallel per parent (configurable). Each in-flight click streams a phase chip (Inferring label… β†’ Searching the web… β†’ Generating image…) on the hotspot. Hit the cap and the cursor turns into βŒ›.
  • πŸ“– Encyclopedia register: planner produces 150–220 char captions with 20–40 in-image text fragments β€” like reading a richly annotated diagram in a children's encyclopedia. Long captions clamp to 2 lines with a ζŸ₯ηœ‹ζ›΄ε€š / Show more toggle.
  • 🌐 Web-search augmented: a "decide-then-search" gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a πŸ“š hover badge over the canvas.
  • πŸ” Resilient SSE: Last-Event-ID replay + per-job snapshot resume β€” a dropped connection or page refresh mid-generation reconnects and catches up on everything it missed, including the in-flight typewriter.
  • 🎬 Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
  • πŸ”— Share as preview: any canvas β†’ read-only ?s=<token> URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones.
  • πŸ“Ί Fullscreen casting: β›Ά requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
  • πŸ”€ Selectable in-image text: every label baked into the diagram is OCR'd with Apple Vision (zh-Hans + en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth.
  • πŸ”Š Voice narration: each node's title + caption is synthesised to speech with Microsoft Edge neural voices (msedge-tts β€” free, no API key). Pick a character voice per flipbook from the live Edge catalogue (filtered to the UI language); the picker reads "ζ™“ζ™“ Β· ε₯³ε£°" instead of raw locale IDs. Switching voices re-narrates the whole book and restarts in-flight playback. Auto-narration is on by default (toggleable) and is bundled into exports so the static site speaks offline too.
  • πŸ“± Mobile responsive: sticky top bar that pins on scroll, single-column gallery, pinch-zoom image lightbox, smaller hotspots and pending bubbles.

πŸ€– Multimodal Γ— Mainstream LLMs

Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:

Modality What it does Pluggable into
πŸ“ Text / JSON LLM planner, click-label inference, decide-then-search verdict any chat-completion-style model
πŸ–ΌοΈ Image generation turns a structured prompt into a 2752Γ—1536 annotated diagram with bake-in text labels OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider
🌐 Web search rephrased query β†’ top-N normalized results β†’ planner context + πŸ“š sources panel any search backend
πŸ‘οΈ OCR (Apple Vision) zh-Hans + en-US recognition over every generated PNG, projected as a selectable HTML overlay local, no API keys needed
πŸ”Š TTS (Edge neural voices) synthesises each node's title + caption to an mp3, per-flipbook character voice Microsoft Edge online voices via msedge-tts, no API key

The image layer is a provider chain (IMAGE_PROVIDER=...,svg) β€” first enabled provider wins, svg is always appended last as a placeholder so the UI never breaks. Adding a new model is a single file:

// server/src/generation/providers/<name>.js
export default {
  name: 'my-model',
  enabled(config) { return Boolean(config.MY_API_KEY); },
  async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
    // call your model, write <hash>.png into outputDir, push phase events
  },
};

Out of the box:

Provider Trigger to enable Status
openai OPENAI_API_KEY set πŸ”Œ stub β€” implement in providers/openai.js
nanobanana NANOBANANA_API_KEY or GEMINI_API_KEY πŸ”Œ stub
seeddance SEEDDANCE_API_KEY or ARK_API_KEY πŸ”Œ stub
codebuddy ENABLE_CODEBUDDY=1 βœ… reference impl (used in the demo gif)
svg always βœ… fallback placeholder

🎯 The reference implementation wires the codebuddy CLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives in server/src/codebuddyClient.js and is a useful template if you ever shell out to any CLI-based model.


🐦 Walkthrough β€” generating a woodpecker flipbook from zero

Type ε•„ζœ¨ιΈŸ (woodpecker) into the top bar and watch the entire pipeline run: decide-then-search β†’ planner β†’ ImageGen β†’ click to drill into the tongue anatomy / nest cavity / ant-foraging zones, each spawning its own annotated diagram with its own sources.


πŸ—‚οΈ Layout

.
β”œβ”€β”€ prompts/                        # system / planner / click-label / image-prompt / decide-search
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ sync-prompts.mjs
β”‚   β”œβ”€β”€ serve-preview.mjs           # build + serve one canvas's static preview
β”‚   └── example-doc-publish.mjs     # publish canvases to GitHub Pages
β”œβ”€β”€ server/
β”‚   └── src/
β”‚       β”œβ”€β”€ routes/                 # canvas, click, events (SSE), assets, share
β”‚       β”œβ”€β”€ export/                 # static-site exporter + viewer template
β”‚       β”‚   β”œβ”€β”€ buildExport.js      # buildCanvasSite / buildCanvasExport (zip)
β”‚       β”‚   └── template/           # self-contained index.html + viewer.js/css
β”‚       β”œβ”€β”€ lib/zip.js              # dependency-free ZIP writer
β”‚       β”œβ”€β”€ generation/
β”‚       β”‚   β”œβ”€β”€ pipeline.js         # generateRoot + expandFromClick + per-node concurrency
β”‚       β”‚   β”œβ”€β”€ decideSearch.js     # decide-then-search gate
β”‚       β”‚   β”œβ”€β”€ webSearch.js        # WebSearch subprocess + result normaliser
β”‚       β”‚   β”œβ”€β”€ queue.js            # PerCanvasQueue / Semaphore / PerKeySemaphore
β”‚       β”‚   β”œβ”€β”€ planner.js / clickLabel.js
β”‚       β”‚   β”œβ”€β”€ image.js            # provider-chain orchestrator
β”‚       β”‚   └── providers/          # codebuddy, openai, nanobanana, seeddance, svg
β”‚       β”œβ”€β”€ db/                     # Sequelize models + hydrateFromDisk
β”‚       β”œβ”€β”€ store/                  # filesystem layer
β”‚       β”œβ”€β”€ sse/                    # event hub
β”‚       └── codebuddyClient.js      # reference CLI-subprocess wrapper
└── web/                            # Vite + React + TS

πŸ’Ύ Storage

  • πŸ“ Filesystem (source of truth for big artifacts): server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}.
  • πŸ—ƒοΈ SQLite (server/data/flipbook.sqlite, via Sequelize): metadata index β€” Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runs hydrateFromDisk() to rebuild this index if it's missing.

πŸ› οΈ Develop

npm install
npm run dev           # server on :8787 + Vite on :5173 in parallel

Open http://127.0.0.1:5173.

By default ENABLE_CODEBUDDY=0 (stub mode β€” fast, SVG placeholders, no LLM). Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner + ImageGen + WebSearch:

ENABLE_CODEBUDDY=1 npm run dev:server

⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs). ImageGen produces 2752Γ—1536 PNG (~6 MB).

Per-node parallelism

Up to 4 click expansions per parent node run in parallel; excess clicks queue. Different parents and different canvases run independently. A per-parent write lock serializes only the short read-modify-write of the parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).

πŸ” Web search

A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the LLM with the proposed subject and asks: do recent / authoritative sources materially improve this node? The default leans yes β€” only clearly abstract / timeless subjects skip search. When yes:

  1. The web-search backend runs with the rephrased query.
  2. Results are normalised into {title, url, snippet, source}.
  3. Top results are passed into the planner prompt.
  4. Sources are persisted both into nodes/<hash>.json and into the SQLite Sources table.
  5. The frontend renders a πŸ“š badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).

πŸ“¦ Export as a standalone static site

Any canvas can be exported as a fully self-contained static site β€” a read-only replica of the preview with all data and images inlined, openable directly from file:// with zero network requests.

  • In-app: the Β·Β·Β· More menu β†’ Export preview downloads a .zip (index.html / viewer.js / viewer.css / data.js + images/).

  • Serve one locally for quick viewing in a browser:

    npm run serve-preview -- <canvasId> [--lang en] [--port 8088]

    Builds the static site to a temp dir, starts a tiny static HTTP server, prints the URL. Ctrl-C cleans up.

  • Publish to GitHub Pages (one or more canvases β†’ a routed gallery landing page at /, each example at /<canvasId>/):

    npm run example:publish -- <canvasId> [<canvasId> ...] [--lang en] [--no-push]

    Builds each canvas, regenerates the landing index, and pushes to the gh-pages branch (accumulating β€” re-publishing a new id keeps the others). β†’ see the result at https://imcuttle.github.io/flipbook-app/.

The exported viewer mirrors the live read-only preview: image stage with collision-avoiding hotspot labels, leader lines, selectable OCR text overlay, caption, breadcrumb, catalog and sources β€” plus progressive image loading, scene transitions, and next-layer image prefetch. Per-node narration mp3s are bundled too, so the static site auto-narrates offline (toggleable in the top bar). It never calls the server.

πŸ”— Share / preview links

  • POST /api/canvas/:id/share β†’ {token, url}. Reuses an existing token for the same canvas.
  • GET /api/share/:token β†’ {canvasId, topic, readOnly:true}.
  • Frontend: opening …?s=<token> puts the UI in read-only preview mode β€” no topic input, no clicks on the image, "πŸ‘ Preview" badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.

πŸ“Ί Fullscreen / casting

  • β›Ά button in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn't supported.
  • πŸ‘ / 🚫 button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.
  • Long-press hint is suppressed in fullscreen by default; the press still works.

🧹 Cleaning local state

npm run clean:data    # reset server/data (all canvases)
npm run clean:dist    # reset web/dist
npm run clean         # both

πŸ“¦ Build for production

npm run build         # builds web/dist
npm start             # serves web/dist + API from :8787

🌐 LAN access via a fixed domain (macOS)

Give the app a stable hostname (e.g. http://flipbook.lan) reachable from any device on your LAN β€” no port number needed. Uses dnsmasq (resolves the domain β†’ this machine's LAN IP) + Caddy (reverse-proxies :80 to the app).

npm run lan:up        # flipbook.lan β†’ dev :5173 (preferred), falls back to prod :8787
npm run lan:down      # tear it down

# custom: scripts/lan-domain-setup.sh <domain> <devPort> <prodPort>
bash scripts/lan-domain-setup.sh studio.lan 5173 8787

The proxy tries the dev port (5173) first and automatically falls back to the prod port (8787) when dev isn't running (passive health check, 3s blacklist). So npm run dev and npm start both work behind the same domain.

lan:up installs dnsmasq/caddy via Homebrew if missing and needs sudo (dnsmasq binds 53, Caddy binds 80). It only configures this machine; to reach the domain from other devices, point their DNS at this machine's LAN IP (router DHCP DNS, per-device DNS, or a hosts entry β€” the script prints the exact options and your IP).

βš™οΈ Configuration (env)

Var Default Purpose
PORT 8787 server port
HOST 127.0.0.1 server bind
DATA_DIR server/data canvas state on disk
PROMPTS_DIR prompts prompt files
DB_PATH <DATA_DIR>/flipbook.sqlite SQLite file
MAX_PARALLEL_CLICKS_PER_NODE 4 concurrent click expansions per parent
MAX_PARALLEL_CODEBUDDY 20 concurrent planner/LLM subprocesses
MAX_PARALLEL_IMAGE 20 concurrent image-generation jobs (separate pool from the LLM limit)
PLANNER_TIMEOUT_MS 90000 per-call planner timeout
IMAGE_TIMEOUT_MS 180000 per-call ImageGen timeout
WEB_SEARCH_TIMEOUT_MS 60000 per-call WebSearch timeout
IMAGE_PROVIDER codebuddy provider chain (e.g. openai,nanobanana,svg)
IMAGE_SIZE 1920x1080 requested size (provider may pick its own)
ENABLE_CODEBUDDY 0 flip to 1 to enable the reference CLI provider
ENABLE_WEB_SEARCH follows ENABLE_CODEBUDDY force-disable with 0
ENABLE_OCR 1 run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to 0 to skip
OCR_TIMEOUT_MS 25000 per-call OCR timeout
OCR_MIN_CONFIDENCE 0.4 drop OCR spans below this confidence
ENABLE_AUDIO 1 synthesise Edge neural-voice narration (mp3) for each node; set to 0 to skip. Non-blocking β€” failures never stop image generation
AUDIO_TIMEOUT_MS 30000 per-call TTS synthesis timeout

English Β· δΈ­ζ–‡

About

🎨 η‚Ήε‡»εΌζŽ’η΄’ηš„ηŸ₯θ―†η”»ε†ŒοΌŒι•ΏζŒ‰ε›Ύη‰‡η”ŸζˆεΈ¦ζ ‡ζ³¨ε­ε›Ύ | Flipbook Canvas β€” click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram via a pluggable multimodal pipeline (LLM + image gen + web search + OCR).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors