fix: replace deprecated Workers AI model + repair hosted recall/query/crawl#34
Merged
Conversation
…ll/query/crawl
Cloudflare deprecated @cf/meta/llama-3.1-8b-instruct on 2026-05-30 (error
5028), which broke the AI Query endpoint (it hardcoded that model and never
used the OpenRouter key). Extract a single aiChatComplete() helper that prefers
the OpenAI-compatible endpoint when OPENAI_API_KEY is set, else Workers AI on
@cf/meta/llama-3.1-8b-instruct-fast (overridable via WORKERS_AI_MODEL). Route
aiQueryRun, the facts extractor, and memwalChat through it. The facts model
keeps null-on-failure so extraction still degrades to deterministic heuristics.
Also in this change set:
- Add per-run memory routes POST /api/runs/:id/memwal/{recall,query} so the
Memory page stops returning "Not found" on hosted runs; resolve the run's
namespace and proxy to the global recall/chat handlers.
- Load namespace facts (context/facts.json / manifest) for grounded recall and
chat instead of an in-memory seed table.
- Rewrite htmlToText to scope to <main>/<article>, drop nav/header/footer/
script/style/svg/form blocks, and preserve headings and list structure, so
the fetch fallback no longer flattens nav/llms.txt boilerplate.
- Add a live ticking elapsed timer to the build console (no longer frozen 0s).
- Make the hosted "Remember" button honest and clickable instead of silently
no-opping behind a delegate gate it can never satisfy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Live prod showed five bugs after building a context package:
aiQueryRunhardcoded@cf/meta/llama-3.1-8b-instruct, which Cloudflare deprecated, and never used the OpenRouter key.htmlToTextgrabbed the whole document.What
aiChatComplete()helper: prefers OpenAI-compatible endpoint whenOPENAI_API_KEYis set, else Workers AI on@cf/meta/llama-3.1-8b-instruct-fast(overridable viaWORKERS_AI_MODEL). RoutesaiQueryRun, the facts extractor, andmemwalChatthrough it. Facts model keeps null-on-failure → still degrades to deterministic heuristics.POST /api/runs/:id/memwal/{recall,query}and load namespace facts (context/facts.json/ manifest) for grounded recall + chat.htmlToTextto scope to<main>/<article>, drop nav/header/footer/script/style/svg/form, preserve headings + lists.Verification
tsc --noEmitclean (api + web)vitest run apps/api/src/worker.test.ts→ 22/22 pass (incl. Firecrawl)wrangler deploy --dry-runclean, no jsdom in runtime bundleFollow-up (infra, not in this PR)
FIRECRAWL_API_KEYas a worker secret to activate clean JS-rendered crawling (code path already wired).