fix: Render and index <img> as first-class inline content#744
Conversation
Closes the silent-strip bug where curating a topic with `<img src alt/>`
embedded inside a `<bv-*>` element succeeded on disk but vanished on
`brv read` and was missed by `brv query`. Empirically reproduced
2026-05-29: writer kept the tag intact; `brv read` showed a gap where
the image was; query for the alt text returned zero matches.
Cause: `html-renderer.ts` and the BM25 indexer's `bodyText` extraction
both rely on `getInnerText()`, which walks text-node descendants only.
Void elements like `<img>` have attribute data (`src`, `alt`) but no
text children → contribute nothing. Worst-class UX (write succeeds,
content disappears).
Approach: treat `<img>` as first-class inline content.
- `html-renderer.ts` — new private `getInlineMarkdown(node)` that walks
like `getInnerText` but translates `<img>` to CommonMark ``.
`renderChild` uses it for inline content; a top-level `<img>` case is
added for the rare bv-sibling shape. Defensive on malformed input:
* missing `src` → empty string (no broken `![alt]()` syntax)
* missing `alt` → `` (valid CommonMark click target)
* `]` in alt → collapsed to space (only `]` closes the alt span)
* `)` in src → CommonMark autolink form `<src>` (parens-tolerant)
- `html-reader.ts` — new exported `extractImageContent(elements)` that
aggregates every `<img>`'s alt + src into a space-joined string.
Surfaced via the new `HtmlTopicRead.imageContent` field. Does NOT
mutate `getInnerText` — separate focused helper with no surprise
blast-radius on shared infrastructure.
- `search-knowledge-service.ts` — concatenate `parsed.imageContent`
into the indexed-content array alongside bodyText / summary / tags /
keywords / related. URLs go in verbatim; the BM25 tokenizer's
whitespace/punctuation split decomposes them into useful tokens
(host, path segments, filename, extension).
- `INDEX_SCHEMA_VERSION` bumped 6 → 7 so cached indexes built pre-fix
invalidate on next daemon start. Previously-curated `<img>` content
becomes searchable retroactively without a manual `brv index rebuild`.
- `system-prompt.yml` — extend the inline-HTML allowlist note to
document `<img>` is supported.
Tests (16 new + 6 integration cases):
- `html-renderer.test.ts` (7 cases): canonical `<img>` rendering inside
`<bv-decision>`; `` for missing alt; silent drop for missing
src; `]` escape in alt; `)` autolink fallback in src; top-level
`<img>` sibling rendering; URL tokens present in rendered output for
BM25 friendliness.
- `html-reader.test.ts` (6 cases): empty topic → empty imageContent;
single image alt + src aggregated; multiple images preserve document
order; empty attrs don't produce double spaces; `readHtmlTopicSync`
surfaces `imageContent` on the parsed result.
- `test/integration/scenarios/img-roundtrip.test.ts` (6 cases): full
read + index pipeline. Curate-shaped HTML on tmp disk → `readHtmlTopic`
→ renderer shows markdown image syntax; indexer (MiniSearch with the
same option shape as production) finds the topic for queries on alt
phrase, URL host token, URL path segment, surrounding prose. Plus a
regression guard for topics with zero images (no double spaces in
the indexed content).
42/42 affected-surface tests green; 54/54 search-knowledge regression
tests pass. Typecheck + lint clean; the `renderChild` complexity
warning was 31 pre-fix and is 32 now — +1 unavoidable for the new
top-level `<img>` branch.
This is the first task in the post-merge inline-html-support
milestone (`features/html-memory-conversion/milestones/02-...`).
The matching fix for `<a href>` (same shape, different bug surface)
is tracked as a follow-up task.
[ENG-3021] Render and index <img> as first-class inline content
…v curate
Lets agents author the curate continuation envelope as a JSON file (no shell-escape
hell) and optionally have brv unlink it after local validation succeeds.
- --response-file <path>: alternative to --response. Reads {html, meta?} from file,
lstat-guarded against symlinks / directories / devices.
- --delete-response-file (opt-in): unlinks the file once envelope parse + session
lookup succeed locally, BEFORE daemon dispatch. Validation failures preserve the
file so the agent can fix and retry; unlink failures abort the curate so we never
claim success when the requested cleanup did not happen.
- Mutex'd at runtime against --response so failures emit a structured envelope
error agents can switch on, not oclif's generic stderr line.
- Kickoff-side guard: any continuation-only flag (--response, --response-file,
--delete-response-file) without --session is rejected up front. Also closes a
pre-existing silent-ignore on --response without --session.
- All local validation (envelope parse + session existence) now runs before any
daemon connect; bad JSON no longer triggers withDaemonRetry.
Stdin (--response-file -) deliberately deferred to a follow-up; no codebase
precedent for the dash-as-stdin convention yet.
Docs + skill markdown updated. brv curate --help EXAMPLES shows both flags.
Four cleanups from PR review (one per finding):
1. Defer --response-file unlink until after daemon dispatch returns.
Previously the unlink fired between local validation and dispatch, so
a transport error (daemon down, malformed payload, daemon-side throw)
destroyed the envelope file the agent had paid an LLM call to author.
Now the unlink only fires on a non-throw dispatch (success OR
validation-failed); a daemon-error throw skips it and the file
survives for retry. Unlink failure post-success appends a warning to
the dispatch envelope instead of aborting.
2. Replace `as Error & {kind?: string}` cast on the parse-error catch
with `instanceof InvalidResponseFormatError`. Required exporting the
class from curate-session.ts. Matches the existing pattern for
InvalidResponseFileError and respects the no-`as`-Type convention.
3. Pre-check whitespace-only payload in the command and emit the
transient `empty-response` envelope directly instead of letting
JSON.parse(' ') throw and collapse it into the terminal
`invalid-response-format` kind. Preserves the documented contract
where `empty-response` keeps the session alive for retry.
4. Replace the one-line buildUnknownSessionEnvelope wrapper with a
direct export of the internal unknownSessionEnvelope. Same shape,
one fewer indirection.
Interactive verification: file preserved on whitespace-only payload
(both inline and file), happy-path file deleted post-dispatch, real
envelope still curates clean. 8877 tests pass (1 new for the
parseCurateResponse-throws-InvalidResponseFormatError contract).
… gaps
Second round of PR review fixes (four findings on the previous fix
commit):
1. Switch cleanup-failure-post-success from warnings[] (free-text) to
errors[] (structured kind). The previous fix moved
response-file-delete-error into warnings, which buried the machine-
readable kind under a substring-match. Now the cleanup error is
appended to the dispatch envelope's errors[] so consumers can switch
on `kind === 'response-file-delete-error'` programmatically. The
`ok`/`status` fields stay as the daemon set them, so
`{ok: true, status: 'done', errors: [{kind: 'response-file-delete-error', …}]}`
is the success-with-cleanup-hiccup shape.
2. Update docs/curate-protocol.md row for response-file-delete-error to
describe the new "non-terminal companion" lifecycle. Previously said
"the curate is aborted" which mismatched the post-fix behavior.
3. Document the warnings[] field in the wire-envelope JSON skeleton.
Was undocumented (pre-existing gap); this PR actively uses the field
so now's the time to surface it.
4. Add a doc note under --delete-response-file explaining the
correct-html re-authoring tax: on a validation-failed continuation,
the file is unlinked but the session is still live, so the calling
agent must author a fresh envelope and write a new file before the
next continuation. Inline --response has no equivalent overhead.
Interactive verification: forced unlink failure via chmod-protected
parent dir. Result: status=done, ok=true, errors=[{kind:response-file-
delete-error, message:EACCES…}], warnings=None. Structured kind is
where consumers can find it.
…eton Real regression caught in third-round review: when the previous fix moved --delete-response-file cleanup failures from `warnings[]` to `errors[]`, the text-mode emitter's `done` branch was left iterating `warnings[]` only. JSON consumers saw the new `errors[0].kind` cleanly, but `--format text` users with a successful curate + failed cleanup saw `✓ Curated to ...` and nothing else — the cleanup failure vanished. Fix the emitter to iterate companion `errors[]` on the `done` branch too, prefixed with `⚠` to match the warnings convention. Same "non-fatal companion" semantics the docs already describe. Plus two doc lines tracking the new `done`-with-errors[] shape: - JSON-skeleton comment for `errors[]` now lists the three valid placements (correct-html, failed, done-with-companion). - Status-values table's `done` row notes the companion-errors case with a "treat as success; surface but don't abandon" hint so a contract author reading top-of-doc material doesn't assume any non-empty `errors[]` means failure. Interactive verification: chmod-555 parent directory → text-mode emits "✓ Curated to ..." plus "⚠ response-file-delete-error: ..." on the next line.
…estration Closes the four-round-carryover finding by pinning the load-bearing orchestration invariants. Three consecutive review rounds had each introduced a subtle behavioral change in the emitter / orchestration surface caught only by manual interactive verification: abd82ba — deferred unlink until after dispatch 316ece0 — switched cleanup-failure from warnings[] to errors[] 60b613d — made text-mode iterate companion errors[] on the done branch Each would have been caught by a small command-level test with stubbed seams. Adding both the seams and the tests. Refactor (src/oclif/commands/curate/index.ts): - Extract the dispatch path into a protected dispatchContinuation() method that wraps withDaemonRetry + continueSession. Throws on daemon-error so the caller can map to a daemon-error envelope AND skip the post-success unlink — that ordering is the load-bearing invariant. - Extract loadCurateResponseFile, deleteCurateResponseFile, peekCurateSession, and resolveProjectRoot calls into protected methods so a TestableCurate subclass can inject in-memory doubles. - Switch the removed-flag check from process.argv.slice(2) to this.argv so testable subclasses scope to their own argv (mocha's --timeout was leaking into the curate flag parser otherwise). New test file (test/commands/curate/index.test.ts) — 13 tests pinning: flag combinations: - --response + --response-file together - --delete-response-file without --response-file - continuation-only flags on kickoff (no --session) pre-dispatch local validation: - whitespace payload → empty-response, no dispatch - invalid JSON → invalid-response-format, no dispatch - unknown session → unknown-session, no unlink - --response-file read error → response-file-read-error, no dispatch daemon-error path preserves the response file (regression abd82ba): - dispatch throws → daemon-error envelope, NO unlink call fired cleanup-failure-post-success → done with appended errors[] (316ece0): - done + unlink throws → ok=true, status=done, errors[0].kind=response-file-delete-error - failed + unlink throws → existing errors preserved, cleanup appended - done + unlink succeeds → clean done envelope (no companion errors) text-mode emitter surfaces companion errors on done (60b613d): - text-mode done + cleanup error → both ✓ and ⚠ lines surface - text-mode done + writer warnings → existing behavior unchanged Pattern follows test/commands/status.test.ts: TestableCurate subclasses Curate, overrides protected seams via constructor-injected behavior callbacks, captures stdout via sinon.stub(process.stdout, 'write') because writeJsonResponse bypasses this.log. Full suite: 8890 passing (13 new, 1 added earlier this session).
Accept curate continuation envelope from a JSON file, with opt-in cleanup
Claude PR Review — in progress
|
|
Claude finished @danhdoan's task in 4m 39s —— View job Claude PR Review
SummaryThis is a project-to-main merge of two independently-reviewed features ( Strengths
Findings (all non-blocking; details in inline comments)
Prior-round findings I concur with
Score: 4.3 / 5Solid merge of two well-scoped features with strong test coverage and good documentation hygiene. The notable gaps are (1) the writer/renderer URL-scheme sanitization gap (acknowledged in PR body and tracked as a follow-up) and (2) the autolink fallback information loss — neither blocks the merge. |

proj: merge byterover-tool-mode into main —
first-class rendering + curate
--response-file flags
Body:
Summary
proj/byterover-tool-modeand needs toland on
main.access them on
mainyet.project-to-main merge.
Type of change
Scope (select all touched areas)
Linked issues
N/A
Root cause (bug fixes only, otherwise write
N/A)N/A — feature merge, not a bug fix.
Test plan
test/unit/server/infra/render/reader/html-renderer.test.ts(img rendering)test/unit/server/infra/render/reader/html-reader.test.ts(img indexing extractor)test/integration/scenarios/img-roundtrip.test.ts(full read + index pipeline)test/unit/oclif/lib/curate-session.test.ts(response-file helpers + envelope parsing)test/commands/curate/index.test.ts(Curate command orchestration)<img>rendering inside<bv-*>elements: canonical, missing alt, missing src, specialchars in alt, autolink fallback for src with parens, top-level img sibling
--response-filehappy path + inline equivalence + meta passthrough--delete-response-filesemantics: deletes on non-throw dispatch, preserves on parse-fail/ unknown-session / invalid-uuid, surfaces cleanup-error in both JSON
errors[]andtext-mode stderr
User-visible changes
<img>rendered and indexed as first-class inline content. Previously, curating atopic with
<img src alt/>inside a<bv-*>element wrote the tag to disk but the image'ssrc/altdid not show up inbrv readoutput or BM25 search. After this merge,<img>is rendered as CommonMark
inrendered_mdat the correct inline position, andthe alt + src text is tokenized into the BM25 index. Index schema version bumped 6 → 7 so
previously-curated img content becomes searchable retroactively without a manual
brv index rebuild.brv curate --response-file <path>and--delete-response-file. Previously, an agenthad to inline-escape its entire JSON envelope on the shell with
--response '<json>'. The new--response-fileflag reads the same{html, meta?}envelope from disk, sidesteppingshell-escape complexity for non-trivial HTML.
--delete-response-filecleans up the envelopefile after a successful daemon dispatch; the file is preserved on dispatch failure so the
caller can retry. New documented error kinds:
invalid-response-format,response-file-not-regular,response-file-read-error,response-file-delete-error.Evidence
Checklist
npm test)npm run lint)npm run typecheck)npm run build)docs/curate-protocol.md+ skill templates for new--response-fileflags; system prompt updated to document<img>in the inline-HTMLallowlist)
mainRisks and mitigations
with no parseable JSON output on stdout. Discovered during the post-merge E2E sweep; fix in
progress on a separate branch.
<img>writer-validation defects — event-handler attributesand non-HTTP URL schemes pass through the writer without sanitization. Verified
non-exploitable in current consumers: WebUI sanitizes via DOMPurify + React
<img>onlyforwards known props; terminal and LLM-as-text consumers don't execute the content.
follow-up.