Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions BUGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,21 @@

These are bugs (or missing features) I've observed while working with `multi checks`.

- [ ] Output is now hanging. I suspect this is recent (within the last few commits) and it started
happening after implement the changes to the `Presenter` actor to fix writing text off-screen without wrapping.

- [ ] Remove the `Claude -p` executor.

- [ ] Running 16 agents seems to nearly freeze the computer. Use an OTel profile to determine if this is true.
- [ ] No use of Cersei workflows to chain multiple prompts together.

- [ ] Temperature not configured.
- [ ] Logs no longer report the id of the check that failed (or the number of attempted retries)

- [ ] No limit on max turns.
- [ ] Assemble_instructions is hard-coded: src/checks/executor/mod.rs:98 (definition), called from src/checks/executor/cersei.rs:110

- [ ] Logs no longer report the id of the check that failed (or the number of attempted retries)
- [ ] No system prompt provided.

- [ ] No use of Cersei workflows to chain multiple prompts together.
- [ ] Not sure if prompt caching is enabled at all.

- [ ] Running 16 agents seems to nearly freeze the computer. Use an OTel profile to determine if this is true.

- [ ] No limit on max turns.

- [ ] No support for Fireworks AI.

Expand All @@ -25,14 +26,6 @@ happening after implement the changes to the `Presenter` actor to fix writing te

- [ ] No loading of RULES.md files from the .claude directory.

- [ ] Assemble_instructions is hard-coded: src/checks/executor/mod.rs:98 (definition), called from src/checks/executor/cersei.rs:110

- [ ] No system prompt provided.

- [ ] Not sure if prompt caching is enabled at all.

- [ ] No trace capture. We need a way to record all session traces so that we can analyze why they failed.

- CERSEI: `append_system_prompt()` function is dead unless routed through the separate build_system_prompt() composer.

- [ ] `Ctrl-C` (shutdown signals) needs to be handled gracefully and cross-platform.
Expand All @@ -50,6 +43,13 @@ guaranteeing the terminal is restored on the way out.

## Fixes

- [x] No trace capture. We need a way to record all session traces so that we can analyze why they failed.

- [x] Output is now hanging. I suspect this is recent (within the last few commits) and it started
happening after implement the changes to the `Presenter` actor to fix writing text off-screen without wrapping.

- [x] Temperature not configured.

- [x] No loading of CLAUDE.md files

- [x] Concurrency still not respected.
Expand Down
49 changes: 32 additions & 17 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions guides/checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,10 @@ base_url = "https://..."
An unset flag contributes nothing — it never overrides a value from the
environment or file. The `model` is validated against a hardcoded allowlist of
known IDs for the selected provider; an unknown ID is a clear error. `effort`
currently maps to the in-process agent's sampling temperature (`low` → most
deterministic, `high` → most exploratory); mapping it to an extended-thinking
budget is pending an upstream provider fix.
maps to the in-process agent's extended-thinking budget: `medium` and `high`
enable extended thinking (4096- and 8192-token budgets respectively), while
`low` — the default — keeps thinking off for speed and cost, running the agent
deterministically instead.

The **`executor`** selects the execution engine. The default `cersei` runs each
check as an in-process agent (native multi-provider model swapping, no external
Expand Down
Loading