Chaos: Adding a few MCP endpointSpec and some refactoring & fixes#218
Chaos: Adding a few MCP endpointSpec and some refactoring & fixes#218priyanshuagarwal-harness wants to merge 10 commits into
Conversation
… bodyBuilder coercion
- New resources: discovered_namespace, discovered_service (list-only)
- New extractor sdPageExtract with empty-result diagnostic hint
- Reshape chaosDRTestListExtract to consume { items, pagination } envelope
- Expand chaos_fault.list with 8 new filters (search, type, tags, category, etc.)
- Add coerceBody helper to defensively parse string-serialized request bodies
- Add include_all_secrets_accessible_at_scope filter to secret.list
- Remove redundant bodySchema blocks on 4 template execute actions
…d discovered_network_map
- Rename resource: chaos_network_map -> chaos_application_map (old name kept as search alias)
- chaos_application_map.list switches GET -> POST; adds environment_id/infra_id/search/all/minimal filters
- chaos_application_map.get requires environment_id + infra_id (composite Mongo key)
- New extractor chaosAppMapPageExtract for the { data, page } envelope
- New resource: discovered_network_map (Service Discovery raw inventory)
- Expand chaos_experiment with target_network_map_ids, my_experiments, exclude_automation filters
- Document system-tag semantics (fault=, probe=, workload=, service=) in descriptions
- Add multi-step reasoning playbook to descToolsetChaos
Co-authored-by: Cursor <cursoragent@cursor.com>
bce790b to
480f61e
Compare
There was a problem hiding this comment.
Stale comment
Found 4 issues against the architecture standard.
- High:
src/registry/toolsets/chaos.tsnow swallows malformed JSON incoerceBody(), which lets bad string bodies reach write actions with defaulted values instead of failing locally.- High:
src/registry/toolsets/chaos.tsexposeschaos_application_map.getwithout validating the composite{environment_id, infra_id}key, even though the new description says missing either yields a backend 500.- Medium:
src/registry/toolsets/chaos.tsoverwrites explicitruntime_inputswhenexperiment_variablesortasksare also present, silently dropping caller data.- Medium:
src/registry/toolsets/chaos.tsmarkschaos_experiment.create.bodySchema.idas required even though the builder auto-generates it, so the agent-facing structured contract is stricter than runtime.Assumptions:
- None beyond the behavior reproduced from the PR head.
Change summary:
- The Chaos expansion is useful and the branch builds/tests cleanly, but the four items above still miss Sunil’s fail-loud + metadata/runtime-alignment bar.
Verification:
pnpm buildpnpm test -- --runInBand tests/registry/chaos-experiment.test.ts(Vitest actually ran the full suite: 57 files / 1354 tests passed)pnpm docs:check- Local repro scripts against
origin/pr-218confirmed the malformed-body, missing-app-map-scope, andruntime_inputsoverwrite behaviors.Sent by Cursor Automation: Sunil On Demand Architecture Review
| function coerceBody(input: Record<string, unknown>): Record<string, unknown> { | ||
| const raw = input.body ?? input; | ||
| if (typeof raw === "string") { | ||
| try { return JSON.parse(raw) as Record<string, unknown>; } catch { /* fall through */ } |
There was a problem hiding this comment.
coerceBody() currently swallows malformed JSON and keeps going as if the body were an object. I reproduced this on the PR head with registry.dispatchExecute(..., 'chaos_probe', 'enable', { body: '{bad json' }), and it still emitted a real write request with {"isEnabled": true}. For this repo's fail-loud standard, I think this helper needs to throw on parse failure instead of falling through.
| bodySchema: { | ||
| description: descBodyExperimentCreate, | ||
| fields: [ | ||
| { name: "id", type: "string", required: true, description: descExperimentIdUUID }, |
There was a problem hiding this comment.
The builder below auto-generates id when it is absent, but the structured schema marks it required: true. That means harness_describe / elicitation will tell agents they must provide an ID even though runtime does not need one. Can we make this field optional in bodySchema so the metadata stays aligned with the actual request builder?
| }, | ||
| }, | ||
| executeActions: { | ||
| run: { |
There was a problem hiding this comment.
This assignment overwrites any caller-supplied runtime_inputs from lines 316-317. I reproduced it on the PR head: passing both runtime_inputs and experiment_variables drops the original runtime_inputs object completely. We should either merge the two representations or reject mixed usage, otherwise the shared-tool contract silently loses data.
| path: `${CHAOS}/rest/v2/applicationmaps/{mapId}`, | ||
| operationPolicy: { risk: "read", retryPolicy: "safe" }, | ||
| pathParams: { map_id: "mapId" }, | ||
| queryParams: { |
There was a problem hiding this comment.
The new description says environment_id + infra_id are effectively required here because the backend lookup is keyed on all three values, but there is still no local validation. I verified that registry.dispatch(client, 'chaos_application_map', 'get', { map_id: 'map-1' }) emits a request with neither query param, which pushes callers into the backend 500 path instead of failing locally with a clear missing-parameter error.
There was a problem hiding this comment.
Stale comment
Found 3 architecture issues worth fixing before merge:
chaos_dr_test.listnow parses the wrong response envelope and will silently return empty results.coerceBody()swallows malformed JSON on mutation paths, so bad agent input can turn into defaulted writes instead of a clear validation error.chaos_application_map.getdocumentsenvironment_id+infra_idas required but never validates them locally, so callers still hit the backend 500 described in the docs.I also noticed CI already has a red
build-and-test (22)shard (docs:checkexited 1), but I stopped at the architecture blockers above.Sent by Cursor Automation: Sunil On Demand Architecture Review
| const r = raw as { drtests?: unknown[] }; | ||
| const items = r.drtests ?? []; | ||
| return { items, total: items.length }; | ||
| const r = raw as { items?: unknown[]; pagination?: { totalItems?: number } }; |
There was a problem hiding this comment.
Changing this parser from { drtests: [...] } to { items, pagination } looks like a regression. The DR-test API docs still return drtests, and main was parsing that shape. With a real response like { drtests: [...] }, this now produces { items: [], total: 0 }, so chaos_dr_test.list silently empties out instead of failing loudly. If the backend contract hasn't changed, this extractor (and the updated descListDRTests pagination prose) should stay on the drtests envelope.
| function coerceBody(input: Record<string, unknown>): Record<string, unknown> { | ||
| const raw = input.body ?? input; | ||
| if (typeof raw === "string") { | ||
| try { return JSON.parse(raw) as Record<string, unknown>; } catch { /* fall through */ } |
There was a problem hiding this comment.
Swallowing JSON.parse failures here violates the fail-loud rule for write paths. A malformed string body now falls through as a plain string, and callers with defaults like chaos_probe.enable / verify still build valid mutation payloads ({ isEnabled: true }, { verify: true }) instead of rejecting bad input locally. Can we throw an actionable error for invalid JSON strings rather than catch {} and continue?
| environment_id: "environmentIdentifier", | ||
| infra_id: "infraId", |
There was a problem hiding this comment.
This spec already knows environment_id + infra_id are required for get (descGetApplicationMap says missing either one yields a backend mongo: no documents 500), but nothing here enforces that locally. Registry.dispatch() only checks required listFilterFields on list and required bodySchema fields on the built body, so harness_get(resource_type="chaos_application_map", resource_id=...) without both params still goes straight to that backend 500. I'd add a local guard/preflight so the contract fails loudly.
Co-authored-by: Cursor <cursoragent@cursor.com>
Runs pnpm docs:generate against the merged tree (PR chaos changes + main's iacm + ai-evals sync) to update the README counts surfaced by docs:check: 187 -> 191 resource types in narrative and the architecture diagram. Toolset count (32 default), opt-in iacm (33 total), and 31 prompt templates already match the regenerated values. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Found 3 architecture issues worth fixing before merge:
- High:
chaosDRTestListExtract()now expects{ items, pagination }, but the publishedGET /v3/dr-testscontract still returnsdrtests. On this head,chaosDRTestListExtract({ drtests:[{ identity:"dr1" }], pagination:{ totalItems:1 } })returns{"items":[],"total":1}, sochaos_dr_test.listsilently drops real results. - High: the new shared
coerceBody()helper swallows JSON parse failures and can turn malformed string bodies into real writes. Repro on this head:dispatchExecute("chaos_probe","verify",{ body:"{" })emits{ verify: true }instead of failing locally. - Medium:
chaos_application_map.getnow documentsenvironment_id+infra_idas required, but the resource still doesn’t validate them before dispatch. Repro on this head issues/applicationmaps/{id}with only org/project params, so callers still hit the backend error path the description warns about.
Assumptions:
- Reviewed against the current PR head
ab341469from the contributor fork and the public Harness API docs for DR tests and application maps.
Change summary:
- The README/docs refresh is now in sync, the POST +
page.totalItemsapplication-map list wiring matches the public API docs, and the focused local checks passed.
Verification:
pnpm buildpnpm docs:checkpnpm test tests/registry/chaos-probe.test.ts- Targeted
node --input-type=modulerepros against the built registry/extractors for the three items above
Sent by Cursor Automation: Sunil On Demand Architecture Review
| export const chaosDRTestListExtract = (raw: unknown): { items: unknown[]; total: number } => { | ||
| const r = raw as { drtests?: unknown[] }; | ||
| const items = r.drtests ?? []; | ||
| return { items, total: items.length }; | ||
| const r = raw as { items?: unknown[]; pagination?: { totalItems?: number } }; | ||
| const items = r.items ?? []; | ||
| return { items, total: r.pagination?.totalItems ?? items.length }; |
There was a problem hiding this comment.
The published GET /v3/dr-tests contract still returns the list under drtests, not items. On this PR head, the built extractor reproduces the regression:
chaosDRTestListExtract({ drtests: [{ identity: "dr1" }], pagination: { totalItems: 1 } })
// => { items: [], total: 1 }That means chaos_dr_test.list will silently drop real results, which misses the fail-loud / contract-alignment bar. Can we keep reading drtests here (and optionally pagination.totalItems) until the API contract actually changes?
| function coerceBody(input: Record<string, unknown>): Record<string, unknown> { | ||
| const raw = input.body ?? input; | ||
| if (typeof raw === "string") { | ||
| try { return JSON.parse(raw) as Record<string, unknown>; } catch { /* fall through */ } | ||
| } | ||
| return raw as Record<string, unknown>; |
There was a problem hiding this comment.
This helper improves the valid double-serialized case, but the empty catch means malformed JSON still falls through as a string. On mutation paths with defaults, that becomes a real write instead of a local validation error. Example on this head:
await reg.dispatchExecute(client, "chaos_probe", "verify", {
probe_id: "probe-1",
org_id: "org",
project_id: "proj",
body: "{",
});
// request body => { verify: true }That misses Sunil's fail-loud standard. Could we rethrow when input.body is a string that fails JSON.parse(), rather than silently treating it as an object?
| queryParams: { | ||
| environment_id: "environmentIdentifier", | ||
| infra_id: "infraId", |
There was a problem hiding this comment.
The new description correctly documents environment_id + infra_id as required for chaos_application_map.get, but the resource still doesn't enforce them locally. Repro on this head:
await reg.dispatch(client, "chaos_application_map", "get", {
map_id: "map-1",
org_id: "org",
project_id: "proj",
});
// params => { organizationIdentifier: "org", projectIdentifier: "proj" }So agents still fall through to the backend error path the description warns about. Can we add a local guard for the composite key before dispatch?


Description
Type of Change
Checklist