fix: remove hardcoded chat model from public example by cursor[bot] · Pull Request #234 · harness/mcp-server

cursor · 2026-05-20T14:50:56Z

Description

Removes a hardcoded provider chat model slug from the public agent-pipeline example and replaces it with a neutral runtime input placeholder. Adds a regression test so public examples do not expose concrete provider chat model slugs.

Type of Change

Checklist

Tests pass
Typecheck passes

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

CLAassistant · 2026-05-20T14:52:14Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

cursor

Stale comment

Found 1 architecture-standards issue:

README.md now says 32 of 32 toolsets are enabled, but the same public docs still describe ai-evals as an opt-in toolset excluded from defaults. The registry marks ai-evals with optIn: false, so this count refresh leaves the README in a partially updated, contradictory state. Under Sunil's minimal-impact/docs-match-runtime standard, this should either be dropped from the bugfix PR or completed in the same change.

Assumptions:

Reviewed against the live PR head 248095d.

Verification:

pnpm test tests/tools/harness-schema-examples.test.ts

pnpm typecheck

Change summary:

The public-example fix itself looks good; the review concern is limited to the unrelated README refresh.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

cursor

Stale comment

Findings

Warning: README.md was updated to advertise the all-toolsets inventory (187 resource types / 32 toolsets), but the generated docs intentionally publish the default inventory (169 resource types / 31 toolsets, with 31 of 32 enabled by default because ai-evals is opt-in). On this branch, pnpm build && pnpm docs:check fails locally and pnpm docs:generate rewrites the README back to those default counts, so the PR currently leaves the public docs out of sync with the repository's generated-docs standard.

Open Questions

None.

Change Summary

src/data/examples/pipeline-v1.ts now uses a neutral model: <+input> placeholder instead of a concrete provider slug.

src/registry/toolsets/ai-evals.ts correctly restores ai-evals to opt-in-only, and the focused registry/example tests pass.

Local verification: pnpm test tests/registry/registry.test.ts tests/tools/harness-schema-examples.test.ts, pnpm build, and pnpm typecheck passed; pnpm docs:check failed because README.md is stale.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

Important — README.md: the generated resource/toolset counts are stale on the current PR head. In an isolated checkout of origin/cursor/mcp-v2-bug-resolution-76a3, pnpm build && pnpm docs:check fails, and pnpm docs:generate rewrites the file back to 169 resource types, 31 toolsets, and 31 of 32 default-enabled toolsets. That means the current README overstates both the total/default footprint after the later ai-evals opt-in change, so the docs no longer match the checked-in generator output or runtime defaults. Please regenerate README.md from the current branch state before merge.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

Important

tests/tools/harness-schema-examples.test.ts: the new regression only exercises getExample(), but the public surface this bug affects is harness_schema(..., example="agent-pipeline") in src/tools/harness-schema.ts. That means the PR body and tasks/todo.md currently overstate the coverage: a tool-layer regression could still break public example delivery while this test stays green.

Assumption

I reviewed the requested snapshot 31c119d..248095d. That snapshot also briefly carried an unrelated README.md count refresh that later commits on this PR appear to have corrected, so I’m not treating it as an active issue on the current head.

Assessment

The functional change in src/data/examples/pipeline-v1.ts matches the stated intent and is nicely minimal. The remaining gap is verification depth on the public tool surface.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

No architecture-standard issues found on the current PR head (2ceb94c).

Assumptions / residual risk:

The commits after the code-bearing fix are docs/task-only (README.md, tasks/todo.md), so the local verification below still applies to the live head.

GitHub build-and-test matrix checks are still pending as of this review.

What I checked:

src/data/examples/pipeline-v1.ts now uses a neutral runtime placeholder (model: <+input>) instead of a provider slug, which matches the repo rule to avoid hardcoded public agent guidance.

src/registry/toolsets/ai-evals.ts restores ai-evals to optIn: true, and tests/registry/registry.test.ts covers both default exclusion and explicit +ai-evals enablement.

tests/tools/harness-schema-examples.test.ts adds regression coverage that the public agent-pipeline example does not expose a concrete chat model slug.

Verification:

pnpm test tests/registry/registry.test.ts tests/tools/harness-schema-examples.test.ts

pnpm typecheck

pnpm build

pnpm test (59 files, 1398 tests passed)

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

Findings

Important — src/registry/toolsets/ai-evals.ts:583
Flipping ai-evals to optIn: true fixes the registry defaults, but the default server still unconditionally calls registerAllPrompts(server) from src/index.ts, and src/prompts/index.ts still always registers the AI Evals prompts. After this change, default startup hides the eval_* resources while prompts like run-evaluation and create-eval-suite remain discoverable and tell agents to call those now-disabled resource types, which will fail with Unknown resource_type. That means the documented opt-in behavior is only partially enforced and the default prompt surface regresses.

Open Questions / Assumptions

I’m assuming ai-evals being opt-in is meant to apply to all public AI Evals surfaces, not just resource/tool registration. If prompts are intentionally exempt, the docs and PR title should say so explicitly.

Scope / Tests / Verdict

Title vs. scope: only a partial match. The src/data/examples/pipeline-v1.ts change is the titled bugfix, but src/registry/toolsets/ai-evals.ts changes default server behavior in a broader way, and tasks/todo.md is unrelated process noise for a focused public bugfix PR.

Test coverage: the new example test is good coverage for the example scrub itself. Coverage is not sufficient for the broader opt-in change, because the added registry test only exercises Registry.getResource() and would not catch prompt-registration/default-startup regressions like the one above.

Verdict: needs changes.

Secondary context: the model: <+input> replacement itself is nicely minimal and aligned with the stated example fix.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

No blocking findings.

This PR is aligned with Sunil's architecture standards on the current head:

src/data/examples/pipeline-v1.ts removes the provider-specific chat model slug and replaces it with the neutral model: <+input> placeholder, which keeps the public example generic and avoids hardcoded internal defaults.

tests/tools/harness-schema-examples.test.ts adds focused regression coverage for that public example surface.

src/registry/toolsets/ai-evals.ts restores ai-evals to opt-in-only, and tests/registry/registry.test.ts covers the default-vs-+ai-evals behavior so the exposed registry surface stays consistent with the documented architecture.

Verification I ran on the live PR head:

pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts

pnpm typecheck

pnpm build

pnpm docs:check

pnpm test (59 files / 1398 tests passed)

GitHub CI is also green, including the +ai-evals smoke-test path.

Residual risk is low. The only meaningful remaining limitation is that the model-slug guard is regex-based, but for this narrowly scoped public-example regression that is an appropriate level of coverage.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Stale comment

Findings

Important — src/registry/toolsets/ai-evals.ts
This flips AI Evals out of the default registry, but the server still exposes the AI Evals prompts on every startup. createHarnessServer() always calls registerAllPrompts(server), and registerAllPrompts() always registers run-evaluation, create-eval-suite, and add-metric. Those prompts instruct agents to call eval_* resource types that now fail under the default config with Unknown resource_type.

I verified this on the current PR head after pnpm build: a default Registry reports evalDatasetAvailable: false while prompt registration still includes both run-evaluation and create-eval-suite. Please either gate AI Evals prompt registration on the enabled toolsets or defer the optIn: true change until prompt filtering exists.

Open Questions / Assumptions

I’m assuming ai-evals being opt-in is meant to apply to the whole public AI Evals surface, not just resource registration.

Change Summary

The model: <+input> replacement in src/data/examples/pipeline-v1.ts is a good minimal fix for the original bug report.

The new example/registry coverage is useful, but it does not cover the broader startup prompt-surface regression introduced by the ai-evals opt-in flip.

Local verification on the current PR head: pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts tests/tools/harness-schema-tool.test.ts, pnpm typecheck, pnpm build, pnpm docs:check.

Live PR CI is green as of this review; this issue is a coverage/architecture gap rather than an already-failing check.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor

Findings

Important: making ai-evals opt-in is incomplete. This diff correctly hides the eval_* resources by default, but the server still unconditionally calls registerAllPrompts(server), and src/prompts/index.ts still always registers the AI Evals prompts. On a default startup, prompts like run-evaluation and create-eval-suite remain discoverable even though they instruct agents to call resource types that now fail with Unknown resource_type. That leaves the public surface inconsistent with the documented opt-in behavior and does not fully meet the architecture standard for keeping defaults aligned across surfaces.

Open Questions / Assumptions

I’m assuming ai-evals being opt-in is intended to apply to prompts as well as resources. If prompts are intentionally always-on, that distinction should be documented explicitly because the current prompt text still assumes the resources are available.

Change Summary

The model: <+input> replacement in src/data/examples/pipeline-v1.ts is nicely minimal and aligned with the PR title.
The new tests are good coverage for the example scrub and the default resource gating, but they do not cover prompt registration, which is where the remaining regression sits.
Verification on the current head:
- pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts
- pnpm typecheck
- pnpm build
- pnpm docs:check
- pnpm test (59 files / 1398 tests passed)

GitHub CI is green on the current head as well.

_{Sent by Cursor Automation: Sunil On Demand Architecture Review}

cursor · 2026-05-20T15:09:36Z

  description:
    "Harness AI Evals control plane: datasets, evaluations, runs, metrics, metric sets, suites, targets, models, annotations, analytics, registry, git settings.",
-  optIn: false,
+  optIn: true,


Switching ai-evals back to opt-in here makes the registry defaults correct, but the server still exposes the AI Evals prompts on every startup (createHarnessServer() always calls registerAllPrompts(server), and registerAllPrompts() always registers run-evaluation, create-eval-suite, and add-metric). After this change, default sessions can still discover those prompts even though the eval_* resource types they instruct agents to use are now disabled, which leads to Unknown resource_type failures. Can we gate the AI Evals prompts behind the same opt-in decision and add a regression test for default vs. +ai-evals prompt registration?

cursoragent and others added 3 commits May 20, 2026 14:49

fix: remove hardcoded chat model from public example

e0bcff3

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

docs: record chat model exposure triage

cf3015c

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

docs: note slack reply channel blocker

695c5fc

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

cursoragent and others added 4 commits May 20, 2026 14:53

docs: refresh generated resource counts

640276b

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

docs: avoid repeating removed model slug

248095d

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

fix: keep ai evals opt-in by default

58f6803

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>

docs: record ai evals opt-in fix

e3dac02

Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>