Skip to content

fix: remove hardcoded chat model from public example#234

Draft
cursor[bot] wants to merge 8 commits into
mainfrom
cursor/mcp-v2-bug-resolution-76a3
Draft

fix: remove hardcoded chat model from public example#234
cursor[bot] wants to merge 8 commits into
mainfrom
cursor/mcp-v2-bug-resolution-76a3

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 20, 2026

Description

Removes a hardcoded provider chat model slug from the public agent-pipeline example and replaces it with a neutral runtime input placeholder. Adds a regression test so public examples do not expose concrete provider chat model slugs.

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other

Checklist

  • Tests pass
  • Typecheck passes
Open in Web View Automation 

cursoragent and others added 3 commits May 20, 2026 14:49
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

cursoragent and others added 4 commits May 20, 2026 14:53
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Found 1 architecture-standards issue:

  • README.md now says 32 of 32 toolsets are enabled, but the same public docs still describe ai-evals as an opt-in toolset excluded from defaults. The registry marks ai-evals with optIn: false, so this count refresh leaves the README in a partially updated, contradictory state. Under Sunil's minimal-impact/docs-match-runtime standard, this should either be dropped from the bugfix PR or completed in the same change.

Assumptions:

  • Reviewed against the live PR head 248095d.

Verification:

  • pnpm test tests/tools/harness-schema-examples.test.ts
  • pnpm typecheck

Change summary:

  • The public-example fix itself looks good; the review concern is limited to the unrelated README refresh.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread README.md Outdated
Co-authored-by: Rohan Gupta <thisrohangupta@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Findings

  1. Warning: README.md was updated to advertise the all-toolsets inventory (187 resource types / 32 toolsets), but the generated docs intentionally publish the default inventory (169 resource types / 31 toolsets, with 31 of 32 enabled by default because ai-evals is opt-in). On this branch, pnpm build && pnpm docs:check fails locally and pnpm docs:generate rewrites the README back to those default counts, so the PR currently leaves the public docs out of sync with the repository's generated-docs standard.

Open Questions

None.

Change Summary

  • src/data/examples/pipeline-v1.ts now uses a neutral model: <+input> placeholder instead of a concrete provider slug.
  • src/registry/toolsets/ai-evals.ts correctly restores ai-evals to opt-in-only, and the focused registry/example tests pass.
  • Local verification: pnpm test tests/registry/registry.test.ts tests/tools/harness-schema-examples.test.ts, pnpm build, and pnpm typecheck passed; pnpm docs:check failed because README.md is stale.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment
  1. Important — README.md: the generated resource/toolset counts are stale on the current PR head. In an isolated checkout of origin/cursor/mcp-v2-bug-resolution-76a3, pnpm build && pnpm docs:check fails, and pnpm docs:generate rewrites the file back to 169 resource types, 31 toolsets, and 31 of 32 default-enabled toolsets. That means the current README overstates both the total/default footprint after the later ai-evals opt-in change, so the docs no longer match the checked-in generator output or runtime defaults. Please regenerate README.md from the current branch state before merge.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Important

  • tests/tools/harness-schema-examples.test.ts: the new regression only exercises getExample(), but the public surface this bug affects is harness_schema(..., example="agent-pipeline") in src/tools/harness-schema.ts. That means the PR body and tasks/todo.md currently overstate the coverage: a tool-layer regression could still break public example delivery while this test stays green.

Assumption

  • I reviewed the requested snapshot 31c119d..248095d. That snapshot also briefly carried an unrelated README.md count refresh that later commits on this PR appear to have corrected, so I’m not treating it as an active issue on the current head.

Assessment

  • The functional change in src/data/examples/pipeline-v1.ts matches the stated intent and is nicely minimal. The remaining gap is verification depth on the public tool surface.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread tests/tools/harness-schema-examples.test.ts
Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

No architecture-standard issues found on the current PR head (2ceb94c).

Assumptions / residual risk:

  • The commits after the code-bearing fix are docs/task-only (README.md, tasks/todo.md), so the local verification below still applies to the live head.
  • GitHub build-and-test matrix checks are still pending as of this review.

What I checked:

  • src/data/examples/pipeline-v1.ts now uses a neutral runtime placeholder (model: <+input>) instead of a provider slug, which matches the repo rule to avoid hardcoded public agent guidance.
  • src/registry/toolsets/ai-evals.ts restores ai-evals to optIn: true, and tests/registry/registry.test.ts covers both default exclusion and explicit +ai-evals enablement.
  • tests/tools/harness-schema-examples.test.ts adds regression coverage that the public agent-pipeline example does not expose a concrete chat model slug.

Verification:

  • pnpm test tests/registry/registry.test.ts tests/tools/harness-schema-examples.test.ts
  • pnpm typecheck
  • pnpm build
  • pnpm test (59 files, 1398 tests passed)
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Findings

  1. Importantsrc/registry/toolsets/ai-evals.ts:583
    Flipping ai-evals to optIn: true fixes the registry defaults, but the default server still unconditionally calls registerAllPrompts(server) from src/index.ts, and src/prompts/index.ts still always registers the AI Evals prompts. After this change, default startup hides the eval_* resources while prompts like run-evaluation and create-eval-suite remain discoverable and tell agents to call those now-disabled resource types, which will fail with Unknown resource_type. That means the documented opt-in behavior is only partially enforced and the default prompt surface regresses.

Open Questions / Assumptions

  • I’m assuming ai-evals being opt-in is meant to apply to all public AI Evals surfaces, not just resource/tool registration. If prompts are intentionally exempt, the docs and PR title should say so explicitly.

Scope / Tests / Verdict

  • Title vs. scope: only a partial match. The src/data/examples/pipeline-v1.ts change is the titled bugfix, but src/registry/toolsets/ai-evals.ts changes default server behavior in a broader way, and tasks/todo.md is unrelated process noise for a focused public bugfix PR.
  • Test coverage: the new example test is good coverage for the example scrub itself. Coverage is not sufficient for the broader opt-in change, because the added registry test only exercises Registry.getResource() and would not catch prompt-registration/default-startup regressions like the one above.
  • Verdict: needs changes.

Secondary context: the model: <+input> replacement itself is nicely minimal and aligned with the stated example fix.

Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread src/registry/toolsets/ai-evals.ts
Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

No blocking findings.

This PR is aligned with Sunil's architecture standards on the current head:

  • src/data/examples/pipeline-v1.ts removes the provider-specific chat model slug and replaces it with the neutral model: <+input> placeholder, which keeps the public example generic and avoids hardcoded internal defaults.
  • tests/tools/harness-schema-examples.test.ts adds focused regression coverage for that public example surface.
  • src/registry/toolsets/ai-evals.ts restores ai-evals to opt-in-only, and tests/registry/registry.test.ts covers the default-vs-+ai-evals behavior so the exposed registry surface stays consistent with the documented architecture.

Verification I ran on the live PR head:

  • pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts
  • pnpm typecheck
  • pnpm build
  • pnpm docs:check
  • pnpm test (59 files / 1398 tests passed)

GitHub CI is also green, including the +ai-evals smoke-test path.

Residual risk is low. The only meaningful remaining limitation is that the model-slug guard is regex-based, but for this narrowly scoped public-example regression that is an appropriate level of coverage.

Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Findings

  1. Importantsrc/registry/toolsets/ai-evals.ts
    This flips AI Evals out of the default registry, but the server still exposes the AI Evals prompts on every startup. createHarnessServer() always calls registerAllPrompts(server), and registerAllPrompts() always registers run-evaluation, create-eval-suite, and add-metric. Those prompts instruct agents to call eval_* resource types that now fail under the default config with Unknown resource_type.

    I verified this on the current PR head after pnpm build: a default Registry reports evalDatasetAvailable: false while prompt registration still includes both run-evaluation and create-eval-suite. Please either gate AI Evals prompt registration on the enabled toolsets or defer the optIn: true change until prompt filtering exists.

Open Questions / Assumptions

  • I’m assuming ai-evals being opt-in is meant to apply to the whole public AI Evals surface, not just resource registration.

Change Summary

  • The model: <+input> replacement in src/data/examples/pipeline-v1.ts is a good minimal fix for the original bug report.
  • The new example/registry coverage is useful, but it does not cover the broader startup prompt-surface regression introduced by the ai-evals opt-in flip.
  • Local verification on the current PR head: pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts tests/tools/harness-schema-tool.test.ts, pnpm typecheck, pnpm build, pnpm docs:check.
  • Live PR CI is green as of this review; this issue is a coverage/architecture gap rather than an already-failing check.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread src/registry/toolsets/ai-evals.ts
Copy link
Copy Markdown
Contributor Author

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  1. Important: making ai-evals opt-in is incomplete. This diff correctly hides the eval_* resources by default, but the server still unconditionally calls registerAllPrompts(server), and src/prompts/index.ts still always registers the AI Evals prompts. On a default startup, prompts like run-evaluation and create-eval-suite remain discoverable even though they instruct agents to call resource types that now fail with Unknown resource_type. That leaves the public surface inconsistent with the documented opt-in behavior and does not fully meet the architecture standard for keeping defaults aligned across surfaces.

Open Questions / Assumptions

  • I’m assuming ai-evals being opt-in is intended to apply to prompts as well as resources. If prompts are intentionally always-on, that distinction should be documented explicitly because the current prompt text still assumes the resources are available.

Change Summary

  • The model: <+input> replacement in src/data/examples/pipeline-v1.ts is nicely minimal and aligned with the PR title.
  • The new tests are good coverage for the example scrub and the default resource gating, but they do not cover prompt registration, which is where the remaining regression sits.
  • Verification on the current head:
    • pnpm test tests/tools/harness-schema-examples.test.ts tests/registry/registry.test.ts
    • pnpm typecheck
    • pnpm build
    • pnpm docs:check
    • pnpm test (59 files / 1398 tests passed)

GitHub CI is green on the current head as well.

Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

description:
"Harness AI Evals control plane: datasets, evaluations, runs, metrics, metric sets, suites, targets, models, annotations, analytics, registry, git settings.",
optIn: false,
optIn: true,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching ai-evals back to opt-in here makes the registry defaults correct, but the server still exposes the AI Evals prompts on every startup (createHarnessServer() always calls registerAllPrompts(server), and registerAllPrompts() always registers run-evaluation, create-eval-suite, and add-metric). After this change, default sessions can still discover those prompts even though the eval_* resource types they instruct agents to use are now disabled, which leads to Unknown resource_type failures. Can we gate the AI Evals prompts behind the same opt-in decision and add a regression test for default vs. +ai-evals prompt registration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants