Skip to content

feat: add knowledge-graph and semantic-layer toolsets to OSS#255

Open
sunilgattupalle wants to merge 4 commits into
mainfrom
feat/knowledge-graph-toolsets
Open

feat: add knowledge-graph and semantic-layer toolsets to OSS#255
sunilgattupalle wants to merge 4 commits into
mainfrom
feat/knowledge-graph-toolsets

Conversation

@sunilgattupalle
Copy link
Copy Markdown
Collaborator

Summary

  • Moves knowledge-graph toolset from mcpServerInternal to OSS: adds kg_queryable_type_summary, kg_grammar, and hql_query (validate/run execute actions)
  • Moves semantic-layer toolset from mcpServerInternal to OSS: adds kg_type and kg_related_type
  • Adds src/utils/strip-meta.ts shared utility (consolidates two identical inline copies)
  • Adds harness_batch_validate_hql and harness_batch_execute_hql tools (parallel fan-out for HQL)
  • Both toolsets are default-on; batch tools registered in registerAllTools()

Notes

  • A corresponding cleanup PR in mcpServerInternal follows after this is published as 3.1.0
  • Until the internal dep bump, mcpServerInternal keeps its own strip-meta.ts copy for data-connector.ts

Test plan

  • pnpm typecheck passes
  • pnpm test passes (1473/1473 tests)

🤖 Generated with Claude Code

Moves the Knowledge Graph query toolsets from mcpServerInternal into the
public mcp-server package so all users get HQL query capabilities by default.

- Add `knowledge-graph` toolset: kg_queryable_type_summary, kg_grammar, hql_query (validate/run)
- Add `semantic-layer` toolset: kg_type, kg_related_type
- Add `src/utils/strip-meta.ts` shared utility (consolidates duplicate inline copies)
- Add `harness_batch_validate_hql` and `harness_batch_execute_hql` tools in hql-batch.ts
- Register both toolsets in ALL_TOOLSETS (default-on) and batch tools in registerAllTools()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e
AI-Tool: claude-code
AI-Model: unknown
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…lsets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e
AI-Tool: claude-code
AI-Model: unknown
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Found 2 architecture-standard issues.

  1. src/tools/index.ts:34 registers the new HQL batch tools outside the registry/toolset path. On the built branch, a HARNESS_TOOLSETS=logs registry still exposes harness_batch_validate_hql and harness_batch_execute_hql, so toolset filtering no longer matches the actual public surface.
  2. README.md:11 still documents the 11-tool consolidated architecture, but this branch now publishes 13 tools and does not document the new HQL batch tools or the new knowledge-graph / semantic-layer surfaces in the README tables.

Assumptions:

  • Treating the registry-backed generic-tool model and HARNESS_TOOLSETS filtering as intentional architecture constraints, per the repo guidance and README.

Change summary:

  • Reviewed the PR diff, then verified with pnpm typecheck, pnpm build, pnpm docs:check, and a post-build tool-registration probe.
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread src/tools/index.ts Outdated
Comment thread README.md
- Replace standalone harness_batch_validate_hql/harness_batch_execute_hql tools
  with a `queries` batch parameter on harness_execute — keeps tool count at 11,
  routes through registry for HARNESS_READ_ONLY, elicitation, audit, and retry policy
- Fix semantic-layer kg_type list extractor to return a compact-safe projection
  (identifier, name, category, kind, description) instead of full type objects;
  full field metadata remains on get — no skipCompact needed
- Update kg_related_type get description; strip unused skipCompact flags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e
AI-Tool: claude-code
AI-Model: unknown
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Found 3 issues that I think should be fixed before this PR is considered architecture-compliant:

  1. High: src/tools/hql-batch.ts adds standalone HQL batch-tool code, but it is not wired into the live registration surface, so the file is dead code on the shipped MCP server and drifts from the repo's consolidated-tool / registry-backed design.
  2. Medium: src/registry/toolsets/semantic-layer.ts advertises a kg_type.list contract that the extractor does not actually return, and it points agents at kg_queryable_type, which is not a registered resource type.
  3. Medium: src/registry/toolsets/semantic-layer.ts relies on a dcs_enrichment list signal, but default harness_list compaction strips that key, so agents cannot see the signal the new docs describe.

Open questions / assumptions:

  • Reviewed against live PR head ecd2018456156ca52a75073eae2812a6c2d34100.
  • If src/tools/hql-batch.ts is intentionally staged for a follow-up, I would keep it out of this PR until it is either registry-backed or fully integrated and tested.

Verification:

  • pnpm build
  • pnpm exec vitest run tests/registry/registry.test.ts tests/tools/tool-handlers.test.ts
  • Built-surface probe: HARNESS_TOOLSETS=logs still registers only the 11 consolidated tools (no harness_batch_* tools)
  • Built runtime probe: compactItems([{..., dcs_enrichment: true}]) drops dcs_enrichment
Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

Comment thread src/tools/hql-batch.ts Outdated

// ─── Registration ────────────────────────────────────────────────────────────

export function registerHqlBatchTools(server: McpServer, client: HarnessClient): void {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registerHqlBatchTools() is never wired into registerAllTools() / server startup on the current head, so the two harness_batch_* tools never appear on the live MCP surface. That leaves this whole file as dead code in the OSS build and also drifts from the repo's documented 11-tool, registry-backed architecture. If HQL batching is meant to ship, it should either go through the existing registry / harness_execute path or be integrated and tested end-to-end; otherwise it should stay out of this PR.

Comment thread src/registry/toolsets/semantic-layer.ts Outdated
Comment on lines +208 to +209
"config, or data model). List returns all types with field metadata including non-queryable types. " +
"Use for exploring the full data model. For HQL query building, use kg_queryable_type instead.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is out of sync with the runtime contract in two ways: schemaTypesExtract() only returns summary fields (identifier, name, category, kind, description), not field metadata, and the HQL-oriented summary resource is kg_queryable_type_summary, not kg_queryable_type. Because harness_describe exposes this string directly, agents following it will both overestimate what kg_type.list returns and be pointed at a non-existent resource_type.

Comment thread src/registry/toolsets/semantic-layer.ts Outdated
Comment on lines +58 to +65
// Surface dcs_enrichment flag on relationship types so agents know
// to fetch full details via get before orchestrating DCS calls.
if (category === "relationship") {
const annotations = typeObj.annotations as Record<string, unknown>[] | undefined;
const hasDcsEnrichment = (annotations ?? []).some(
(a) => (a.key as string) === "dcs_enrichment",
);
if (hasDcsEnrichment) item.dcs_enrichment = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemaTypesExtract() adds dcs_enrichment: true here, but default harness_list output immediately runs through compactItems(), and that whitelist drops dcs_enrichment. I verified on the built branch that compactItems([{..., dcs_enrichment: true}]) returns no dcs_enrichment key, so the signal this comment describes never reaches agents unless callers remember to set compact: false. If this flag is part of the agent contract, this resource needs skipCompact: true or the compact whitelist needs to preserve dcs_enrichment.

- Batch HQL: fail fast on HARNESS_READ_ONLY and policy errors before fan-out
  (preflight dispatch on first query surfaces registry/policy errors as hard
  failures rather than per-item success:false results)
- Remove hql-batch.ts from git index (was deleted but still appeared in PR diff)
- semantic-layer kg_type list: drop dcs_enrichment flag (not compact-safe, was
  stripped by harness_list compaction anyway — available on get)
- semantic-layer kg_type description: clarify list returns compact summary,
  get returns full field metadata
- README: fix architecture diagram (32→34 toolsets), add knowledge-graph and
  semantic-layer to toolset reference table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e
AI-Tool: claude-code
AI-Model: unknown
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 3 issues that still keep this short of Sunil's architecture bar:

  1. harness_execute batch HQL drops the first query's real response (and its per-query options), so the first item in every batch returns incomplete data.
  2. The KG discovery hints advertise a harness_get call shape that the public tool doesn't accept (type_id / kind as top-level args instead of resource_id + params), so harness_describe is teaching agents a broken contract.
  3. The new queries schema regresses the repo's Zod 4 metadata rule by calling .describe(...).optional(), which strips the description agents rely on.

Verification: pnpm install --frozen-lockfile, pnpm typecheck, pnpm build, pnpm test (1473 passing), plus stdio startup smoke tests with default toolsets and HARNESS_TOOLSETS=logs.

Residual risk: there is still no dedicated coverage for the new KG/semantic-layer resources or the batch execute path, so these issues were not exercised by the current suite.

Open in Web View Automation 

Sent by Cursor Automation: Sunil On Demand Architecture Review

// committing to the full fan-out.
try {
const firstQuery = batchQueries[0]!;
await registry.dispatchExecute(client, "hql_query", args.action, { body: { query_string: firstQuery.query_string } }, auditCtxBatch, extra.signal);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preflight call is executing the first query for real, but the response is discarded. The batch result later reports query 0 as just { query_string, success: true }, so the first item never returns its actual is_valid/errors or rows/columns/stats, and its timeout_ms / max_results are ignored as well. Can we capture and reuse this response (or do a non-executing preflight) before building results?

executeHint:
"1. Learn grammar: harness_get(resource_type='kg_grammar'). " +
"2. Discover types: harness_list(resource_type='kg_queryable_type_summary') — note the 'identifier' and 'kind' fields. " +
"3. Get fields per type: harness_get(resource_type='kg_type', type_id='<identifier>', kind='<kind>'). " +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hint doesn't match the public harness_get contract. The tool only accepts resource_id plus params, not top-level type_id / kind, so agents following this example will either fail validation or silently drop the fields. Can we switch the example to something like harness_get(resource_type='kg_type', resource_id='<identifier>', params={ kind: '<kind>' }) and update the similar description above as well?

timeout_ms: z.number().optional().describe("Per-query timeout in milliseconds"),
max_results: z.number().optional().describe("Maximum rows to return"),
})
).max(20).describe("Batch HQL queries — use with resource_type='hql_query' and action='validate' or 'run'. Fans out in parallel, returns per-query results.").optional(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small standards issue: this uses .describe(...).optional(). Per the repo's Zod 4 rule in CLAUDE.md / AGENTS.md, .describe() needs to be last or the description is lost on the optional wrapper. This means agents won't actually see the queries guidance here. ...max(20).optional().describe(...) would preserve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants