feat: add knowledge-graph and semantic-layer toolsets to OSS#255
feat: add knowledge-graph and semantic-layer toolsets to OSS#255sunilgattupalle wants to merge 4 commits into
Conversation
Moves the Knowledge Graph query toolsets from mcpServerInternal into the public mcp-server package so all users get HQL query capabilities by default. - Add `knowledge-graph` toolset: kg_queryable_type_summary, kg_grammar, hql_query (validate/run) - Add `semantic-layer` toolset: kg_type, kg_related_type - Add `src/utils/strip-meta.ts` shared utility (consolidates duplicate inline copies) - Add `harness_batch_validate_hql` and `harness_batch_execute_hql` tools in hql-batch.ts - Register both toolsets in ALL_TOOLSETS (default-on) and batch tools in registerAllTools() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e AI-Tool: claude-code AI-Model: unknown
|
|
…lsets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e AI-Tool: claude-code AI-Model: unknown
There was a problem hiding this comment.
Stale comment
Found 2 architecture-standard issues.
src/tools/index.ts:34registers the new HQL batch tools outside the registry/toolset path. On the built branch, aHARNESS_TOOLSETS=logsregistry still exposesharness_batch_validate_hqlandharness_batch_execute_hql, so toolset filtering no longer matches the actual public surface.README.md:11still documents the 11-tool consolidated architecture, but this branch now publishes 13 tools and does not document the new HQL batch tools or the newknowledge-graph/semantic-layersurfaces in the README tables.Assumptions:
- Treating the registry-backed generic-tool model and
HARNESS_TOOLSETSfiltering as intentional architecture constraints, per the repo guidance and README.Change summary:
- Reviewed the PR diff, then verified with
pnpm typecheck,pnpm build,pnpm docs:check, and a post-build tool-registration probe.Sent by Cursor Automation: Sunil On Demand Architecture Review
- Replace standalone harness_batch_validate_hql/harness_batch_execute_hql tools with a `queries` batch parameter on harness_execute — keeps tool count at 11, routes through registry for HARNESS_READ_ONLY, elicitation, audit, and retry policy - Fix semantic-layer kg_type list extractor to return a compact-safe projection (identifier, name, category, kind, description) instead of full type objects; full field metadata remains on get — no skipCompact needed - Update kg_related_type get description; strip unused skipCompact flags Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e AI-Tool: claude-code AI-Model: unknown
There was a problem hiding this comment.
Stale comment
Found 3 issues that I think should be fixed before this PR is considered architecture-compliant:
- High:
src/tools/hql-batch.tsadds standalone HQL batch-tool code, but it is not wired into the live registration surface, so the file is dead code on the shipped MCP server and drifts from the repo's consolidated-tool / registry-backed design.- Medium:
src/registry/toolsets/semantic-layer.tsadvertises akg_type.listcontract that the extractor does not actually return, and it points agents atkg_queryable_type, which is not a registered resource type.- Medium:
src/registry/toolsets/semantic-layer.tsrelies on adcs_enrichmentlist signal, but defaultharness_listcompaction strips that key, so agents cannot see the signal the new docs describe.Open questions / assumptions:
- Reviewed against live PR head
ecd2018456156ca52a75073eae2812a6c2d34100.- If
src/tools/hql-batch.tsis intentionally staged for a follow-up, I would keep it out of this PR until it is either registry-backed or fully integrated and tested.Verification:
pnpm buildpnpm exec vitest run tests/registry/registry.test.ts tests/tools/tool-handlers.test.ts- Built-surface probe:
HARNESS_TOOLSETS=logsstill registers only the 11 consolidated tools (noharness_batch_*tools)- Built runtime probe:
compactItems([{..., dcs_enrichment: true}])dropsdcs_enrichmentSent by Cursor Automation: Sunil On Demand Architecture Review
|
|
||
| // ─── Registration ──────────────────────────────────────────────────────────── | ||
|
|
||
| export function registerHqlBatchTools(server: McpServer, client: HarnessClient): void { |
There was a problem hiding this comment.
registerHqlBatchTools() is never wired into registerAllTools() / server startup on the current head, so the two harness_batch_* tools never appear on the live MCP surface. That leaves this whole file as dead code in the OSS build and also drifts from the repo's documented 11-tool, registry-backed architecture. If HQL batching is meant to ship, it should either go through the existing registry / harness_execute path or be integrated and tested end-to-end; otherwise it should stay out of this PR.
| "config, or data model). List returns all types with field metadata including non-queryable types. " + | ||
| "Use for exploring the full data model. For HQL query building, use kg_queryable_type instead.", |
There was a problem hiding this comment.
This description is out of sync with the runtime contract in two ways: schemaTypesExtract() only returns summary fields (identifier, name, category, kind, description), not field metadata, and the HQL-oriented summary resource is kg_queryable_type_summary, not kg_queryable_type. Because harness_describe exposes this string directly, agents following it will both overestimate what kg_type.list returns and be pointed at a non-existent resource_type.
| // Surface dcs_enrichment flag on relationship types so agents know | ||
| // to fetch full details via get before orchestrating DCS calls. | ||
| if (category === "relationship") { | ||
| const annotations = typeObj.annotations as Record<string, unknown>[] | undefined; | ||
| const hasDcsEnrichment = (annotations ?? []).some( | ||
| (a) => (a.key as string) === "dcs_enrichment", | ||
| ); | ||
| if (hasDcsEnrichment) item.dcs_enrichment = true; |
There was a problem hiding this comment.
schemaTypesExtract() adds dcs_enrichment: true here, but default harness_list output immediately runs through compactItems(), and that whitelist drops dcs_enrichment. I verified on the built branch that compactItems([{..., dcs_enrichment: true}]) returns no dcs_enrichment key, so the signal this comment describes never reaches agents unless callers remember to set compact: false. If this flag is part of the agent contract, this resource needs skipCompact: true or the compact whitelist needs to preserve dcs_enrichment.
- Batch HQL: fail fast on HARNESS_READ_ONLY and policy errors before fan-out (preflight dispatch on first query surfaces registry/policy errors as hard failures rather than per-item success:false results) - Remove hql-batch.ts from git index (was deleted but still appeared in PR diff) - semantic-layer kg_type list: drop dcs_enrichment flag (not compact-safe, was stripped by harness_list compaction anyway — available on get) - semantic-layer kg_type description: clarify list returns compact summary, get returns full field metadata - README: fix architecture diagram (32→34 toolsets), add knowledge-graph and semantic-layer to toolset reference table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: c131f059-3ca1-4319-b21d-1162415e111e AI-Tool: claude-code AI-Model: unknown
There was a problem hiding this comment.
Found 3 issues that still keep this short of Sunil's architecture bar:
harness_executebatch HQL drops the first query's real response (and its per-query options), so the first item in every batch returns incomplete data.- The KG discovery hints advertise a
harness_getcall shape that the public tool doesn't accept (type_id/kindas top-level args instead ofresource_id+params), soharness_describeis teaching agents a broken contract. - The new
queriesschema regresses the repo's Zod 4 metadata rule by calling.describe(...).optional(), which strips the description agents rely on.
Verification: pnpm install --frozen-lockfile, pnpm typecheck, pnpm build, pnpm test (1473 passing), plus stdio startup smoke tests with default toolsets and HARNESS_TOOLSETS=logs.
Residual risk: there is still no dedicated coverage for the new KG/semantic-layer resources or the batch execute path, so these issues were not exercised by the current suite.
Sent by Cursor Automation: Sunil On Demand Architecture Review
| // committing to the full fan-out. | ||
| try { | ||
| const firstQuery = batchQueries[0]!; | ||
| await registry.dispatchExecute(client, "hql_query", args.action, { body: { query_string: firstQuery.query_string } }, auditCtxBatch, extra.signal); |
There was a problem hiding this comment.
This preflight call is executing the first query for real, but the response is discarded. The batch result later reports query 0 as just { query_string, success: true }, so the first item never returns its actual is_valid/errors or rows/columns/stats, and its timeout_ms / max_results are ignored as well. Can we capture and reuse this response (or do a non-executing preflight) before building results?
| executeHint: | ||
| "1. Learn grammar: harness_get(resource_type='kg_grammar'). " + | ||
| "2. Discover types: harness_list(resource_type='kg_queryable_type_summary') — note the 'identifier' and 'kind' fields. " + | ||
| "3. Get fields per type: harness_get(resource_type='kg_type', type_id='<identifier>', kind='<kind>'). " + |
There was a problem hiding this comment.
This hint doesn't match the public harness_get contract. The tool only accepts resource_id plus params, not top-level type_id / kind, so agents following this example will either fail validation or silently drop the fields. Can we switch the example to something like harness_get(resource_type='kg_type', resource_id='<identifier>', params={ kind: '<kind>' }) and update the similar description above as well?
| timeout_ms: z.number().optional().describe("Per-query timeout in milliseconds"), | ||
| max_results: z.number().optional().describe("Maximum rows to return"), | ||
| }) | ||
| ).max(20).describe("Batch HQL queries — use with resource_type='hql_query' and action='validate' or 'run'. Fans out in parallel, returns per-query results.").optional(), |
There was a problem hiding this comment.
Small standards issue: this uses .describe(...).optional(). Per the repo's Zod 4 rule in CLAUDE.md / AGENTS.md, .describe() needs to be last or the description is lost on the optional wrapper. This means agents won't actually see the queries guidance here. ...max(20).optional().describe(...) would preserve it.


Summary
knowledge-graphtoolset frommcpServerInternalto OSS: addskg_queryable_type_summary,kg_grammar, andhql_query(validate/run execute actions)semantic-layertoolset frommcpServerInternalto OSS: addskg_typeandkg_related_typesrc/utils/strip-meta.tsshared utility (consolidates two identical inline copies)harness_batch_validate_hqlandharness_batch_execute_hqltools (parallel fan-out for HQL)registerAllTools()Notes
mcpServerInternalfollows after this is published as 3.1.0mcpServerInternalkeeps its ownstrip-meta.tscopy fordata-connector.tsTest plan
pnpm typecheckpassespnpm testpasses (1473/1473 tests)🤖 Generated with Claude Code