Conversation
Implement reusable Agent Skills (agentskills.io format) that package the repo's domain knowledge for use by any SKILL.md-compatible agent: - markdown-ld-authoring: Frontmatter, entity_hints, wikilinks, and content structuring for maximum RDF extraction quality - sparql-query-writer: KB vocabulary, query patterns, case-insensitive matching, and safety constraints for the SPARQL endpoint - rdf-jsonld-engineer: JSON-LD context design, Turtle serialization, entity ID minting, and sameAs alignment with Wikidata - shacl-shape-designer: SHACL shape writing, pySHACL integration, and validation debugging patterns - llm-rdf-extraction: Prompt engineering for structured RDF output, confidence calibration, chunking strategy, and failure modes - knowledge-graph-mcp: MCP server exposing the KB as agent-callable tools (SPARQL, NL query, entity listing) via FastMCP Each skill follows the Agent Skills spec: SKILL.md with YAML frontmatter (name, description) + markdown instructions, with references/ for detailed schemas and examples. All names pass spec validation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a set of reusable Agent Skills (SKILL.md format) that document and package this repo’s Markdown-LD → RDF/JSON-LD → SPARQL knowledge-bank conventions, including a reference MCP server for querying the graph.
Changes:
- Introduces skill docs for authoring Markdown-LD, extracting RDF with LLMs, designing JSON-LD/RDF artifacts, writing SPARQL, and designing SHACL shapes.
- Adds reference “vocabulary / context / prompt patterns / existing shapes” documents to support those skills.
- Adds a Python FastMCP server example intended to expose the knowledge graph via MCP tools.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/sparql-query-writer/SKILL.md | SPARQL skill guide (prefixes, patterns, safety constraints, endpoint usage). |
| skills/sparql-query-writer/references/vocabulary.md | Vocabulary/prefix and ID/type conventions reference. |
| skills/shacl-shape-designer/SKILL.md | SHACL skill guide with pySHACL usage and design guidelines. |
| skills/shacl-shape-designer/references/existing-shapes.md | Snapshot/reference of current shapes and recommended additions. |
| skills/rdf-jsonld-engineer/SKILL.md | JSON-LD context, graph structure, Turtle serialization, ID minting, sameAs alignment. |
| skills/rdf-jsonld-engineer/references/context-design.md | JSON-LD context design patterns and validation tips. |
| skills/markdown-ld-authoring/SKILL.md | Content authoring conventions for extraction quality (frontmatter, entity_hints, wikilinks, structure). |
| skills/markdown-ld-authoring/references/example-article.md | Annotated example article + “what this produces” examples. |
| skills/llm-rdf-extraction/SKILL.md | Prompt architecture, schema enforcement, confidence, chunking/caching, failure modes. |
| skills/llm-rdf-extraction/references/prompt-patterns.md | Current prompt template + iteration checklist + API call configuration. |
| skills/knowledge-graph-mcp/SKILL.md | MCP server skill guide and tool definitions intended for querying the KB. |
| skills/knowledge-graph-mcp/scripts/server.py | FastMCP server reference implementation (local RDFLib dataset querying). |
| skills/knowledge-graph-mcp/references/api-reference.md | Reference docs for the existing /api/sparql and /api/ask endpoints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ```sparql | ||
| PREFIX schema: <https://schema.org/> | ||
| SELECT ?subject ?predicate ?object WHERE { | ||
| ?subject ?predicate ?object . | ||
| FILTER(?predicate != rdf:type) | ||
| } |
There was a problem hiding this comment.
In this example query, rdf:type is referenced but the rdf: prefix is not declared in the snippet, so the query as written won’t parse. Add PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> to the snippet (or remove the rdf:type filter).
| ## Safety Constraints | ||
|
|
||
| The endpoint enforces these rules: | ||
|
|
||
| 1. **Only `SELECT` and `ASK` queries** — `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, `CREATE` are blocked | ||
| 2. **Always include `LIMIT`** — default to `LIMIT 100` unless the user asks for all results | ||
| 3. **No mutating operations** — the graph is read-only at query time | ||
|
|
There was a problem hiding this comment.
The “Safety Constraints” section says the /api/sparql endpoint enforces LIMITs and allows only SELECT/ASK, but the current implementation only blocks mutating keywords and does not enforce query form or LIMIT injection. Either update the documentation to match actual behavior, or tighten the endpoint/server enforcement to match these stated constraints.
| Two deployment modes: | ||
|
|
||
| 1. **Local mode** — MCP server loads `.ttl` files directly into RDFLib | ||
| 2. **Remote mode** — MCP server proxies to the deployed Azure Functions API | ||
|
|
There was a problem hiding this comment.
This section describes a “Remote mode” that proxies to a deployed API, but the reference implementation in scripts/server.py does not currently implement an --api-url option or any HTTP proxy behavior. Please either implement the remote mode or remove/clarify these deployment-mode claims so the skill stays accurate.
| ### 2. `ask_question` | ||
|
|
||
| Natural language query — the server translates to SPARQL. | ||
|
|
||
| ```python | ||
| @mcp.tool( | ||
| annotations={ | ||
| "readOnlyHint": True, | ||
| "openWorldHint": False, | ||
| } | ||
| ) | ||
| def ask_question(question: str) -> str: | ||
| """Ask a natural language question about the knowledge graph. | ||
|
|
||
| The question is translated to SPARQL and executed. The response | ||
| includes both the generated SPARQL and the results. | ||
|
|
||
| Example questions: | ||
| - "What entities are in the knowledge graph?" | ||
| - "Which articles mention SPARQL?" | ||
| - "Find all organizations" | ||
|
|
||
| Args: | ||
| question: A natural language question about the knowledge. | ||
|
|
||
| Returns: | ||
| JSON with 'question', 'sparql', and 'results' fields. | ||
| """ | ||
| ``` |
There was a problem hiding this comment.
The skill defines an ask_question MCP tool, but scripts/server.py does not implement an ask_question function/tool. This is likely to confuse consumers following the skill; either add the tool (proxying to /api/ask or performing local NL→SPARQL) or remove this tool definition from the doc.
| """MCP server for a Markdown-LD knowledge graph. | ||
|
|
||
| Exposes the knowledge bank as MCP tools that AI agents can discover | ||
| and call. Supports both local mode (loads .ttl files directly) and | ||
| remote mode (proxies to a deployed API). | ||
|
|
||
| Usage: | ||
| # Local mode (stdio transport) | ||
| python server.py --graph-dir ./graph/articles | ||
|
|
||
| # Local mode (HTTP transport) | ||
| python server.py --graph-dir ./graph/articles --transport http --port 8080 | ||
|
|
||
| # Remote mode (proxy to deployed API) | ||
| python server.py --api-url https://your-swa.azurestaticapps.net | ||
|
|
There was a problem hiding this comment.
The module docstring/usage mentions a --api-url flag and “remote mode”, but argparse does not define --api-url and there is no proxy implementation. Running the documented command will fail. Align the docstring with the actual CLI, or add the missing CLI flag and remote proxy code.
| def _enforce_safety(sparql: str) -> tuple[bool, str, str]: | ||
| """Validate safety constraints. Returns (is_safe, sanitized, error).""" | ||
| upper = sparql.strip().upper() | ||
| for kw in MUTATING_KEYWORDS: | ||
| if kw in upper: | ||
| return False, sparql, f"Mutating keyword '{kw}' is not allowed" | ||
| if "LIMIT" not in upper and "SELECT" in upper: | ||
| sparql = sparql.rstrip().rstrip(";") + "\nLIMIT 100" | ||
| return True, sparql, "" |
There was a problem hiding this comment.
_enforce_safety only blocks mutating keywords; it does not actually enforce the documented “Only SELECT and ASK queries” rule, so CONSTRUCT/DESCRIBE queries can slip through. Consider validating the parsed query type (e.g., via RDFLib parsing) and explicitly rejecting anything other than SELECT/ASK, in addition to mutating keyword checks.
| @mcp.tool() | ||
| def list_entities(entity_type: str = "schema:Thing", limit: int = 50) -> str: | ||
| """List entities in the knowledge graph, optionally filtered by type. | ||
|
|
||
| Available types: schema:Person, schema:Organization, | ||
| schema:SoftwareApplication, schema:CreativeWork, schema:Thing | ||
|
|
||
| Args: | ||
| entity_type: Schema.org type to filter by (default: all non-Article entities). | ||
| limit: Maximum number of results (default: 50). | ||
|
|
||
| Returns: | ||
| JSON array of entities with id, name, and type. | ||
| """ | ||
| if entity_type == "schema:Thing": | ||
| query = f""" | ||
| PREFIX schema: <https://schema.org/> | ||
| SELECT DISTINCT ?entity ?name ?type WHERE {{ | ||
| ?entity a ?type ; schema:name ?name . | ||
| FILTER(?type != schema:Article) | ||
| }} LIMIT {limit} | ||
| """ | ||
| else: | ||
| query = f""" | ||
| PREFIX schema: <https://schema.org/> | ||
| SELECT DISTINCT ?entity ?name WHERE {{ | ||
| ?entity a {entity_type} ; schema:name ?name . | ||
| }} LIMIT {limit} | ||
| """ |
There was a problem hiding this comment.
list_entities interpolates entity_type directly into the SPARQL string. Because this parameter is user-controlled, it enables SPARQL injection (changing the WHERE clause, UNIONs, SERVICE calls, etc.) and can bypass intended restrictions. Validate entity_type against an allowlist of supported types (or map known tokens to full IRIs) and reject anything else; also consider clamping limit to a reasonable maximum.
| escaped = entity_name.replace('"', '\\"') | ||
| query = f""" | ||
| PREFIX schema: <https://schema.org/> | ||
| PREFIX kb: <https://example.com/vocab/kb#> | ||
| SELECT ?entity ?type ?sameAs ?article ?articleTitle ?related ?relatedName WHERE {{ | ||
| ?entity schema:name ?name . | ||
| FILTER(LCASE(STR(?name)) = LCASE("{escaped}")) | ||
| ?entity a ?type . |
There was a problem hiding this comment.
get_entity_details only escapes double quotes in entity_name. Backslashes and newlines can still break the SPARQL string literal, and large inputs could also create very expensive queries. Escape backslashes/newlines (or use a safer literal-encoding helper) and consider applying a reasonable max length to entity_name.
| { | ||
| "id": "https://example.com/id/neo4j", | ||
| "type": "schema:SoftwareApplication", | ||
| "schema:name": "Neo4j", | ||
| "schema:sameAs": "https://www.wikidata.org/entity/Q7071552" | ||
| }, |
There was a problem hiding this comment.
In the JSON-LD example, schema:sameAs (and other relationship/object properties like schema:mentions) are shown with plain string values. Without @type: "@id" coercion in the context (or using { "id": "..." } node references), JSON-LD processors will treat these as string literals instead of IRIs. Update the context/example to ensure these properties are encoded as IRI references.
Summary
Implements 6 reusable Agent Skills (SKILL.md format) that package the repo's domain knowledge for use by any compatible agent (Claude Code, Copilot, Codex, Cursor, etc.).
Skills
Structure
Each skill follows the Agent Skills spec:
eferences/\ for detailed schemas, vocabularies, and examples
Validation
Files
13 files added, 2,113 lines total. No existing files modified.