Add 6 Agent Skills for Markdown-LD knowledge bank by lqdev · Pull Request #1 · lqdev/markdown-ld-kb

lqdev · 2026-04-05T03:49:22Z

Summary

Implements 6 reusable Agent Skills (SKILL.md format) that package the repo's domain knowledge for use by any compatible agent (Claude Code, Copilot, Codex, Cursor, etc.).

Skills

Skill	For	What it teaches
\markdown-ld-authoring\	Content authors	Frontmatter, entity_hints, wikilinks, content structuring for RDF extraction
\sparql-query-writer\	KB queriers	Vocabulary, query patterns, case-insensitive matching, safety constraints
\
df-jsonld-engineer\	Knowledge engineers	JSON-LD context design, Turtle serialization, entity ID minting, sameAs alignment
\shacl-shape-designer\	Ontology maintainers	SHACL shape writing, pySHACL integration, validation debugging
\llm-rdf-extraction\	Pipeline developers	Prompt engineering, confidence calibration, chunking strategy, failure modes
\knowledge-graph-mcp\	Agent integrators	MCP server wrapping SPARQL + NL query endpoints via FastMCP

Structure

Each skill follows the Agent Skills spec:

\SKILL.md\ with YAML frontmatter (name, description) + markdown instructions
eferences/\ for detailed schemas, vocabularies, and examples
\scripts/\ for executable code (knowledge-graph-mcp includes a working FastMCP server)

Validation

All 6 skills pass Agent Skills spec validation (name format, description length, body < 500 lines)
All 58 existing tests pass — no changes to existing code
All content derived from actual repo code and conventions (not generic boilerplate)

Files

13 files added, 2,113 lines total. No existing files modified.

Implement reusable Agent Skills (agentskills.io format) that package the repo's domain knowledge for use by any SKILL.md-compatible agent: - markdown-ld-authoring: Frontmatter, entity_hints, wikilinks, and content structuring for maximum RDF extraction quality - sparql-query-writer: KB vocabulary, query patterns, case-insensitive matching, and safety constraints for the SPARQL endpoint - rdf-jsonld-engineer: JSON-LD context design, Turtle serialization, entity ID minting, and sameAs alignment with Wikidata - shacl-shape-designer: SHACL shape writing, pySHACL integration, and validation debugging patterns - llm-rdf-extraction: Prompt engineering for structured RDF output, confidence calibration, chunking strategy, and failure modes - knowledge-graph-mcp: MCP server exposing the KB as agent-callable tools (SPARQL, NL query, entity listing) via FastMCP Each skill follows the Agent Skills spec: SKILL.md with YAML frontmatter (name, description) + markdown instructions, with references/ for detailed schemas and examples. All names pass spec validation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a set of reusable Agent Skills (SKILL.md format) that document and package this repo’s Markdown-LD → RDF/JSON-LD → SPARQL knowledge-bank conventions, including a reference MCP server for querying the graph.

Changes:

Introduces skill docs for authoring Markdown-LD, extracting RDF with LLMs, designing JSON-LD/RDF artifacts, writing SPARQL, and designing SHACL shapes.
Adds reference “vocabulary / context / prompt patterns / existing shapes” documents to support those skills.
Adds a Python FastMCP server example intended to expose the knowledge graph via MCP tools.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
skills/sparql-query-writer/SKILL.md	SPARQL skill guide (prefixes, patterns, safety constraints, endpoint usage).
skills/sparql-query-writer/references/vocabulary.md	Vocabulary/prefix and ID/type conventions reference.
skills/shacl-shape-designer/SKILL.md	SHACL skill guide with pySHACL usage and design guidelines.
skills/shacl-shape-designer/references/existing-shapes.md	Snapshot/reference of current shapes and recommended additions.
skills/rdf-jsonld-engineer/SKILL.md	JSON-LD context, graph structure, Turtle serialization, ID minting, sameAs alignment.
skills/rdf-jsonld-engineer/references/context-design.md	JSON-LD context design patterns and validation tips.
skills/markdown-ld-authoring/SKILL.md	Content authoring conventions for extraction quality (frontmatter, entity_hints, wikilinks, structure).
skills/markdown-ld-authoring/references/example-article.md	Annotated example article + “what this produces” examples.
skills/llm-rdf-extraction/SKILL.md	Prompt architecture, schema enforcement, confidence, chunking/caching, failure modes.
skills/llm-rdf-extraction/references/prompt-patterns.md	Current prompt template + iteration checklist + API call configuration.
skills/knowledge-graph-mcp/SKILL.md	MCP server skill guide and tool definitions intended for querying the KB.
skills/knowledge-graph-mcp/scripts/server.py	FastMCP server reference implementation (local RDFLib dataset querying).
skills/knowledge-graph-mcp/references/api-reference.md	Reference docs for the existing `/api/sparql` and `/api/ask` endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T03:53:31Z

+```sparql
+PREFIX schema: <https://schema.org/>
+SELECT ?subject ?predicate ?object WHERE {
+  ?subject ?predicate ?object .
+  FILTER(?predicate != rdf:type)
+}


In this example query, rdf:type is referenced but the rdf: prefix is not declared in the snippet, so the query as written won’t parse. Add PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> to the snippet (or remove the rdf:type filter).

Copilot · 2026-04-05T03:53:31Z

+## Safety Constraints
+
+The endpoint enforces these rules:
+
+1. **Only `SELECT` and `ASK` queries** — `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, `CREATE` are blocked
+2. **Always include `LIMIT`** — default to `LIMIT 100` unless the user asks for all results
+3. **No mutating operations** — the graph is read-only at query time
+


The “Safety Constraints” section says the /api/sparql endpoint enforces LIMITs and allows only SELECT/ASK, but the current implementation only blocks mutating keywords and does not enforce query form or LIMIT injection. Either update the documentation to match actual behavior, or tighten the endpoint/server enforcement to match these stated constraints.

Copilot · 2026-04-05T03:53:32Z

+Two deployment modes:
+
+1. **Local mode** — MCP server loads `.ttl` files directly into RDFLib
+2. **Remote mode** — MCP server proxies to the deployed Azure Functions API
+


This section describes a “Remote mode” that proxies to a deployed API, but the reference implementation in scripts/server.py does not currently implement an --api-url option or any HTTP proxy behavior. Please either implement the remote mode or remove/clarify these deployment-mode claims so the skill stays accurate.

Copilot · 2026-04-05T03:53:32Z

+### 2. `ask_question`
+
+Natural language query — the server translates to SPARQL.
+
+```python
+@mcp.tool(
+    annotations={
+        "readOnlyHint": True,
+        "openWorldHint": False,
+    }
+)
+def ask_question(question: str) -> str:
+    """Ask a natural language question about the knowledge graph.
+
+    The question is translated to SPARQL and executed. The response
+    includes both the generated SPARQL and the results.
+
+    Example questions:
+    - "What entities are in the knowledge graph?"
+    - "Which articles mention SPARQL?"
+    - "Find all organizations"
+
+    Args:
+        question: A natural language question about the knowledge.
+
+    Returns:
+        JSON with 'question', 'sparql', and 'results' fields.
+    """
+```


The skill defines an ask_question MCP tool, but scripts/server.py does not implement an ask_question function/tool. This is likely to confuse consumers following the skill; either add the tool (proxying to /api/ask or performing local NL→SPARQL) or remove this tool definition from the doc.

Copilot · 2026-04-05T03:53:32Z

+"""MCP server for a Markdown-LD knowledge graph.
+
+Exposes the knowledge bank as MCP tools that AI agents can discover
+and call. Supports both local mode (loads .ttl files directly) and
+remote mode (proxies to a deployed API).
+
+Usage:
+    # Local mode (stdio transport)
+    python server.py --graph-dir ./graph/articles
+
+    # Local mode (HTTP transport)
+    python server.py --graph-dir ./graph/articles --transport http --port 8080
+
+    # Remote mode (proxy to deployed API)
+    python server.py --api-url https://your-swa.azurestaticapps.net
+


The module docstring/usage mentions a --api-url flag and “remote mode”, but argparse does not define --api-url and there is no proxy implementation. Running the documented command will fail. Align the docstring with the actual CLI, or add the missing CLI flag and remote proxy code.

Copilot · 2026-04-05T03:53:32Z

+def _enforce_safety(sparql: str) -> tuple[bool, str, str]:
+    """Validate safety constraints. Returns (is_safe, sanitized, error)."""
+    upper = sparql.strip().upper()
+    for kw in MUTATING_KEYWORDS:
+        if kw in upper:
+            return False, sparql, f"Mutating keyword '{kw}' is not allowed"
+    if "LIMIT" not in upper and "SELECT" in upper:
+        sparql = sparql.rstrip().rstrip(";") + "\nLIMIT 100"
+    return True, sparql, ""


_enforce_safety only blocks mutating keywords; it does not actually enforce the documented “Only SELECT and ASK queries” rule, so CONSTRUCT/DESCRIBE queries can slip through. Consider validating the parsed query type (e.g., via RDFLib parsing) and explicitly rejecting anything other than SELECT/ASK, in addition to mutating keyword checks.

Copilot · 2026-04-05T03:53:32Z

+@mcp.tool()
+def list_entities(entity_type: str = "schema:Thing", limit: int = 50) -> str:
+    """List entities in the knowledge graph, optionally filtered by type.
+
+    Available types: schema:Person, schema:Organization,
+    schema:SoftwareApplication, schema:CreativeWork, schema:Thing
+
+    Args:
+        entity_type: Schema.org type to filter by (default: all non-Article entities).
+        limit: Maximum number of results (default: 50).
+
+    Returns:
+        JSON array of entities with id, name, and type.
+    """
+    if entity_type == "schema:Thing":
+        query = f"""
+        PREFIX schema: <https://schema.org/>
+        SELECT DISTINCT ?entity ?name ?type WHERE {{
+            ?entity a ?type ; schema:name ?name .
+            FILTER(?type != schema:Article)
+        }} LIMIT {limit}
+        """
+    else:
+        query = f"""
+        PREFIX schema: <https://schema.org/>
+        SELECT DISTINCT ?entity ?name WHERE {{
+            ?entity a {entity_type} ; schema:name ?name .
+        }} LIMIT {limit}
+        """


list_entities interpolates entity_type directly into the SPARQL string. Because this parameter is user-controlled, it enables SPARQL injection (changing the WHERE clause, UNIONs, SERVICE calls, etc.) and can bypass intended restrictions. Validate entity_type against an allowlist of supported types (or map known tokens to full IRIs) and reject anything else; also consider clamping limit to a reasonable maximum.

Copilot · 2026-04-05T03:53:33Z

+    escaped = entity_name.replace('"', '\\"')
+    query = f"""
+    PREFIX schema: <https://schema.org/>
+    PREFIX kb: <https://example.com/vocab/kb#>
+    SELECT ?entity ?type ?sameAs ?article ?articleTitle ?related ?relatedName WHERE {{
+        ?entity schema:name ?name .
+        FILTER(LCASE(STR(?name)) = LCASE("{escaped}"))
+        ?entity a ?type .


get_entity_details only escapes double quotes in entity_name. Backslashes and newlines can still break the SPARQL string literal, and large inputs could also create very expensive queries. Escape backslashes/newlines (or use a safer literal-encoding helper) and consider applying a reasonable max length to entity_name.

Copilot · 2026-04-05T03:53:33Z

+    {
+      "id": "https://example.com/id/neo4j",
+      "type": "schema:SoftwareApplication",
+      "schema:name": "Neo4j",
+      "schema:sameAs": "https://www.wikidata.org/entity/Q7071552"
+    },


In the JSON-LD example, schema:sameAs (and other relationship/object properties like schema:mentions) are shown with plain string values. Without @type: "@id" coercion in the context (or using { "id": "..." } node references), JSON-LD processors will treat these as string literals instead of IRIs. Update the context/example to ensure these properties are encoded as IRI references.

Copilot AI review requested due to automatic review settings April 5, 2026 03:49

Copilot started reviewing on behalf of lqdev April 5, 2026 03:49 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 6 Agent Skills for Markdown-LD knowledge bank#1

Add 6 Agent Skills for Markdown-LD knowledge bank#1
lqdev wants to merge 1 commit intomainfrom
feature/agent-skills

lqdev commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lqdev commented Apr 5, 2026

Summary

Skills

Structure

Validation

Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants