Skip to content

Add 6 Agent Skills for Markdown-LD knowledge bank#1

Open
lqdev wants to merge 1 commit intomainfrom
feature/agent-skills
Open

Add 6 Agent Skills for Markdown-LD knowledge bank#1
lqdev wants to merge 1 commit intomainfrom
feature/agent-skills

Conversation

@lqdev
Copy link
Copy Markdown
Owner

@lqdev lqdev commented Apr 5, 2026

Summary

Implements 6 reusable Agent Skills (SKILL.md format) that package the repo's domain knowledge for use by any compatible agent (Claude Code, Copilot, Codex, Cursor, etc.).

Skills

Skill For What it teaches
\markdown-ld-authoring\ Content authors Frontmatter, entity_hints, wikilinks, content structuring for RDF extraction
\sparql-query-writer\ KB queriers Vocabulary, query patterns, case-insensitive matching, safety constraints
\
df-jsonld-engineer\ Knowledge engineers JSON-LD context design, Turtle serialization, entity ID minting, sameAs alignment
\shacl-shape-designer\ Ontology maintainers SHACL shape writing, pySHACL integration, validation debugging
\llm-rdf-extraction\ Pipeline developers Prompt engineering, confidence calibration, chunking strategy, failure modes
\knowledge-graph-mcp\ Agent integrators MCP server wrapping SPARQL + NL query endpoints via FastMCP

Structure

Each skill follows the Agent Skills spec:

  • \SKILL.md\ with YAML frontmatter (name, description) + markdown instructions

  • eferences/\ for detailed schemas, vocabularies, and examples
  • \scripts/\ for executable code (knowledge-graph-mcp includes a working FastMCP server)

Validation

  • All 6 skills pass Agent Skills spec validation (name format, description length, body < 500 lines)
  • All 58 existing tests pass — no changes to existing code
  • All content derived from actual repo code and conventions (not generic boilerplate)

Files

13 files added, 2,113 lines total. No existing files modified.

Implement reusable Agent Skills (agentskills.io format) that package
the repo's domain knowledge for use by any SKILL.md-compatible agent:

- markdown-ld-authoring: Frontmatter, entity_hints, wikilinks, and
  content structuring for maximum RDF extraction quality
- sparql-query-writer: KB vocabulary, query patterns, case-insensitive
  matching, and safety constraints for the SPARQL endpoint
- rdf-jsonld-engineer: JSON-LD context design, Turtle serialization,
  entity ID minting, and sameAs alignment with Wikidata
- shacl-shape-designer: SHACL shape writing, pySHACL integration,
  and validation debugging patterns
- llm-rdf-extraction: Prompt engineering for structured RDF output,
  confidence calibration, chunking strategy, and failure modes
- knowledge-graph-mcp: MCP server exposing the KB as agent-callable
  tools (SPARQL, NL query, entity listing) via FastMCP

Each skill follows the Agent Skills spec: SKILL.md with YAML frontmatter
(name, description) + markdown instructions, with references/ for
detailed schemas and examples. All names pass spec validation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 5, 2026 03:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a set of reusable Agent Skills (SKILL.md format) that document and package this repo’s Markdown-LD → RDF/JSON-LD → SPARQL knowledge-bank conventions, including a reference MCP server for querying the graph.

Changes:

  • Introduces skill docs for authoring Markdown-LD, extracting RDF with LLMs, designing JSON-LD/RDF artifacts, writing SPARQL, and designing SHACL shapes.
  • Adds reference “vocabulary / context / prompt patterns / existing shapes” documents to support those skills.
  • Adds a Python FastMCP server example intended to expose the knowledge graph via MCP tools.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
skills/sparql-query-writer/SKILL.md SPARQL skill guide (prefixes, patterns, safety constraints, endpoint usage).
skills/sparql-query-writer/references/vocabulary.md Vocabulary/prefix and ID/type conventions reference.
skills/shacl-shape-designer/SKILL.md SHACL skill guide with pySHACL usage and design guidelines.
skills/shacl-shape-designer/references/existing-shapes.md Snapshot/reference of current shapes and recommended additions.
skills/rdf-jsonld-engineer/SKILL.md JSON-LD context, graph structure, Turtle serialization, ID minting, sameAs alignment.
skills/rdf-jsonld-engineer/references/context-design.md JSON-LD context design patterns and validation tips.
skills/markdown-ld-authoring/SKILL.md Content authoring conventions for extraction quality (frontmatter, entity_hints, wikilinks, structure).
skills/markdown-ld-authoring/references/example-article.md Annotated example article + “what this produces” examples.
skills/llm-rdf-extraction/SKILL.md Prompt architecture, schema enforcement, confidence, chunking/caching, failure modes.
skills/llm-rdf-extraction/references/prompt-patterns.md Current prompt template + iteration checklist + API call configuration.
skills/knowledge-graph-mcp/SKILL.md MCP server skill guide and tool definitions intended for querying the KB.
skills/knowledge-graph-mcp/scripts/server.py FastMCP server reference implementation (local RDFLib dataset querying).
skills/knowledge-graph-mcp/references/api-reference.md Reference docs for the existing /api/sparql and /api/ask endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +106 to +111
```sparql
PREFIX schema: <https://schema.org/>
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object .
FILTER(?predicate != rdf:type)
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example query, rdf:type is referenced but the rdf: prefix is not declared in the snippet, so the query as written won’t parse. Add PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> to the snippet (or remove the rdf:type filter).

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +151
## Safety Constraints

The endpoint enforces these rules:

1. **Only `SELECT` and `ASK` queries** — `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, `CREATE` are blocked
2. **Always include `LIMIT`** — default to `LIMIT 100` unless the user asks for all results
3. **No mutating operations** — the graph is read-only at query time

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Safety Constraints” section says the /api/sparql endpoint enforces LIMITs and allows only SELECT/ASK, but the current implementation only blocks mutating keywords and does not enforce query form or LIMIT injection. Either update the documentation to match actual behavior, or tighten the endpoint/server enforcement to match these stated constraints.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
Two deployment modes:

1. **Local mode** — MCP server loads `.ttl` files directly into RDFLib
2. **Remote mode** — MCP server proxies to the deployed Azure Functions API

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section describes a “Remote mode” that proxies to a deployed API, but the reference implementation in scripts/server.py does not currently implement an --api-url option or any HTTP proxy behavior. Please either implement the remote mode or remove/clarify these deployment-mode claims so the skill stays accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +93
### 2. `ask_question`

Natural language query — the server translates to SPARQL.

```python
@mcp.tool(
annotations={
"readOnlyHint": True,
"openWorldHint": False,
}
)
def ask_question(question: str) -> str:
"""Ask a natural language question about the knowledge graph.

The question is translated to SPARQL and executed. The response
includes both the generated SPARQL and the results.

Example questions:
- "What entities are in the knowledge graph?"
- "Which articles mention SPARQL?"
- "Find all organizations"

Args:
question: A natural language question about the knowledge.

Returns:
JSON with 'question', 'sparql', and 'results' fields.
"""
```
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill defines an ask_question MCP tool, but scripts/server.py does not implement an ask_question function/tool. This is likely to confuse consumers following the skill; either add the tool (proxying to /api/ask or performing local NL→SPARQL) or remove this tool definition from the doc.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +16
"""MCP server for a Markdown-LD knowledge graph.

Exposes the knowledge bank as MCP tools that AI agents can discover
and call. Supports both local mode (loads .ttl files directly) and
remote mode (proxies to a deployed API).

Usage:
# Local mode (stdio transport)
python server.py --graph-dir ./graph/articles

# Local mode (HTTP transport)
python server.py --graph-dir ./graph/articles --transport http --port 8080

# Remote mode (proxy to deployed API)
python server.py --api-url https://your-swa.azurestaticapps.net

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring/usage mentions a --api-url flag and “remote mode”, but argparse does not define --api-url and there is no proxy implementation. Running the documented command will fail. Align the docstring with the actual CLI, or add the missing CLI flag and remote proxy code.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +68
def _enforce_safety(sparql: str) -> tuple[bool, str, str]:
"""Validate safety constraints. Returns (is_safe, sanitized, error)."""
upper = sparql.strip().upper()
for kw in MUTATING_KEYWORDS:
if kw in upper:
return False, sparql, f"Mutating keyword '{kw}' is not allowed"
if "LIMIT" not in upper and "SELECT" in upper:
sparql = sparql.rstrip().rstrip(";") + "\nLIMIT 100"
return True, sparql, ""
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_enforce_safety only blocks mutating keywords; it does not actually enforce the documented “Only SELECT and ASK queries” rule, so CONSTRUCT/DESCRIBE queries can slip through. Consider validating the parsed query type (e.g., via RDFLib parsing) and explicitly rejecting anything other than SELECT/ASK, in addition to mutating keyword checks.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +136
@mcp.tool()
def list_entities(entity_type: str = "schema:Thing", limit: int = 50) -> str:
"""List entities in the knowledge graph, optionally filtered by type.

Available types: schema:Person, schema:Organization,
schema:SoftwareApplication, schema:CreativeWork, schema:Thing

Args:
entity_type: Schema.org type to filter by (default: all non-Article entities).
limit: Maximum number of results (default: 50).

Returns:
JSON array of entities with id, name, and type.
"""
if entity_type == "schema:Thing":
query = f"""
PREFIX schema: <https://schema.org/>
SELECT DISTINCT ?entity ?name ?type WHERE {{
?entity a ?type ; schema:name ?name .
FILTER(?type != schema:Article)
}} LIMIT {limit}
"""
else:
query = f"""
PREFIX schema: <https://schema.org/>
SELECT DISTINCT ?entity ?name WHERE {{
?entity a {entity_type} ; schema:name ?name .
}} LIMIT {limit}
"""
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_entities interpolates entity_type directly into the SPARQL string. Because this parameter is user-controlled, it enables SPARQL injection (changing the WHERE clause, UNIONs, SERVICE calls, etc.) and can bypass intended restrictions. Validate entity_type against an allowlist of supported types (or map known tokens to full IRIs) and reject anything else; also consider clamping limit to a reasonable maximum.

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +182
escaped = entity_name.replace('"', '\\"')
query = f"""
PREFIX schema: <https://schema.org/>
PREFIX kb: <https://example.com/vocab/kb#>
SELECT ?entity ?type ?sameAs ?article ?articleTitle ?related ?relatedName WHERE {{
?entity schema:name ?name .
FILTER(LCASE(STR(?name)) = LCASE("{escaped}"))
?entity a ?type .
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_entity_details only escapes double quotes in entity_name. Backslashes and newlines can still break the SPARQL string literal, and large inputs could also create very expensive queries. Escape backslashes/newlines (or use a safer literal-encoding helper) and consider applying a reasonable max length to entity_name.

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +86
{
"id": "https://example.com/id/neo4j",
"type": "schema:SoftwareApplication",
"schema:name": "Neo4j",
"schema:sameAs": "https://www.wikidata.org/entity/Q7071552"
},
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JSON-LD example, schema:sameAs (and other relationship/object properties like schema:mentions) are shown with plain string values. Without @type: "@id" coercion in the context (or using { "id": "..." } node references), JSON-LD processors will treat these as string literals instead of IRIs. Update the context/example to ensure these properties are encoded as IRI references.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants