Skip to content

Lint auto-fix: deterministic fixes and AI-assisted suggestions #81

@JohnRDOrazio

Description

@JohnRDOrazio

Summary

Add auto-fix capabilities to the ontology linter. Where a rule violation has a single unambiguous fix, apply it deterministically. Where multiple valid fixes exist or semantic understanding is required, use AI-assisted suggestions that the user can review and accept.

Ref: originally tracked in #6 (now closed — lint limiting addressed by CatholicOS/ontokit-web#41).


API Design

Endpoints

  • POST /api/v1/projects/{id}/lint/fix — apply deterministic fixes for specified rule IDs (or all auto-fixable rules)
  • POST /api/v1/projects/{id}/lint/suggest — generate AI-assisted fix suggestions for specified issues

Request/Response

# Fix request
class LintFixRequest(BaseModel):
    rule_ids: list[str] | None = None  # None = all auto-fixable rules
    issue_ids: list[str] | None = None  # specific issues to fix
    branch: str
    dry_run: bool = False  # preview changes without applying

# Fix response
class LintFixResult(BaseModel):
    fixed: list[LintFixApplied]
    skipped: list[LintFixSkipped]
    errors: list[LintFixError]

# Suggestion request
class LintSuggestRequest(BaseModel):
    issue_ids: list[str]
    branch: str
    use_reasoning: bool = False  # run OWL reasoning before generating suggestions

# Suggestion response
class LintSuggestion(BaseModel):
    issue_id: str
    rule_id: str
    reasoning_validated: bool  # True if OWL reasoning confirmed this is a real violation
    options: list[SuggestedFix]  # ranked by confidence

class SuggestedFix(BaseModel):
    description: str
    diff_preview: str  # Turtle diff showing the change
    confidence: float  # 0.0–1.0

Frontend Actions

All AI-assisted suggestions present the project admin with four actions:

  • Accept — apply the suggested fix as-is
  • Edit — modify the suggestion before applying (e.g., tweak a generated label)
  • Reject — dismiss the suggestion, keep the lint issue open
  • Remove triple — delete the offending triple entirely (destructive fallback)

Phase 1 — Deterministic Fixes

These rules have a single correct fix that can be applied programmatically without user review.

duplicate-triple (INFO)

  • Fix: Deduplicate by removing redundant predicate-object pairs for the same subject
  • Confidence: 100% — identical triples carry no additional meaning
  • Implementation: Keep first occurrence, remove subsequent duplicates

inverse-property-inconsistency (WARNING)

  • Fix: Add the missing symmetric inverse assertion
  • Confidence: 100% — if A inverseOf B exists, B inverseOf A must also exist
  • Implementation: For each missing inverse, add (object, inverseProperty, subject) triple

undefined-prefix (ERROR)

  • Fix: Register the undefined prefix in namespace bindings
  • Confidence: 90% — can resolve common prefixes (rdfs, owl, skos, dc, dcterms, foaf, schema, etc.) from a well-known registry; flag unknown prefixes for user review
  • Implementation: Maintain a lookup table of standard prefixes → namespace URIs; add binding to graph

Phase 2 — AI-Assisted Suggestions

These rules require semantic understanding to produce meaningful fixes. The API returns ranked suggestions for the admin to Accept / Edit / Reject / Remove triple.

undefined-parent (ERROR)

  • Approach: An undefined parent likely indicates a typo, a missing import, or a renamed class — not that the relationship is unwanted. Use embedding similarity search to find the closest matching class IRI in the ontology.
  • When a candidate exists: Present the top candidate(s) ranked by similarity score with actions: Accept (replace the undefined parent IRI with the candidate) / Reject (keep the issue open for manual fix) / Remove triple
  • When no candidate exists: Present only Remove triple or Reject (manual fix) — no suggestion to offer
  • Implementation: Embed the undefined parent IRI (and its local name), query the project's indexed entity embeddings for nearest neighbors, apply a similarity threshold to filter noise

empty-label (WARNING)

  • Approach: An empty label signals intent to label the entity — generate a suggested label rather than silently deleting the triple
  • Deterministic fallback: Split IRI local name by case boundaries → "givenName""Given Name"
  • AI enhancement: Generate context-aware label considering parent class, domain, and sibling labels
  • Remove triple option: Available as explicit fallback if no label is appropriate

missing-label (WARNING)

  • Approach: Parse local name from IRI (camelCase/snake_case splitting), then optionally use LLM to refine into a natural-language label
  • Deterministic fallback: Split IRI local name by case boundaries → "givenName""Given Name"
  • AI enhancement: Generate context-aware label considering parent class, domain, and sibling labels

missing-comment (INFO)

  • Approach: Generate a description from the class's position in the hierarchy, its properties, and its labels
  • AI only: No reliable deterministic approach — requires natural language generation
  • Context input: Class labels, parent classes, properties with domain/range, sibling classes

missing-english-label (WARNING)

  • Approach: Translate existing labels from other languages, or generate from IRI
  • Deterministic fallback: IRI local name splitting (same as missing-label)
  • AI enhancement: Translate existing @fr, @de, etc. labels to English

orphan-class (WARNING)

  • Approach: Suggest potential parent classes based on label similarity, namespace grouping, and property overlap
  • AI only: Requires understanding ontology structure to suggest meaningful placement
  • Options presented: Top 3–5 candidate parents ranked by semantic similarity, plus "make subclass of owl:Thing" as explicit choice
  • 🦉 owlready2 enhancement: After reasoning, inferred subsumptions may reveal that the class has inferred parents — if so, the issue is a false positive and is auto-dismissed (see owlready2 integration below)

duplicate-label (WARNING)

  • Approach: Suggest disambiguated labels based on each class's position in the hierarchy
  • AI enhancement: Generate distinguishing qualifiers (e.g., "Title (Book)" vs "Title (Publication)")
  • Options: Rename each duplicate with a qualifying suffix, or flag for manual review

label-per-language (ERROR)

  • Approach: Present conflicting labels and suggest which to keep based on usage context
  • AI enhancement: Assess which label better describes the entity given its properties and hierarchy
  • Options: Keep each conflicting label (user picks), or merge into a single preferred label

circular-hierarchy (ERROR)

  • Approach: Identify the cycle, suggest which edge to remove based on hierarchy depth and class relationships
  • AI enhancement: Analyze which subclass relationship is likely erroneous based on naming patterns and broader ontology structure
  • Options: Remove each edge in the cycle (user picks which breaks the cycle most logically)
  • 🦉 owlready2 enhancement: The reasoner detects unsatisfiable classes caused by cycles. After reasoning, my_class.equivalent_to containing owl:Nothing confirms which classes in the cycle are unsatisfiable, helping pinpoint the problematic edge (see owlready2 integration below)

domain-violation / range-violation (WARNING)

  • Approach: Suggest either (a) adding the expected type to the subject/object, (b) widening the domain/range declaration, or (c) using a different property
  • AI enhancement: Determine which fix aligns with the ontology's design intent
  • Options: Multiple fix strategies ranked by invasiveness (least disruptive first)
  • 🦉 owlready2 enhancement: Current checks are purely structural (asserted types only). After reasoning, inferred types through subsumption may satisfy the domain/range constraint — these violations are false positives and are auto-dismissed (see owlready2 integration below)

cardinality-violation (ERROR)

  • Approach: Present excess values and suggest which to remove, or suggest relaxing the constraint
  • AI enhancement: Assess which values are likely duplicates or errors based on content similarity
  • Options: Remove each excess value (user picks), or modify cardinality constraint
  • 🦉 owlready2 enhancement: Reasoning considers inferred properties and equivalences — a cardinality "violation" may be satisfied by inferred values, or new violations may surface from inferred triples. Post-reasoning validation gives more accurate violation counts (see owlready2 integration below)

disjoint-violation (ERROR)

  • Approach: Present conflicting types and suggest which to remove
  • AI enhancement: Determine which type assignment is likely erroneous based on the entity's other properties
  • Options: Remove each conflicting type (user picks), or relax disjointness axiom
  • 🦉 owlready2 enhancement: sync_reasoner() + inconsistent_classes() catches indirect disjointness through class expressions that structural checks miss. The reasoner can explain why classes are disjoint (e.g., through inherited disjointness axioms), producing more informative fix suggestions (see owlready2 integration below)

🦉 owlready2-Enhanced Detection

Five lint rules benefit from OWL DL reasoning (HermiT/Pellet) via the owlready2 integration plan (ontokit-web/docs/plans/owlready2-integration.md). When reasoning is available (Phase 2+ of that plan), the auto-fix service can optionally run a reasoning pass before generating suggestions. This has two effects:

1. False positive elimination

Current lint checks are structural — they only consider asserted triples. OWL reasoning infers implicit facts (subsumptions, type memberships, equivalences) that may satisfy constraints the structural checker flagged as violations.

Rule How reasoning eliminates false positives
domain-violation / range-violation A subject may satisfy the domain via an inferred parent class (e.g., Dog asserted, Animal inferred — satisfies domain: Animal)
cardinality-violation Inferred property values from equivalences or property chains may satisfy minimum cardinality constraints
orphan-class Inferred subsumptions may reveal that a class has inferred parents, making it not truly orphaned

When reasoning dismisses a violation, the suggestion response marks reasoning_validated: false and the frontend can auto-dismiss or visually deprioritize these issues.

2. More accurate fix suggestions

For violations that survive reasoning (confirmed real problems), the reasoner provides richer context for generating better suggestions:

Rule How reasoning improves suggestions
circular-hierarchy sync_reasoner() identifies which classes become equivalent to owl:Nothing (unsatisfiable) due to the cycle, pinpointing the problematic edge rather than presenting all edges as equally suspect
disjoint-violation inconsistent_classes() catches indirect disjointness through class expressions (e.g., inherited axioms). The reasoner's explanation of why classes are disjoint informs which type to suggest removing
domain-violation / range-violation Post-reasoning, the full inferred type hierarchy is available — suggestions can reference inferred types when recommending which type assertion to add

Integration approach

  • The LintSuggestRequest accepts an optional use_reasoning: bool flag (default false)
  • When true, the suggest endpoint loads the ontology into an EphemeralWorld (owlready2 bridge), runs sync_reasoner_hermit(), then re-evaluates the flagged issues against the inferred graph
  • Issues that no longer hold post-reasoning are returned with reasoning_validated: false
  • Issues that still hold are returned with reasoning_validated: true and enhanced context from the reasoner
  • This is an optional enhancement — auto-fix works without reasoning, but produces better results with it
  • Requires owlready2 bridge layer (Phase 1) and reasoning task infrastructure (Phase 2) from the owlready2 integration plan

Rules unaffected by reasoning

The remaining 8 rules operate at the annotation/syntactic level and gain no benefit from OWL reasoning:

  • duplicate-triple — syntactic deduplication
  • inverse-property-inconsistency — structural symmetry check
  • undefined-prefix — namespace binding lookup
  • undefined-parent — IRI existence check (embedding search, not reasoning)
  • empty-label / missing-label / missing-comment / missing-english-label — annotation generation
  • duplicate-label / label-per-language — annotation conflict resolution

Implementation Notes

  • Worker integration: Both fix and suggest endpoints should dispatch to ARQ worker tasks (consistent with lint/normalization pattern)
  • Atomicity: Deterministic fixes should be applied as a single git commit with a descriptive message
  • Undo: Each fix commit can be reverted via existing git infrastructure
  • AI provider: Use the project's configured embedding/LLM service for AI-assisted suggestions; fall back to deterministic heuristics if unavailable
  • Embeddings: The undefined-parent and orphan-class fixes leverage the project's entity embeddings (pgvector) for similarity search — these must be indexed before suggestions can be generated
  • owlready2: When use_reasoning=true, the suggest endpoint depends on the owlready2 bridge layer and reasoning infrastructure. The EphemeralWorld context manager handles SQLite lifecycle; each reasoning pass is fully isolated. Falls back gracefully to structural-only analysis if Java/reasoner is unavailable.
  • Per-project config: Respect project_lint_config (Per-project lint rule configuration #26) — only offer fixes for enabled rules
  • Frontend counterpart: Will need UI for reviewing and accepting suggestions with Accept / Edit / Reject / Remove triple actions (separate ontokit-web issue). When reasoning_validated is available, the UI should visually distinguish confirmed violations from likely false positives.

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions