Lint auto-fix: deterministic fixes and AI-assisted suggestions

## Summary

Add auto-fix capabilities to the ontology linter. Where a rule violation has a single unambiguous fix, apply it deterministically. Where multiple valid fixes exist or semantic understanding is required, use AI-assisted suggestions that the user can review and accept.

Ref: originally tracked in #6 (now closed — lint limiting addressed by CatholicOS/ontokit-web#41).

---

## API Design

### Endpoints

- `POST /api/v1/projects/{id}/lint/fix` — apply deterministic fixes for specified rule IDs (or all auto-fixable rules)
- `POST /api/v1/projects/{id}/lint/suggest` — generate AI-assisted fix suggestions for specified issues

### Request/Response

```python
# Fix request
class LintFixRequest(BaseModel):
    rule_ids: list[str] | None = None  # None = all auto-fixable rules
    issue_ids: list[str] | None = None  # specific issues to fix
    branch: str
    dry_run: bool = False  # preview changes without applying

# Fix response
class LintFixResult(BaseModel):
    fixed: list[LintFixApplied]
    skipped: list[LintFixSkipped]
    errors: list[LintFixError]

# Suggestion request
class LintSuggestRequest(BaseModel):
    issue_ids: list[str]
    branch: str
    use_reasoning: bool = False  # run OWL reasoning before generating suggestions

# Suggestion response
class LintSuggestion(BaseModel):
    issue_id: str
    rule_id: str
    reasoning_validated: bool  # True if OWL reasoning confirmed this is a real violation
    options: list[SuggestedFix]  # ranked by confidence

class SuggestedFix(BaseModel):
    description: str
    diff_preview: str  # Turtle diff showing the change
    confidence: float  # 0.0–1.0
```

### Frontend Actions

All AI-assisted suggestions present the project admin with four actions:
- **Accept** — apply the suggested fix as-is
- **Edit** — modify the suggestion before applying (e.g., tweak a generated label)
- **Reject** — dismiss the suggestion, keep the lint issue open
- **Remove triple** — delete the offending triple entirely (destructive fallback)

---

## Phase 1 — Deterministic Fixes

These rules have a single correct fix that can be applied programmatically without user review.

### `duplicate-triple` (INFO)
- **Fix:** Deduplicate by removing redundant predicate-object pairs for the same subject
- **Confidence:** 100% — identical triples carry no additional meaning
- **Implementation:** Keep first occurrence, remove subsequent duplicates

### `inverse-property-inconsistency` (WARNING)
- **Fix:** Add the missing symmetric inverse assertion
- **Confidence:** 100% — if `A inverseOf B` exists, `B inverseOf A` must also exist
- **Implementation:** For each missing inverse, add `(object, inverseProperty, subject)` triple

### `undefined-prefix` (ERROR)
- **Fix:** Register the undefined prefix in namespace bindings
- **Confidence:** 90% — can resolve common prefixes (rdfs, owl, skos, dc, dcterms, foaf, schema, etc.) from a well-known registry; flag unknown prefixes for user review
- **Implementation:** Maintain a lookup table of standard prefixes → namespace URIs; add binding to graph

---

## Phase 2 — AI-Assisted Suggestions

These rules require semantic understanding to produce meaningful fixes. The API returns ranked suggestions for the admin to **Accept / Edit / Reject / Remove triple**.

### `undefined-parent` (ERROR)
- **Approach:** An undefined parent likely indicates a typo, a missing import, or a renamed class — not that the relationship is unwanted. Use embedding similarity search to find the closest matching class IRI in the ontology.
- **When a candidate exists:** Present the top candidate(s) ranked by similarity score with actions: **Accept** (replace the undefined parent IRI with the candidate) / **Reject** (keep the issue open for manual fix) / **Remove triple**
- **When no candidate exists:** Present only **Remove triple** or **Reject** (manual fix) — no suggestion to offer
- **Implementation:** Embed the undefined parent IRI (and its local name), query the project's indexed entity embeddings for nearest neighbors, apply a similarity threshold to filter noise

### `empty-label` (WARNING)
- **Approach:** An empty label signals intent to label the entity — generate a suggested label rather than silently deleting the triple
- **Deterministic fallback:** Split IRI local name by case boundaries → `"givenName"` → `"Given Name"`
- **AI enhancement:** Generate context-aware label considering parent class, domain, and sibling labels
- **Remove triple option:** Available as explicit fallback if no label is appropriate

### `missing-label` (WARNING)
- **Approach:** Parse local name from IRI (camelCase/snake_case splitting), then optionally use LLM to refine into a natural-language label
- **Deterministic fallback:** Split IRI local name by case boundaries → `"givenName"` → `"Given Name"`
- **AI enhancement:** Generate context-aware label considering parent class, domain, and sibling labels

### `missing-comment` (INFO)
- **Approach:** Generate a description from the class's position in the hierarchy, its properties, and its labels
- **AI only:** No reliable deterministic approach — requires natural language generation
- **Context input:** Class labels, parent classes, properties with domain/range, sibling classes

### `missing-english-label` (WARNING)
- **Approach:** Translate existing labels from other languages, or generate from IRI
- **Deterministic fallback:** IRI local name splitting (same as `missing-label`)
- **AI enhancement:** Translate existing `@fr`, `@de`, etc. labels to English

### `orphan-class` (WARNING)
- **Approach:** Suggest potential parent classes based on label similarity, namespace grouping, and property overlap
- **AI only:** Requires understanding ontology structure to suggest meaningful placement
- **Options presented:** Top 3–5 candidate parents ranked by semantic similarity, plus "make subclass of owl:Thing" as explicit choice
- **🦉 owlready2 enhancement:** After reasoning, inferred subsumptions may reveal that the class has inferred parents — if so, the issue is a false positive and is auto-dismissed (see [owlready2 integration](#owlready2-enhanced-detection) below)

### `duplicate-label` (WARNING)
- **Approach:** Suggest disambiguated labels based on each class's position in the hierarchy
- **AI enhancement:** Generate distinguishing qualifiers (e.g., "Title (Book)" vs "Title (Publication)")
- **Options:** Rename each duplicate with a qualifying suffix, or flag for manual review

### `label-per-language` (ERROR)
- **Approach:** Present conflicting labels and suggest which to keep based on usage context
- **AI enhancement:** Assess which label better describes the entity given its properties and hierarchy
- **Options:** Keep each conflicting label (user picks), or merge into a single preferred label

### `circular-hierarchy` (ERROR)
- **Approach:** Identify the cycle, suggest which edge to remove based on hierarchy depth and class relationships
- **AI enhancement:** Analyze which subclass relationship is likely erroneous based on naming patterns and broader ontology structure
- **Options:** Remove each edge in the cycle (user picks which breaks the cycle most logically)
- **🦉 owlready2 enhancement:** The reasoner detects unsatisfiable classes caused by cycles. After reasoning, `my_class.equivalent_to` containing `owl:Nothing` confirms which classes in the cycle are unsatisfiable, helping pinpoint the problematic edge (see [owlready2 integration](#owlready2-enhanced-detection) below)

### `domain-violation` / `range-violation` (WARNING)
- **Approach:** Suggest either (a) adding the expected type to the subject/object, (b) widening the domain/range declaration, or (c) using a different property
- **AI enhancement:** Determine which fix aligns with the ontology's design intent
- **Options:** Multiple fix strategies ranked by invasiveness (least disruptive first)
- **🦉 owlready2 enhancement:** Current checks are purely structural (asserted types only). After reasoning, inferred types through subsumption may satisfy the domain/range constraint — these violations are false positives and are auto-dismissed (see [owlready2 integration](#owlready2-enhanced-detection) below)

### `cardinality-violation` (ERROR)
- **Approach:** Present excess values and suggest which to remove, or suggest relaxing the constraint
- **AI enhancement:** Assess which values are likely duplicates or errors based on content similarity
- **Options:** Remove each excess value (user picks), or modify cardinality constraint
- **🦉 owlready2 enhancement:** Reasoning considers inferred properties and equivalences — a cardinality "violation" may be satisfied by inferred values, or new violations may surface from inferred triples. Post-reasoning validation gives more accurate violation counts (see [owlready2 integration](#owlready2-enhanced-detection) below)

### `disjoint-violation` (ERROR)
- **Approach:** Present conflicting types and suggest which to remove
- **AI enhancement:** Determine which type assignment is likely erroneous based on the entity's other properties
- **Options:** Remove each conflicting type (user picks), or relax disjointness axiom
- **🦉 owlready2 enhancement:** `sync_reasoner()` + `inconsistent_classes()` catches indirect disjointness through class expressions that structural checks miss. The reasoner can explain *why* classes are disjoint (e.g., through inherited disjointness axioms), producing more informative fix suggestions (see [owlready2 integration](#owlready2-enhanced-detection) below)

---

## 🦉 owlready2-Enhanced Detection

Five lint rules benefit from OWL DL reasoning (HermiT/Pellet) via the owlready2 integration plan ([`ontokit-web/docs/plans/owlready2-integration.md`](https://github.com/CatholicOS/ontokit-web/blob/dev/docs/plans/owlready2-integration.md)). When reasoning is available (Phase 2+ of that plan), the auto-fix service can optionally run a reasoning pass before generating suggestions. This has two effects:

### 1. False positive elimination

Current lint checks are structural — they only consider asserted triples. OWL reasoning infers implicit facts (subsumptions, type memberships, equivalences) that may satisfy constraints the structural checker flagged as violations.

| Rule | How reasoning eliminates false positives |
|------|------------------------------------------|
| `domain-violation` / `range-violation` | A subject may satisfy the domain via an inferred parent class (e.g., `Dog` asserted, `Animal` inferred — satisfies `domain: Animal`) |
| `cardinality-violation` | Inferred property values from equivalences or property chains may satisfy minimum cardinality constraints |
| `orphan-class` | Inferred subsumptions may reveal that a class has inferred parents, making it not truly orphaned |

When reasoning dismisses a violation, the suggestion response marks `reasoning_validated: false` and the frontend can auto-dismiss or visually deprioritize these issues.

### 2. More accurate fix suggestions

For violations that survive reasoning (confirmed real problems), the reasoner provides richer context for generating better suggestions:

| Rule | How reasoning improves suggestions |
|------|-------------------------------------|
| `circular-hierarchy` | `sync_reasoner()` identifies which classes become equivalent to `owl:Nothing` (unsatisfiable) due to the cycle, pinpointing the problematic edge rather than presenting all edges as equally suspect |
| `disjoint-violation` | `inconsistent_classes()` catches indirect disjointness through class expressions (e.g., inherited axioms). The reasoner's explanation of *why* classes are disjoint informs which type to suggest removing |
| `domain-violation` / `range-violation` | Post-reasoning, the full inferred type hierarchy is available — suggestions can reference inferred types when recommending which type assertion to add |

### Integration approach

- The `LintSuggestRequest` accepts an optional `use_reasoning: bool` flag (default `false`)
- When `true`, the suggest endpoint loads the ontology into an `EphemeralWorld` (owlready2 bridge), runs `sync_reasoner_hermit()`, then re-evaluates the flagged issues against the inferred graph
- Issues that no longer hold post-reasoning are returned with `reasoning_validated: false`
- Issues that still hold are returned with `reasoning_validated: true` and enhanced context from the reasoner
- This is an **optional enhancement** — auto-fix works without reasoning, but produces better results with it
- Requires owlready2 bridge layer (Phase 1) and reasoning task infrastructure (Phase 2) from the [owlready2 integration plan](https://github.com/CatholicOS/ontokit-web/blob/dev/docs/plans/owlready2-integration.md)

### Rules unaffected by reasoning

The remaining 8 rules operate at the annotation/syntactic level and gain no benefit from OWL reasoning:

- `duplicate-triple` — syntactic deduplication
- `inverse-property-inconsistency` — structural symmetry check
- `undefined-prefix` — namespace binding lookup
- `undefined-parent` — IRI existence check (embedding search, not reasoning)
- `empty-label` / `missing-label` / `missing-comment` / `missing-english-label` — annotation generation
- `duplicate-label` / `label-per-language` — annotation conflict resolution

---

## Implementation Notes

- **Worker integration:** Both fix and suggest endpoints should dispatch to ARQ worker tasks (consistent with lint/normalization pattern)
- **Atomicity:** Deterministic fixes should be applied as a single git commit with a descriptive message
- **Undo:** Each fix commit can be reverted via existing git infrastructure
- **AI provider:** Use the project's configured embedding/LLM service for AI-assisted suggestions; fall back to deterministic heuristics if unavailable
- **Embeddings:** The `undefined-parent` and `orphan-class` fixes leverage the project's entity embeddings (pgvector) for similarity search — these must be indexed before suggestions can be generated
- **owlready2:** When `use_reasoning=true`, the suggest endpoint depends on the owlready2 bridge layer and reasoning infrastructure. The `EphemeralWorld` context manager handles SQLite lifecycle; each reasoning pass is fully isolated. Falls back gracefully to structural-only analysis if Java/reasoner is unavailable.
- **Per-project config:** Respect `project_lint_config` (#26) — only offer fixes for enabled rules
- **Frontend counterpart:** Will need UI for reviewing and accepting suggestions with Accept / Edit / Reject / Remove triple actions (separate ontokit-web issue). When `reasoning_validated` is available, the UI should visually distinguish confirmed violations from likely false positives.

## Related

- #26 — Per-project lint rule configuration (controls which rules are active)
- CatholicOS/ontokit-web#56 — Lint rule configuration UI
- CatholicOS/ontokit-web#41 — Lint display limiting (closed the original #6)
- [`ontokit-web/docs/plans/owlready2-integration.md`](https://github.com/CatholicOS/ontokit-web/blob/dev/docs/plans/owlready2-integration.md) — owlready2 integration plan (Phases 1–2 required for reasoning-enhanced detection)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Rule	How reasoning improves suggestions
`circular-hierarchy`	`sync_reasoner()` identifies which classes become equivalent to `owl:Nothing` (unsatisfiable) due to the cycle, pinpointing the problematic edge rather than presenting all edges as equally suspect
`disjoint-violation`	`inconsistent_classes()` catches indirect disjointness through class expressions (e.g., inherited axioms). The reasoner's explanation of why classes are disjoint informs which type to suggest removing
`domain-violation` / `range-violation`	Post-reasoning, the full inferred type hierarchy is available — suggestions can reference inferred types when recommending which type assertion to add

Rule	How reasoning eliminates false positives
`domain-violation` / `range-violation`	A subject may satisfy the domain via an inferred parent class (e.g., `Dog` asserted, `Animal` inferred — satisfies `domain: Animal`)
`cardinality-violation`	Inferred property values from equivalences or property chains may satisfy minimum cardinality constraints
`orphan-class`	Inferred subsumptions may reveal that a class has inferred parents, making it not truly orphaned

Lint auto-fix: deterministic fixes and AI-assisted suggestions #81

Description

Summary

API Design

Endpoints

Request/Response

Frontend Actions

Phase 1 — Deterministic Fixes

duplicate-triple (INFO)

inverse-property-inconsistency (WARNING)

undefined-prefix (ERROR)

Phase 2 — AI-Assisted Suggestions

undefined-parent (ERROR)

empty-label (WARNING)

missing-label (WARNING)

missing-comment (INFO)

missing-english-label (WARNING)

orphan-class (WARNING)

duplicate-label (WARNING)

label-per-language (ERROR)

circular-hierarchy (ERROR)

domain-violation / range-violation (WARNING)

cardinality-violation (ERROR)

disjoint-violation (ERROR)

🦉 owlready2-Enhanced Detection

1. False positive elimination

2. More accurate fix suggestions

Integration approach

Rules unaffected by reasoning

Implementation Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`duplicate-triple` (INFO)

`inverse-property-inconsistency` (WARNING)

`undefined-prefix` (ERROR)

`undefined-parent` (ERROR)

`empty-label` (WARNING)

`missing-label` (WARNING)

`missing-comment` (INFO)

`missing-english-label` (WARNING)

`orphan-class` (WARNING)

`duplicate-label` (WARNING)

`label-per-language` (ERROR)

`circular-hierarchy` (ERROR)

`domain-violation` / `range-violation` (WARNING)

`cardinality-violation` (ERROR)

`disjoint-violation` (ERROR)