diff --git a/.gitignore b/.gitignore index 2a8975c..aba5071 100644 --- a/.gitignore +++ b/.gitignore @@ -73,3 +73,6 @@ Thumbs.db *.nt !tests/fixtures/*.ttl !tests/fixtures/*.owl + +# Local git worktrees +.worktrees/ diff --git a/.serena/.gitignore b/.serena/.gitignore new file mode 100644 index 0000000..2e510af --- /dev/null +++ b/.serena/.gitignore @@ -0,0 +1,2 @@ +/cache +/project.local.yml diff --git a/.serena/memories/code_style_and_conventions.md b/.serena/memories/code_style_and_conventions.md new file mode 100644 index 0000000..e1cdc2b --- /dev/null +++ b/.serena/memories/code_style_and_conventions.md @@ -0,0 +1,53 @@ +# Code Style & Conventions — ontokit-api + +## Formatting +- **Line length**: 100 characters +- **Formatter**: `ruff format` (enforced by pre-commit) +- **Target**: Python 3.11 + +## Linting (ruff) +Selected rule sets: +- `E`, `W` — pycodestyle errors/warnings +- `F` — Pyflakes +- `I` — isort (`known-first-party = ["ontokit"]`) +- `B` — flake8-bugbear +- `C4` — flake8-comprehensions +- `UP` — pyupgrade +- `ARG` — flake8-unused-arguments +- `SIM` — flake8-simplify + +Ignored: `E501` (line length handled by formatter, not linter). + +## Type checking +- **mypy** in **strict mode** (`strict = true`) +- `warn_return_any = true`, `warn_unused_ignores = true` +- Plugin: `pydantic.mypy` +- Pyright also configured (uses `.venv`, py 3.11) + +## Pydantic conventions +- Pydantic v2 (>=2.13.3, <2.14) +- `init_forbid_extra = true`, `init_typed = true` +- Strict validation everywhere, computed fields where appropriate + +## Architectural patterns +- **Async-first**: all I/O uses async/await +- **Dependency injection**: FastAPI's `Depends()` +- **Service singletons**: obtained via `get_service_name()` dependency providers +- **UTC-aware datetimes** throughout (no naive datetimes) +- **Layered**: routes → services → models / schemas / core + +## URL versioning +The `/api/v1/` prefix is set in `main.py` router registration — do NOT recreate the version in the directory tree. + +## Git module guideline +Use `ontokit/git/bare_repository.py` (pygit2-based) for new code. +The GitPython-based `repository.py` is **deprecated** and kept only for backward compat. + +## Pre-commit +Enabled hooks: ruff (lint + format) and mypy. Installed via `make setup`. + +## Testing conventions +- Pytest with `asyncio_mode = "auto"` (no need for `@pytest.mark.asyncio`) +- `testpaths = ["tests"]` +- Default args: `-v --cov=ontokit --cov-report=term-missing` +- Layout: `tests/unit/` and `tests/integration/` diff --git a/.serena/memories/project_overview.md b/.serena/memories/project_overview.md new file mode 100644 index 0000000..57495e0 --- /dev/null +++ b/.serena/memories/project_overview.md @@ -0,0 +1,31 @@ +# OntoKit API — Project Overview + +Collaborative OWL ontology curation API built with **FastAPI** (Python 3.11+, target 3.13). Distributed as the `ontokit` package on PyPI. + +## Purpose +Provides a RESTful API for managing ontologies, semantic web knowledge graphs, and team collaboration with git-based version control. Sister project to `ontokit-web` (frontend). + +## Core Capabilities +- REST endpoints for ontologies, classes, properties, individuals +- Project management (public/private visibility, member roles) +- Git-based version control with branches + PR workflow (pygit2 bare repos for concurrent access) +- 20+ ontology linting/validation rules +- Semantic search via `sentence-transformers` + `pgvector` +- Real-time collaboration over WebSockets +- Background job queue (ARQ + Redis) +- GitHub App integration for syncing remote repos + +## Tech Stack +- **Framework**: FastAPI, async-first +- **Database**: PostgreSQL 17 + SQLAlchemy 2.0 (async via asyncpg) + Alembic migrations +- **Cache/Queue**: Redis 7 + ARQ +- **Object Storage**: MinIO (S3-compatible) +- **Auth**: Zitadel (OIDC/OAuth2, JWT validation) +- **RDF**: RDFLib 7.1+, OWLReady2 +- **Git**: pygit2 (bare repos); legacy GitPython implementation deprecated +- **Validation**: Pydantic v2 (strict mode) +- **Package Mgmt**: uv + +## Repo Location + +Companion repos in same parent directory: `ontokit-web` (frontend), `folio-api`, `ontokit-api.wiki`. diff --git a/.serena/memories/project_structure.md b/.serena/memories/project_structure.md new file mode 100644 index 0000000..43dbd64 --- /dev/null +++ b/.serena/memories/project_structure.md @@ -0,0 +1,55 @@ +# Codebase Structure — ontokit-api + +## Top-level layout +```text +ontokit-api/ +├── ontokit/ # main package +├── tests/ # unit + integration tests +├── alembic/ # DB migrations +├── scripts/ # release, migration, setup scripts +├── data/ # data assets +├── docs/ # documentation +├── config/ # config files +├── compose.yaml # docker compose (dev) +├── compose.prod.yaml # docker compose (prod infra) +├── Dockerfile / Dockerfile.prod +├── pyproject.toml / uv.lock +├── alembic.ini +├── Makefile +├── .env.example +├── CLAUDE.md, AGENTS.md, GEMINI.md, README.md, SECURITY.md, RELEASING.md +``` + +## ontokit/ package layout (layered architecture) +```text +ontokit/ +├── api/routes/ # REST endpoints (FastAPI routers) +├── services/ # Business logic layer +├── models/ # SQLAlchemy ORM models +├── schemas/ # Pydantic v2 request/response schemas +├── core/ # Config, database, auth infrastructure +├── git/ # Git repository management (bare repos via pygit2) +├── collab/ # WebSocket real-time collaboration +├── version.py # Version mgmt (Weblate-style, with -dev/-rc suffix support) +├── runner.py # CLI entry point (`ontokit` script) +├── worker.py # ARQ background job worker +└── main.py # FastAPI app + router registration +``` + +URL prefix `/api/v1/` is registered in `main.py`, NOT in directory structure. + +## Key services (ontokit/services/) +- **ontology.py** — RDF/OWL graph operations (RDFLib + OWLReady2) +- **linter.py** — 20+ ontology validation rules +- **pull_request_service.py** — git-based PR workflow with diff generation +- **github_service.py** — GitHub App integration for remote sync +- **project_service.py** — project CRUD + member management + +## Git module (ontokit/git/) +- **bare_repository.py** — `BareOntologyRepository` + `BareGitRepositoryService`; pygit2-based, no working dir, supports concurrent multi-user branch work +- **repository.py** — Legacy GitPython implementation (DEPRECATED) + +## Tests +- `tests/unit/` — unit tests +- `tests/integration/` — integration tests +- `tests/conftest.py` — shared fixtures diff --git a/.serena/memories/suggested_commands.md b/.serena/memories/suggested_commands.md new file mode 100644 index 0000000..c55461f --- /dev/null +++ b/.serena/memories/suggested_commands.md @@ -0,0 +1,82 @@ +# Suggested Commands — ontokit-api + +## First-time setup +```bash +make setup # uv sync --extra dev + pre-commit install +cp .env.example .env # then edit .env +./scripts/setup-zitadel.sh --update-env # provision Zitadel OIDC apps +``` + +## Dev server +```bash +uvicorn ontokit.main:app --reload +ontokit --reload # equivalent CLI (installed entry point) +``` + +## Docker (full stack) +```bash +docker compose up -d # full local stack +docker compose -f compose.prod.yaml up -d # infra only (hybrid mode) +docker compose exec api alembic upgrade head # migrate inside container +docker compose up -d --force-recreate api worker # restart after .env change +``` + +## Linting / Formatting / Type checking +```bash +make lint # uv run ruff check ontokit/ tests/ --fix +make format # uv run ruff format ontokit/ tests/ +make typecheck # uv run mypy ontokit/ + +# raw equivalents: +ruff check ontokit/ --fix +ruff format ontokit/ +mypy ontokit/ +``` + +## Tests +```bash +make test # full suite w/ coverage +pytest tests/ -v --cov=ontokit # explicit +pytest tests/unit/test_health.py -v # single file +pytest tests/ -k "test_name" -v # by keyword +``` + +## Security scan (Semgrep) +With Pro: +```bash +semgrep --pro --config p/default --config p/owasp-top-ten --config p/python --config p/fastapi --config p/jwt +``` +Without Pro: drop `--pro`. + +## DB migrations +```bash +alembic upgrade head +alembic downgrade -1 +alembic revision --autogenerate -m "description" +``` + +## Build / Publish +```bash +uv build +uv run twine check --strict dist/* +uv publish +``` + +## Release flow +```bash +python scripts/prepare-release.py # strip -dev suffix, commit +git tag -s ontokit-X.Y.Z +git push --tags # CI/CD publishes +python scripts/set-version.py X.Y.Z # set next dev version (adds -dev) +``` + +## Migration: old → bare git repos +```bash +python scripts/migrate_to_bare_repos.py --dry-run +python scripts/migrate_to_bare_repos.py +python scripts/migrate_to_bare_repos.py --keep-old +``` + +## System utilities (Linux/WSL2) +Standard GNU coreutils. `cd`, `ls`, `grep`, `find`, `git` behave normally. +Prefer Serena's `find_file`, `search_for_pattern`, `find_symbol` over shell `find`/`grep` when working inside the repo. diff --git a/.serena/memories/task_completion_checklist.md b/.serena/memories/task_completion_checklist.md new file mode 100644 index 0000000..053bc8c --- /dev/null +++ b/.serena/memories/task_completion_checklist.md @@ -0,0 +1,49 @@ +# When a Coding Task Is Complete — ontokit-api + +Run these BEFORE declaring work done or committing: + +1. **Lint + auto-fix** + ```bash + make lint # or: ruff check ontokit/ tests/ --fix + ``` + +2. **Format** + ```bash + make format # or: ruff format ontokit/ tests/ + ``` + +3. **Type check (strict mypy)** + ```bash + make typecheck # or: mypy ontokit/ + ``` + +4. **Tests with coverage** + ```bash + make test # or: pytest tests/ -v --cov=ontokit + ``` + +5. **(Optional/CI) Security scan** + ```bash + semgrep --pro --config p/default --config p/owasp-top-ten --config p/python --config p/fastapi --config p/jwt + # drop --pro if no Pro entitlement + ``` + +6. **DB schema changes** — generate + commit a migration: + ```bash + alembic revision --autogenerate -m "description" + alembic upgrade head # verify it applies cleanly + ``` + +7. **Pre-commit** runs ruff + mypy automatically on commit (installed via `make setup`). + Don't bypass with `--no-verify` unless explicitly authorized. + +8. **CI** runs `semgrep ci` (diff-aware) — keep `.semgrepignore` honest. + +## Release-specific +For a release, follow `RELEASING.md`: +```bash +python scripts/prepare-release.py +git tag -s ontokit-X.Y.Z +git push --tags +python scripts/set-version.py X.Y.Z +``` diff --git a/.serena/project.yml b/.serena/project.yml new file mode 100644 index 0000000..107567a --- /dev/null +++ b/.serena/project.yml @@ -0,0 +1,158 @@ +# the name by which the project can be referenced within Serena +project_name: "ontokit-api" + + +# list of languages for which language servers are started; choose from: +# al bash clojure cpp csharp +# csharp_omnisharp dart elixir elm erlang +# fortran fsharp go groovy haskell +# haxe java julia kotlin lua +# markdown +# matlab nix pascal perl php +# php_phpactor powershell python python_jedi r +# rego ruby ruby_solargraph rust scala +# swift terraform toml typescript typescript_vts +# vue yaml zig +# (This list may be outdated. For the current list, see values of Language enum here: +# https://github.com/oraios/serena/blob/main/src/solidlsp/ls_config.py +# For some languages, there are alternative language servers, e.g. csharp_omnisharp, ruby_solargraph.) +# Note: +# - For C, use cpp +# - For JavaScript, use typescript +# - For Free Pascal/Lazarus, use pascal +# Special requirements: +# Some languages require additional setup/installations. +# See here for details: https://oraios.github.io/serena/01-about/020_programming-languages.html#language-servers +# When using multiple languages, the first language server that supports a given file will be used for that file. +# The first language is the default language and the respective language server will be used as a fallback. +# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored. +languages: +- python + +# the encoding used by text files in the project +# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings +encoding: "utf-8" + +# line ending convention to use when writing source files. +# Possible values: unset (use global setting), "lf", "crlf", or "native" (platform default) +# This does not affect Serena's own files (e.g. memories and configuration files), which always use native line endings. +line_ending: + +# The language backend to use for this project. +# If not set, the global setting from serena_config.yml is used. +# Valid values: LSP, JetBrains +# Note: the backend is fixed at startup. If a project with a different backend +# is activated post-init, an error will be returned. +language_backend: + +# whether to use project's .gitignore files to ignore files +ignore_all_files_in_gitignore: true + +# advanced configuration option allowing to configure language server-specific options. +# Maps the language key to the options. +# Have a look at the docstring of the constructors of the LS implementations within solidlsp (e.g., for C# or PHP) to see which options are available. +# No documentation on options means no options are available. +ls_specific_settings: {} + +# list of additional paths to ignore in this project. +# Same syntax as gitignore, so you can use * and **. +# Note: global ignored_paths from serena_config.yml are also applied additively. +ignored_paths: [] + +# whether the project is in read-only mode +# If set to true, all editing tools will be disabled and attempts to use them will result in an error +# Added on 2025-04-18 +read_only: false + +# list of tool names to exclude. +# This extends the existing exclusions (e.g. from the global configuration) +# +# Below is the complete list of tools for convenience. +# To make sure you have the latest list of tools, and to view their descriptions, +# execute `uv run scripts/print_tool_overview.py`. +# +# * `activate_project`: Activates a project based on the project name or path. +# * `check_onboarding_performed`: Checks whether project onboarding was already performed. +# * `create_text_file`: Creates/overwrites a file in the project directory. +# * `delete_memory`: Delete a memory file. Should only happen if a user asks for it explicitly, +# for example by saying that the information retrieved from a memory file is no longer correct +# or no longer relevant for the project. +# * `edit_memory`: Replaces content matching a regular expression in a memory. +# * `execute_shell_command`: Executes a shell command. +# * `find_file`: Finds files in the given relative paths +# * `find_referencing_symbols`: Finds symbols that reference the given symbol using the language server backend +# * `find_symbol`: Performs a global (or local) search using the language server backend. +# * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes. +# * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file. +# * `initial_instructions`: Provides instructions Serena usage (i.e. the 'Serena Instructions Manual') +# for clients that do not read the initial instructions when the MCP server is connected. +# * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol. +# * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol. +# * `list_dir`: Lists files and directories in the given directory (optionally with recursion). +# * `list_memories`: List available memories. Any memory can be read using the `read_memory` tool. +# * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building). +# * `read_file`: Reads a file within the project directory. +# * `read_memory`: Read the content of a memory file. This tool should only be used if the information +# is relevant to the current task. You can infer whether the information +# is relevant from the memory file name. +# You should not read the same memory file multiple times in the same conversation. +# * `rename_memory`: Renames or moves a memory. Moving between project and global scope is supported +# (e.g., renaming "global/foo" to "bar" moves it from global to project scope). +# * `rename_symbol`: Renames a symbol throughout the codebase using language server refactoring capabilities. +# For JB, we use a separate tool. +# * `replace_content`: Replaces content in a file (optionally using regular expressions). +# * `replace_symbol_body`: Replaces the full definition of a symbol using the language server backend. +# * `safe_delete_symbol`: +# * `search_for_pattern`: Performs a search for a pattern in the project. +# * `write_memory`: Write some information (utf-8-encoded) about this project that can be useful for future tasks to a memory in md format. +# The memory name should be meaningful. +excluded_tools: [] + +# list of tools to include that would otherwise be disabled (particularly optional tools that are disabled by default). +# This extends the existing inclusions (e.g. from the global configuration). +included_optional_tools: [] + +# fixed set of tools to use as the base tool set (if non-empty), replacing Serena's default set of tools. +# This cannot be combined with non-empty excluded_tools or included_optional_tools. +fixed_tools: [] + +# list of mode names to that are always to be included in the set of active modes +# The full set of modes to be activated is base_modes + default_modes. +# If the setting is undefined, the base_modes from the global configuration (serena_config.yml) apply. +# Otherwise, this setting overrides the global configuration. +# Set this to [] to disable base modes for this project. +# Set this to a list of mode names to always include the respective modes for this project. +base_modes: + +# list of mode names that are to be activated by default. +# The full set of modes to be activated is base_modes + default_modes. +# If the setting is undefined, the default_modes from the global configuration (serena_config.yml) apply. +# Otherwise, this overrides the setting from the global configuration (serena_config.yml). +# This setting can, in turn, be overridden by CLI parameters (--mode). +default_modes: + +# initial prompt for the project. It will always be given to the LLM upon activating the project +# (contrary to the memories, which are loaded on demand). +initial_prompt: | + - API URL prefix `/api/v1/` is set in `ontokit/main.py` router registration; do NOT recreate it in the directory tree. + - Git operations: use `ontokit/git/bare_repository.py` (pygit2). The legacy `ontokit/git/repository.py` (GitPython) is DEPRECATED — do not extend it. + - All I/O is async/await; all datetimes are UTC-aware (no naive datetimes). + - Default branch is `dev`; PRs target `dev`. + - Pre-commit runs ruff + mypy strict; never bypass with `--no-verify`. +# time budget (seconds) per tool call for the retrieval of additional symbol information +# such as docstrings or parameter information. +# This overrides the corresponding setting in the global configuration; see the documentation there. +# If null or missing, use the setting from the global configuration. +symbol_info_budget: + +# list of regex patterns which, when matched, mark a memory entry as read‑only. +# Extends the list from the global configuration, merging the two lists. +read_only_memory_patterns: [] + +# list of regex patterns for memories to completely ignore. +# Matching memories will not appear in list_memories or activate_project output +# and cannot be accessed via read_memory or write_memory. +# To access ignored memory files, use the read_file tool on the raw file path. +# Extends the list from the global configuration, merging the two lists. +# Example: ["_archive/.*", "_episodes/.*"] +ignored_memory_patterns: []