Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 39 additions & 27 deletions Dockerfile.prod
Original file line number Diff line number Diff line change
@@ -1,56 +1,68 @@
# OntoKit API Production Dockerfile
FROM python:3.13-slim
# OntoKit API Production Dockerfile (multi-stage)
# ---------- Stage 1: builder ----------
FROM python:3.13-slim AS builder
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Compare Docker production runtime against declared Python support/tooling targets.

rg -n -C2 'python:3\.13|Programming Language :: Python|python_version|target-version' Dockerfile.prod pyproject.toml

Repository: CatholicOS/ontokit-api

Length of output: 1202


Align the production Python version with supported/tested versions.

The Dockerfile runs Python 3.13, but pyproject.toml declares support for only Python 3.11 and 3.12, and tooling targets Python 3.11. This creates a production runtime that has never been tested or validated.

Change to python:3.12-slim on lines 3 and 33, or add Python 3.13 to classifiers and CI coverage before deployment.

Suggested change for Python 3.12
-FROM python:3.13-slim AS builder
+FROM python:3.12-slim AS builder
-FROM python:3.13-slim AS runtime
+FROM python:3.12-slim AS runtime
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.prod` at line 3, The Dockerfile.prod currently uses the untested
base image "python:3.13-slim" (seen in the FROM python:3.13-slim AS builder and
the later python:3.13-slim runtime stage); update both occurrences to
"python:3.12-slim" so the production image matches the supported/tested Python
versions in pyproject.toml and your tooling, or alternatively add Python 3.13 to
pyproject classifiers and CI before keeping 3.13.


# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1

# Install system dependencies (git + libgit2-dev required by pygit2/gitpython)
# Build-time deps only: gcc is a safety net for any source-only sdists.
# pygit2 ships manylinux wheels with libgit2 bundled, so no libgit2-dev is needed.
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
libgit2-dev \
gcc \
&& rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN useradd --create-home --shell /bin/bash ontokit
WORKDIR /home/ontokit/app
# Install dependencies into a dedicated virtualenv so the runtime stage can
# copy a single self-contained tree.
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Create git repo storage directory
RUN mkdir -p /data/repos && chown ontokit:ontokit /data/repos

# Install Python dependencies
WORKDIR /build
COPY pyproject.toml README.md ./
COPY ontokit/version.py ./ontokit/version.py
COPY ontokit/ ./ontokit/
RUN pip install --upgrade pip && \
pip install .
Comment thread
coderabbitai[bot] marked this conversation as resolved.

# Copy application code
COPY --chown=ontokit:ontokit ontokit/ ./ontokit/
# Strip bytecode caches and test directories from site-packages to shrink the venv.
RUN find /opt/venv -type d -name "__pycache__" -exec rm -rf {} + && \
find /opt/venv -type d -name "tests" -exec rm -rf {} + && \
find /opt/venv -type d -name "test" -exec rm -rf {} +

# ---------- Stage 2: runtime ----------
FROM python:3.13-slim AS runtime

# Copy alembic configuration for migrations
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PATH="/opt/venv/bin:$PATH"

# Runtime-only deps: curl for healthcheck, git for GitPython subprocess calls.
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
&& rm -rf /var/lib/apt/lists/*

RUN useradd --create-home --shell /bin/bash ontokit && \
mkdir -p /data/repos && \
chown ontokit:ontokit /data/repos

COPY --from=builder /opt/venv /opt/venv

WORKDIR /home/ontokit/app

# The ontokit package is already installed in /opt/venv by the builder stage,
# so we only need the alembic migration files in the workdir at runtime.
COPY --chown=ontokit:ontokit alembic.ini ./
COPY --chown=ontokit:ontokit alembic/ ./alembic/

# Copy entrypoint script (runs migrations before starting the app)
COPY --chown=ontokit:ontokit scripts/entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

# Switch to non-root user
USER ontokit

# Pre-download sentence-transformers model to avoid first-request latency
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1

# Auto-migrate on startup, then run the app
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["uvicorn", "ontokit.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
9 changes: 8 additions & 1 deletion ontokit/services/embedding_providers/local_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@
def _get_model(model_name: str) -> object:
"""Get or load a sentence-transformers model (cached)."""
if model_name not in _models:
from sentence_transformers import SentenceTransformer
try:
from sentence_transformers import SentenceTransformer
except ImportError as exc:
raise RuntimeError(
"The local embedding provider requires 'sentence-transformers'. "
"Install it with: pip install 'ontokit[local-embeddings]' "
"(or choose the 'openai' or 'voyage' provider instead)."
) from exc
Comment on lines +17 to +24
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Confirm that sentence-transformers is optional while local provider remains reachable
# from default embedding configuration paths.

printf '\nOptional dependency declaration:\n'
rg -n -C2 'local-embeddings|sentence-transformers' pyproject.toml

printf '\nStatic optional import sites:\n'
rg -n -C3 'from sentence_transformers import SentenceTransformer' --iglob '*.py'

printf '\nDefault local provider paths:\n'
rg -n -C3 'cfg else "local"|provider_name = cfg.provider|get_embedding_provider\(config.provider' --iglob '*.py'

Repository: CatholicOS/ontokit-api

Length of output: 2851


🏁 Script executed:

#!/bin/bash

# Check the LocalProvider class structure and when _get_model is called
printf '\n=== LocalProvider class initialization ===\n'
sed -n '1,60p' ontokit/services/embedding_providers/local_provider.py | cat -n

# Check if _get_model is called in __init__ or lazily
printf '\n=== Check for properties/methods that trigger _get_model ===\n'
rg -n 'def (dimensions|__init__|embed)' ontokit/services/embedding_providers/local_provider.py -A5

# Check the EmbeddingService singleton pattern
printf '\n=== EmbeddingService instantiation pattern ===\n'
rg -n 'def get_embedding_service|class EmbeddingService' ontokit/services/embedding_service.py -A3

# Check base provider to see if dimensions is a property
printf '\n=== Base EmbeddingProvider ===\n'
rg -n 'class EmbeddingProvider|def dimensions' ontokit/services/embedding_providers/base.py -A3

Repository: CatholicOS/ontokit-api

Length of output: 3884


🏁 Script executed:

#!/bin/bash

# Check if MyPy is configured and what rules are enabled
printf '\n=== MyPy/Ruff configuration ===\n'
if [ -f pyproject.toml ]; then
  sed -n '/\[tool.mypy\]/,/^\[/p' pyproject.toml | head -20
  echo '---'
  sed -n '/\[tool.ruff\]/,/^\[/p' pyproject.toml | head -30
fi

# Check how get_embedding_provider is called
printf '\n=== get_embedding_provider function signature ===\n'
rg -n 'def get_embedding_provider' ontokit/services/embedding_providers/__init__.py -A10

Repository: CatholicOS/ontokit-api

Length of output: 889


🏁 Script executed:

#!/bin/bash

# Search for any type stubs or imports that might handle optional dependencies
printf '\n=== Optional dependency handling patterns ===\n'
rg -n 'TYPE_CHECKING|Optional\[|try:.*import|importlib' ontokit/services/embedding_providers/ --iglob '*.py' -B2 -A2

Repository: CatholicOS/ontokit-api

Length of output: 112


Use dynamic import for optional sentence-transformers dependency.

With sentence-transformers optional in the default install, the static import at line 18 will fail MyPy strict type checking. Additionally, the dimensions property is accessed immediately when get_embedding_provider() is called (embedding_service.py:110), meaning default projects without the optional extra will hit this code path at runtime.

Use importlib.import_module() to defer the import and avoid breaking strict type checking in default environments:

Proposed fix
 import asyncio
+from importlib import import_module
 import logging
+from typing import Any, cast
 
 from ontokit.services.embedding_providers.base import EmbeddingProvider
@@
     """Get or load a sentence-transformers model (cached)."""
     if model_name not in _models:
         try:
-            from sentence_transformers import SentenceTransformer
-        except ImportError as exc:
+            sentence_transformers = import_module("sentence_transformers")
+        except ModuleNotFoundError as exc:
+            if exc.name != "sentence_transformers":
+                raise
             raise RuntimeError(
                 "The local embedding provider requires 'sentence-transformers'. "
                 "Install it with: pip install 'ontokit[local-embeddings]' "
                 "(or choose the 'openai' or 'voyage' provider instead)."
             ) from exc
+        SentenceTransformer = cast(Any, getattr(sentence_transformers, "SentenceTransformer"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/services/embedding_providers/local_provider.py` around lines 17 - 24,
Replace the static top-level import of SentenceTransformer with a dynamic import
using importlib.import_module so the optional dependency is only required when
actually instantiating the provider: inside the code path that constructs or
accesses the LocalEmbeddingProvider (e.g., within get_embedding_provider() or
the LocalEmbeddingProvider.__init__/factory method) call
importlib.import_module("sentence_transformers") and then get
SentenceTransformer = module.SentenceTransformer; keep the same
ImportError->RuntimeError conversion (raise RuntimeError(...) from exc) so
callers still get the friendly message, and ensure you only access the
dimensions property after the dynamic import so default installs without the
optional extra don’t trigger MyPy/ runtime import failures.


logger.info(f"Loading sentence-transformers model: {model_name}")
_models[model_name] = SentenceTransformer(model_name)
Expand Down
12 changes: 11 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,15 @@ dependencies = [
"minio>=7.2.0",
"websockets>=14.0",
"slowapi>=0.1.9",
"sentence-transformers>=5.4.0",
"pgvector>=0.3.0",
"numpy>=2.4.4",
]

[project.optional-dependencies]
local-embeddings = [
"sentence-transformers>=5.4.0",
]

[dependency-groups]
dev = [
"pytest>=9.0.3",
Expand Down Expand Up @@ -112,6 +116,12 @@ init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true

# sentence-transformers is an optional extra ([local-embeddings]); it won't be
# present in the default environment, so mypy should not fail on the import.
[[tool.mypy.overrides]]
module = ["sentence_transformers"]
ignore_missing_imports = true

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
Expand Down
10 changes: 9 additions & 1 deletion tests/unit/test_embedding_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ async def test_creates_new_config_when_none_exists(
self, service: EmbeddingService, mock_db: AsyncMock
) -> None:
"""Creates a new ProjectEmbeddingConfig when none exists."""
from unittest.mock import patch

result = MagicMock()
result.scalar_one_or_none.return_value = None
mock_db.execute.return_value = result
Expand All @@ -118,7 +120,13 @@ async def test_creates_new_config_when_none_exists(
update.api_key = None
update.auto_embed_on_save = True

await service.update_config(PROJECT_ID, update)
mock_provider = MagicMock()
mock_provider.dimensions = 384
with patch(
"ontokit.services.embedding_service.get_embedding_provider",
return_value=mock_provider,
):
await service.update_config(PROJECT_ID, update)
mock_db.add.assert_called_once()
added = mock_db.add.call_args[0][0]
assert added.auto_embed_on_save is True
Expand Down
31 changes: 31 additions & 0 deletions tests/unit/test_local_embedding_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""Tests for the local embedding provider (ontokit/services/embedding_providers/local_provider.py)."""

from __future__ import annotations

import sys

import pytest

from ontokit.services.embedding_providers import local_provider
from ontokit.services.embedding_providers.local_provider import LocalEmbeddingProvider


class TestSentenceTransformersMissing:
"""sentence-transformers is an optional extra; verify the friendly error path."""

def test_dimensions_raises_runtime_error_pointing_at_extras(
self, monkeypatch: pytest.MonkeyPatch
) -> None:
# Poison sys.modules so `from sentence_transformers import ...` fails
# with ImportError even if the package is installed in the dev env.
monkeypatch.setitem(sys.modules, "sentence_transformers", None)
# Reset the model cache so _get_model actually attempts the import.
monkeypatch.setattr(local_provider, "_models", {})

provider = LocalEmbeddingProvider(model_name="all-MiniLM-L6-v2")
with pytest.raises(RuntimeError, match="local-embeddings") as excinfo:
_ = provider.dimensions

# The RuntimeError preserves the original ImportError as its cause,
# so callers (and logs) can trace back to the real failure.
assert isinstance(excinfo.value.__cause__, ImportError)
9 changes: 7 additions & 2 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading