feat: database abstraction layer for Oracle 23ai support#947
Open
feat: database abstraction layer for Oracle 23ai support#947
Conversation
ba92cd7 to
8cd903d
Compare
| f" ORDER BY embedding <=> $1::vector" | ||
| f" LIMIT {hnsw_fetch})" | ||
| ) | ||
| if not _is_pg: |
Collaborator
There was a problem hiding this comment.
we should move this query code to a function of the abstraction and every impl has his own syntax.
all this if not _is_pg: are going to be a mess to maintain
Introduce DatabaseBackend ABC and SQLDialect ABC to decouple the engine from asyncpg/PostgreSQL, enabling Oracle 23ai as a second database platform. - engine/db/: DatabaseBackend (pool lifecycle), DatabaseConnection (query execution), ResultRow (uniform row access) - engine/db/postgresql.py: asyncpg implementation - engine/db/oracle.py: python-oracledb implementation (thin mode) - engine/sql/: SQLDialect ABC with 20+ dialect methods (params, vector ops, JSON, upsert, FTS, etc.) - engine/sql/postgresql.py: PG dialect ($N, <=>, @>, unnest, ON CONFLICT, etc.) - engine/sql/oracle.py: Oracle dialect (:N, MERGE INTO, VECTOR_DISTANCE, JSON_MERGEPATCH, etc.) - config.py: HINDSIGHT_API_DATABASE_BACKEND env var (default: postgresql) - memory_engine.py: uses DatabaseBackend.initialize() instead of raw asyncpg.create_pool(); exposes get_pool() for backward compat - 52 unit tests covering all abstractions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 tests exercising OracleBackend + OracleDialect abstractions: - Pool lifecycle (initialize, acquire, shutdown) - Transaction commit/rollback semantics - All DatabaseConnection methods (execute, executemany, fetch, fetchrow, fetchval) - ResultRow dict-like access on real Oracle rows - Vector insert + VECTOR_DISTANCE cosine search - JSON insert/extract/MERGEPATCH - MERGE INTO upsert via dialect - ILIKE via UPPER() dialect - UTL_MATCH fuzzy similarity via dialect - FOR UPDATE SKIP LOCKED - OFFSET/FETCH FIRST pagination via dialect - Concurrent pool operations - Dialect SQL fragments accepted by real Oracle (SYS_GUID, SYSTIMESTAMP, GREATEST) All skip cleanly when ORACLE_TEST_DSN is not set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace raw asyncpg pool usage across all SQL-executing consumers with the DatabaseBackend interface, enabling Oracle 23ai support without changing business logic. Key changes: - memory_engine.py passes self._backend instead of self._pool to all consumers (orchestrator, entity_resolver, bank_utils, config_resolver, webhooks, file storage, task backend, audit logger) - BudgetedPool updated to handle both DatabaseBackend and raw pool, with _wraps_backend marker for acquire_with_retry dispatch - acquire_with_retry recognizes BudgetedPool via _wraps_backend attr - WebhookManager uses conn-based queries instead of pool convenience methods - consolidator, storage, task_backend, audit use acquire_with_retry instead of direct pool.acquire()/pool.execute() - Oracle backend: transparent query rewriting ($N->:N, strip ::casts, NOW()->SYSTIMESTAMP), transaction support via savepoints, cursor fixes - DatabaseConnection.copy_records_to_table with asyncpg native COPY override and executemany INSERT fallback for Oracle - Oracle query rewriter tests and backend integration test fixes 1344 tests pass; all failures are Groq API rate limiting, not migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all Oracle-specific logic behind conn.backend_type dispatch so the PostgreSQL path is entirely untouched when running on PG. Key changes: - engine/schema.py: centralized fq_table/fq_table_explicit with Oracle awareness (Oracle skips schema prefix since it uses ALTER SESSION) - engine/db/base.py: backend_type property, bulk_insert_from_arrays method, capability flags (supports_partial_indexes, supports_bm25, supports_unnest, supports_pg_trgm) on DatabaseConnection/Backend - engine/db/oracle.py: massive expansion (+782) with transparent PG→Oracle query rewriting, RETURNING INTO handling, UUID RAW(16) conversion, ANY() expansion, array-contains rewriting, executemany bulk insert - entity_resolver, link_utils, fact_storage, chunk_storage, bank_utils: Oracle fallback paths using conn.backend_type dispatch - search/retrieval, link_expansion_retrieval: Oracle-aware search CTE dispatch (no BM25, different vector distance syntax) - task_backend, storage/postgresql, webhooks/manager, worker/poller, admin/cli: delegate to schema.py for table qualification - api/http.py: use backend instead of raw pool throughout - migrations_oracle.py: stub for Oracle DDL migrations - tests/test_db_abstraction.py: updated for tuple return from _rewrite_pg_to_oracle Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests Implement full Oracle DDL migrations (13 tables, 30+ indexes, HNSW vector index, Oracle Text index), Oracle-specific graph retrieval using JSON_TABLE for observation expansion, and query rewriter fixes for JSONB boolean patterns and quoted identifiers. Key changes: - migrations_oracle.py: full idempotent DDL replacing the stub - link_expansion_retrieval.py: Oracle graph retrieval via JSON_TABLE/ROW_NUMBER - oracle.py: JSONB boolean rewrite, CTE MATERIALIZED strip, COALESCE fix - retrieval.py: subquery alias fix for Oracle derived tables - 49 Oracle integration tests + 10 HTTP integration tests (all passing) - conftest.py: Oracle test fixtures, fixed import ordering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per PR review feedback, moved backend-specific retrieval query SQL from inline if/else blocks in retrieval.py into build_semantic_arm(), build_bm25_arm(), and prepare_bm25_text() methods on the SQLDialect ABC. Each database (PostgreSQL, Oracle) now owns its own query arm construction, eliminating all _is_pg checks from retrieval.py. Also fixes two migration bugs: - h3c4d5e6f7g8: CREATE TABLE IF NOT EXISTS was a no-op when mental_models already existed from the reflections rename chain, so v4 columns (subtype, description, etc.) were never added. Added idempotent ALTER TABLE ADD COLUMN IF NOT EXISTS for each column. - o0j1k2l3m4n5: CHECK constraint restricted subtype to 'directive' only, but memory_engine.py creates mental models with subtype='pinned'. Updated constraint to allow both. Other fixes: - Added session-scoped test cleanup to drop accumulated per-bank vector indexes and truncate stale data from pg0 between test runs - Fixed pg_trgm tests: MagicMock conn needs explicit backend_type="postgresql" to prevent Oracle dispatch path from triggering - Added RewriteResult NamedTuple to replace bare tuple return from _rewrite_pg_to_oracle (per project coding standards) - Added type hints to _resolve_entities_batch_oracle_fuzzy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0535fcc to
4b57ee7
Compare
…st flakiness Thread ops through the retain pipeline (orchestrator → link_creation → link_utils) via DatabaseBackend.ops property, eliminating scattered create_data_access_ops() calls. Fix 14 test failures caused by LLM non-determinism (mock deterministic facts), stale pg0 state (stamp migrations), missing backend.ops access (use DatabaseBackend instead of raw Pool), and incorrect default assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test-api-oracle job that mirrors test-api but runs against an Oracle 23ai Free container service. Only runs when the PR has the "oracle-tests" label, so it's off by default and doesn't affect normal CI. Uses pytest -m oracle marker to run Oracle-specific tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration h3c4d5e6f7g8 used CREATE TABLE IF NOT EXISTS which was a no-op on databases where mental_models already existed from the reflections rename chain. The fix added to h3c4d5e6f7g8 (Step 4b) only helps fresh databases — existing ones already have the migration stamped as applied. This new migration adds the missing columns idempotently for databases in that state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add migration d5y6z7a8b9c0 to idempotently add missing columns to mental_models table for databases where h3c4d5e6f7g8 was already stamped as applied before the fix - Remove redundant "Wait for Oracle" step in CI — services health check already ensures Oracle is ready before steps run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test-api-oracle CI job was failing because oracledb wasn't installed. Added it as an optional 'oracle' dependency group and regenerated the lockfile so --all-extras picks it up. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Oracle's VECTOR type requires automatic segment space management (ASSM). The SYSTEM tablespace uses manual SSM, causing ORA-43853 on table creation. Fix: create a dedicated tablespace with ASSM and a test user before running Oracle tests. Also fix DSN format to use path-based service name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mental_models table in Oracle migrations was missing structured_content and last_refreshed_source_query columns that memory_engine.py actively reads/writes. This caused mental model create/refresh to fail on Oracle. The Oracle HTTP test for mental models was asserting `status in (200, 500)` which silently accepted this failure. Tightened to assert 200 so missing columns can't hide behind lenient assertions again. Also documented known PG-only code paths (task_backend, poller, webhooks, audit, config_resolver) in the Oracle backend docstring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WorkerPoller and BrokerTaskBackend still use raw asyncpg pool APIs and PG-specific SQL (FOR UPDATE SKIP LOCKED, ::jsonb, NOW(), etc). Starting them on an Oracle backend would crash immediately. Guard both the embedded worker (API startup) and standalone worker (hindsight-worker command) to skip/exit on Oracle. Operations that normally go through the async worker will run synchronously instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move _get_mu_table() from PG/Oracle ops duplicates into DataAccessOps base class (DRY) - Remove unnecessary getattr() fallback for conn.backend_type in retrieval.py — it's a guaranteed property on DatabaseConnection - Promote _oracle_special set to _ORACLE_TEXT_SPECIAL frozenset class constant on OracleDialect (avoid recreation per call) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BrokerTaskBackend queues tasks into async_operations for the WorkerPoller to pick up, but the poller is disabled on Oracle (it uses raw asyncpg). This meant mental model refresh, consolidation, and async retain would be queued as "pending" but never executed. Fix: use SyncTaskBackend on Oracle, which executes tasks inline in the request. This matches what the test fixtures already do and ensures all operations complete on Oracle, just synchronously rather than in the background. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… test - Restore getattr(conn, "backend_type", "postgresql") in retrieval.py so raw asyncpg connections (used in test_hnsw_indexes) don't crash - Guard RETURNING→INTO rewrite to only apply on INSERT/UPDATE/DELETE, preventing false matches on SELECT queries - Add _safe_cleanup helpers in Oracle tests to suppress ORA-00060 deadlock errors during test teardown - Add TestOracleEndToEnd::test_full_lifecycle that verifies the complete retain→recall→reflect→mental model refresh flow, checking final state (not just HTTP 200) to catch issues like the SyncTaskBackend bug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Oracle's thin driver defaults short strings like '[]' to VARCHAR2, which fails with ORA-00932 when the target column is CLOB (e.g. tags, metadata). Detect JSON-shaped strings in _make_bind_params and use setinputsizes to explicitly type them as DB_TYPE_CLOB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
setinputsizes must run after _expand_any_lists — otherwise it registers types for params that get removed during expansion, causing DPY-4008 "no bind placeholder" errors. Move CLOB typing to a separate method called after expansion in each DML path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite COALESCE(col, '[]'::jsonb) || :N::jsonb to use JSON_MERGEPATCH with TO_CLOB for CLOB column compatibility - Escape Oracle Text reserved words (about, near, not, etc.) with curly braces in CONTAINS queries to prevent DRG-50901 parse errors - Guard json.loads() calls for result_metadata/task_payload to handle Oracle's pre-parsed JSON dicts alongside PG strings - Rewrite FOR SHARE → FOR UPDATE (Oracle doesn't support FOR SHARE) - Fix e2e test: replace nonexistent DELETE memory endpoint with list assertion, use substantive content for reliable fact extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run Oracle CI tests sequentially (-n0) to prevent ORA-00060 deadlocks from concurrent transactions against the same Oracle Free container. - Fix trigger column quoting: previous guard skipped quoting bare `trigger` in SELECT when `"trigger"` already appeared in JSON_VALUE. Use negative lookbehind/lookahead to quote only unquoted occurrences. - Use direct delete_bank (not _safe_cleanup) in test_retain_and_delete_cycle so mid-test bank deletion failures aren't silently swallowed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 6 explicit backend-type string checks in business logic with polymorphic dispatch through DatabaseBackend methods: - api/http.py: use backend.supports_worker_poller instead of config check - worker/main.py: use backend.supports_worker_poller instead of config check - memory_engine.py: use backend.run_migrations() for migration dispatch - memory_engine.py: use backend.create_task_backend() for task backend selection - memory_engine.py: use conn.parse_json() for JSON column normalization (6 sites) - config.py: type database_backend as Literal["postgresql", "oracle"] - db/__init__.py: refactor factories to use helper functions - db/base.py: add supports_worker_poller, run_migrations(), create_task_backend(), parse_json() to DatabaseBackend/DatabaseConnection ABCs No backend-type string checks remain in business logic files. They only exist in the db/ and sql/ abstraction layer factories where they belong. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves conflicts in: - memory_engine.py: integrated statement_timeout init callback with backend.initialize() abstraction - retrieval.py: added extra_where param to dialect build_semantic_arm() and build_bm25_arm() methods to support created_after/created_before time range filtering from main - openapi.json: took main's version (0.5.4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…a safety 1. Oracle rewriter: fix _UPSERT_RE regex to handle nested function calls in VALUES clause (e.g. COALESCE($7, NOW())). The previous regex used [^)]+ which broke on nested parentheses. Also added _split_respecting_parens() to correctly split values with nested calls. 2. Alembic: add merge migration e6f7g8h9i0j1 to unify two heads (8c6fa6f7230b from main, d5y6z7a8b9c0 from oracle branch). 3. task_backend.py: rephrase docstring to avoid false positive from test_sql_schema_safety (matched "INSERTed into async_operations"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Oracle COALESCE + SYSTIMESTAMP type mismatch: when a NULL datetime bind param appears in COALESCE(:N, SYSTIMESTAMP), Oracle's thin driver defaults the NULL to VARCHAR2, causing ORA-00932. Fixed by detecting this pattern in _apply_clob_input_sizes and hinting the param as DB_TYPE_TIMESTAMP_TZ. 2. result_metadata JSON parsing: main added json.loads(row["result_metadata"]) in list_operations which fails on Oracle where JSON columns return pre-parsed dicts. Replaced with conn.parse_json() for backend-agnostic handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ON_TABLE entity resolution - Refactor OracleOps.insert_facts_batch from N row-by-row fetchval calls to single executemany with client-side UUID generation (single network round-trip) - Replace PG-only unnest($2::text[]) in Oracle fuzzy entity resolution with native JSON_TABLE to expand entity texts into rows - Add unit tests verifying column mapping correctness, SQL structure, data transformation, fallback behavior, and candidate grouping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b476798 to
339dc97
Compare
…ON_TABLE entity resolution - Refactor OracleOps.insert_facts_batch from N row-by-row fetchval calls to single executemany with client-side UUID generation (single network round-trip) - Replace PG-only unnest($2::text[]) in Oracle fuzzy entity resolution with native JSON_TABLE to expand entity texts into rows - Add unit tests verifying column mapping correctness, SQL structure, data transformation, fallback behavior, and candidate grouping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
339dc97 to
f47acd9
Compare
…document stats) Merge origin/main into feature/database-abstraction. Resolves conflicts by keeping DatabaseBackend abstraction while incorporating main's new features: statement_timeout init, per-fact-type document counts, and cancel_operation status guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge two alembic heads (oracle merge + cancelled status) created by merging main into the database-abstraction branch. - Fix get_document query that used GROUP BY on CLOB columns which Oracle cannot handle. Rewrote with subquery for counts and portable CASE WHEN syntax instead of PG-specific FILTER (WHERE ...). - Add ty: ignore for Windows-only subprocess attrs in daemon_embed_manager. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ruff reformatted a multi-line conditional to single line. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1: Refactor WorkerPoller from raw asyncpg pool to DatabaseBackend abstraction. Oracle now uses BrokerTaskBackend (async worker/poller) instead of SyncTaskBackend (inline). Portable SQL patterns replace PG-specific COUNT(*) FILTER and pg_stat_activity. Phase 2: Add observation_sources junction table replacing source_memory_ids array/CLOB queries. Both PG and Oracle now use identical standard SQL joins instead of dialect-specific unnest/&& (PG) or JSON_TABLE (Oracle). Dual-write maintains backward compat. Phase 3: Add automatic list partitioning on memory_units(bank_id) for Oracle. Enables partition pruning on bank-scoped queries. HNSW vector index uses ORGANIZATION NEIGHBOR PARTITIONS. Text index (CTXSYS.CONTEXT) remains global — LOCAL not supported on LIST partitioned tables. All 5 Known Limitations from PR #947 review are now resolved: 1. Worker poller decoupled from asyncpg 2. Entity resolver (already done — JSON_TABLE + UTL_MATCH) 3. source_memory_ids array queries (junction table) 4. Per-bank vector index isolation (automatic partitioning) 5. Bulk insert optimization (already done — executemany) Tests: 1805 PG passed, 60 Oracle passed, lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DatabaseBackend/DatabaseConnectionABCs andSQLDialectABC to decouple the engine from asyncpg/PostgreSQL, enabling Oracle 23ai as a second database backend_is_pgchecksWhat works on Oracle
Benchmark results (2,000 items, cross-encoder reranker, 10 iterations)
Note: PG runs against pg0 (native/embedded) while Oracle runs in Docker, so the gap includes Docker virtualization overhead. CPU-bound phases (reranking, embedding) are identical across both backends. An apples-to-apples comparison (both in Docker or both native) would narrow the difference.
Remaining work (follow-up PRs)
Decouple worker poller from asyncpg
The worker poller (
worker/poller.py) currently uses raw asyncpg pool methods (pool.fetch,pool.execute,conn.transaction()) instead of going through theDatabaseBackendabstraction. This needs to be refactored to use the backend interface —python-oracledbhas full async support (create_pool_async()) and Oracle supportsSELECT ... FOR UPDATE SKIP LOCKEDfor the same atomic claiming semantics. Until then, Oracle must run withHINDSIGHT_API_WORKER_ENABLED=false. Blocks: async/batch retain, background consolidation, async mental model refresh. Estimate: 2-3 days.Port entity resolver fuzzy matching to dialect layer
_resolve_entities_batch_oracle_fuzzystill uses PG-onlyunnest($2::text[]) AS q(query_text)syntax. Oracle equivalents exist (JSON_TABLEorTABLE()for unnest,UTL_MATCH.JARO_WINKLER_SIMILARITYfor fuzzy scoring). Currently falls back to full entity scan (correct but slower for large banks). Estimate: 1-2 days.Refactor source_memory_ids array queries
10+ queries across
memory_engine.pyandlink_expansion_retrieval.pyuse PG array operators (ANY(),&&,unnest()) onsource_memory_ids. Two approaches: (A) switch Oracle column from CLOB to native JSON type and useJSON_EXISTS/JSON_TABLE, or (B) normalize into a junction table (fact_source_memories) which would simplify both backends and eliminate the dialect-specific rewrites entirely. Blocksinclude_source_factsandinclude_chunksin recall. Estimate: 3-5 days.Per-bank vector index isolation
Oracle doesn't support partial indexes on the VECTOR type, but the idiomatic Oracle approach is list partitioning by
bank_idwith a local HNSW index per partition — this gives equivalent tenant isolation plus partition pruning at query time. Currently uses a single global HNSW index (functional, not optimal). Estimate: 2-3 days including benchmarking.Extend bulk insert to last remaining site
2 of 3 insert sites already have Oracle bulk paths via
executemanyindb/ops_oracle.py.fact_storage.pystill uses row-by-row fallback. Straightforward to extend the existing pattern. Estimate: < 1 day.Key design decisions
self._pool = backend.get_pool()keeps all existing PG consumers working unchanged during transitionpython-oracledbwhen selected, no hard dependencybuild_semantic_arm()/build_bm25_arm()/prepare_bm25_text()build complete UNION ALL subquery arms per database, eliminating all inline backend checks from retrieval.pyTest plan
backend_typeissue)h3c4d5e6f7g8,o0j1k2l3m4n5)ruff check)🤖 Generated with Claude Code