feat(transfers): add NM_Wells 1:1 staging mirror + ref-table lexicon loader by jirhiker · Pull Request #686 · DataIntegrationGroup/OcotilloAPI

jirhiker · 2026-06-06T22:21:09Z

Summary

Phase 1 of the NM_Wells → Ocotillo migration: a 1:1 staging-mirror schema for the legacy NM_Wells SQL Server database, loaders that populate it (from a SQL dump or CSV), the ref_* → lexicon loader, a standalone orchestrator, and pygeoapi OGC views over the geothermal data.

Two-phase by design:

Phase 1 (this PR) — land NM_Wells source tables unchanged into NMW_* mirror tables (faithful, column-for-column), following the db/nma_legacy.py (NM_Aquifer) convention.
Phase 2 (later) — transform mirror rows into the Ocotillo model (Location → Thing → FieldEvent → Sample → Observation, etc). Not built here; per-column targets + lexicon mappings are flagged inline for it.

What's included

Mirror schema — `db/nmw_legacy.py` + 2 migrations

17 of 22 "Migrate First" source tables, NMW_ prefix, original column names preserved, each column annotated with its Phase-2 Ocotillo target:

Main (5): WellLocations, WellHeaders, WellRecords, WellZDatum, WellSamples (columns/types from the planning workbook field map).
Geothermal (7) + DST (5): Gt{BhtHeaders,BhtData,Conductivity,HeatFlow,SumHeatFlow,TempDepths}, WsIntervals, WsDst{Headers,Intervals,FlowHistory,FluidProperties,Pressure} (columns/lengths/PKs straight from the SQL-dump DDL).
PKs verified against the dump DDL; SSMA_TimeStamp rowversion dropped.
Migrations: u7v8w9x0y1z2 (Main), v8w9x0y1z2a3 (Geothermal/DST); bodies generated from model metadata.

Loaders

transfers/nmw_mirror_transfer.py — data-driven CSV/SQL → NMW_* loader; model-driven type coercion (NULL/NaN/NaT → None, rowversion dropped), chunked ON CONFLICT DO NOTHING upsert.
transfers/nmw_sql_dump.py — streams INSERT [dbo].[tbl_*] (...) VALUES (...) from a SQL Server data dump (handles N'...'/escaped '', embedded commas/parens, CAST, multi-row VALUES, 0x binary, UTF-16/UTF-8 BOM).
transfers/reference_lexicon_transfer.py — loads all 49 ref_* lookups into the lexicon (one category per table; idempotent like init_lexicon).
Both loaders share one row source: SQL dump when NMW_SQL_DUMP is set, else per-table CSV.

Orchestration

transfers/transfer_geothermal.py — standalone orchestrator (python -m transfers.transfer_geothermal): runs the reference→lexicon load then the mirror load. Flags TRANSFER_GEOTHERMAL_REFERENCE, TRANSFER_NMW_MIRROR, TRANSFER_LIMIT, NMW_SQL_DUMP.
transfers/transfer.py (the legacy NM_Aquifer driver) is deprecated — module docstring + DeprecationWarning; no NM_Wells wiring added to it.

Lexicon flagging (for Phase 2)

db/nmw_legacy.py exposes LEXICON_REF_BY_COLUMN (40 coded attributes → their ref_* source table; will become lexicon_term FKs / enums) and LEXICON_CANDIDATES_NO_REF (8 coded columns with no ref table → need a new category/enum). Validated: every column + ref table exists.

OGC views (pygeoapi) — 5 geothermal layers

ogc_geothermal_wells_bht — wells with bottom-hole-temperature data (NMW_GtBhtData), aggregate BHT stats.
ogc_geothermal_wells_temperature_profile — materialized + indexed (unique well_data_id, GiST geom); downhole temp-vs-depth series (NMW_GtTempDepths, ~370k source rows) as an ordered JSON array.
ogc_geothermal_wells_summary_heat_flow — summary heat-flow determinations (NMW_GtSumHeatFlow): aggregates + a measurements JSON series.
ogc_geothermal_wells_interval_heat_flow — per-interval heat-flow values (NMW_GtHeatFlow): aggregates + a measurements JSON series.
All point geometry from NMW_WellLocations Lat/Long_dd83; required-table guards; follow the existing ogc_* migration pattern.
Migrations: w9x0y1z2a3b4 (BHT + profile), x0y1z2a3b4c5 (summary heat flow), y1z2a3b4c5d6 (interval heat flow).

Design notes

Source access: NM_Wells delivered as a SQL dump. The provided NMWells.sql is schema-only (no rows) — it seeded the models/migrations; NMW_SQL_DUMP should point at a separate data dump (or use CSV exports).
FKs: legacy link GUIDs (SamplSetID, BHTGUID, IntrvlGUID, DSTGUID, DSTInterval, RecrdSetID) kept as indexed plain columns, not enforced FKs — staging fidelity.
Types: uniqueidentifier→UUID, nvarchar(n)→String(n), datetime2→DateTime, timestamp(rowversion)→dropped.
Materialized profile view: snapshot — REFRESH MATERIALIZED VIEW after a data reload.

Not in this PR

Phase-2 transform logic (flagged/documented only).
Remaining "Migrate First" tables: tbl_sources + 4 Subsurface Library tables.

Testing

All models import and mappers configure; mirror column counts verified against source DDL / field map; PKs verified against the dump.
SQL-dump parser + type coercion unit-checked (incl. escaped quotes, multi-row VALUES, UTF-16, NaN/NaT→None).
Lexicon flag map validated against models + ref tables (0 errors).
All migrations compile; single Alembic head (y1z2a3b4c5d6); pre-commit (black/flake8) green.
Not run against a live database (source data dump not yet available; OGC view SQL is exercised by CI's Postgres jobs).

Note: an internal migration-notes doc (docs/nm_wells-migration.md) is kept locally but docs/ is gitignored, so it is not part of this PR; the same notes live in the model docstrings.

🤖 Generated with Claude Code

Phase 1 of the NM_Wells -> Ocotillo migration: faithful column-for-column staging mirror of the legacy NM_Wells SQL Server DB, plus loaders. The transform into the Ocotillo model (Phase 2) is documented inline but not built. - db/nmw_legacy.py: 17 NMW_* mirror models (5 Main, 7 Geothermal, 5 DST), source column names preserved, per-column Phase-2 transform-target notes. Main columns from the planning workbook field map; Geothermal/DST columns, lengths and PKs taken directly from the SQL-dump DDL. - alembic: two migrations (Main; Geothermal+DST) chained off current head, bodies generated from model metadata. Single head. - transfers/nmw_mirror_transfer.py: data-driven CSV -> NMW_* loader with type coercion (NaN/NaT -> None, rowversion dropped), chunked ON CONFLICT upsert. Gated by TRANSFER_NMW_MIRROR (default off; separate source DB). - transfers/reference_lexicon_transfer.py: loads all 49 ref_* lookups into the lexicon (category per table), idempotent like init_lexicon; registered as a foundational transfer. - db/__init__.py, transfers/transfer.py, .env.example: wiring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Phase 1 of the NM_Wells → Ocotillo migration: introduces a 1:1 staging-mirror schema for NM_Wells legacy tables, plus transfer loaders to (a) populate the staging mirror from CSV exports and (b) load ref_* reference tables into the lexicon as foundational data.

Changes:

Add NM_Wells legacy staging mirror ORM models and Alembic migrations for Main + Geothermal/DST tables.
Add a generic, chunked mirror loader (transfer_nmw_mirror) and gate it behind TRANSFER_NMW_MIRROR (default off).
Add a foundational “reference → lexicon” loader (transfer_reference_tables) and run it as part of Phase 1 foundational transfers.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
transfers/transfer.py	Wires in new foundational reference→lexicon transfer and optional NM_Wells mirror load; adjusts foundational parallelism.
transfers/reference_lexicon_transfer.py	New transfer to ingest legacy `ref_*` lookups into lexicon categories/terms.
transfers/nmw_mirror_transfer.py	New generic CSV→NMW_* staging mirror loader with type coercion + chunked idempotent inserts.
db/nmw_legacy.py	New SQLAlchemy models for 1:1 NM_Wells staging mirror tables.
db/init.py	Exposes NM_Wells legacy models via db package import.
alembic/versions/u7v8w9x0y1z2_nmw_legacy_staging_mirror_tables.py	Migration creating the 5 “Main” staging mirror tables.
alembic/versions/v8w9x0y1z2a3_nmw_geothermal_dst_mirror_tables.py	Migration creating Geothermal + DST staging mirror tables.
.env.example	Adds `TRANSFER_NMW_MIRROR` toggle (default false).

- nmw_mirror_transfer: parse DateTime values with pd.to_datetime(errors=coerce) since read_csv does not parse_dates (avoids driver-dependent insert failures). - db/nmw_legacy: fix attribute typos (dst_operator, recov_column, resistivity) while preserving the legacy DB column names; fix latitude_dd27 comment typo. - reference_lexicon_transfer: correct stale exclusion comment (ref_date_drilled is included; only ref_nm_quads is excluded). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The SSMA_TimeStamp column is a SQL Server rowversion artifact with no value as staging data (the loader already skipped it). Remove it from the NMW_* mirror models and both migrations; drop the now-unused LargeBinary import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Confirmed source PKs from the NM_Wells SQL dump DDL: - WellHeaders/WellRecords/WellSamples have declared PRIMARY KEY constraints (WellDataID / RecrdSetID / SamplSetID) matching the models. - WellLocations and WellZDatum declare no PK, only unique indexes on OBJECTID and GlobalID. Switch WellZDatum PK from GlobalID to OBJECTID for consistency with WellLocations and safety (OBJECTID identity is never NULL; the GlobalID unique index permits one NULL). Update the migration accordingly. Remove the TODO(verify) note; PKs are now confirmed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add transfers/nmw_sql_dump.py: streams INSERT [dbo].[tbl_*] (...) VALUES (...) statements out of a SQL Server data-dump .sql file, yielding {column: value} dicts. Handles N'...' / escaped '', embedded commas/parens, CAST(expr AS type), multi-row VALUES, 0x binary -> None, and UTF-16/UTF-8 (BOM auto-detect). Refactor transfer_nmw_mirror to be source-agnostic: when NMW_SQL_DUMP points at a .sql data dump it loads from there, otherwise falls back to per-table CSVs. Same model-driven type coercion and chunked ON CONFLICT upsert for both. Note: the provided NMWells.sql is schema-only; NMW_SQL_DUMP expects a separate data dump containing INSERT statements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…er.py Move the NM_Wells (geothermal) orchestration out of transfers/transfer.py into a new standalone transfers/transfer_geothermal.py. Revert all NM_Wells wiring from transfer.py and mark that module deprecated (module docstring + DeprecationWarning in transfer_all) so new migrations get their own orchestrator. transfer_geothermal.py runs the reference->lexicon load (TRANSFER_GEOTHERMAL_REFERENCE) and the NMW_* mirror load (TRANSFER_NMW_MIRROR); both default on. Run: python -m transfers.transfer_geothermal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

reference_lexicon_transfer now selects its row source the same way as nmw_mirror_transfer: a SQL Server data dump when NMW_SQL_DUMP is set (parsed by nmw_sql_dump.iter_table_rows), otherwise per-table CSV. _pick_columns operates on a column-name list and rows are processed as dicts so both sources share one path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add LEXICON_REF_BY_COLUMN mapping every coded mirror attribute to its ref_* source table (which reference_lexicon_transfer loads as a lexicon category whose rows become terms). These 40 attributes will become lexicon_term FKs / enums in the Phase-2 transform. Add LEXICON_CANDIDATES_NO_REF for 8 coded columns that have no ref_* table and will need a new category/enum (DrillFluid, TestType, Operation, etc.). Validated: every column + ref table exists. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Remove the dead `category = table[4:]` line and fix the stale docstring; the category is nmw_<table> (e.g. nmw_ref_states). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ile) Two pygeoapi point layers over the NMW_* staging mirror, geometry from NMW_WellLocations Lat/Long_dd83: - ogc_geothermal_wells_bht: one feature per geothermal well with bottom-hole temperature data (NMW_GtBhtData), aggregate BHT stats. - ogc_geothermal_wells_temperature_profile: one feature per geothermal well with a downhole temperature-vs-depth series (NMW_GtTempDepths) as an ordered JSON array. Wells link via gt_*.SamplSetID -> NMW_WellSamples -> NMW_WellRecords -> NMW_WellLocations. Guards required tables; drops views on downgrade. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The temperature-vs-depth profile view scans/groups NMW_GtTempDepths (~370k source rows) and builds a per-well JSON series — too heavy to recompute per pygeoapi request. Convert it to a MATERIALIZED view with a unique index on well_data_id (enables REFRESH CONCURRENTLY) and a GiST index on geom. The BHT view stays a regular view (small source). REFRESH after a data reload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pygeoapi point layer ogc_geothermal_wells_heat_flow: one feature per geothermal well with summary heat-flow determinations (NMW_GtSumHeatFlow) - aggregate heat flow, thermal gradient, thermal conductivity and quality. Geometry from NMW_WellLocations; linked via NMW_GtSumHeatFlow.RecrdSetID -> NMW_WellRecords. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pygeoapi point layer ogc_geothermal_wells_interval_heat_flow from NMW_GtHeatFlow (per-interval values: Q heat flow, gradient, Kpr conductivity, Ka diffusivity), one feature per well. Distinct from ogc_geothermal_wells_heat_flow (summary, NMW_GtSumHeatFlow). Linked via IntrvlGUID -> NMW_WsIntervals -> NMW_WellSamples -> NMW_WellRecords -> NMW_WellLocations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Rename ogc_geothermal_wells_heat_flow -> ogc_geothermal_wells_summary_heat_flow. - Add a `measurements` JSON series to both heat-flow views: one element per determination/interval (depth range, heat flow, gradient, conductivity, etc.), ordered by depth, alongside the existing per-well aggregates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

When NMW_SQL_DUMP is set, the mirror now parses the dump with sqlparse (nmw_sql_dump.write_table_csv) into a CSV per table, then bulk-loads each via Postgres COPY ... FROM STDIN (truncate + COPY; Postgres casts text -> types) — far faster than row-by-row ORM inserts. CSV dir defaults to a temp dir (override NMW_CSV_DIR). The CSV-exports fallback (no dump) keeps the row-insert path. Adds sqlparse dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add refresh_materialized_views (REFRESH the geothermal materialized views, currently ogc_geothermal_wells_temperature_profile; skip any not present). The transfer_geothermal orchestrator calls it after the NMW_* mirror load so the materialized view reflects the freshly loaded data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…er-93c916 # Conflicts: # pyproject.toml

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Resolve requirements.txt conflict by regenerating from the merged uv.lock (uv export). Brings staging fixes incl. the CLI test update. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 6, 2026 22:21

Copilot started reviewing on behalf of jirhiker June 6, 2026 22:21 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Comment thread transfers/reference_lexicon_transfer.py Outdated

Comment thread transfers/nmw_mirror_transfer.py

Comment thread db/nmw_legacy.py Outdated

Comment thread db/nmw_legacy.py

Comment thread db/nmw_legacy.py Outdated

Comment thread db/nmw_legacy.py

jirhiker changed the title ~~feat(transfers): NM_Wells 1:1 staging mirror + ref-table lexicon loader~~ feat(transfers): add NM_Wells 1:1 staging mirror + ref-table lexicon loader Jun 6, 2026

jirhiker marked this pull request as draft June 7, 2026 04:21

jirhiker and others added 9 commits June 6, 2026 22:24

Formatting changes

6d7e92c

Formatting changes

cff5d52

chore(transfers): clean up _spec category derivation

a63179e

Remove the dead `category = table[4:]` line and fix the stale docstring; the category is nmw_<table> (e.g. nmw_ref_states). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jirhiker added this to the Geothermal OGC Features API milestone Jun 7, 2026

jirhiker and others added 11 commits June 7, 2026 00:39

feat(alembic): update comments for geothermal OGC views

9b8037a

Merge remote-tracking branch 'origin/staging' into claude/serene-beav…

408a927

…er-93c916 # Conflicts: # pyproject.toml

chore: regenerate requirements.txt with sqlparse after staging merge

d24667e

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge origin/staging into claude/serene-beaver-93c916

9ca82c5

Resolve requirements.txt conflict by regenerating from the merged uv.lock (uv export). Brings staging fixes incl. the CLI test update. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transfers): add NM_Wells 1:1 staging mirror + ref-table lexicon loader#686

feat(transfers): add NM_Wells 1:1 staging mirror + ref-table lexicon loader#686
jirhiker wants to merge 22 commits into
stagingfrom
claude/serene-beaver-93c916

jirhiker commented Jun 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jirhiker commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Mirror schema — db/nmw_legacy.py + 2 migrations

Loaders

Orchestration

Lexicon flagging (for Phase 2)

OGC views (pygeoapi) — 5 geothermal layers

Design notes

Not in this PR

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jirhiker commented Jun 6, 2026 •

edited

Loading

Mirror schema — `db/nmw_legacy.py` + 2 migrations