BDMS-787 site_name script by jeremyzilar · Pull Request #668 · DataIntegrationGroup/OcotilloAPI

jeremyzilar · 2026-05-01T15:17:46Z

What is changing

The site_name field on well detail pages was always returning null because the legacy AMP Location table has a SiteNames column that was never included in the data transfer.
site_name is derived from the alternate_ids array by finding the entry where alternate_organization == "NMBGMR" and returning its alternate_id value. No such entries existed in the database.
This PR adds a one-time migration script (transfers/migrate_nmbgmr_site_names.py) that reads SiteNames from the legacy Location.csv and inserts the missing ThingIdLink rows with alternate_organization = "NMBGMR".

What the script does

Reads PointID and SiteNames from transfers/data/nma_csv_cache/Location.csv
Filters to rows with a non-null, non-empty SiteNames value (31,775 rows in the source)
Matches each PointID to a Thing in the database (8,487 matched locally)
Inserts a ThingIdLink row per match with alternate_organization = "NMBGMR" and alternate_id = SiteNames
Skips rows that already exist, so it is safe to re-run after future well transfers

Test plan

Confirm site_name returns the correct value for WL-0029 (Zwager domestic), RA-077 (Swingle Domestic), and AR-0056 (McDaniel Irrigation) after running the script on staging
Run the script on staging: python -m transfers.migrate_nmbgmr_site_names
Verify the script is idempotent by running it a second time and confirming it inserts 0 new rows
Run on production after staging is confirmed

The legacy Location.csv has a SiteNames column that was never transferred into the ThingIdLink table. This left site_name null for all wells in the API response. The script reads SiteNames from the CSV and inserts NMBGMR ThingIdLink rows for all matched wells. It is idempotent and safe to re-run after future well transfers.

jacob-a-brown

Overall this looks good to me, though see comment below about using LocationId/nma_pk_location instead of PointID. I read through transfers/link_ids_transfer.py and the SiteNames column is never addressed.

jacob-a-brown

Consider using LocationId from the Locations table and match it to nma_pk_location in the Thing table to get the correct id since that is the primary key from NM_Aquifer and should therefore have more fidelity than PointID (there are some non-unique PointIDs in Locations)

something like

with session_ctx() as session:
    # Build a LocationId -> thing_id map for all matching wells in one query.
    location_ids = df["LocationId"].tolist()
    thing_id_by_location_id: dict[str, int] = {
        location_id: thing_id
        for location_id, thing_id in session.execute(
            select(Thing.nma_pk_location, Thing.id).where(Thing.nma_pk_location.in_(location_ids))
        ).all()
    }

then use LocationId subsequently instead of PointID. Something like

# Build candidate rows.
candidates: list[dict] = []
for row in df.itertuples(index=False):
    thing_id = thing_id_by_location_id.get(row.LocationId)
    if thing_id is None:
        continue
    candidates.append(
        {
            "thing_id": thing_id,
            "relation": RELATION,
            "alternate_id": row.SiteNames,
            "alternate_organization": ALTERNATE_ORGANIZATION,
            "release_status": RELEASE_STATUS,
        }
    )

PointID is not unique across all rows in Location.csv (MB-1005 appears twice with different SiteNames). Switch to matching LocationId against Thing.nma_pk_location, which is the UUID primary key from NM_Aquifer and has higher fidelity. Suggested by jacob-a-brown in PR #668.

jeremyzilar · 2026-05-01T16:04:33Z

Thank you @jacob-a-brown 🎉
I fixed it and then ran it again locally, and it looks like only one record out of the 8,487 records had a different location when using the old PointID approach (MB-1005 was the duplicate).

ksmuczynski

Looks good! Thanks @jacob-a-brown for the comment re: the Locations table and the nma_pk field, I was thinking this, too.

jeremyzilar and others added 2 commits May 1, 2026 11:15

Formatting changes

e669892

jeremyzilar self-assigned this May 1, 2026

jeremyzilar requested review from jacob-a-brown, jirhiker and ksmuczynski May 1, 2026 15:18

jacob-a-brown reviewed May 1, 2026

View reviewed changes

jeremyzilar requested a review from jacob-a-brown May 1, 2026 16:04

jacob-a-brown approved these changes May 1, 2026

View reviewed changes

ksmuczynski approved these changes May 1, 2026

View reviewed changes

jeremyzilar merged commit 55f593c into staging May 1, 2026
8 checks passed

jeremyzilar deleted the BDMS-787-site_name-script branch May 1, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BDMS-787 site_name script#668

BDMS-787 site_name script#668
jeremyzilar merged 3 commits into
stagingfrom
BDMS-787-site_name-script

jeremyzilar commented May 1, 2026

Uh oh!

jacob-a-brown left a comment •

edited

Loading

Uh oh!

jacob-a-brown left a comment •

edited

Loading

Uh oh!

jeremyzilar commented May 1, 2026

Uh oh!

ksmuczynski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeremyzilar commented May 1, 2026

What is changing

What the script does

Test plan

Uh oh!

jacob-a-brown left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacob-a-brown left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremyzilar commented May 1, 2026

Uh oh!

ksmuczynski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacob-a-brown left a comment •

edited

Loading

jacob-a-brown left a comment •

edited

Loading