Skip to content

BDMS-787 site_name script#668

Merged
jeremyzilar merged 3 commits into
stagingfrom
BDMS-787-site_name-script
May 1, 2026
Merged

BDMS-787 site_name script#668
jeremyzilar merged 3 commits into
stagingfrom
BDMS-787-site_name-script

Conversation

@jeremyzilar

Copy link
Copy Markdown
Contributor

What is changing

  • The site_name field on well detail pages was always returning null because the legacy AMP Location table has a SiteNames column that was never included in the data transfer.
  • site_name is derived from the alternate_ids array by finding the entry where alternate_organization == "NMBGMR" and returning its alternate_id value. No such entries existed in the database.
  • This PR adds a one-time migration script (transfers/migrate_nmbgmr_site_names.py) that reads SiteNames from the legacy Location.csv and inserts the missing ThingIdLink rows with alternate_organization = "NMBGMR".

What the script does

  • Reads PointID and SiteNames from transfers/data/nma_csv_cache/Location.csv
  • Filters to rows with a non-null, non-empty SiteNames value (31,775 rows in the source)
  • Matches each PointID to a Thing in the database (8,487 matched locally)
  • Inserts a ThingIdLink row per match with alternate_organization = "NMBGMR" and alternate_id = SiteNames
  • Skips rows that already exist, so it is safe to re-run after future well transfers

Test plan

  • Confirm site_name returns the correct value for WL-0029 (Zwager domestic), RA-077 (Swingle Domestic), and AR-0056 (McDaniel Irrigation) after running the script on staging
  • Run the script on staging: python -m transfers.migrate_nmbgmr_site_names
  • Verify the script is idempotent by running it a second time and confirming it inserts 0 new rows
  • Run on production after staging is confirmed

jeremyzilar and others added 2 commits May 1, 2026 11:15
The legacy Location.csv has a SiteNames column that was never transferred
into the ThingIdLink table. This left site_name null for all wells in the
API response. The script reads SiteNames from the CSV and inserts NMBGMR
ThingIdLink rows for all matched wells. It is idempotent and safe to re-run
after future well transfers.

@jacob-a-brown jacob-a-brown left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me, though see comment below about using LocationId/nma_pk_location instead of PointID. I read through transfers/link_ids_transfer.py and the SiteNames column is never addressed.

@jacob-a-brown jacob-a-brown left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using LocationId from the Locations table and match it to nma_pk_location in the Thing table to get the correct id since that is the primary key from NM_Aquifer and should therefore have more fidelity than PointID (there are some non-unique PointIDs in Locations)

something like

with session_ctx() as session:
    # Build a LocationId -> thing_id map for all matching wells in one query.
    location_ids = df["LocationId"].tolist()
    thing_id_by_location_id: dict[str, int] = {
        location_id: thing_id
        for location_id, thing_id in session.execute(
            select(Thing.nma_pk_location, Thing.id).where(Thing.nma_pk_location.in_(location_ids))
        ).all()
    }

then use LocationId subsequently instead of PointID. Something like

# Build candidate rows.
candidates: list[dict] = []
for row in df.itertuples(index=False):
    thing_id = thing_id_by_location_id.get(row.LocationId)
    if thing_id is None:
        continue
    candidates.append(
        {
            "thing_id": thing_id,
            "relation": RELATION,
            "alternate_id": row.SiteNames,
            "alternate_organization": ALTERNATE_ORGANIZATION,
            "release_status": RELEASE_STATUS,
        }
    )

PointID is not unique across all rows in Location.csv (MB-1005 appears
twice with different SiteNames). Switch to matching LocationId against
Thing.nma_pk_location, which is the UUID primary key from NM_Aquifer
and has higher fidelity. Suggested by jacob-a-brown in PR #668.
@jeremyzilar

Copy link
Copy Markdown
Contributor Author

Thank you @jacob-a-brown 🎉
I fixed it and then ran it again locally, and it looks like only one record out of the 8,487 records had a different location when using the old PointID approach (MB-1005 was the duplicate).

@jeremyzilar jeremyzilar requested a review from jacob-a-brown May 1, 2026 16:04

@ksmuczynski ksmuczynski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @jacob-a-brown for the comment re: the Locations table and the nma_pk field, I was thinking this, too.

@jeremyzilar jeremyzilar merged commit 55f593c into staging May 1, 2026
8 checks passed
@jeremyzilar jeremyzilar deleted the BDMS-787-site_name-script branch May 1, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants