BDMS-787 site_name script#668
Conversation
The legacy Location.csv has a SiteNames column that was never transferred into the ThingIdLink table. This left site_name null for all wells in the API response. The script reads SiteNames from the CSV and inserts NMBGMR ThingIdLink rows for all matched wells. It is idempotent and safe to re-run after future well transfers.
There was a problem hiding this comment.
Consider using LocationId from the Locations table and match it to nma_pk_location in the Thing table to get the correct id since that is the primary key from NM_Aquifer and should therefore have more fidelity than PointID (there are some non-unique PointIDs in Locations)
something like
with session_ctx() as session:
# Build a LocationId -> thing_id map for all matching wells in one query.
location_ids = df["LocationId"].tolist()
thing_id_by_location_id: dict[str, int] = {
location_id: thing_id
for location_id, thing_id in session.execute(
select(Thing.nma_pk_location, Thing.id).where(Thing.nma_pk_location.in_(location_ids))
).all()
}then use LocationId subsequently instead of PointID. Something like
# Build candidate rows.
candidates: list[dict] = []
for row in df.itertuples(index=False):
thing_id = thing_id_by_location_id.get(row.LocationId)
if thing_id is None:
continue
candidates.append(
{
"thing_id": thing_id,
"relation": RELATION,
"alternate_id": row.SiteNames,
"alternate_organization": ALTERNATE_ORGANIZATION,
"release_status": RELEASE_STATUS,
}
)PointID is not unique across all rows in Location.csv (MB-1005 appears twice with different SiteNames). Switch to matching LocationId against Thing.nma_pk_location, which is the UUID primary key from NM_Aquifer and has higher fidelity. Suggested by jacob-a-brown in PR #668.
|
Thank you @jacob-a-brown 🎉 |
ksmuczynski
left a comment
There was a problem hiding this comment.
Looks good! Thanks @jacob-a-brown for the comment re: the Locations table and the nma_pk field, I was thinking this, too.
What is changing
site_namefield on well detail pages was always returningnullbecause the legacy AMPLocationtable has aSiteNamescolumn that was never included in the data transfer.site_nameis derived from thealternate_idsarray by finding the entry wherealternate_organization == "NMBGMR"and returning itsalternate_idvalue. No such entries existed in the database.transfers/migrate_nmbgmr_site_names.py) that readsSiteNamesfrom the legacyLocation.csvand inserts the missingThingIdLinkrows withalternate_organization = "NMBGMR".What the script does
PointIDandSiteNamesfromtransfers/data/nma_csv_cache/Location.csvSiteNamesvalue (31,775 rows in the source)PointIDto aThingin the database (8,487 matched locally)ThingIdLinkrow per match withalternate_organization = "NMBGMR"andalternate_id = SiteNamesTest plan
site_namereturns the correct value forWL-0029(Zwager domestic),RA-077(Swingle Domestic), andAR-0056(McDaniel Irrigation) after running the script on stagingpython -m transfers.migrate_nmbgmr_site_names