NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports by ksmuczynski · Pull Request #633 · DataIntegrationGroup/OcotilloAPI

ksmuczynski · 2026-04-01T20:57:06Z

Summary

This PR hardens the CSV importers around duplicate water well names.

It updates the well inventory importer so it:

preserves the current re-import/idempotency behavior
reuses an existing well when the row clearly matches a prior import
fails the row if a water well with the same Thing.name already exists elsewhere in the database
avoids silently creating duplicate Thing records

It also keeps the water-level importer defensive so ambiguous well lookups no longer crash the CLI and instead return a row-level validation error.

Why

This PR addresses the following problem / context:

I found that duplicate Thing.name records for water well could exist in the database, and at least one duplicate appears to have been introduced by the well inventory import path.

That caused two issues:

the well inventory importer created a second Thing with the same well name
the water-level importer crashed with MultipleResultsFound when trying to resolve a well by name

How

Implementation summary - the following was changed / added / removed:

Added a database-wide duplicate-name check in the well inventory import flow after the existing re-import detection logic
Fail the row with a clear well_name_point_id validation error when the well already exists
Kept the water-level duplicate-well guard so ambiguous lookups fail gracefully instead of aborting the command
Added regression coverage for both behaviors

Why This Approach

The well inventory importer still needs to support safe reruns of the same import, so the existing _find_existing_imported_well() logic is evaluated first.

Only if the row does not match a prior import do we check whether a well with the same name already exists in the database and block creation.

That gives us:

idempotent re-imports
no silent duplicate Thing creation
clearer operator-facing errors

Notes

Any special considerations, workarounds, or follow-up work to note?

This change does not clean up existing duplicate Thing.name rows already in the database. Any pre-existing duplicates should be identified and resolved separately.
This PR intentionally treats well inventory as a create-oriented importer, not a general update/upsert workflow for existing wells. If updating existing wells via CSV is needed, that should be scoped as a separate feature with explicit merge rules.

…orts Preserve well inventory re-import idempotency while blocking creation of a new water well when the same `Thing.name` already exists in the database. Also keep the water-level importer defensive against ambiguous well lookups so duplicate well names produce row-level validation errors instead of crashing the CLI with `MultipleResultsFound`. This reduces the risk of creating duplicate `Thing` records and makes import failures clearer for operators.

Copilot

Pull request overview

Hardens the well inventory and water-level CSV import flows against duplicate Thing.name collisions for water well records, preventing silent duplicate creation and ensuring ambiguous well lookups fail as row-level validation errors instead of crashing.

Changes:

Well inventory import: after existing “re-import/idempotency” detection, block creation when a water well with the same Thing.name already exists elsewhere in the DB.
Water-level import: catch MultipleResultsFound during well resolution and surface a row-level validation error instead of aborting.
Adds regression tests covering both the “existing DB well name” inventory case and the “duplicate name matches” water-level case.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`services/well_inventory_csv.py`	Adds DB-wide duplicate-name guard and maps the resulting row error to `well_name_point_id`.
`services/water_level_csv.py`	Catches `MultipleResultsFound` from ambiguous well-name lookups and records a row-level validation error.
`tests/test_well_inventory.py`	Adds coverage asserting inventory import fails when the well name already exists in the DB.
`tests/test_water_level_csv_service.py`	Adds coverage ensuring water-level import reports duplicate well-name matches instead of crashing.

Copilot · 2026-04-01T21:01:26Z

+
+        with session_ctx() as session:
+            session.add(Thing(name=row["well_name_point_id"], thing_type="water well"))
+            session.commit()
+
+        file_path = tmp_path / "well-inventory-existing-db-well.csv"
+        with file_path.open("w", encoding="utf-8", newline="") as f:
+            writer = csv.DictWriter(f, fieldnames=list(row.keys()))
+            writer.writeheader()
+            writer.writerow(row)
+
+        result = well_inventory_csv(file_path)
+
+        assert result.exit_code == 1, result.stderr
+        errors = result.payload.get("validation_errors", [])
+        assert errors
+        assert errors[0]["field"] == "well_name_point_id"
+        assert (
+            errors[0]["error"]
+            == "Well already exists in database for well_name_point_id 'TEST-0001'"
+        )
+
+        with session_ctx() as session:
+            things = (
+                session.query(Thing)
+                .filter(
+                    Thing.name == row["well_name_point_id"],
+                    Thing.thing_type == "water well",
+                )
+                .all()
+            )
+            assert len(things) == 1


This test inserts a Thing into the shared session-scoped test database but never deletes it. Because the test suite shares DB state across tests (see tests/conftest.py session fixture), leaving this row behind can make other well-inventory tests flaky by causing unexpected name collisions. Clean up the created Thing (e.g., capture its id and delete it in a finally, or use a fixture that yields and deletes).

Suggested change

with session_ctx() as session:

session.add(Thing(name=row["well_name_point_id"], thing_type="water well"))

session.commit()

file_path = tmp_path / "well-inventory-existing-db-well.csv"

with file_path.open("w", encoding="utf-8", newline="") as f:

writer = csv.DictWriter(f, fieldnames=list(row.keys()))

writer.writeheader()

writer.writerow(row)

result = well_inventory_csv(file_path)

assert result.exit_code == 1, result.stderr

errors = result.payload.get("validation_errors", [])

assert errors

assert errors[0]["field"] == "well_name_point_id"

assert (

errors[0]["error"]

== "Well already exists in database for well_name_point_id 'TEST-0001'"

)

with session_ctx() as session:

things = (

session.query(Thing)

.filter(

Thing.name == row["well_name_point_id"],

Thing.thing_type == "water well",

)

.all()

)

assert len(things) == 1

thing_name = row["well_name_point_id"]

try:

with session_ctx() as session:

session.add(Thing(name=thing_name, thing_type="water well"))

session.commit()

file_path = tmp_path / "well-inventory-existing-db-well.csv"

with file_path.open("w", encoding="utf-8", newline="") as f:

writer = csv.DictWriter(f, fieldnames=list(row.keys()))

writer.writeheader()

writer.writerow(row)

result = well_inventory_csv(file_path)

assert result.exit_code == 1, result.stderr

errors = result.payload.get("validation_errors", [])

assert errors

assert errors[0]["field"] == "well_name_point_id"

assert (

errors[0]["error"]

== "Well already exists in database for well_name_point_id 'TEST-0001'"

)

with session_ctx() as session:

things = (

session.query(Thing)

.filter(

Thing.name == thing_name,

Thing.thing_type == "water well",

)

.all()

)

assert len(things) == 1

finally:

# Clean up the Thing created for this test to avoid leaking state

with session_ctx() as session:

(

session.query(Thing)

.filter(

Thing.name == thing_name,

Thing.thing_type == "water well",

)

.delete(synchronize_session=False)

)

session.commit()

Copilot · 2026-04-01T21:01:26Z

+                            if error_text.startswith(
+                                "Well already exists in database for well_name_point_id "
+                            ):
+                                field = "well_name_point_id"
+                            else:
+                                field = _extract_field_from_value_error(error_text)


Mapping the validation field by checking error_text.startswith(...) is brittle and couples control flow to an exact error-message string. Consider raising a structured/typed exception (or returning a structured error) that carries the target field explicitly, so future message wording changes don’t silently regress field attribution.

Suggested change

if error_text.startswith(

"Well already exists in database for well_name_point_id "

):

field = "well_name_point_id"

else:

field = _extract_field_from_value_error(error_text)

field = _extract_field_from_value_error(error_text)

jirhiker

both of changes seem related to querying the db to check for an existing well. this should be a shared function

ksmuczynski · 2026-04-08T02:38:21Z

There are two related checks here:

well-inventory-specific re-import detection via _find_existing_imported_well()
the generic database lookup for an existing water well by Thing.name

I think only the second one should be shared.

_find_existing_imported_well() is importer-specific idempotency logic. It is intentionally checking for evidence that the row matches a prior well-inventory import, which is different from a general “does a well with this
name already exist?” query.

Example for _find_existing_imported_well():

a prior well inventory import already created well TEST-0001
that import also created the expected well inventory field event on the same inventory datetime, or the expected groundwater-level sample name for the same measurement datetime
in that case, a rerun of the same inventory row should reuse the existing imported well and succeed

That is different from this case:

a Thing named TEST-0001 already exists in the database
but it was created by some other workflow, or the expected inventory lineage does not match
in that case, the row should not be treated as a safe rerun

The overlap between these changes is the generic Thing.name == well_name_point_id and thing_type == "water well" lookup. I agree that should be extracted into a shared helper and then each importer can apply its own
behavior to the result set.

Example for the generic lookup in water-level import:

CSV row references TEST-0001
if 0 wells match, report Unknown well_name_point_id
if 1 well matches, attach the measurement to that well
if 2+ wells match, report that multiple wells were found and fail the row

Example for the generic lookup in well-inventory import:

after _find_existing_imported_well() does not find a prior import match
if any water well already exists with Thing.name = TEST-0001
fail the row with “well already exists in the database” rather than creating a duplicate Thing

So I agree the generic water-well lookup should be shared, but I would keep _find_existing_imported_well() separate because it encodes well-inventory-specific rerun behavior rather than just a name-based existence check.

I can make that refactor in a follow-up commit.

ksmuczynski · 2026-04-08T15:17:11Z

@jirhiker Okay, I added a helper find_water_wells_by_name(...) in services/thing_helper.py. It is reused in the well inventory and water-level CSV importers in the following way:

water-level importer (services/water_level_csv.py): uses it for 0 / 1 / many resolution
well inventory importer (services/well_inventory_csv.py): uses it only for the generic “already exists in DB” check after _find_existing_imported_well(...)

So the shared query is centralized, but the two importer-specific behaviors remain separate.

Extract the generic water-well lookup by `Thing.name` into a shared helper and reuse it in the well inventory and water-level CSV importers. Keep importer-specific behavior separate: - water-level import still distinguishes unknown vs ambiguous matches - well-inventory import still applies its own re-import detection before enforcing the duplicate-name check

jacob-a-brown · 2026-04-08T15:17:46Z

+                            if error_text.startswith(
+                                "Well already exists in database for well_name_point_id "


Since this string is used in multiple places I would recommend creating a global constant at the beginning and then inserting it into f-strings wherever it is used. This way you don't have to worry about having the text correspond between its different invocations and can easily change it in the future. Something like

WELL_EXISTS_SUBSTRING = "Well already in database for well_name_point_id" ... ... ... if error_text.startswith(WELL_EXISTS_SUBSTRING) ... ... raise ValueError(f"{WELL_EXISTS_SUBSTRING} `{model.well_name_point_id}`")

jacob-a-brown · 2026-04-08T15:32:46Z

    )
+
+
+def test_bulk_upload_water_levels_reports_duplicate_well_name_matches():


This test raises what appears to be an omission in the DB. Shouldn't names be unique? That would prevent this error from even occuring. There's even a TODO above the name column:

# TODO: should name be unique?

ksmuczynski requested review from Copilot, jacob-a-brown and jirhiker April 1, 2026 20:57

Copilot started reviewing on behalf of ksmuczynski April 1, 2026 20:57 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

jirhiker requested changes Apr 6, 2026

View reviewed changes

ksmuczynski requested a review from jirhiker April 8, 2026 15:17

jacob-a-brown reviewed Apr 8, 2026

View reviewed changes

jirhiker approved these changes Apr 8, 2026

View reviewed changes

jirhiker merged commit 8cd1103 into staging Apr 13, 2026
8 checks passed

jirhiker deleted the kas-reinforce-duplicate-logic-check branch April 13, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports#633

NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports#633
jirhiker merged 2 commits into
stagingfrom
kas-reinforce-duplicate-logic-check

ksmuczynski commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

jirhiker left a comment

Uh oh!

ksmuczynski commented Apr 8, 2026

Uh oh!

ksmuczynski commented Apr 8, 2026

Uh oh!

jacob-a-brown Apr 8, 2026

Uh oh!

jacob-a-brown Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if error_text.startswith(
		"Well already exists in database for well_name_point_id "

		)


		def test_bulk_upload_water_levels_reports_duplicate_well_name_matches():

Conversation

ksmuczynski commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How

Why This Approach

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jirhiker left a comment

Choose a reason for hiding this comment

Uh oh!

ksmuczynski commented Apr 8, 2026

Uh oh!

ksmuczynski commented Apr 8, 2026

Uh oh!

jacob-a-brown Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jacob-a-brown Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ksmuczynski commented Apr 1, 2026 •

edited

Loading