Skip to content

Add clean_name utility and fallback_func support to lookup#1

Draft
jcarbaugh wants to merge 1 commit into
mainfrom
APR-30-add-clean-name-fallback-func
Draft

Add clean_name utility and fallback_func support to lookup#1
jcarbaugh wants to merge 1 commit into
mainfrom
APR-30-add-clean-name-fallback-func

Conversation

@jcarbaugh
Copy link
Copy Markdown
Member

Summary

Adds two related improvements to us.states for handling messy, free-form state input.

clean_name

A new utility that normalizes free-form text down to a bare state name. It lowercases the input, strips punctuation, removes the filler words the, state, commonwealth, and of, and collapses whitespace before recombining tokens into a single space-separated string.

>>> us.states.clean_name(' The state OF idaho ')
'idaho'
>>> us.states.lookup(us.states.clean_name('Commonwealth of Virginia'))
<State:Virginia>

clean_name is a standalone normalizer — it is not auto-wired into lookup, so existing callers are unaffected. Callers combine the two explicitly.

fallback_func argument on lookup

lookup now accepts an optional fallback_func. When the built-in FIPS / abbreviation / metaphone matching finds nothing, the fallback is called with the original, untransformed lookup value and its return value is used as the result. This lets callers plug in their own match logic. Fallback results are not cached.

A startswith_fallback helper is included as a ready-made fallback_func: it returns the first state or territory whose name starts with the given value (case-insensitive). An empty string matches nothing.

>>> us.states.lookup('verm', fallback_func=us.states.startswith_fallback)
<State:Vermont>

Testing

  • 12 new tests in us/tests/test_us.py covering clean_name, the fallback_func behavior, and startswith_fallback.
  • Full suite: 38 passed, 1 skipped. ruff check clean.

Notes

  • README and CHANGELOG updated.

Linear issue: https://linear.app/apricotdotcool/issue/APR-30/add-clean-name-utility-and-fallback-func-support-to-lookup

Add a `clean_name` helper that normalizes free-form text down to a bare
state name by stripping punctuation and the filler words "the", "state",
"commonwealth", and "of", then collapsing whitespace.

Add a `fallback_func` argument to `lookup`. When no built-in match is
found, the fallback is called with the original lookup value so callers
can supply custom match logic. Include a `startswith_fallback` helper
that matches on a state name prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jcarbaugh
Copy link
Copy Markdown
Member Author

Draft PR for review. Two additions to us/states.py, both backward-compatible.

What to focus on:

  • lookup regression surface — the new fallback_func defaults to None and only runs after the existing match loop finds nothing, so existing callers are unaffected. original_val is captured before the metaphone transform so the fallback receives the caller's raw input, not the transformed value.
  • clean_name normalization — punctuation is replaced with spaces (not deleted) so North-Dakota becomes north dakota rather than northdakota. Stop words are the, state, commonwealth, of.
  • startswith_fallback ambiguity — short prefixes can match multiple states; first match wins by design. Empty string returns None.
  • Caching — fallback results are intentionally not written to _lookup_cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant