Small follow-ups from v4 audit#312
Conversation
vahid-ahmadi
left a comment
There was a problem hiding this comment.
Three independently useful follow-ups, all correctly scoped. Approve in spirit; one substantive comment and a couple of nits.
Substantive: the script duplicates manifest discovery instead of reusing get_release_manifest.
provenance/manifest.py already exposes get_release_manifest(country_id), which loads the bundled JSON via importlib.resources.files("policyengine").joinpath("data", "release_manifests", ...). The script instead computes MANIFEST_DIR = Path(__file__).resolve().parent.parent / "src" / "policyengine" / ... and re-validates the JSON itself.
That works in a source checkout but won't work post-install — the src/ walk won't find anything in a wheel-installed package. Switching to get_release_manifest(country) makes the script work from anywhere and removes the duplicated discovery logic.
The reason it's structured this way is the drift test (test__check_data_staleness_script_detects_drift) reassigns module.MANIFEST_DIR = tmp_manifest_dir. If the script used get_release_manifest, the test would need to monkey-patch the helper instead — slightly more code, but the runtime behaviour gets a lot more robust. Worth the swap.
Smaller comments
COUNTRIES = ("us", "uk")is hardcoded. If a third country is ever bundled, this also needs editing. Could just list*.jsonunder the manifest dir; not blocking.- The STALE verdict ends with "consider a release-bundle refresh" regardless of direction. If the installed package is older than the pinned one (engineer working in an older env), refresh is the wrong advice — the verdict ideally distinguishes installed > pin (refresh bundle) vs installed < pin (upgrade local package). Cheap to add, optional.
find_spec("policyengine_us")andmetadata.version("policyengine-us")answer different questions (import vs distribution). Edge case where a vendored copy is on PYTHONPATH but no dist is installed will throwPackageNotFoundErrorfrommetadata.version. Wrap it or accept the rare crash; either is fine.tests/test_small_follow_ups.pyis a fine bin for these three but reads as a sediment file. If a fourth small follow-up shows up, prefer splitting by topic (test_simulation_save.py, test_check_data_staleness.py) rather than growing this file.
Verified correct
- The
save()fix is the right shape — explicitValueErrorwith the actionable next step. - The
load()docstring honestly retracts the round-trip claim. Good. - Script exit code mirrors "any country stale", suitable for CI gating as advertised.
- The drift test exercises the failure path cleanly without needing to break the installed environment.
LGTM once the manifest-discovery swap is in (or you've decided to keep the test-friendly version with a comment explaining why).
|
Addressed Vahid's manifest-discovery review in 49a42fd: Verification:
|
49a42fd to
a77d7b4
Compare
|
Rebased the PR onto current main in a77d7b4 and kept the manifest-discovery fix from Vahid's review. This version loads release manifests via the packaged get_release_manifest() path, so it should work from an installed wheel instead of relying on a source-tree src/ path.\n\nLocal verification on the clean rebased worktree:\n- uv run --extra dev python -m pytest -q tests/test_small_follow_ups.py -> 3 passed\n- uv run --extra dev ruff check scripts/check_data_staleness.py tests/test_small_follow_ups.py src/policyengine/tax_benefit_models/common/model_version.py -> passed\n- uv run --extra dev ruff format --check scripts/check_data_staleness.py tests/test_small_follow_ups.py src/policyengine/tax_benefit_models/common/model_version.py -> passed\n- uv run --extra dev python scripts/check_data_staleness.py -> US/UK ok against bundled pins\n\nI intentionally did not carry over the old uv.lock churn; current main already pins and locks policyengine-us==1.687.0, matching the bundled US manifest. |
Three small quality-of-life items the v4 audit flagged:
1.
Simulation.save()unhelpful errorBefore:
AttributeError: 'NoneType' object has no attribute 'save'After:
ValueError: Simulation.save() called with no output_dataset. Run simulation.run() or simulation.ensure() first so there is something to persist.One-line fix in
common/model_version.py.2.
scripts/check_data_staleness.py~60 lines of Python. Prints one verdict line per country comparing the bundled release manifest's
model_package.versionagainst the installed country package. Exits non-zero when any country is stale so CI can gate on it. No network access; reads both sides locally viaimportlib.metadata+ the bundled JSON.Example output:
Targeted fix for the "is our data stale?" pain flagged in #311. Doesn't require the larger storage-substrate design to ship.
3. Clarified
load()docstringThe old docstring claimed
save()+load()round-tripcreated_at/updated_at. They don't —save()doesn't persist them;load()reads filesystemctime/mtimeinstead. Docstring now says so.Tests
tests/test_small_follow_ups.py— 3 tests:save()raisesValueErrorwith "no output_dataset" in the messagecheck_data_stalenessscript emits one line per countrycheck_country()test against a tmp manifest with a drifted pin430/430 existing tests still pass.
🤖 Generated with Claude Code