Fix TCTracks.from_FAST duplicate loading from year loop#1269
Fix TCTracks.from_FAST duplicate loading from year loop#1269simonameiler wants to merge 2 commits intodevelopfrom
Conversation
NicolasColombi
left a comment
There was a problem hiding this comment.
Hi Simona, thanks for catching this and proposing to fix it! 🙌
As mentioned in the review, my only remark is on the test file, I believe it would be less cumbersome to have a single file and single test.
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| ds = xr.Dataset( | ||
| { | ||
| "lon_trks": ( | ||
| ("n_trk", "time"), | ||
| np.array( | ||
| [ | ||
| [290.0, 291.0, 292.0], | ||
| [300.0, 301.0, 302.0], | ||
| ], | ||
| dtype=float, | ||
| ), | ||
| ), | ||
| "lat_trks": ( | ||
| ("n_trk", "time"), | ||
| np.array( | ||
| [ | ||
| [10.0, 10.5, 11.0], | ||
| [15.0, 15.5, 16.0], | ||
| ], | ||
| dtype=float, | ||
| ), | ||
| ), | ||
| "vmax_trks": ( | ||
| ("n_trk", "time"), | ||
| np.array( | ||
| [ | ||
| [20.0, 21.0, 22.0], | ||
| [25.0, 26.0, 27.0], | ||
| ], | ||
| dtype=float, | ||
| ), | ||
| ), | ||
| "tc_month": ("n_trk", np.array([8, 9], dtype=np.int64)), | ||
| "tc_basins": ("n_trk", np.array(["NA", "NA"], dtype="<U2")), | ||
| "tc_years": ("n_trk", np.array([1998, 1999], dtype=np.int64)), | ||
| "seeds_per_month": ( | ||
| ("year", "basin", "month"), | ||
| np.zeros((4, 1, 12), dtype=float), | ||
| ), | ||
| }, | ||
| coords={ | ||
| "n_trk": ("n_trk", np.array([0, 1], dtype=np.int64)), | ||
| "time": ("time", np.array([0, 10800, 21600], dtype=float)), | ||
| "year": ( | ||
| "year", | ||
| np.array([1998, 1999, 2000, 2001], dtype=np.int64), | ||
| ), | ||
| "basin": ("basin", np.array(["NA"], dtype="<U2")), | ||
| "month": ("month", np.arange(1, 13, dtype=np.int64)), | ||
| }, | ||
| ) | ||
|
|
||
| path = DATA_DIR.joinpath(tmpdir, "fast_regression.nc") | ||
| ds.to_netcdf(path) |
There was a problem hiding this comment.
Since there is already a test file named FAST_test_tracks.nc, with only one year (hence the test did not reveled the bug you are suggesting to fix), it might be a good idea to update such file to contain 2 years, and then test the updated from_FAST function on such file with only one test. I do not think this would be an issue size wise, as it is only 62KB at the moment. This would require, either reproducing the small file for two years, or simply fabricating such file by duplicating the existing one, concatenate it, and manually modify the second year number. Lastly, you will need to update the original test to capture this.
This way I think we can have a single file and single test, reducing the code, since the majority of the code in your test is there to create a temporary file.
There was a problem hiding this comment.
Thanks for the suggestion, this makes a lot of sense.
I’ve updated the existing fixture file (FAST_test_tracks.nc) to include two years (year = [2025, 2026]) by duplicating the seeds_per_month data along the year dimension. The track-related variables (tc_years, lon_trks, etc.) remain unchanged, so the file structure stays consistent with the original intent.
With this update, I removed the separate regression test and its temporary-file setup. The existing assertion in test_from_FAST:
self.assertEqual(len(tc_track.data), 5)
now ensures that tracks are not duplicated when a year dimension is present. With two years present, the previous buggy implementation would have returned 10 tracks instead of 5.
This keeps the test setup simpler and avoids duplicating logic for temporary file creation.
…egression test The test fixture FAST_test_tracks.nc now has year=[2025,2026] (only seeds_per_month is extended; track variables retain their n_trk dim). The existing len(tc_track.data)==5 assertion now acts as the regression check: the buggy year-loop code would return 5x2=10 tracks. The separate test_from_FAST_not_multiplied_by_year_dim (with its temporary-file scaffolding) is removed.
NicolasColombi
left a comment
There was a problem hiding this comment.
Thanks for fixing this Simona! 🙌 This is ready to merge from my side.
@chahank do we have your blessing ?
chahank
left a comment
There was a problem hiding this comment.
Please update the CHANGELOG file : https://github.com/CLIMADA-project/climada_python/blob/main/CHANGELOG.md
|
Thanks for the bugfix! |
Changes proposed in this PR:
This PR fixes duplicate track creation in
TCTracks.from_FASTfor FAST NetCDF files containing bothn_trkandyeardimensions.Root cause:
from_FASTiterated over bothdataset.yearanddataset.n_trk,then selecteddataset.sel(n_trk=i, year=year).For FAST files where track variables are indexed by
n_trkandtime,this repeats each track once per year index.Changes:
In climada/hazard/tc_tracks.py:
removed outer loop over
dataset.yeariterate only over
dataset.n_trkselect tracks with
dataset.sel(n_trk=i)Tests:
Added regression test
test_from_FAST_not_multiplied_by_year_dimin climada/hazard/test/test_tc_tracks.py.Test builds a tiny temporary FAST-like NetCDF with both
n_trkandyeardims and checks that output size equalsn_trk(notn_trk * n_year).Validation
Ran:
python -m unittest climada.hazard.test.test_tc_tracks.TestIO.test_from_FAST
python -m unittest climada.hazard.test.test_tc_tracks.TestIO.test_from_FAST_not_multiplied_by_year_dim
Both passed locally.
This PR fixes #1268
PR Author Checklist
develop)PR Reviewer Checklist