Skip to content

Fix TCTracks.from_FAST duplicate loading from year loop#1269

Open
simonameiler wants to merge 2 commits intodevelopfrom
fix/fast-loader-duplication
Open

Fix TCTracks.from_FAST duplicate loading from year loop#1269
simonameiler wants to merge 2 commits intodevelopfrom
fix/fast-loader-duplication

Conversation

@simonameiler
Copy link
Copy Markdown
Collaborator

Changes proposed in this PR:
This PR fixes duplicate track creation in TCTracks.from_FAST for FAST NetCDF files containing both n_trk and year dimensions.

Root cause:
from_FAST iterated over both dataset.year and dataset.n_trk, then selected dataset.sel(n_trk=i, year=year).
For FAST files where track variables are indexed by n_trk and time, this repeats each track once per year index.

Changes:
In climada/hazard/tc_tracks.py:
removed outer loop over dataset.year
iterate only over dataset.n_trk
select tracks with dataset.sel(n_trk=i)

Tests:
Added regression test test_from_FAST_not_multiplied_by_year_dim in climada/hazard/test/test_tc_tracks.py.
Test builds a tiny temporary FAST-like NetCDF with both n_trk and year dims and checks that output size equals n_trk (not n_trk * n_year).
Validation

Ran:
python -m unittest climada.hazard.test.test_tc_tracks.TestIO.test_from_FAST
python -m unittest climada.hazard.test.test_tc_tracks.TestIO.test_from_FAST_not_multiplied_by_year_dim
Both passed locally.

This PR fixes #1268

PR Author Checklist

PR Reviewer Checklist

Copy link
Copy Markdown
Collaborator

@NicolasColombi NicolasColombi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Simona, thanks for catching this and proposing to fix it! 🙌

As mentioned in the review, my only remark is on the test file, I believe it would be less cumbersome to have a single file and single test.

Comment on lines +710 to +764
with tempfile.TemporaryDirectory() as tmpdir:
ds = xr.Dataset(
{
"lon_trks": (
("n_trk", "time"),
np.array(
[
[290.0, 291.0, 292.0],
[300.0, 301.0, 302.0],
],
dtype=float,
),
),
"lat_trks": (
("n_trk", "time"),
np.array(
[
[10.0, 10.5, 11.0],
[15.0, 15.5, 16.0],
],
dtype=float,
),
),
"vmax_trks": (
("n_trk", "time"),
np.array(
[
[20.0, 21.0, 22.0],
[25.0, 26.0, 27.0],
],
dtype=float,
),
),
"tc_month": ("n_trk", np.array([8, 9], dtype=np.int64)),
"tc_basins": ("n_trk", np.array(["NA", "NA"], dtype="<U2")),
"tc_years": ("n_trk", np.array([1998, 1999], dtype=np.int64)),
"seeds_per_month": (
("year", "basin", "month"),
np.zeros((4, 1, 12), dtype=float),
),
},
coords={
"n_trk": ("n_trk", np.array([0, 1], dtype=np.int64)),
"time": ("time", np.array([0, 10800, 21600], dtype=float)),
"year": (
"year",
np.array([1998, 1999, 2000, 2001], dtype=np.int64),
),
"basin": ("basin", np.array(["NA"], dtype="<U2")),
"month": ("month", np.arange(1, 13, dtype=np.int64)),
},
)

path = DATA_DIR.joinpath(tmpdir, "fast_regression.nc")
ds.to_netcdf(path)
Copy link
Copy Markdown
Collaborator

@NicolasColombi NicolasColombi Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is already a test file named FAST_test_tracks.nc, with only one year (hence the test did not reveled the bug you are suggesting to fix), it might be a good idea to update such file to contain 2 years, and then test the updated from_FAST function on such file with only one test. I do not think this would be an issue size wise, as it is only 62KB at the moment. This would require, either reproducing the small file for two years, or simply fabricating such file by duplicating the existing one, concatenate it, and manually modify the second year number. Lastly, you will need to update the original test to capture this.

This way I think we can have a single file and single test, reducing the code, since the majority of the code in your test is there to create a temporary file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, this makes a lot of sense.

I’ve updated the existing fixture file (FAST_test_tracks.nc) to include two years (year = [2025, 2026]) by duplicating the seeds_per_month data along the year dimension. The track-related variables (tc_years, lon_trks, etc.) remain unchanged, so the file structure stays consistent with the original intent.

With this update, I removed the separate regression test and its temporary-file setup. The existing assertion in test_from_FAST:

self.assertEqual(len(tc_track.data), 5)

now ensures that tracks are not duplicated when a year dimension is present. With two years present, the previous buggy implementation would have returned 10 tracks instead of 5.

This keeps the test setup simpler and avoids duplicating logic for temporary file creation.

…egression test

The test fixture FAST_test_tracks.nc now has year=[2025,2026] (only
seeds_per_month is extended; track variables retain their n_trk dim).
The existing len(tc_track.data)==5 assertion now acts as the regression
check: the buggy year-loop code would return 5x2=10 tracks.

The separate test_from_FAST_not_multiplied_by_year_dim (with its
temporary-file scaffolding) is removed.
Copy link
Copy Markdown
Collaborator

@NicolasColombi NicolasColombi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this Simona! 🙌 This is ready to merge from my side.

@chahank do we have your blessing ?

Copy link
Copy Markdown
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chahank
Copy link
Copy Markdown
Member

chahank commented Mar 27, 2026

Thanks for the bugfix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants