From 1ae9037a747fe7d4cdbdb3e59b1f1109fd5cc0cb Mon Sep 17 00:00:00 2001 From: Kyle Mandli Date: Thu, 16 Apr 2026 09:24:46 -0400 Subject: [PATCH 1/4] Repurpose netcdf.rst for input only description Adds both user and developer level documentation. Need to still fix references to this document. --- doc/netcdf.rst | 411 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 403 insertions(+), 8 deletions(-) diff --git a/doc/netcdf.rst b/doc/netcdf.rst index fdcb9af..bfd16d6 100644 --- a/doc/netcdf.rst +++ b/doc/netcdf.rst @@ -1,12 +1,407 @@ -:orphan: +.. toctree:: + :maxdepth: 2 -.. _netcdf: + user_guide + developer_reference + descriptor_format + unit_contract + coordinate_normalization + technical_debt + adding_met_field + test_structure + next_steps -.. seealso:: :ref:`topo_netcdf`. +GeoClaw NetCDF Input System +=========================== -========================== -Using NetCDF output -========================== +This document covers the NetCDF input pipeline introduced in the +``refactor-netcdf-support`` PR. It has two sections: a user guide for +scientists who want to use NetCDF files as input, and a developer +reference for those working on the implementation. -NetCDF output is not currently supported in Clawpack. This is not a suitable -format for AMR style data. +-------------- + +User Guide +---------- + +Topography from NetCDF +~~~~~~~~~~~~~~~~~~~~~~ + +GeoClaw can read topography/bathymetry directly from a NetCDF file +(``topo_type=4``). The file must contain a 2D elevation variable on a +regular latitude/longitude grid. Common sources include GEBCO, ETOPO, +NOAA coastal DEMs, and any file produced by standard GIS tools. + +What GeoClaw handles automatically +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- Longitude conventions: both [-180, 180] and [0, 360] are detected and + normalized at runtime. You do not need to preprocess the file. +- Latitude ordering: both S-to-N and N-to-S are handled correctly. +- Dimension ordering: ``(lat, lon)`` and ``(lon, lat)`` are both + supported. +- Fill values: ``_FillValue`` and ``missing_value`` attributes are + resolved automatically. If a fill value is found within your + simulation domain, GeoClaw will abort with an error -- this is + intentional, as missing bathymetry is a silent correctness hazard. + +What your file must provide +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- A 2D variable containing elevation in **meters** (positive up, + negative for ocean). If your file uses feet or another unit, use + ``CFNormalizer`` to convert before running (see below). +- 1D coordinate variables for latitude and longitude with recognizable + names (``lat``/``latitude``/``y`` and ``lon``/``longitude``/``x`` are + all detected). Curvilinear (2D coordinate) grids are not currently + supported for topography. + +Registering a NetCDF topo file in setrun.py +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: python + + from clawpack.geoclaw.netcdf_utils import TopoInterrogator + meta = TopoInterrogator('bathy.nc', var_name='z', + crop_bounds=(-100, -60, 15, 35)).interrogate_topo() + rundata.topo_data.topofiles.append([4, 'bathy.nc', meta]) + +That is the only change required in most cases. GeoClaw's Python layer +interrogates the file when you run ``setrun.py`` and writes the +necessary descriptor information into ``topo.data`` automatically. +Variable and coordinate names are auto-detected from CF attributes where +possible; pass ``var_name``, ``lon_name``, or ``lat_name`` explicitly to +``TopoInterrogator`` if auto-detection fails. + +Domain subsetting (crop) +^^^^^^^^^^^^^^^^^^^^^^^^ + +If your NetCDF file covers a larger area than your simulation domain +(common with global or regional datasets), you can crop at read time +without creating a smaller file: + +.. code:: python + + topo_data.topofiles.append([4, 'gebco_global.nc', + {'crop_bounds': [-100, -80, 20, 35]}]) + +Only the subset is read into memory at runtime. The full file is never +loaded. + +Checking CF compliance +^^^^^^^^^^^^^^^^^^^^^^ + +If you are unsure whether your file will be read correctly, the +``CFNormalizer`` utility can inspect and repair common issues: + +.. code:: python + + from clawpack.geoclaw.netcdf_utils import CFNormalizer + + cf = CFNormalizer('path/to/bathymetry.nc') + cf.report() # prints any issues found + cf.normalize('path/to/bathymetry_cf.nc') # writes a corrected copy + +``CFNormalizer`` adds missing ``standard_name``, ``axis``, and ``units`` +attributes, and resolves ``_FillValue``/``missing_value`` conflicts. It +does not resample or reproject. + +-------------- + +Storm surge met forcing from NetCDF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For storm surge simulations using full gridded met forcing (wind and +pressure fields), GeoClaw can read directly from a NetCDF file. This +replaces the need to convert to OWI ASCII format. + +Supported source formats: + +- **ERA5** (ECMWF reanalysis): detected automatically from CF attributes +- **NWS13** (OWI NetCDF): detected automatically +- **Generic CF-compliant NetCDF**: requires variable name mapping if + names are non-standard + +Not yet supported: raw WRF output (requires preprocessing due to +curvilinear grid and string-encoded time axis). + +Required variables and units +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +GeoClaw expects the following in the NetCDF file **after any unit +conversion** (conversion happens automatically during Python +preprocessing): + +============================= ================================ +Variable Unit +============================= ================================ +Wind (u-component, eastward) m/s +Wind (v-component, northward) m/s +Surface pressure Pa +Time seconds from user-defined offset +============================= ================================ + +If your file uses hPa/mbar for pressure or knots for wind, +``MetInterrogator`` will convert automatically. + +Registering a NetCDF storm file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In ``setrun.py``: + +.. code:: python + + surge_data.storm_specification_type = 'data' + surge_data.storm_file = 'isaac.storm' + +Then create the storm descriptor file, *e.g.* ``isaac.storm``, using the +Python API: + +.. code:: python + + from clawpack.geoclaw.surge.storm import Storm + import numpy as np + + storm = Storm() + storm.time_offset = np.datetime64('2012-08-29') + storm.file_format = 'netcdf' + storm.file_paths = ['path/to/forcing.nc'] + storm.write('isaac.storm', file_format='data') + +If your variable or dimension names are non-standard, provide a mapping: + +.. code:: python + + storm.write('isaac.storm', file_format='data', + dim_mapping={'t': 'valid_time'}, + var_mapping={'wind_u': 'u10', 'wind_v': 'v10', + 'pressure': 'msl'}) + +Time handling +^^^^^^^^^^^^^ + +GeoClaw works in seconds from a user-defined offset (typically landfall +or storm genesis). All CF time decoding -- including calendar handling +and unit conversion from hours/days -- is done in Python. The Fortran +runtime sees only seconds from offset. + +Set the offset when constructing the storm: + +.. code:: python + + storm.time_offset = np.datetime64('2012-08-29T00:00') # landfall time + +-------------- + +Developer Reference +------------------- + +Architecture overview +~~~~~~~~~~~~~~~~~~~~~ + +The system has a strict Python/Fortran split: + +**Python** handles: file interrogation, CF attribute parsing, coordinate +convention detection, fill value resolution, unit conversion, time +decoding, crop bound validation, and descriptor writing. + +**Fortran** handles: opening the NetCDF file at runtime using +information from the descriptor, index arithmetic for coordinate +normalization (no data copies), domain subsetting via +``start``/``count`` arguments to ``nf90_get_var``, and time-slice reads +for met forcing. + +Fortran assumes the unit contract from ``GEOCLAW_NETCDF_UNITS`` in +``units.py`` without checking. Python enforces it. + +Class hierarchy (``netcdf_utils.py``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + NetCDFInterrogator + - open file (xarray, Dask-lazy) + - discover coordinate variables by name heuristics + CF standard_name + - detect lon convention, lat order, dim order + - resolve fill value (_FillValue wins over missing_value per CF spec) + - validate crop bounds against file extent + - output: NetCDFDescriptor dataclass + + TopoInterrogator(NetCDFInterrogator) + - detect fill values within crop region (hard error) + - verify and convert units to contract + - no multi-file coverage logic (Fortran handles compositing) + + MetInterrogator(NetCDFInterrogator) + - check wind_u, wind_v, pressure present and on same grid/time axis + - convert units to contract + - decode CF time to seconds from offset + - detect ensemble/member dimensions (hard error if non-singleton) + - partial domain coverage is allowed (Fortran fills edges) + + DescriptorWriter + - topo: writes key=value lines for inline inclusion in topo.data, using a + blank line to terminate the Fortran read loop + - met: writes &file_info namelist block + repeated &variable_info + blocks for *.storm file body + + CFNormalizer + - adds/fixes CF attributes without modifying data + - idempotent + +Descriptor format +~~~~~~~~~~~~~~~~~ + +Topo (lines in topo.data after topo_type) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:: + + var_name = z + lon_name = longitude + lat_name = latitude + lon_convention = 180 + lat_order = S_to_N + dim_order = lat,lon + fill_value = -9999.0 + fill_action = abort + crop_bounds = -100.0 -80.0 20.0 35.0 + +``fill_action = abort`` is the only supported value for topography. +``crop_bounds`` is omitted if no crop is specified. + +Met forcing (body of \*.storm after format header) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:: + + # format: netcdf + &file_info + source_file = /path/to/forcing.nc + lon_name = longitude + lat_name = latitude + time_name = valid_time + dim_order = time,lat,lon + lon_convention = 360 + lat_order = S_to_N + fill_value = -9999.0 + fill_action = warn + time_offset = 0.0 + crop_bounds = -100.0 -80.0 20.0 35.0 + / + &variable_info var_name=u10 geoclaw_role=wind_u / + &variable_info var_name=v10 geoclaw_role=wind_v / + &variable_info var_name=msl geoclaw_role=pressure / + +The ``&variable_info`` blocks use a manually parsed key=value format +(not true repeated Fortran namelists) for compiler portability. Fortran +reads them in a loop until EOF. Adding new roles (precipitation, +friction) requires no change to the Fortran parser. + +Unit contract +~~~~~~~~~~~~~ + +Defined in ``units.py``: + +.. code:: python + + GEOCLAW_NETCDF_UNITS = { + "topo": "m", + "wind_u": "m/s", + "wind_v": "m/s", + "pressure": "Pa", + "time": "s", + } + +All conversion happens in Python before the descriptor is written. +Fortran trusts the descriptor and never checks units. If you add a new +``geoclaw_role``, add its contract unit here first. + +Fortran coordinate normalization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Fortran applies coordinate normalization via index arithmetic only -- no +arrays are duplicated. The descriptor provides ``lon_convention`` and +``lat_order``; the reader uses these to compute correct indices when +calling ``nf90_get_var``. For crop bounds, ``start`` and ``count`` are +computed from the coordinate arrays at runtime. + +All ``nf90_*`` calls go through a checked interface in +``topo_module.f90`` that prints a meaningful diagnostic and stops +cleanly on failure. This is important for SLURM batch jobs where a +silent Fortran ``STOP`` produces no useful output. + +Known technical debt +~~~~~~~~~~~~~~~~~~~~ + +- ``util.get_netcdf_names`` and the coordinate/variable discovery logic + in ``NetCDFInterrogator`` are parallel implementations. They should + be consolidated -- ``NetCDFInterrogator`` should be the single source + and ``util.get_netcdf_names`` should delegate to it. This is deferred + to the surge module refactor. +- ``Storm.write(file_format="data")`` does not yet call + ``DescriptorWriter`` directly for NetCDF storm entries. Currently the + tests write descriptors manually. The integration should happen as + part of the surge refactor. +- WRF raw output requires ``MetPreprocessor`` for string-time decoding + (``Times`` character array, not a numeric CF time variable) and + curvilinear grid handling (``XLAT``/``XLONG`` are 2D time-varying + arrays). A skip-marked test stub documents the gap in + ``test_storm.py``. + +Adding a new met forcing field (e.g. precipitation) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Add the role and unit to ``GEOCLAW_NETCDF_UNITS`` in ``units.py`` +2. Add detection logic to ``MetInterrogator`` for the new variable +3. ``DescriptorWriter`` requires no change -- new ``&variable_info`` + blocks are written automatically for any role the interrogator + returns +4. Add the Fortran reader logic in the storm NetCDF module to consume + the new ``geoclaw_role`` value +5. Add unit tests in ``tests/netcdf/test_met_interrogator.py`` +6. Add a regression test in the appropriate storm surge example + +Test structure +~~~~~~~~~~~~~~ + +:: + + tests/netcdf/ unit tests for netcdf_utils.py + conftest.py in-memory NetCDF fixtures + test_base_interrogator.py + test_topo_interrogator.py + test_met_interrogator.py + test_descriptor_writer.py + test_cf_normalizer.py + + tests/test_storm.py extended for ERA5 and NWS13 variants + + examples/tsunami/bowl-slosh-netcdf/ coordinate variant regression tests + examples/tsunami/chile2010/ topotools write path + coord variants + examples/storm-surge/isaac/ ERA5 and NWS13 met forcing regression + +All test NetCDF files are generated in-memory or in ``tmp_path`` -- no +binary files are committed to the repository. + +-------------- + +Next steps +---------- + +- **Surge module refactor**: consolidate ``util.get_netcdf_names`` into + ``NetCDFInterrogator``, wire ``Storm.write`` to use + ``DescriptorWriter``, clean up parallel discovery paths in + ``storm.py`` +- **WRF support**: implement ``MetPreprocessor`` to handle string-time + axis and curvilinear grid; the skip-marked test stub in + ``test_storm.py`` documents the expected interface +- **Precipitation field**: reserved in ``GEOCLAW_NETCDF_UNITS``, add + ``MetInterrogator`` detection and Fortran consumer +- **Friction field**: same pattern as precipitation +- **CLI tool**: expose ``CFNormalizer`` as a command-line utility for + users who want to check or repair their NetCDF files before running + GeoClaw +- **Duplicate code audit**: other places in the GeoClaw codebase that + do ad-hoc NetCDF variable discovery should be identified and migrated + to ``NetCDFInterrogator`` From d28de7035dbe971e49b43b43824bfdc25a29da04 Mon Sep 17 00:00:00 2001 From: Kyle Mandli Date: Thu, 16 Apr 2026 17:07:51 -0400 Subject: [PATCH 2/4] Add docs for new netcdf input for GeoClaw --- doc/netcdf.rst | 33 +++++++++++++++++++++++++-------- doc/topo.rst | 48 +++++++++++++++++++++++------------------------- 2 files changed, 48 insertions(+), 33 deletions(-) diff --git a/doc/netcdf.rst b/doc/netcdf.rst index bfd16d6..7ef3bb6 100644 --- a/doc/netcdf.rst +++ b/doc/netcdf.rst @@ -11,12 +11,14 @@ test_structure next_steps +.. _netcdf_input: + GeoClaw NetCDF Input System =========================== This document covers the NetCDF input pipeline introduced in the -``refactor-netcdf-support`` PR. It has two sections: a user guide for -scientists who want to use NetCDF files as input, and a developer +``refactor-netcdf-support`` PR (VERSION ?5.15?). It has two sections: a user +guide for scientists who want to use NetCDF files as input, and a developer reference for those working on the implementation. -------------- @@ -59,6 +61,20 @@ What your file must provide Registering a NetCDF topo file in setrun.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If your file meets the above requirements and either has the variable `z` or +only one 2D variable in the file, you can simply use the following in +``setrun.py``: + +.. code:: python + + rundata.topo_data.topofiles.append([4, 'bathy.nc']) + +This matches with what is expected for backwards compatibility. + +If your file meets the above requirements and has a non-standard variable name +or you want to specify crop bounds you can use the Python API to interrogate the +file and write a descriptor: + .. code:: python from clawpack.geoclaw.netcdf_utils import TopoInterrogator @@ -67,11 +83,12 @@ Registering a NetCDF topo file in setrun.py rundata.topo_data.topofiles.append([4, 'bathy.nc', meta]) That is the only change required in most cases. GeoClaw's Python layer -interrogates the file when you run ``setrun.py`` and writes the -necessary descriptor information into ``topo.data`` automatically. -Variable and coordinate names are auto-detected from CF attributes where -possible; pass ``var_name``, ``lon_name``, or ``lat_name`` explicitly to -``TopoInterrogator`` if auto-detection fails. +interrogates the file when you run ``setrun.py`` and writes the necessary +descriptor information into ``topo.data`` automatically. Variable and coordinate +names are auto-detected from CF attributes where possible; pass ``var_name``, +``lon_name``, or ``lat_name`` explicitly to ``TopoInterrogator`` if +auto-detection fails. This also should handle multiple different coordinate +layouts. Domain subsetting (crop) ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -86,7 +103,7 @@ without creating a smaller file: {'crop_bounds': [-100, -80, 20, 35]}]) Only the subset is read into memory at runtime. The full file is never -loaded. +loaded. Note this is the same as using the Checking CF compliance ^^^^^^^^^^^^^^^^^^^^^^ diff --git a/doc/topo.rst b/doc/topo.rst index 42d4853..22a15b8 100644 --- a/doc/topo.rst +++ b/doc/topo.rst @@ -12,12 +12,13 @@ Topography data - :ref:`topo_order` - :ref:`tsunamidata` - :ref:`dtopo` + - :ref:`g_input` The :ref:`geoclaw` software for flow over topography requires at least one topo file to be input, see :ref:`setrun_geoclaw`. -Currently topo files are restricted to three possible formats as ASCII files, or -NetCDF files are also allowed. +Currently topo files are restricted to three possible formats as ASCII files, +and netCDF files. In the descriptions below it is assumed that the topo file gives the elevation of the topography (relative to some reference level) as a value of @@ -169,14 +170,8 @@ The recognized topotypes are: **topotype = 4** - This file type is not ASCII but rather in a NetCDF4 format supported by the - `CF MetaData conventions (v. 1.6) `_. Files - that conform to this standard can be read in by GeoClaw. The `topotools` - module also has support for reading and writing (including therefore - conversion) of these types of bathymetry files (see :ref:`topo_netcdf` - below). To use this functionality - you will need to add *-DNETCDF* to the *FFLAGS* variable either by the - command line or in the Makefile. + Read in netCDF formatted topography data. See :ref:`netcdf_input` for + NetCDF topography format information. The Fortran code will recognize headers for *topotype* 2 or 3 that have the labels first and then the parameter values. @@ -214,21 +209,24 @@ Several on-line databases are available for topograpy, see Some Python tools for working with topography files are available, see :ref:`topotools`. -.. _topo_netcdf: - -NetCDF format -^^^^^^^^^^^^^ - -Topofiles can be read in netCDF format, either from local `.nc` files or -from some online databases that provide netCDF servers, e.g. the -`NOAA THREDDS server `_. -Use the -`topotools.read_netcdf `_ -function. Note that this also allows reading in only a subset of the data, -both limiting the extent and the resolution, e.g. by sampling every other -point (by setting `coarsen=2`). This is particularly useful if you only want -a subset of a huge online netCDF file (e.g. coastal DEMs at 1/3 arcsecond -resolution are typically several gigabytes). +.. _noaa_thredds: + +NOAA THREDDS server +^^^^^^^^^^^^^^^^^^^^^ + +The `NOAA THREDDS server +`_ +can be used to access a variety of topography data sets, including the etopo1 +global data set at 1 arcminute resolution and the etopo2 global data set at 2 +arcminute resolution. These are available in netCDF format and can be read +directly into GeoClaw. As a convenience, you can use the `topotools.read_netcdf +`_ function. Note +that this also allows reading in only a subset of the data, both limiting the +extent and the resolution, e.g. by sampling every other point (by setting +`coarsen=2`). This is particularly useful if you only want a subset of a huge +online netCDF file (e.g. coastal DEMs at 1/3 arcsecond resolution are typically +several gigabytes). See :ref:`netcdf_input` for more details on working with +netCDF files. The dictionary `topotools.remote_topo_urls` contains some useful URLs for etopo and a few other NOAA THREDDS datasets. This allows reading etopo From 68acd0a45fe72d4b73d9934093aec06e33979ea7 Mon Sep 17 00:00:00 2001 From: Kyle Mandli Date: Thu, 16 Apr 2026 17:08:06 -0400 Subject: [PATCH 3/4] Remove mention of NetCDF output for AMR --- doc/output_styles.rst | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/doc/output_styles.rst b/doc/output_styles.rst index fefb5f8..ac6cae8 100644 --- a/doc/output_styles.rst +++ b/doc/output_styles.rst @@ -134,8 +134,6 @@ Each patch has a unique `grid_number` that usually isn't needed for visualization purposes. - - .. _output_binary: Raw binary output data formats @@ -163,15 +161,6 @@ Unlike the ASCII data files, the binary output files contain ghost cells as well as the interior cells (since a contiguous block of memory is dumped for each patch with a single `write` statement). - -.. _output_netcdf: - -NetCDF output data format ------------------------------- - -NetCDF output is not currently supported in Clawpack. This is not a suitable -format for AMR style data. - .. _output_aux: Output of aux arrays @@ -185,5 +174,3 @@ have the same format as the `q` data, as specifed by `output_format`. Set `output_aux_onlyonce` to `True` if the `aux` arrays do not change with time and you only want to output these arrays once. - - From 893bce5ab30a5a12138399b7c973640dd7c3dd29 Mon Sep 17 00:00:00 2001 From: Kyle Mandli Date: Fri, 17 Apr 2026 19:27:22 -0400 Subject: [PATCH 4/4] Add changes needed for met forcing Some of the less common cases are now handled better with auto-detection of variables and better support for time offsets. --- doc/netcdf.rst | 160 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 128 insertions(+), 32 deletions(-) diff --git a/doc/netcdf.rst b/doc/netcdf.rst index 7ef3bb6..3050123 100644 --- a/doc/netcdf.rst +++ b/doc/netcdf.rst @@ -171,43 +171,136 @@ In ``setrun.py``: surge_data.storm_specification_type = 'data' surge_data.storm_file = 'isaac.storm' -Then create the storm descriptor file, *e.g.* ``isaac.storm``, using the -Python API: +Then create the storm descriptor file, *e.g.* ``isaac.storm``. GeoClaw +uses a two-stage discovery process: standard variable names are found +automatically from CF ``axis`` / ``standard_name`` attributes and +built-in fallback lists; you only need to supply ``var_mapping`` for +roles whose names are non-standard. Coordinate names (``lon``, ``lat``, +``time``) are always discovered automatically and never need to be +specified. + +**Case 1 — all standard names (ERA5 and similar):** + +ERA5 variable names (``u10``, ``v10``, ``msl``) are in the built-in +fallback lists and are found automatically. No ``var_mapping`` is +required: .. code:: python - from clawpack.geoclaw.surge.storm import Storm import numpy as np + from clawpack.geoclaw.surge.storm import Storm storm = Storm() storm.time_offset = np.datetime64('2012-08-29') storm.file_format = 'netcdf' - storm.file_paths = ['path/to/forcing.nc'] + storm.file_paths = ['path/to/era5_forcing.nc'] storm.write('isaac.storm', file_format='data') -If your variable or dimension names are non-standard, provide a mapping: +**Case 2 — mixed: standard wind names, non-standard pressure:** + +A common pattern when a file has multiple pressure fields (e.g. mean +sea-level and surface pressure) or a non-standard pressure variable +name. Specify only the roles that cannot be discovered automatically; +the rest are still found from the fallback lists: .. code:: python storm.write('isaac.storm', file_format='data', - dim_mapping={'t': 'valid_time'}, - var_mapping={'wind_u': 'u10', 'wind_v': 'v10', - 'pressure': 'msl'}) + var_mapping={'pressure': 'prmsl'}) -Time handling -^^^^^^^^^^^^^ +Any name supplied in ``var_mapping`` is validated against the variables +actually present in the file before the descriptor is written, so a +typo raises an informative error immediately rather than producing +incorrect output silently. + +**Case 3 — all non-standard names (e.g. NWS13/OWI-NetCDF):** + +Supply all three roles explicitly when none of the variable names match +the built-in fallback lists: + +.. code:: python + + storm = Storm() + storm.time_offset = np.datetime64('2012-08-29') + storm.file_format = 'nws13' + storm.file_paths = ['path/to/nws13_forcing.nc'] + storm.write('isaac.storm', file_format='data', + var_mapping={'wind_u': 'uwnd', + 'wind_v': 'vwnd', + 'pressure': 'press'}) -GeoClaw works in seconds from a user-defined offset (typically landfall -or storm genesis). All CF time decoding -- including calendar handling -and unit conversion from hours/days -- is done in Python. The Fortran -runtime sees only seconds from offset. +**Advanced: pre-built MetInterrogator:** -Set the offset when constructing the storm: +If you need direct control over CF validation, unit checking, or +lon/lat convention detection before the descriptor is written, you can +construct a :class:`~clawpack.geoclaw.netcdf_utils.MetInterrogator` +explicitly and pass it via ``met_interrogator``. When a pre-built +interrogator is supplied, auto-discovery and ``var_mapping`` validation +are bypassed entirely: + +.. code:: python + + from clawpack.geoclaw.netcdf_utils import MetInterrogator + + mi = MetInterrogator('path/to/forcing.nc', + variable_map={'wind_u': 'u10', + 'wind_v': 'v10', + 'pressure': 'msl'}) + storm.write('isaac.storm', file_format='data', met_interrogator=mi) + +.. note:: + + ``dim_mapping`` is accepted by ``write()`` / ``write_data()`` for + backwards compatibility but has no effect -- coordinate names are + always discovered automatically via CF conventions. + +.. note:: + + ``storm.window`` (ramp width and application domain) and + ``MetInterrogator``'s ``crop_bounds`` (read-time spatial subset of + the NetCDF file) are independent. Setting one does not affect the + other. + + +Time handling +^^^^^^^^^^^^^^^ + +GeoClaw works in seconds relative to a user-defined reference time +(typically landfall or storm genesis). Set it when constructing the +storm: .. code:: python storm.time_offset = np.datetime64('2012-08-29T00:00') # landfall time +Python computes ``nc_time_offset`` — the elapsed seconds from +``storm.time_offset`` to the first record in the file — and writes it +to the descriptor. This value is independent of how the file encodes +time internally (Unix epoch, local epoch, hours-since-reference, etc.). + +Fortran converts raw time values using: + +.. code:: + + storm_time[i] = raw[i] − raw[0] + nc_time_offset + +Subtracting ``raw[0]`` converts any absolute encoding to +elapsed-since-first-record; adding ``nc_time_offset`` then anchors that +to ``storm.time_offset``. No unit conversion (hours→seconds, +days→seconds) is performed in Fortran: the descriptor stores +``nc_time_offset`` in seconds and the raw values are read as stored. +``write_data`` passes ``time_reference=storm.time_offset`` to +``MetInterrogator`` automatically; no additional user action is needed. + +Longitude convention +^^^^^^^^^^^^^^^^^^^^^^ + +Longitude convention ([0, 360] vs [-180, 180]) is detected and +normalized automatically by ``MetInterrogator``. ERA5 files that store +longitude in [0, 360] do not need to be preprocessed to [-180, 180] +before use. The detected convention is written to the descriptor and +Fortran applies index normalization at runtime without copying arrays. + -------------- Developer Reference @@ -219,14 +312,16 @@ Architecture overview The system has a strict Python/Fortran split: **Python** handles: file interrogation, CF attribute parsing, coordinate -convention detection, fill value resolution, unit conversion, time -decoding, crop bound validation, and descriptor writing. +convention detection, fill value resolution, unit conversion, +``nc_time_offset`` computation (elapsed seconds from ``storm.time_offset`` +to the first record), crop bound validation, and descriptor writing. **Fortran** handles: opening the NetCDF file at runtime using information from the descriptor, index arithmetic for coordinate normalization (no data copies), domain subsetting via -``start``/``count`` arguments to ``nf90_get_var``, and time-slice reads -for met forcing. +``start``/``count`` arguments to ``nf90_get_var``, time-slice reads +for met forcing, and converting raw time values to seconds from +``storm.time_offset`` via ``raw[i] − raw[0] + nc_time_offset``. Fortran assumes the unit contract from ``GEOCLAW_NETCDF_UNITS`` in ``units.py`` without checking. Python enforces it. @@ -351,21 +446,21 @@ silent Fortran ``STOP`` produces no useful output. Known technical debt ~~~~~~~~~~~~~~~~~~~~ -- ``util.get_netcdf_names`` and the coordinate/variable discovery logic - in ``NetCDFInterrogator`` are parallel implementations. They should - be consolidated -- ``NetCDFInterrogator`` should be the single source - and ``util.get_netcdf_names`` should delegate to it. This is deferred - to the surge module refactor. -- ``Storm.write(file_format="data")`` does not yet call - ``DescriptorWriter`` directly for NetCDF storm entries. Currently the - tests write descriptors manually. The integration should happen as - part of the surge refactor. +- ``util.get_netcdf_names`` (in ``util.py``) remains a parallel + implementation of variable name discovery alongside + ``NetCDFInterrogator``. ``Storm.write_data`` now uses + ``MetInterrogator`` for NetCDF met forcing (format 2), but falls back + to ``util.get_netcdf_names`` for variable name discovery when no + explicit ``var_mapping`` is provided. Full consolidation -- making + ``util.get_netcdf_names`` delegate to ``NetCDFInterrogator`` and + removing the duplicate -- remains a TODO. - WRF raw output requires ``MetPreprocessor`` for string-time decoding (``Times`` character array, not a numeric CF time variable) and curvilinear grid handling (``XLAT``/``XLONG`` are 2D time-varying arrays). A skip-marked test stub documents the gap in ``test_storm.py``. + Adding a new met forcing field (e.g. precipitation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -406,10 +501,11 @@ binary files are committed to the repository. Next steps ---------- -- **Surge module refactor**: consolidate ``util.get_netcdf_names`` into - ``NetCDFInterrogator``, wire ``Storm.write`` to use - ``DescriptorWriter``, clean up parallel discovery paths in - ``storm.py`` +- **Consolidate util.get_netcdf_names**: ``Storm.write_data`` now + calls ``MetInterrogator`` and ``DescriptorWriter`` for NetCDF met + forcing; the remaining step is to make ``util.get_netcdf_names`` + delegate to ``NetCDFInterrogator`` and remove the duplicate discovery + logic in ``storm.py`` - **WRF support**: implement ``MetPreprocessor`` to handle string-time axis and curvilinear grid; the skip-marked test stub in ``test_storm.py`` documents the expected interface