Skip to content

Release/0.7.0#152

Open
jackiryan wants to merge 11 commits intomainfrom
release/0.7.0
Open

Release/0.7.0#152
jackiryan wants to merge 11 commits intomainfrom
release/0.7.0

Conversation

@jackiryan
Copy link
Contributor

Description

This release is primarily intended to add support for the GHRSST L4 MUR product from PODAAC.

Refactored pipeline to use more concise messages between lambdas. The "Browse Image Transfer" (BIT) workflow is now more consolidated and the following lambdas from version <=0.6.0 are now part of the same lambda:

  • process_harmony_results (reading result urls from Harmony API request and generating checksums)
  • generate_image_metadata (producing the image metadata xml for GIBS with data start and end times)
  • build_image_sets (associating sets of browse image, world file, and image metadata xml)
  • save_cnm_message (saving the cnm json message with image set information for GIBS)

All of these functions occur synchronously, so they have been combined into the new "Handle BIG Result" step of the pipeline which produces and saves CNM messages to S3. This was done to reduce the size of messages passed by the bignbit pipeline between steps, since these messages are limited to 256KB in size. For datasets like GHRSST MUR where many tiled images are produced, the old system caused Step Functions to fail the workflow due to oversized messages. Rather than introduce a new database or save additional intermediate files to S3, this approach both simplifies the workflow and mitigates the message size issue.

In practice, this means that the final state output of the workflow has changed. Previously, the pobit item of the state payload contained an array of image_set objects. Now it contains an array of references to CNM messages that have been sent to GIBS over the SQS queue:

  "pobit": [
      {
        "cmr_provider": "LARC_CLOUD",
        "collection_name": "PREFIRE_SAT2_2B-FLX_EEDTEST",
        "cnm_bucket": "podaac-sit-svc-internal",
        "cnm_key": "bignbit-cnm-output/PREFIRE_SAT2_2B-FLX_EEDTEST_flx_LL/PREFIRE_SAT2_2B-FLX_S07_R00_20210721013413_03040.nc.G00.2026-02-19T17:27:01.816Z.cnm.json",
        "gibs": {
          "cnmContent": {
            "MD5OfMessageBody": "9597edb67114477b7c1133aec065062d",
            "MD5OfMessageAttributes": "f7aec4559a577e6f7a0ba62823347d93",
            "MessageId": "ffa61002-f475-48b6-b02f-fe89b408e3ea",
            "SequenceNumber": "18900253714877423616",
            "ResponseMetadata": {
              "RequestId": "e900647d-6804-5d8d-b4a2-066c2237bebc",
              "HTTPStatusCode": 200,
              "HTTPHeaders": {
                "x-amzn-requestid": "e900647d-6804-5d8d-b4a2-066c2237bebc",
                "date": "Thu, 19 Feb 2026 17:27:15 GMT",
                "content-type": "text/xml",
                "content-length": "512",
                "connection": "keep-alive"
              },
              "RetryAttempts": 0
            }
          }
        }
      },
      ...
  ]

Additionally, warnings are now issued when a Harmony API call reports success but produces no data. This situation is occasionally encountered with some data sets, though its root cause is unknown. For now, we want to capture the occurrences without failing the overall workflow since other variables or projections in the browse image workflow may still succeed.

Added

  • issues/108: Handle case when no data is returned from a Harmony job by throwing a warning that can be tracked in CloudWatch logs.

Changed

  • issues/148: Refactored message passing system after browse image generation to handle large tiled outputs (100s of output files).

Removed

  • issues/148: Removed "Generate Image Metadata", "Build Image Sets", "Process Harmony Results", and "Save CNM Message" lambdas in favor of a consolidated "Process BIG Result" lambda that generates and save CNM messages for the entire result of a browse image generation workflow.

Overview of verification done

  • Tested the following data sets in the SIT venue:
    • OPERA HLS
    • PREFIRE COG
    • TEMPO NO2
    • GHRSST L4 MUR (new!!)

Overview of integration done

Integration testing in UAT is TBD.

PR checklist:

  • Linted
  • Updated unit tests
  • Updated changelog
  • Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

jackiryan and others added 11 commits January 7, 2026 17:38
# Conflicts:
#	pyproject.toml
)

* issue/108: handle case when no data is returned from a Harmony job

* Updated step function graph
This workflow automatically:
- Adds new issues and PRs to the podaac project
- Sets their status to 'needs:triage'

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* issue/148: Refactored format of harmony job status passed to generate image metadata

* Removed process_harmony_results.py lambda and associated tests

* Fix linter error

* Updated terraform scripts to remove references to process_harmony_job_output and provide env variables to generate_image_metadata

* Refactored generate_image_metadata, build_image_sets, and save_cnm_message into one lambda

* Fixed bug with collection name key

* issues/148: Updated unit tests, changelog, and step function graph

* issue/148: fixed issue with CICD pipeline workflow not creating manifest list

* issue/148: set provenance to false in build and publish docker image step

* issue/148: fixed a bug so that image sets with no world file are allowed

* issue/148: fixed bug where HarmonyJobNoDataError pass state triggered KeyError

* issue/148: Minor changes to address copilot code review comments
@tloubrieu-jpl tloubrieu-jpl moved this to needs:triage in podaac Feb 19, 2026
@jackiryan jackiryan requested a review from jamesfwood February 19, 2026 21:39
@tloubrieu-jpl tloubrieu-jpl moved this from needs:triage to routed in podaac Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: routed

Development

Successfully merging this pull request may close these issues.

4 participants