Skip to content

Comments

[GH-2662] Implement a pure Java single-thread COG writer#2663

Merged
jiayuasu merged 1 commit intomasterfrom
feature/cog-writer
Feb 20, 2026
Merged

[GH-2662] Implement a pure Java single-thread COG writer#2663
jiayuasu merged 1 commit intomasterfrom
feature/cog-writer

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Feb 19, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This PR adds a pure Java, single-threaded Cloud Optimized GeoTIFF (COG) writer to Apache Sedona. The implementation generates COG files by:

  1. Computing overview decimation factors (powers of 2), ported from GeoTrellis
  2. Generating downsampled overview images via GeoTools resample
  3. Writing each level as a tiled GeoTIFF via GeoTools GeoTiffWriter
  4. Parsing the TIFF IFD structure of each level (TiffIfdParser)
  5. Reassembling all levels into COG byte order with IFDs at file start and image data in reverse size order (CogAssembler)

New files

  • CogOptions.java — Immutable builder-pattern options class with 5 configurable parameters:
    • compression (default: Deflate) — validated against allow-list: Deflate, LZW, JPEG, PackBits
    • compressionQuality (default: 0.2) — quality from 0.0 to 1.0
    • tileSize (default: 256) — tile dimensions, must be power of 2
    • resampling (default: Nearest) — overview resampling: Nearest, Bilinear, or Bicubic
    • overviewCount (default: -1/auto) — number of overview levels, 0 for none
  • CogWriter.java — Main orchestrator with write(GridCoverage2D, CogOptions) and write(GridCoverage2D, CogOptions, OutputStream) methods. Pipeline-per-level processing releases each overview for GC before generating the next.
  • CogAssembler.java — Reassembles parsed TIFFs into COG-compliant byte layout, injects NewSubfileType tag for overviews. Supports both byte[] and OutputStream output.
  • TiffIfdParser.java — Parses TIFF byte arrays to extract IFD entries, image data references, and overflow areas. Uses zero-copy image data references (stores offset + length into source array instead of copying).

Modified files

  • RasterOutputs.java — Single new public API method:
    • asCloudOptimizedGeoTiff(GridCoverage2D raster, CogOptions options)

How was this patch tested?

  • 25 unit tests in CogWriterTest.java covering:
    • Overview decimation computation
    • Overview generation with all resampling modes (Nearest, Bilinear, Bicubic)
    • Small raster (no overviews), medium raster (with overviews), multiband raster
    • LZW compression, existing GeoTIFF conversion
    • TIFF IFD parsing, NewSubfileType tag presence in overview IFDs
    • Input validation (compression allow-list, compressionQuality, tileSize, malformed TIFF)
    • COG tile offsets are forward-pointing
    • CogOptions builder defaults, validation, and resampling normalization
    • overviewCount=0 (no overviews), overviewCount=1 (specific count)
    • tileSize=512, RasterOutputs CogOptions API path
  • External validation: generated COGs confirmed valid by rio cogeo validate and gdalinfo

Did this PR include necessary documentation updates?

  • Yes, this PR adds a new public Java API method (asCloudOptimizedGeoTiff in RasterOutputs). End-user documentation for the Spark SQL surface will be added in a follow-up PR when the SQL function is wired up.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pure-Java Cloud Optimized GeoTIFF (COG) writer implementation to Sedona’s common raster module, exposing new RasterOutputs entrypoints and introducing a TIFF-IFD parser + COG byte-layout assembler.

Changes:

  • Introduces CogWriter, CogAssembler, TiffIfdParser, and CogOptions to generate COGs from GridCoverage2D.
  • Adds RasterOutputs.asCloudOptimizedGeoTiff(...) public APIs that delegate to the new writer.
  • Adds CogWriterTest coverage for overview generation, assembly structure, and option validation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/raster/cog/CogWriter.java Implements overview computation/resampling and writes tiled GeoTIFF levels for later COG assembly.
common/src/main/java/org/apache/sedona/common/raster/cog/CogAssembler.java Rebuilds multi-level TIFFs into COG-compliant byte layout and patches IFD offsets/tags.
common/src/main/java/org/apache/sedona/common/raster/cog/TiffIfdParser.java Parses TIFF header/IFD/overflow/image segment regions needed for assembly.
common/src/main/java/org/apache/sedona/common/raster/cog/CogOptions.java Adds an immutable options/builder API for controlling compression/tiling/resampling/overviews.
common/src/main/java/org/apache/sedona/common/raster/RasterOutputs.java Exposes new public asCloudOptimizedGeoTiff overloads to produce COG bytes.
common/src/test/java/org/apache/sedona/common/raster/cog/CogWriterTest.java Adds unit tests validating decimations, overview creation, COG layout properties, and options validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Adds a COG writer that produces spec-compliant Cloud Optimized GeoTIFF files
with configurable compression, tile size, resampling, and overview count.

Architecture:
- CogOptions: immutable builder with validated options
- CogWriter: orchestrates overview generation, GeoTIFF encoding, and assembly
- TiffIfdParser: zero-copy TIFF IFD structure parser
- CogAssembler: reassembles parsed TIFFs into COG byte order

Key features:
- IFD-first layout with forward-pointing TileOffsets (COG spec)
- NewSubfileType injection for overview IFDs
- Pipeline-per-level processing for GC-friendly memory behavior
- Zero-copy image data references (no System.arraycopy)
- Streaming OutputStream API to avoid final byte[] allocation
- Compression allow-list validation (Deflate, LZW, JPEG, PackBits)

Public API: RasterOutputs.asCloudOptimizedGeoTiff(raster, CogOptions)

Closes #2662
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu marked this pull request as ready for review February 20, 2026 07:03
@jiayuasu jiayuasu merged commit 150c531 into master Feb 20, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a pure Java single thread COG writer

1 participant