Skip to content

Comments

[GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS metadata#2667

Merged
jiayuasu merged 2 commits intoapache:masterfrom
jiayuasu:fix/geoparquet-srid-to-projjson
Feb 21, 2026
Merged

[GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS metadata#2667
jiayuasu merged 2 commits intoapache:masterfrom
jiayuasu:fix/geoparquet-srid-to-projjson

Conversation

@jiayuasu
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

When writing GeoParquet files, the writer now automatically derives PROJJSON CRS metadata from the geometry SRID, instead of always writing null (unknown CRS) when no explicit geoparquet.crs option is provided.

Behavior

  • SRID 4326: CRS field is omitted from GeoParquet metadata, since the GeoParquet spec defines the default CRS as OGC:CRS84 (equivalent to EPSG:4326). This is a no-op that keeps file metadata minimal.
  • SRID > 0 (non-4326): Generates PROJJSON via CRSSerializer.toProjJson() from proj4sedona and writes it to the crs field. The PROJJSON includes the id field (e.g., {"authority":"EPSG","code":32632}) for interoperability with other tools.
  • SRID 0 or mixed SRIDs: Falls back to null (unknown CRS), consistent with GeoPandas behavior.
  • Explicit geoparquet.crs option: Always takes precedence over SRID-derived CRS.

Changes

pom.xml

  • Bump proj4sedona version from 0.0.5 to 0.0.6 (adds id field support in CRSSerializer.toProjJson()).

GeoParquetMetaData.scala

  • Added sridToProjJson(srid: Int): Option[JValue] utility method. Returns None for SRID 0 and 4326 (default CRS), generates PROJJSON for other SRIDs using proj4sedona CRSSerializer.toProjJson().

GeoParquetWriteSupport.scala

  • Track observed SRID per geometry column during writing (_srid, _mixedSrids, observedSrid).
  • Added userExplicitlySetDefaultCrs flag to distinguish "no option provided" from "user explicitly set CRS".
  • In finalizeWrite(): when no explicit CRS option is provided, derive CRS from the observed SRID. For SRID 4326, omit CRS entirely. For other SRIDs, generate PROJJSON. For SRID 0 or mixed SRIDs, write null.

geoparquetIOTests.scala

  • "GeoParquet save should omit CRS for SRID 4326 per GeoParquet default": verifies CRS is omitted and round-trip preserves SRID 4326.
  • "GeoParquet save should auto-generate projjson from non-default SRID": verifies PROJJSON with EPSG:32632 identifier and round-trip.
  • "GeoParquet save should keep crs null when geometry SRID is 0": verifies null CRS for unknown SRID.
  • "GeoParquet save should use explicit CRS option over SRID-derived CRS": verifies explicit option takes precedence.
  • "GeoParquet save should keep crs null for mixed SRIDs in one column": verifies null CRS for mixed SRIDs.

geoparquet-sedona-spark.md

  • Document the automatic CRS from SRID behavior.

How was this patch tested?

All 46 geoparquetIOTests pass:

mvn test -pl spark/common -Dlog4j.version=2.19.0 -Dsuites=org.apache.sedona.sql.geoparquetIOTests -DfailIfNoTests=false

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic CRS (Coordinate Reference System) metadata derivation from geometry SRID when writing GeoParquet files, addressing issue #2664. Previously, the GeoParquet writer always wrote null CRS unless an explicit geoparquet.crs option was provided.

Changes:

  • Bumped proj4sedona dependency from 0.0.5 to 0.0.6 to support PROJJSON generation with EPSG identifier
  • Added automatic SRID-to-PROJJSON conversion for non-default SRIDs during GeoParquet write operations
  • Enhanced GeoParquetWriteSupport to track observed SRID per geometry column and generate appropriate CRS metadata
  • Added comprehensive test coverage for various SRID scenarios and CRS behaviors
  • Updated documentation to reflect the new automatic CRS derivation feature

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pom.xml Bumps proj4sedona version to 0.0.6 for PROJJSON id field support
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala Adds sridToProjJson utility method to convert SRID to PROJJSON using proj4sedona
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala Implements SRID tracking and automatic CRS derivation logic in write support
spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala Adds 5 comprehensive tests covering SRID 4326 omission, PROJJSON generation, SRID 0, explicit options, and mixed SRIDs
docs/tutorial/files/geoparquet-sedona-spark.md Documents the new automatic CRS from SRID behavior and updates option descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu merged commit b288870 into apache:master Feb 21, 2026
45 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoParquet writer should try to utilize SRID to produce projjson

1 participant