[GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS metadata#2667
Merged
jiayuasu merged 2 commits intoapache:masterfrom Feb 21, 2026
Merged
Conversation
…rojjson CRS metadata
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements automatic CRS (Coordinate Reference System) metadata derivation from geometry SRID when writing GeoParquet files, addressing issue #2664. Previously, the GeoParquet writer always wrote null CRS unless an explicit geoparquet.crs option was provided.
Changes:
- Bumped proj4sedona dependency from 0.0.5 to 0.0.6 to support PROJJSON generation with EPSG identifier
- Added automatic SRID-to-PROJJSON conversion for non-default SRIDs during GeoParquet write operations
- Enhanced GeoParquetWriteSupport to track observed SRID per geometry column and generate appropriate CRS metadata
- Added comprehensive test coverage for various SRID scenarios and CRS behaviors
- Updated documentation to reflect the new automatic CRS derivation feature
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Bumps proj4sedona version to 0.0.6 for PROJJSON id field support |
| spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala | Adds sridToProjJson utility method to convert SRID to PROJJSON using proj4sedona |
| spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala | Implements SRID tracking and automatic CRS derivation logic in write support |
| spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala | Adds 5 comprehensive tests covering SRID 4326 omission, PROJJSON generation, SRID 0, explicit options, and mixed SRIDs |
| docs/tutorial/files/geoparquet-sedona-spark.md | Documents the new automatic CRS from SRID behavior and updates option descriptions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ain/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes GeoParquet writer should try to utilize SRID to produce projjson #2664What changes were proposed in this PR?
When writing GeoParquet files, the writer now automatically derives PROJJSON CRS metadata from the geometry SRID, instead of always writing
null(unknown CRS) when no explicitgeoparquet.crsoption is provided.Behavior
CRSSerializer.toProjJson()from proj4sedona and writes it to thecrsfield. The PROJJSON includes theidfield (e.g.,{"authority":"EPSG","code":32632}) for interoperability with other tools.null(unknown CRS), consistent with GeoPandas behavior.geoparquet.crsoption: Always takes precedence over SRID-derived CRS.Changes
pom.xml
idfield support inCRSSerializer.toProjJson()).GeoParquetMetaData.scala
sridToProjJson(srid: Int): Option[JValue]utility method. ReturnsNonefor SRID 0 and 4326 (default CRS), generates PROJJSON for other SRIDs using proj4sedonaCRSSerializer.toProjJson().GeoParquetWriteSupport.scala
_srid,_mixedSrids,observedSrid).userExplicitlySetDefaultCrsflag to distinguish "no option provided" from "user explicitly set CRS".finalizeWrite(): when no explicit CRS option is provided, derive CRS from the observed SRID. For SRID 4326, omit CRS entirely. For other SRIDs, generate PROJJSON. For SRID 0 or mixed SRIDs, writenull.geoparquetIOTests.scala
nullCRS for unknown SRID.nullCRS for mixed SRIDs.geoparquet-sedona-spark.md
How was this patch tested?
All 46 geoparquetIOTests pass:
Did this PR include necessary documentation updates?