Skip to content

[SPARK-57225][SS] Reject write operations on state data sources with clear error#56319

Open
shrirangmhalgi wants to merge 1 commit into
apache:masterfrom
shrirangmhalgi:SPARK-57225-state-datasource-readonly
Open

[SPARK-57225][SS] Reject write operations on state data sources with clear error#56319
shrirangmhalgi wants to merge 1 commit into
apache:masterfrom
shrirangmhalgi:SPARK-57225-state-datasource-readonly

Conversation

@shrirangmhalgi
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Make StateDataSource and StateMetadataSource explicitly reject write operations with a clear error message (STDS_WRITE_UNSUPPORTED) instead of falling through to the V1 write path and producing a confusing internal error.

Both sources now implement CreatableRelationProvider and throw immediately in createRelation, producing: "The state data source 'statestore' is read-only and does not support write operations."

Why are the changes needed?

StateDataSource (statestore) and StateMetadataSource (state-metadata) are read-only checkpoint inspection sources. Previously, df.write.format("statestore").save() fell through to the V1 write path and threw INTERNAL_ERROR: "does not allow create table as select" - a confusing message that doesn't tell the user the source is intentionally read-only.

Does this PR introduce any user-facing change?

Yes. Write attempts on state data sources now produce a clear error:

[STDS_WRITE_UNSUPPORTED] The state data source 'statestore' is read-only and does not support write operations. State store checkpoint data should not be modified externally.

Previously the error was a generic internal error.

How was this patch tested?

Added 2 tests in StateDataSourceNegativeTestSuite verifying that df.write.format("statestore").save() and df.write.format("state-metadata").save() throw STDS_WRITE_UNSUPPORTED with sqlState 0A000.

Was this patch authored or co-authored using generative AI tooling?

Yes. Using Claude-Opus 4.6

…clear error

StateDataSource (statestore) and StateMetadataSource (state-metadata) are read-only streaming checkpoint inspection sources. Previously, attempting to write via df.write.format("statestore").save() would fall through to the V1 write path and produce a confusing internal error.

This PR makes both sources implement CreatableRelationProvider to intercept write attempts early and throw a clear STDS_WRITE_UNSUPPORTED error: "The state data source '<sourceName>' is read-only and does not support write operations."

Changes:
- Add STDS_WRITE_UNSUPPORTED error class with sqlState 0A000
- StateDataSource implements CreatableRelationProvider, throws in createRelation
- StateMetadataSource implements CreatableRelationProvider, throws in createRelation
- Add regression tests in StateDataSourceNegativeTestSuite
@shrirangmhalgi
Copy link
Copy Markdown
Contributor Author

@HeartSaVioR / @liviazhu Could you please review this PR which addresses SPARK-57225? It makes the state data sources (statestore and state-metadata) explicitly reject write operations with a clear STDS_WRITE_UNSUPPORTED error instead of falling through to a confusing internal error. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant