Skip to content

[SPARK-57274][CONNECT] Support fetch/type accessors and getMoreResults for SparkConnectStatement#56341

Open
j1wonpark wants to merge 5 commits into
apache:masterfrom
j1wonpark:SPARK-57274
Open

[SPARK-57274][CONNECT] Support fetch/type accessors and getMoreResults for SparkConnectStatement#56341
j1wonpark wants to merge 5 commits into
apache:masterfrom
j1wonpark:SPARK-57274

Conversation

@j1wonpark
Copy link
Copy Markdown
Contributor

@j1wonpark j1wonpark commented Jun 5, 2026

What changes were proposed in this pull request?

Implement the SparkConnectStatement accessors that previously threw SQLFeatureNotSupportedException (setFetchSize/getFetchSize,setFetchDirection/getFetchDirection, getResultSetType, setQueryTimeout/getQueryTimeout), fix getMoreResults (which threw, breaking JDBC result-drain loops), and implement the createStatement type/concurrency overloads on SparkConnectConnection. Follow-up to SPARK-54108 / SPARK-54014.

Fetch size / query timeout are stored as hints (Spark Connect is forward-only and server-paginated); createStatement accepts forward-only / scroll-insensitive and rejects updatable / scroll-sensitive, mirroring the Spark Thrift Server's Hive JDBC policy. See SPARK-57274 for details.

Why are the changes needed?

JDBC client tools (e.g. DataGrip) call these methods around every query, so the throws abort the query path. getMoreResults throwing in particular makes the standard drain loop while (getMoreResults() || getUpdateCount() != -1) error out or spin forever; DataGrip hangs on a result-less command such as USE <db>.

Does this PR introduce any user-facing change?

Yes, in the unreleased master only: methods that previously threw SQLFeatureNotSupportedException are now implemented. No existing behavior is removed.

How was this patch tested?

New cases in SparkConnectStatementSuite: accessor defaults/validation, drain-loop termination, and the typed createStatement overloads.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…s for SparkConnectStatement

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
if (seconds < 0) {
throw new SQLException("Query timeout must be zero or a positive integer.")
}
queryTimeout = seconds
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, DataGrip fails immediately when calling this method, because the JDBC API javadoc does not say it may throw SQLFeatureNotSupportedException, so the fix is indeed required.

For the implementation, I feel we should not store a dummy state, the API javadoc does not say the get* method must return the same value accepted by the set* the, so getQueryTimeout should always return 0 to reflect the real state.

So the suggestion here is: keep the checkOpen() and if (seconds < 0) check, silent drop the value, and keep existing getQueryTimeout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the stored field — getQueryTimeout now always returns 0 to reflect the real state, and setQueryTimeout keeps the checkOpen()/seconds < 0 checks but silently drops the value.

if (resultSetConcurrency != ResultSet.CONCUR_READ_ONLY) {
throw new SQLFeatureNotSupportedException(
s"ResultSet concurrency $resultSetConcurrency is not supported; " +
"only CONCUR_READ_ONLY.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resultSetConcurrency is an int, you should stringify it to a readable string, see JdbcErrorUtils.stringify* methods

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added JdbcErrorUtils.stringifyResultSetConcurrency and applied it here.

resultSetHoldability: Int): Statement =
throw new SQLFeatureNotSupportedException
resultSetHoldability: Int): Statement = {
// holdability is ignored
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to override this method? the javadoc clearly states that SQLFeatureNotSupportedException may be thrown – if the JDBC driver does not support this method or this method is not supported for the specified result set type, result set holdability and result set concurrency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Reverted the 3-arg createStatement(type, concurrency, holdability) to throw SQLFeatureNotSupportedException, matching the javadoc and the Hive JDBC driver. Holdability has no meaning without transaction support, so accepting it while ignoring the value was misleading.

override def setFetchSize(rows: Int): Unit =
throw new SQLFeatureNotSupportedException
// stored as a hint; Spark Connect results are forward-only and server-paginated
override def setFetchSize(rows: Int): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to get/setQueryTimeout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Same treatment as get/setQueryTimeout: getFetchSize returns 0 and setFetchSize validates then silently drops the value. Also removed the now-unused DEFAULT_FETCH_SIZE.

override def setFetchDirection(direction: Int): Unit = {
checkOpen()
if (direction != ResultSet.FETCH_FORWARD) {
throw new SQLException(s"Fetch direction $direction is not supported.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: stringify, and please check all places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Applied stringify* here and audited all call sites: the remaining ones (ResultSet fetch direction, Connection transaction isolation / holdability) were already using stringify*. No raw int constants are left in error messages.

j1wonpark added 2 commits June 5, 2026 19:08
…t/fetchSize state

getQueryTimeout and getFetchSize now always return 0 to reflect the real state;
the setters validate and silently drop the value instead of echoing it back.
Remove the now-unused DEFAULT_FETCH_SIZE constant.

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
… in error messages

Use JdbcErrorUtils.stringify* so error messages show readable names instead of
raw int codes. Add stringifyResultSetConcurrency and apply it, along with the
existing stringifyFetchDirection/stringifyResultSetType, to the remaining call
sites in SparkConnectStatement and SparkConnectConnection.

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
}

// SCROLL_INSENSITIVE is accepted but the returned statement is forward-only.
// Mirrors the Hive JDBC driver policy used by the Spark Thrift Server.
Copy link
Copy Markdown
Member

@pan3793 pan3793 Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's correct. If you look at the Hive JDBC driver implementation, you will find it has the corresponding code for SCROLL_INSENSITIVE in ResultSet impl, which is different from FORWARD_ONLY

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Spark Connect has no server-side scrollable cursor (unlike HiveServer2's FETCH_FIRST), so the code accepted SCROLL_INSENSITIVE but silently downgraded it to forward-only, which—as you said—doesn't match Hive. I've changed checkSupportedResultSet to only allow TYPE_FORWARD_ONLY. This also makes it consistent with supportsResultSetType/supportsResultSetConcurrency, which already advertise FORWARD_ONLY + READ_ONLY only. Proper scrollable support would need client-side materialization, so I'd prefer to do it in a follow-up. Does that sound reasonable?

j1wonpark added 2 commits June 5, 2026 20:11
… with holdability

Revert the 3-arg createStatement(type, concurrency, holdability) to throw
SQLFeatureNotSupportedException, matching the JDBC javadoc (which permits it) and
the Hive JDBC driver. Holdability has no meaning without transaction support, so
accepting it while ignoring the value is misleading.

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
… result sets

Reject TYPE_SCROLL_INSENSITIVE instead of accepting it and silently downgrading
to forward-only. Spark Connect has no server-side scrollable cursor, and the
previous code claimed to mirror the Hive JDBC driver while behaving differently
from it (Hive backs SCROLL_INSENSITIVE with distinct ResultSet logic). Scrollable
result sets can be added in a follow-up with proper client-side support.

Signed-off-by: Jiwon Park <jpark92@outlook.kr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants