Skip to content

[FEAT] Cancel server-side query when caller context is cancelled (#126)#182

Open
40u5 wants to merge 1 commit into
apache:masterfrom
40u5:fix/issue-126-context-cancel
Open

[FEAT] Cancel server-side query when caller context is cancelled (#126)#182
40u5 wants to merge 1 commit into
apache:masterfrom
40u5:fix/issue-126-context-cancel

Conversation

@40u5
Copy link
Copy Markdown

@40u5 40u5 commented Jun 3, 2026

What changes were proposed in this pull request?

Fixes #126.

  • Adds Interrupt(ctx, interruptType, operationIdOrTag) to the SparkConnectClient interface (spark/client/base/base.go) and implements it on sparkConnectClientImpl (spark/client/client.go), wrapping the proto-level Interrupt RPC and branching on INTERRUPT_TYPE_ALL / TAG / OPERATION_ID.
  • ExecutePlanClient now remembers the caller's context.Context, the operation id, and a back-reference to the client. ToTable arms a watcher goroutine that, on callerCtx.Done(), sends InterruptRequest{OPERATION_ID, operationId} using a detached 10 s context so the cancellation actually reaches the server.
  • Exposes InterruptAll, InterruptTag, and InterruptOperation on SparkSession (spark/sql/sparksession.go) so users can also kill operations explicitly, mirroring PySpark.
  • Updates spark/mocks/mock_executor.go to satisfy the extended interface.

Why are the changes needed?

As reported in #126, dataframe.Collect(ctx) previously dropped the gRPC stream locally when the caller cancelled ctx, but the Spark executor kept running the query until the server's idle timeout (5 min). This addresses all three follow-ups from @grundprinzip's reply on the issue:

  1. Surfaces InterruptTag on SparkSession.
  2. Surfaces InterruptOperation on SparkSession.
  3. Wires up automatic interrupt on client-side context cancellation.

Does this PR introduce any user-facing change?

Yes — additive only:

  • SparkSession gains InterruptAll(ctx), InterruptTag(ctx, tag), and InterruptOperation(ctx, operationId), each returning the server-reported list of interrupted operation ids.
  • Cancelling the context.Context passed to Collect / ExecutePlan / ExecuteCommand now causes the server-side operation to be interrupted instead of running to idle timeout.

No backward-incompatible breakages for users — the SparkConnectClient interface gains a method, but it has no known external implementers; the in-tree mocks.TestExecutor is updated in the same commit.

How was this patch tested?

Added three unit tests in spark/client/client_test.go:

  • TestExecutePlanCancellingContextSendsInterrupt — uses a Recv-blocking stream and verifies that cancelling ctx triggers an InterruptRequest with INTERRUPT_TYPE_OPERATION_ID and the matching operation id within 2 s.
  • TestInterruptAllCallsClient — verifies Interrupt(ALL, "") plumbs through.
  • TestInterruptOperationCallsClient — verifies Interrupt(OPERATION_ID, opID) plumbs through.

go build ./..., go vet ./..., gofmt -l ./spark (no diff), and go test ./spark/... (all packages PASS) run cleanly. The only failure under go test ./... is the existing internal/tests/integration suite, which requires SPARK_HOME to be set and is unrelated to this change.

Fixes apache#126. dataframe.Collect(ctx) previously dropped the gRPC stream
locally on ctx cancellation but left the Spark executor running until
the 5-minute idle timeout. This change:

- Adds Interrupt(ctx, type, idOrTag) to the SparkConnectClient
  interface and to sparkConnectClientImpl, wrapping the existing
  proto-level Interrupt RPC.
- Records the operation id and the caller's context on
  ExecutePlanClient and arms a watcher in ToTable that sends
  InterruptRequest{OPERATION_ID} when the caller's ctx is cancelled.
- Exposes InterruptAll, InterruptTag, and InterruptOperation on
  SparkSession so users can also kill operations explicitly.

Includes regression tests covering ctx-cancellation -> Interrupt and
the new InterruptAll / InterruptOperation paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context cancel doesn't work

1 participant