Skip to content

test: stabilize nightly stream and TLS specs#2994

Closed
He-Pin wants to merge 3 commits into
apache:mainfrom
He-Pin:fix/jdk25-input-stream-source-tck-shutdown
Closed

test: stabilize nightly stream and TLS specs#2994
He-Pin wants to merge 3 commits into
apache:mainfrom
He-Pin:fix/jdk25-input-stream-source-tck-shutdown

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented May 25, 2026

No description provided.

…eTest for JDK 25 virtualized nightly

Motivation:
JDK 25 nightly runs abort the stream TCK with `Failed to stop
[InputStreamSourceTest] within [40000 milliseconds]` after the
CoordinatedShutdown `actor-system-terminate` phase times out at its
default 10 seconds. The dump shows two `flow-X-0-take` ActorGraphInterpreter
children stuck mid-termination under the StreamSupervisor.

The test feeds a CPU-busy `InputStream` whose `read()` always returns a
fresh byte without blocking or yielding, so each `onPull` runs up to
`chunkSize` synchronous `read()` calls. The nightly JDK 25 build forces
`pekko.test.stream-dispatcher.fork-join-executor.virtualize=on`, which is
the very dispatcher the test pins via `ActorAttributes.dispatcher(...)`.
On a virtualized dispatcher this combination slows cancellation
propagation through `take(elements)` enough that the 10 second phase
timeout fires before the lingering flow actors finish terminating, even
though the outer `ActorSystemLifecycle.shutdownTimeout` is already scaled
to 40 seconds by `pekko.test.timefactor`.

Modification:
Override `additionalConfig` in `InputStreamSourceTest` to extend
`pekko.coordinated-shutdown.phases.actor-system-terminate.timeout` to
30 seconds, mirroring the pattern already used in
`MixedProtocolClusterSpec` for the same JDK 25 virtualized failure mode.
The override layers on top of `PekkoPublisherVerification.additionalConfig`
via `withFallback` so existing buffer-size settings are preserved.

Result:
The phase has enough headroom to drain in-flight cancellation traffic on
virtualized dispatchers before the outer shutdown await fires. Verified
locally on JDK 25 (Oracle OpenJDK 25.0.2) with the same virtualize/timefactor
flags as `nightly-builds.yml`: `sbt "project stream-tests-tck"
"testOnly org.apache.pekko.stream.tck.InputStreamSourceTest"` reports
26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber
specs).

References:
nightly-builds.yml `jdk-nightly-build` matrix entry javaVersion=25
@He-Pin He-Pin requested a review from pjfanning May 27, 2026 08:58
Motivation:
Recent nightly builds fail repeatedly on JDK 21/25 in stream TCK and TLS rotating-key tests.

Modification:
Make InputStreamSourceTest model the TCK element count directly by emitting one byte per ByteString without relying on take(elements) cancellation. Allow RotatingKeysSSLEngineProviderSpec.contact to ignore retry ActorIdentity messages while waiting for the echo response.

Result:
The affected specs no longer fail when delayed Identify responses or JDK 25 virtualized test-stream-dispatcher scheduling occur.

Tests:
- scalafmt --mode diff-ref=origin/main
- scalafmt --list --mode diff-ref=origin/main
- git diff --check
- sbt with JDK 25 nightly-style virtualized dispatcher flags: stream-tests-tck / Test / testOnly org.apache.pekko.stream.tck.InputStreamSourceTest; remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec
- sbt with JDK 21 nightly-style virtualized dispatcher flags: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec
- sbt with JDK 25 nightly-style virtualized test-stream-dispatcher flags: stream-tests-tck / Test / testOnly org.apache.pekko.stream.tck.InputStreamSourceTest

References:
None - nightly-builds.yml failure analysis
@He-Pin He-Pin changed the title test: extend actor-system-terminate phase timeout in InputStreamSourceTest for JDK 25 nightly test: stabilize nightly stream and TLS specs May 27, 2026
Motivation:
Recent nightly builds repeatedly time out in TlsGraphStageEdgeCasesSpec under JDK 25 while running early-cancellation TLS edge cases.

Modification:
Have collectExactly materialize a KillSwitch and watch stream termination, then shut the stream down after collecting the expected bytes so repeated early-cancellation tests do not leave previous TLS materializations draining in the same actor system.

Result:
The TLS edge case suite no longer accumulates lingering TlsGraphStage/headOptionSink actors during repeated early-cancellation checks.

Tests:
- scalafmt --mode diff-ref=origin/main
- scalafmt --list --mode diff-ref=origin/main
- git diff --check
- sbt with JDK 25 nightly-style virtualized test-stream-dispatcher flags: stream-tests / Test / testOnly org.apache.pekko.stream.io.TlsGraphStageEdgeCasesSpec

References:
None - nightly-builds.yml failure analysis
@He-Pin
Copy link
Copy Markdown
Member Author

He-Pin commented May 28, 2026

Superseded by smaller single-commit PRs for easier review:

Keeping the fixes split so each failure domain can be reviewed independently.

@He-Pin
Copy link
Copy Markdown
Member Author

He-Pin commented May 28, 2026

Closing as superseded by the split single-commit PRs listed above.

@He-Pin He-Pin closed this May 28, 2026
He-Pin added a commit that referenced this pull request May 28, 2026
Motivation:
JDK 25 nightly builds time out in repeated TlsGraphStageEdgeCasesSpec early-cancellation scenarios because earlier materializations can keep draining after the expected bytes have been collected.

Modification:
Materialize collectExactly with a KillSwitch and watch stream termination, then shut down and await the stream in finally after the expected bytes are collected.

Result:
Repeated TLS edge-case checks do not leave prior materializations running in the same actor system.

Tests:
- JDK 25 nightly-style virtualized stream-dispatcher flags: stream-tests / Test / testOnly org.apache.pekko.stream.io.TlsGraphStageEdgeCasesSpec
- scalafmt --mode diff-ref=origin/main --quiet
- scalafmt --list --mode diff-ref=origin/main
- git diff --check

References:
Refs #2994
He-Pin added a commit that referenced this pull request May 29, 2026
Motivation:
RotatingKeysSSLEngineProviderSpec can receive a delayed ActorIdentity from an earlier Identify attempt after the target actor ref has already been resolved. Nightly retry policy still fails builds when that stale identity arrives before the ping reply.

Modification:
Wait for the expected ping with fishForMessage and ignore late ActorIdentity messages while keeping the same per-attempt timeout budget.

Result:
The test accepts the intended ping reply without being failed by harmless delayed Identify responses.

Tests:
- JDK 21: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec
- JDK 25: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec (fails on existing JDK 25 EKU certificate validation behavior)
- scalafmt --mode diff-ref=origin/main --quiet
- scalafmt --list --mode diff-ref=origin/main
- git diff --check

References:
Refs #2994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant