[GSoC 2026] Kafka Streams runner skeleton module + portable entry points#38534
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request lays the groundwork for a new Apache Beam runner that utilizes Apache Kafka Streams. It focuses on setting up the necessary build infrastructure and initial Java classes to define the runner's behavior, including job submission, pipeline options, and a basic translation framework. The current stage provides a functional skeleton that can identify unsupported pipeline operations, preparing the module for incremental feature development. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new runners/kafka-streams Gradle module to Apache Beam, providing a portable job server and runner entry points for Kafka Streams. The implementation includes the job invoker, server driver, pipeline options, and a basic translator that currently validates the pipeline graph and fails fast for unsupported transforms. Review feedback focuses on improving the robustness of resource management, specifically suggesting the use of finally blocks to ensure the embedded job server is stopped during pipeline cancellation or completion. Additionally, the reviewer identified potential resource collisions when using default values for the Kafka Streams application ID and state directory in concurrent execution environments.
|
Assigning reviewers: R: @kennknowles added as fallback since no labels match configuration Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
- cancel(): wrap delegate.cancel() in try/finally so the embedded job server is always stopped, even if cancellation throws IOException. - waitUntilFinish(Duration): only stop the job server when the returned state is terminal, so a timed-out wait does not prematurely kill the job server while the pipeline is still running. - waitUntilFinish(): wrap in try/finally for the same defensive cleanup reason as cancel(). - KafkaStreamsPipelineOptions.StateDirDefaultFactory: include the job name in the default Kafka Streams state directory so that multiple pipelines on the same host (e.g. parallel tests) do not collide and hit a LockException.
There was a problem hiding this comment.
Generally this looks good, but three concerns (besides my added notes) remain:
a) please remove the nullness suppression from the new code. This is used in the codebase for legacy code that should be ported when updated. We should not add this to new code, we should instead make the code compliant with the check.
b) virtually every PR should contain tests. There are exceptions to this, but this PR introduces concepts that can be run in tests (although will throw exception). We can assert that the code throws the exception we expect and when we change the implementation then we update the tests.
c) ideally, we should integrate running our tests in PreCommit check and make it run when we update the kafka-streams-runner path
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** Driver that starts a Beam job server for the Kafka Streams portable runner. */ | ||
| @SuppressWarnings({ |
There was a problem hiding this comment.
We are adding new code, we should not suppress these warnings as that way we would create fresh technical debt in the code base. This applies to all classes.
| @Description( | ||
| "Kafka Streams processing.guarantee setting, for example at_least_once or exactly_once_v2.") | ||
| @Default.String("exactly_once_v2") | ||
| String getProcessingGuarantee(); |
There was a problem hiding this comment.
Also here, this was on the previous review, I'd remove this, as the guarantee is required for the Beam model to work.
|
@je-ik I have pushed a follow-up commit addressing your three concerns : (a) dropped all file-level nullness suppressions; made the classes (b) added three test classes including one that asserts the (c) added a PreCommit workflow that runs
|
|
@je-ik Pushed a follow-up addressing both threads. dropped the applicationId default and added @Validation.Required, so the validator now fails fast if it's not set instead of letting jobs share a consumer group. For the setRunner(null) bit: just removed the call (and the small suppression with it). The skeleton translator throws on every URN before anything reaches a harness, so clearing the runner isn't really doing anything yet. I'll bring it back in a non-suppressing way when the first real translator lands and the proto actually goes somewhere. No @SuppressWarnings left in the new code. ./gradlew :runners:kafka-streams:check is green locally. |
Summary
runners/kafka-streamsGradle module wired into settings + root build.KafkaStreamsRunner, options, registrar, job server driver, job invoker, translation context.No translator registered for URN ...for unsupported transforms.CHANGES.mdentry.Validation
./gradlew :runners:kafka-streams:compileJava./gradlew :runners:kafka-streams:checkScope
Skeleton plumbing only. No transform translators, watermark manager, or state implementation yet.
Targets the GSoC integration branch
feat/18479-kafka-streams-runner-skeleton.Fixes #38465
cc @je-ik