Skip to content

[GSoC 2026] Kafka Streams runner skeleton module + portable entry points#38534

Open
junaiddshaukat wants to merge 4 commits into
apache:feat/18479-kafka-streams-runner-skeletonfrom
junaiddshaukat:feat/kafka-streams-runner
Open

[GSoC 2026] Kafka Streams runner skeleton module + portable entry points#38534
junaiddshaukat wants to merge 4 commits into
apache:feat/18479-kafka-streams-runner-skeletonfrom
junaiddshaukat:feat/kafka-streams-runner

Conversation

@junaiddshaukat
Copy link
Copy Markdown
Contributor

@junaiddshaukat junaiddshaukat commented May 19, 2026

Summary

  • Adds runners/kafka-streams Gradle module wired into settings + root build.
  • Initial runner surface: KafkaStreamsRunner, options, registrar, job server driver, job invoker, translation context.
  • Translator stub fails fast with No translator registered for URN ... for unsupported transforms.
  • Adds CHANGES.md entry.

Validation

  • ./gradlew :runners:kafka-streams:compileJava
  • ./gradlew :runners:kafka-streams:check

Scope

Skeleton plumbing only. No transform translators, watermark manager, or state implementation yet.
Targets the GSoC integration branch feat/18479-kafka-streams-runner-skeleton.

Fixes #38465

cc @je-ik

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for a new Apache Beam runner that utilizes Apache Kafka Streams. It focuses on setting up the necessary build infrastructure and initial Java classes to define the runner's behavior, including job submission, pipeline options, and a basic translation framework. The current stage provides a functional skeleton that can identify unsupported pipeline operations, preparing the module for incremental feature development.

Highlights

  • New Kafka Streams Runner Module: Added a new Gradle module for the Kafka Streams runner, integrating it into the build system and CHANGES.md.
  • Initial Runner Components: Introduced core components for the Kafka Streams runner, including KafkaStreamsRunner, KafkaStreamsPipelineOptions, KafkaStreamsJobServerDriver, and KafkaStreamsJobInvoker.
  • Fail-Fast Translator Skeleton: Implemented a skeleton pipeline translator that explicitly fails fast with an 'unsupported URN' message for transforms not yet implemented, facilitating future development.
  • Portable Entry Points: Established portable job server and runner entry points for the new Kafka Streams runner.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new runners/kafka-streams Gradle module to Apache Beam, providing a portable job server and runner entry points for Kafka Streams. The implementation includes the job invoker, server driver, pipeline options, and a basic translator that currently validates the pipeline graph and fails fast for unsupported transforms. Review feedback focuses on improving the robustness of resource management, specifically suggesting the use of finally blocks to ensure the embedded job server is stopped during pipeline cancellation or completion. Additionally, the reviewer identified potential resource collisions when using default values for the Kafka Streams application ID and state directory in concurrent execution environments.

@github-actions
Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @kennknowles added as fallback since no labels match configuration

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

- cancel(): wrap delegate.cancel() in try/finally so the embedded job
  server is always stopped, even if cancellation throws IOException.
- waitUntilFinish(Duration): only stop the job server when the returned
  state is terminal, so a timed-out wait does not prematurely kill the
  job server while the pipeline is still running.
- waitUntilFinish(): wrap in try/finally for the same defensive cleanup
  reason as cancel().
- KafkaStreamsPipelineOptions.StateDirDefaultFactory: include the job
  name in the default Kafka Streams state directory so that multiple
  pipelines on the same host (e.g. parallel tests) do not collide and
  hit a LockException.
@je-ik je-ik self-requested a review May 20, 2026 06:01
Copy link
Copy Markdown
Contributor

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good, but three concerns (besides my added notes) remain:
a) please remove the nullness suppression from the new code. This is used in the codebase for legacy code that should be ported when updated. We should not add this to new code, we should instead make the code compliant with the check.
b) virtually every PR should contain tests. There are exceptions to this, but this PR introduces concepts that can be run in tests (although will throw exception). We can assert that the code throws the exception we expect and when we change the implementation then we update the tests.
c) ideally, we should integrate running our tests in PreCommit check and make it run when we update the kafka-streams-runner path

import org.slf4j.LoggerFactory;

/** Driver that starts a Beam job server for the Kafka Streams portable runner. */
@SuppressWarnings({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding new code, we should not suppress these warnings as that way we would create fresh technical debt in the code base. This applies to all classes.

@Description(
"Kafka Streams processing.guarantee setting, for example at_least_once or exactly_once_v2.")
@Default.String("exactly_once_v2")
String getProcessingGuarantee();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, this was on the previous review, I'd remove this, as the guarantee is required for the Beam model to work.

@github-actions github-actions Bot added the build label May 21, 2026
@junaiddshaukat
Copy link
Copy Markdown
Contributor Author

@je-ik I have pushed a follow-up commit addressing your three concerns :

(a) dropped all file-level nullness suppressions; made the classes
checker-compliant. One narrow one-line suppression remains on a
private helper wrapping setRunner(null) — that's the same Beam idiom
Flink/Spark use, happy to refactor if you'd rather avoid it.
Also removed the processingGuarantee option per your inline note.

(b) added three test classes including one that asserts the
translator throws UnsupportedOperationException with "No translator
registered for URN ..." on a one-transform Impulse pipeline. These
get replaced incrementally as real translators land.

(c) added a PreCommit workflow that runs
:runners:kafka-streams:build on changes under runners/kafka-streams/**,
plus the matching trigger file and registry row.

./gradlew :runners:kafka-streams:check is green locally.
Ready for re-review whenever you have time.

@github-actions github-actions Bot added build and removed build labels May 21, 2026
@junaiddshaukat
Copy link
Copy Markdown
Contributor Author

@je-ik Pushed a follow-up addressing both threads.

dropped the applicationId default and added @Validation.Required, so the validator now fails fast if it's not set instead of letting jobs share a consumer group.

For the setRunner(null) bit: just removed the call (and the small suppression with it). The skeleton translator throws on every URN before anything reaches a harness, so clearing the runner isn't really doing anything yet. I'll bring it back in a non-suppressing way when the first real translator lands and the proto actually goes somewhere.

No @SuppressWarnings left in the new code.

./gradlew :runners:kafka-streams:check is green locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants