Skip to content

DOC-2130 Warn against max_in_flight > 1 with batching processors#426

Open
micheleRP wants to merge 6 commits into
mainfrom
doc-2130-max-in-flight-batching-warning
Open

DOC-2130 Warn against max_in_flight > 1 with batching processors#426
micheleRP wants to merge 6 commits into
mainfrom
doc-2130-max-in-flight-batching-warning

Conversation

@micheleRP
Copy link
Copy Markdown
Contributor

@micheleRP micheleRP commented May 6, 2026

Summary

  • Adds a NOTE to the max_in_flight reference on every batched output (46 connectors registered via MustRegisterBatchOutput) explaining that values > 1 alongside a batching block with processors risk shipping raw, unprocessed messages to the output if a batching processor errors at runtime.
  • Per CON-461 the underlying behavior is in shared benthos framework code; until the next Connect v5 major can fix it, every batched output gets the same advisory.
  • One source change in docs-data/overrides.json plus the regenerated reference partials. Cloud Connect docs single-source from this repo, so this PR covers both sites.

Resolves DOC-2130.

Approved copy

NOTE: Set max_in_flight: 1 when a batching block with processors is configured on this output. If a batching processor (for example parquet_encode, archive, or compress) errors at runtime, Redpanda Connect can write raw, unprocessed messages to the output instead of the encoded batch, producing corrupt downstream data (for example, a .parquet file containing raw JSON). Because the batching processor is single threaded, higher values do not improve performance.

How the override is wired

  • New shared definitions.max_in_flight_batched description.
  • 32 batched outputs that had no prior max_in_flight override now $ref it.
  • elasticsearch_v8 and elasticsearch_v9 repointed from the removed elasticsearch_max_in_flight definition to the new shared one.
  • 12 connectors with bespoke max_in_flight prose (couchbase, cypher, gcp_bigquery, kafka_franz, mongodb, questdb, redpanda, redpanda_common, redpanda_migrator, salesforce_sink, snowflake_streaming, sql_raw) get the NOTE appended in place.
  • ockam_kafka exposes the field nested as kafka.max_in_flight; the NOTE is appended to that nested description rather than added at the top level.
  • Two batched outputs (zmq4, gcp_bigquery_write_api) don't expose a public max_in_flight field in the generated reference, so no override is applied.

Preview links (representative sample)

Test plan

  • jq . docs-data/overrides.json > /dev/null passes
  • npx doc-tools generate rpcn-connector-docs regenerates 44 output partials with the NOTE
  • Local npm run build succeeds; rendered HTML shows the NOTE as a styled admonitionblock note
  • Spot-checked: aws_s3 ($ref only), redpanda_migrator (multi-paragraph custom + appended NOTE), elasticsearch_v8 (repointed $ref), kafka_franz (single-line custom + appended NOTE), ockam_kafka (nested kafka.max_in_flight)
  • Reviewer to verify wording on a few additional pages (any Connect output reference under outputs/)

🤖 Generated with Claude Code

@micheleRP micheleRP requested a review from a team as a code owner May 6, 2026 00:05
@netlify
Copy link
Copy Markdown

netlify Bot commented May 6, 2026

Deploy Preview for redpanda-connect ready!

Name Link
🔨 Latest commit ffa56af
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-connect/deploys/6a03a90d6c6c3d00084ca61b
😎 Deploy Preview https://deploy-preview-426--redpanda-connect.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9630cf00-5663-4b1b-89d7-6aab32bd5da5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request updates documentation and configuration metadata for Connect 4.89.0. It introduces new public components, processors, inputs, and outputs (including gcp_bigquery_write_api, a2a_message, ffi processors, tigerbeetle_cdc and zmq4 inputs/outputs, and open_telemetry_collector metrics/tracers). The primary focus is standardizing max_in_flight field documentation across 50+ output configurations, replacing inline descriptions with references to a batched definition and adding consistent warnings about data corruption risks when batching processors are misconfigured. Platform transitions and binary analysis metadata are also updated.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

The changes span many files (70+) but follow highly repetitive patterns: documentation additions that are consistent across output types, schema reference updates to max_in_flight_batched, and metadata entries in JSON. The functional scope is straightforward—no complex logic, control flow changes, or intricate interactions—making this a low-complexity, pattern-based update despite the file count.

Possibly related PRs

Suggested reviewers

  • josephwoodward
  • Jeffail
  • JakeSCahill
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding warnings about using max_in_flight > 1 with batching processors across batched outputs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the purpose of adding a NOTE to max_in_flight references across batched outputs, details the implementation approach, includes approved copy, and explains the test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch doc-2130-max-in-flight-batching-warning

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@micheleRP micheleRP requested a review from rjustice-rp May 6, 2026 00:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs-data/overrides.json`:
- Around line 90-91: The description for the "max_in_flight_batched" override is
inaccurate—change its opening sentence to refer to "message batches" rather than
"messages"; edit the "description" value for the max_in_flight_batched object so
it begins "The maximum number of message batches to have in flight at a given
time." and keep the rest of the explanatory note intact to preserve guidance
about batching processors and max_in_flight behavior.
- Around line 6291-6299: Remove the spurious `max_in_flight` override from the
zmq4 output entry: locate the JSON object with "summary": "Writes messages to a
ZeroMQ socket." and delete the child entry whose "name" is "max_in_flight" and
"$ref" is "#/definitions/max_in_flight_batched" so the generated reference for
zmq4 no longer exposes an option users cannot set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 622cb7fc-c6ed-4f53-8db5-4cc9b89b3962

📥 Commits

Reviewing files that changed from the base of the PR and between 5d68a60 and f62df66.

📒 Files selected for processing (47)
  • docs-data/connect-diff-4.88.0_to_4.89.0.json
  • docs-data/overrides.json
  • modules/components/attachments/connect-4.89.0.json
  • modules/components/partials/fields/outputs/arc.adoc
  • modules/components/partials/fields/outputs/aws_dynamodb.adoc
  • modules/components/partials/fields/outputs/aws_kinesis.adoc
  • modules/components/partials/fields/outputs/aws_kinesis_firehose.adoc
  • modules/components/partials/fields/outputs/aws_s3.adoc
  • modules/components/partials/fields/outputs/aws_sqs.adoc
  • modules/components/partials/fields/outputs/azure_cosmosdb.adoc
  • modules/components/partials/fields/outputs/azure_queue_storage.adoc
  • modules/components/partials/fields/outputs/azure_table_storage.adoc
  • modules/components/partials/fields/outputs/cassandra.adoc
  • modules/components/partials/fields/outputs/couchbase.adoc
  • modules/components/partials/fields/outputs/cyborgdb.adoc
  • modules/components/partials/fields/outputs/cypher.adoc
  • modules/components/partials/fields/outputs/elasticsearch_v8.adoc
  • modules/components/partials/fields/outputs/elasticsearch_v9.adoc
  • modules/components/partials/fields/outputs/gcp_bigquery.adoc
  • modules/components/partials/fields/outputs/gcp_cloud_storage.adoc
  • modules/components/partials/fields/outputs/gcp_pubsub.adoc
  • modules/components/partials/fields/outputs/hdfs.adoc
  • modules/components/partials/fields/outputs/iceberg.adoc
  • modules/components/partials/fields/outputs/kafka.adoc
  • modules/components/partials/fields/outputs/kafka_franz.adoc
  • modules/components/partials/fields/outputs/mongodb.adoc
  • modules/components/partials/fields/outputs/ockam_kafka.adoc
  • modules/components/partials/fields/outputs/opensearch.adoc
  • modules/components/partials/fields/outputs/otlp_grpc.adoc
  • modules/components/partials/fields/outputs/otlp_http.adoc
  • modules/components/partials/fields/outputs/pinecone.adoc
  • modules/components/partials/fields/outputs/pusher.adoc
  • modules/components/partials/fields/outputs/qdrant.adoc
  • modules/components/partials/fields/outputs/questdb.adoc
  • modules/components/partials/fields/outputs/redis_list.adoc
  • modules/components/partials/fields/outputs/redis_pubsub.adoc
  • modules/components/partials/fields/outputs/redis_streams.adoc
  • modules/components/partials/fields/outputs/redpanda.adoc
  • modules/components/partials/fields/outputs/redpanda_common.adoc
  • modules/components/partials/fields/outputs/redpanda_migrator.adoc
  • modules/components/partials/fields/outputs/salesforce_sink.adoc
  • modules/components/partials/fields/outputs/snowflake_put.adoc
  • modules/components/partials/fields/outputs/snowflake_streaming.adoc
  • modules/components/partials/fields/outputs/splunk_hec.adoc
  • modules/components/partials/fields/outputs/sql.adoc
  • modules/components/partials/fields/outputs/sql_insert.adoc
  • modules/components/partials/fields/outputs/sql_raw.adoc

Comment thread docs-data/overrides.json Outdated
Comment thread docs-data/overrides.json Outdated
@rjustice-rp
Copy link
Copy Markdown

Given this significantly expands the comments on max_in_flight past what I previously imaged for just the aws_s3 output for self-hosted and cloud, to what we see below, it's now a pretty big change in docs and very noticeable. I'm requesting @Jeffail and @josephwoodward confirm this goes across all these outputs and connectors below. Thanks claude for being thorough :)

  • 32 batched outputs that had no prior max_in_flight override now $ref it.
  • 12 connectors with bespoke max_in_flight prose (couchbase, cypher, gcp_bigquery, kafka_franz, mongodb, questdb, redpanda, redpanda_common, redpanda_migrator, salesforce_sink, snowflake_streaming, sql_raw) get the NOTE appended in place.

micheleRP and others added 5 commits May 12, 2026 16:17
Add a NOTE to the max_in_flight reference of every batched output (46
connectors registered via MustRegisterBatchOutput) explaining that
setting max_in_flight > 1 alongside a batching block with processors
risks shipping raw, unprocessed messages to the output if a batching
processor errors at runtime. Per CON-461, the underlying behavior
lives in shared benthos framework code; until the next Connect v5
major can fix it, every affected output gets the same advisory.

Implementation: introduce a shared
definitions.max_in_flight_batched override and \$ref it from the 32
batched outputs that lacked an existing max_in_flight override.
Repoint elasticsearch_v8 and elasticsearch_v9 from the now-removed
elasticsearch_max_in_flight definition to the new shared one. Append
the NOTE to the 12 connectors with bespoke max_in_flight prose
(couchbase, cypher, gcp_bigquery, kafka_franz, mongodb, questdb,
redpanda, redpanda_common, redpanda_migrator, salesforce_sink,
snowflake_streaming, sql_raw) plus ockam_kafka.kafka.max_in_flight
where the field is nested. Cloud Connect docs single-source from
this repo, so the same change covers both sites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Shared `max_in_flight_batched` description now says
  "message batches" instead of "messages" — `max_in_flight` controls
  parallel batches, not parallel messages, matching the upstream
  benthos description correction (commit 77a2bba44).
* Drop the dead `max_in_flight` override on `zmq4`. Although the
  connector source registers via `MustRegisterBatchOutput`, the
  generated configspec does not expose `max_in_flight`, so the
  override would never render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@micheleRP micheleRP force-pushed the doc-2130-max-in-flight-batching-warning branch from ac82c35 to 81b587f Compare May 12, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants