DOC-2130 Warn against max_in_flight > 1 with batching processors#426
DOC-2130 Warn against max_in_flight > 1 with batching processors#426micheleRP wants to merge 6 commits into
Conversation
✅ Deploy Preview for redpanda-connect ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request updates documentation and configuration metadata for Connect 4.89.0. It introduces new public components, processors, inputs, and outputs (including gcp_bigquery_write_api, a2a_message, ffi processors, tigerbeetle_cdc and zmq4 inputs/outputs, and open_telemetry_collector metrics/tracers). The primary focus is standardizing max_in_flight field documentation across 50+ output configurations, replacing inline descriptions with references to a batched definition and adding consistent warnings about data corruption risks when batching processors are misconfigured. Platform transitions and binary analysis metadata are also updated. Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes The changes span many files (70+) but follow highly repetitive patterns: documentation additions that are consistent across output types, schema reference updates to max_in_flight_batched, and metadata entries in JSON. The functional scope is straightforward—no complex logic, control flow changes, or intricate interactions—making this a low-complexity, pattern-based update despite the file count. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs-data/overrides.json`:
- Around line 90-91: The description for the "max_in_flight_batched" override is
inaccurate—change its opening sentence to refer to "message batches" rather than
"messages"; edit the "description" value for the max_in_flight_batched object so
it begins "The maximum number of message batches to have in flight at a given
time." and keep the rest of the explanatory note intact to preserve guidance
about batching processors and max_in_flight behavior.
- Around line 6291-6299: Remove the spurious `max_in_flight` override from the
zmq4 output entry: locate the JSON object with "summary": "Writes messages to a
ZeroMQ socket." and delete the child entry whose "name" is "max_in_flight" and
"$ref" is "#/definitions/max_in_flight_batched" so the generated reference for
zmq4 no longer exposes an option users cannot set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 622cb7fc-c6ed-4f53-8db5-4cc9b89b3962
📒 Files selected for processing (47)
docs-data/connect-diff-4.88.0_to_4.89.0.jsondocs-data/overrides.jsonmodules/components/attachments/connect-4.89.0.jsonmodules/components/partials/fields/outputs/arc.adocmodules/components/partials/fields/outputs/aws_dynamodb.adocmodules/components/partials/fields/outputs/aws_kinesis.adocmodules/components/partials/fields/outputs/aws_kinesis_firehose.adocmodules/components/partials/fields/outputs/aws_s3.adocmodules/components/partials/fields/outputs/aws_sqs.adocmodules/components/partials/fields/outputs/azure_cosmosdb.adocmodules/components/partials/fields/outputs/azure_queue_storage.adocmodules/components/partials/fields/outputs/azure_table_storage.adocmodules/components/partials/fields/outputs/cassandra.adocmodules/components/partials/fields/outputs/couchbase.adocmodules/components/partials/fields/outputs/cyborgdb.adocmodules/components/partials/fields/outputs/cypher.adocmodules/components/partials/fields/outputs/elasticsearch_v8.adocmodules/components/partials/fields/outputs/elasticsearch_v9.adocmodules/components/partials/fields/outputs/gcp_bigquery.adocmodules/components/partials/fields/outputs/gcp_cloud_storage.adocmodules/components/partials/fields/outputs/gcp_pubsub.adocmodules/components/partials/fields/outputs/hdfs.adocmodules/components/partials/fields/outputs/iceberg.adocmodules/components/partials/fields/outputs/kafka.adocmodules/components/partials/fields/outputs/kafka_franz.adocmodules/components/partials/fields/outputs/mongodb.adocmodules/components/partials/fields/outputs/ockam_kafka.adocmodules/components/partials/fields/outputs/opensearch.adocmodules/components/partials/fields/outputs/otlp_grpc.adocmodules/components/partials/fields/outputs/otlp_http.adocmodules/components/partials/fields/outputs/pinecone.adocmodules/components/partials/fields/outputs/pusher.adocmodules/components/partials/fields/outputs/qdrant.adocmodules/components/partials/fields/outputs/questdb.adocmodules/components/partials/fields/outputs/redis_list.adocmodules/components/partials/fields/outputs/redis_pubsub.adocmodules/components/partials/fields/outputs/redis_streams.adocmodules/components/partials/fields/outputs/redpanda.adocmodules/components/partials/fields/outputs/redpanda_common.adocmodules/components/partials/fields/outputs/redpanda_migrator.adocmodules/components/partials/fields/outputs/salesforce_sink.adocmodules/components/partials/fields/outputs/snowflake_put.adocmodules/components/partials/fields/outputs/snowflake_streaming.adocmodules/components/partials/fields/outputs/splunk_hec.adocmodules/components/partials/fields/outputs/sql.adocmodules/components/partials/fields/outputs/sql_insert.adocmodules/components/partials/fields/outputs/sql_raw.adoc
|
Given this significantly expands the comments on max_in_flight past what I previously imaged for just the aws_s3 output for self-hosted and cloud, to what we see below, it's now a pretty big change in docs and very noticeable. I'm requesting @Jeffail and @josephwoodward confirm this goes across all these outputs and connectors below. Thanks claude for being thorough :)
|
Add a NOTE to the max_in_flight reference of every batched output (46 connectors registered via MustRegisterBatchOutput) explaining that setting max_in_flight > 1 alongside a batching block with processors risks shipping raw, unprocessed messages to the output if a batching processor errors at runtime. Per CON-461, the underlying behavior lives in shared benthos framework code; until the next Connect v5 major can fix it, every affected output gets the same advisory. Implementation: introduce a shared definitions.max_in_flight_batched override and \$ref it from the 32 batched outputs that lacked an existing max_in_flight override. Repoint elasticsearch_v8 and elasticsearch_v9 from the now-removed elasticsearch_max_in_flight definition to the new shared one. Append the NOTE to the 12 connectors with bespoke max_in_flight prose (couchbase, cypher, gcp_bigquery, kafka_franz, mongodb, questdb, redpanda, redpanda_common, redpanda_migrator, salesforce_sink, snowflake_streaming, sql_raw) plus ockam_kafka.kafka.max_in_flight where the field is nested. Cloud Connect docs single-source from this repo, so the same change covers both sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Shared `max_in_flight_batched` description now says "message batches" instead of "messages" — `max_in_flight` controls parallel batches, not parallel messages, matching the upstream benthos description correction (commit 77a2bba44). * Drop the dead `max_in_flight` override on `zmq4`. Although the connector source registers via `MustRegisterBatchOutput`, the generated configspec does not expose `max_in_flight`, so the override would never render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ac82c35 to
81b587f
Compare
Summary
max_in_flightreference on every batched output (46 connectors registered viaMustRegisterBatchOutput) explaining that values > 1 alongside abatchingblock withprocessorsrisk shipping raw, unprocessed messages to the output if a batching processor errors at runtime.docs-data/overrides.jsonplus the regenerated reference partials. Cloud Connect docs single-source from this repo, so this PR covers both sites.Resolves DOC-2130.
Approved copy
How the override is wired
definitions.max_in_flight_batcheddescription.max_in_flightoverride now$refit.elasticsearch_v8andelasticsearch_v9repointed from the removedelasticsearch_max_in_flightdefinition to the new shared one.max_in_flightprose (couchbase,cypher,gcp_bigquery,kafka_franz,mongodb,questdb,redpanda,redpanda_common,redpanda_migrator,salesforce_sink,snowflake_streaming,sql_raw) get the NOTE appended in place.ockam_kafkaexposes the field nested askafka.max_in_flight; the NOTE is appended to that nested description rather than added at the top level.zmq4,gcp_bigquery_write_api) don't expose a publicmax_in_flightfield in the generated reference, so no override is applied.Preview links (representative sample)
$ref-only): the original ticket scope$ref-only case$refTest plan
jq . docs-data/overrides.json > /dev/nullpassesnpx doc-tools generate rpcn-connector-docsregenerates 44 output partials with the NOTEnpm run buildsucceeds; rendered HTML shows the NOTE as a styledadmonitionblock noteaws_s3($refonly),redpanda_migrator(multi-paragraph custom + appended NOTE),elasticsearch_v8(repointed$ref),kafka_franz(single-line custom + appended NOTE),ockam_kafka(nestedkafka.max_in_flight)outputs/)🤖 Generated with Claude Code