oplogPopulator: route oplog key through TransformObjectKey SMT when available by delthas · Pull Request #2741 · scality/backbeat

delthas · 2026-05-21T14:41:48Z

Summary

Adds an oplogPopulator.transformObjectKey config flag (default false). When enabled, oplog source connectors are configured with the com.scality.kafka.connect.transforms.TransformObjectKey SMT (shipped by Zenko via ZENKO-5274), keying messages by the raw S3 object key.
extensions/oplogPopulator/constants.js: defaultConnectorConfig (legacy {ns, fullDocument.value.key} key schema) + smtKeyConfig (projects documentKey._id, adds the SMT + key.converter=StringConverter), applied on top when the flag is set.
ConnectorsManager._getDefaultConnectorConfiguration picks the SMT key config from the flag. _processOldConnectors scrubs SMT-only keys from a stale connector config when the flag is off, so a connector that previously ran with the SMT doesn't keep referencing a class that isn't there.
OplogPopulatorConfigValidator: validates the new boolean flag.

Context

BB-768: the oplog Kafka topic keys messages from fullDocument.value.key, which is null on update/delete events (BB-355 removed change.stream.full.document=updateLookup). Those op types collapse onto hash({ns, null}) while inserts spread across partitions, breaking per-object ordering across op types.

The fix lives in the Zenko-side SMT (ZENKO-5274): it derives the key from documentKey._id (always populated) and strips the arsenal master/version encoding, so master and all versions of an S3 object hash to the same partition. This PR is the Backbeat side that configures the connector to use it.

Enablement model

Rather than Backbeat probing the Kafka Connect plugin path at runtime, the SMT is gated by an explicit config flag owned by the operator (ZKOP / zenkoversion CR). The flag is flipped on once the Connect image ships the TransformObjectKey plugin and the supporting Backbeat version is deployed. This keeps Backbeat free of auto-detection/runtime-flip machinery and gives a controlled rollout. A config change goes through the existing reconciliation: the connector is updated in place via PUT /connectors/{name}/config (no recreate, no resume-token loss — the change touches only the key schema + transforms, not the pipeline match stage).

Follow-up

ZKOP / zenko-operator must surface transformObjectKey into the generated Backbeat config (tracked separately).

Issue: BB-768

bert-e · 2026-05-21T14:41:52Z

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

codecov · 2026-05-21T14:43:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.66%. Comparing base (c52fbcc) to head (5bbc166).

Additional details and impacted files

Files with missing lines	Coverage Δ
extensions/oplogPopulator/OplogPopulator.js	`89.14% <ø> (ø)`
...ns/oplogPopulator/OplogPopulatorConfigValidator.js	`100.00% <ø> (ø)`
extensions/oplogPopulator/constants.js	`100.00% <100.00%> (ø)`
...nsions/oplogPopulator/modules/ConnectorsManager.js	`94.32% <100.00%> (+0.25%)`	⬆️

... and 7 files with indirect coverage changes

Components	Coverage Δ
Bucket Notification	`80.22% <ø> (ø)`
Core Library	`80.82% <ø> (-0.16%)`	⬇️
Ingestion	`71.24% <ø> (ø)`
Lifecycle	`79.06% <ø> (ø)`
Oplog Populator	`86.04% <100.00%> (+0.20%)`	⬆️
Replication	`59.71% <ø> (-0.08%)`	⬇️
Bucket Scanner	`85.76% <ø> (ø)`

@@                 Coverage Diff                 @@
##           development/9.5    #2741      +/-   ##
===================================================
- Coverage            74.73%   74.66%   -0.07%     
===================================================
  Files                  199      199              
  Lines                13650    13659       +9     
===================================================
- Hits                 10201    10199       -2     
- Misses                3439     3450      +11     
  Partials                10       10

Flag	Coverage Δ
api:retry	`9.12% <0.00%> (-0.01%)`	⬇️
api:routes	`8.94% <0.00%> (-0.01%)`	⬇️
bucket-scanner	`85.76% <ø> (ø)`
ft_test:queuepopulator	`9.07% <0.00%> (-1.05%)`	⬇️
ingestion	`12.56% <0.00%> (-0.01%)`	⬇️
lib	`7.76% <0.00%> (-0.03%)`	⬇️
lifecycle	`18.98% <0.00%> (-0.02%)`	⬇️
notification	`1.02% <0.00%> (-0.01%)`	⬇️
oplogPopulator	`0.16% <33.33%> (+0.02%)`	⬆️
replication	`18.71% <0.00%> (-0.01%)`	⬇️
unit	`51.26% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

bert-e · 2026-05-21T15:41:08Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

francoisferrand

Do we need to make this dynamic, or should we just let ZKOP enable the plugin depending on backbeat/zenko version (could be a "feature flag" in zenkoversion cr, so it can be enabled when Zenko has both the the plugin, the supporting backbeat version, and we decide it is time) ?
How critical/dangerous is this whole fix (SMT...) ? Should it land in a patch release, or should we be more cautious?

…a config flag Adds an oplogPopulator 'transformObjectKey' config flag (default false). When enabled, oplog source connectors are configured with the com.scality.kafka.connect.transforms.TransformObjectKey SMT (shipped by Zenko via ZENKO-5274), keying messages by the raw S3 object key derived from documentKey._id with the arsenal master/version encoding stripped. Master and all versions of the same S3 object then hash to the same partition, regardless of op type — fixing BB-768, where update/delete events collapse onto one partition because fullDocument (today's key source) is null on those op types. The flag is owned by the operator (ZKOP / zenkoversion CR) and flipped on once the Kafka Connect image ships the TransformObjectKey plugin and the supporting Backbeat version is deployed — rather than Backbeat probing the Connect plugin path at runtime. This keeps Backbeat free of the auto-detection / runtime-flip machinery and gives a controlled rollout. * extensions/oplogPopulator/constants: defaultConnectorConfig (legacy key schema) + smtKeyConfig (documentKey._id projection + transforms + key.converter), applied on top when the flag is set. * ConnectorsManager: _getDefaultConnectorConfiguration picks the SMT key config from the flag; _processOldConnectors scrubs SMT-only keys from a stale oldConfig when the flag is off (so a connector that previously ran with the SMT doesn't keep referencing a missing class). * OplogPopulatorConfigValidator: validate the new boolean flag. Issue: BB-768

bert-e · 2026-05-27T09:44:24Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

claude · 2026-05-27T09:47:36Z

PR description vs. implementation mismatch: The Summary section describes adding getConnectorPlugins() to KafkaConnectWrapper, runtime probing for the SMT plugin on each reconciliation tick, an _smtAvailable flag, and automatic SMT activation without operator action. None of this exists in the diff — the actual implementation uses a static transformObjectKey config flag that the operator must set manually and restart Backbeat. The Context section's claim that "an upgrade rollout that bumps the Kafka Connect image after Backbeat auto-activates the SMT config on the very next tick, without operator action" does not match the code. Either update the PR description to reflect the simpler config-flag approach, or the probing logic was intended but is missing from this PR.
- Code is otherwise clean: config spread order is correct, SMT-key scrubbing on downgrade handles the legacy restore properly (output.schema.key is overwritten by the default while SMT-only keys are deleted), test coverage for both directions (enable/disable) is solid, and the Joi schema + validator tests are adequate.

Review by Claude Code

delthas · 2026-05-27T09:53:30Z

Pivoted per your first point: dropped the runtime /connector-plugins probe and the auto-flip machinery entirely. The SMT is now gated by an oplogPopulator.transformObjectKey config flag (default off), which ZKOP/zenkoversion flips on once the Connect image ships the plugin and the supporting Backbeat is deployed. This also dissolves the _applyKeyConfig / smtOnlyConfigKeys complexity flagged inline — constants.js is now just defaultConnectorConfig + smtKeyConfig (the base + additions shape you suggested).

Per your second point: retargeted to development/9.5 only (dropped the 9.4.1 patch); fix version updated to 9.5.0.

Follow-up: the zenkoversion CR → Backbeat config is tracked in ZKOP-553.

francoisferrand · 2026-05-27T10:29:47Z

+        // SMT so the oplog message key is the raw S3 object key. Enabled via
+        // the oplogPopulator 'transformObjectKey' config flag, set by the
+        // operator once Kafka Connect ships the TransformObjectKey plugin.
+        this._transformObjectKey = params.transformObjectKey || false;


defaulting is not needed here, already handling by joi ?

francoisferrand · 2026-05-27T10:43:01Z

+    'value.converter': 'org.apache.kafka.connect.storage.StringConverter',
+    // Kafka message key config (legacy).
+    // The key schema projects {ns.coll, fullDocument.value.key}. fullDocument
+    // is null on update/delete events (BB-355 removed updateLookup), so for


c.f. https://scality.atlassian.net/browse/BB-768?focusedCommentId=477005 :

we always pass the full document (AFAIK) on update, so the key should be available (but maybe not in "fullDocument")

the delete event is not used AFAIK (we generate an update with the previous document right before it), and may be dropped instead

→ is there a simpler (as in "less maitenance") fix?

delthas · 2026-05-29T10:24:42Z

Replaced by #2744

…line The current Kafka message key projects fullDocument.value.key, which is null on update events since BB-355 removed change.stream.full.document=updateLookup. Every update for a given bucket therefore serialises to the same key Struct ({ns:{coll:<bucket>}, fullDocument:null}) and lands on one partition, breaking per-object ordering across op types (insert vs update) and pinning a hot bucket's update traffic to a single partition (a blocker for the oplog scaling work tracked in BB-756). Fix: append a $addFields stage to the MongoDB connector's change-stream pipeline that synthesises a top-level s3Key field by $ifNull coalescing $fullDocument.value.key and $updateDescription.updatedFields.value.key. The connector's output.schema.key is replaced with a single-field Avro record {s3Key: [string, null]} so the Kafka message key is just that field. All events for the same logical S3 object — insert, master/version updates, replication-status updates, delete-marker updates — yield the same s3Key value and hash to the same partition. Master and version documents share the same value.key, so master/version events also collapse to the same partition without prefix-stripping. The $ifNull variant relies on metadata always writing the whole 'value' subdocument (full $set). Confirmed against arsenal: there is no partial dotted-$set path for object MD today. A hypothetical future partial $set would not populate updateDescription.updatedFields.value.key and that event would mis-partition — accepted risk per the ticket discussion. The change is propagated to existing connectors via the existing in-place PUT /connectors/{name}/config reconciliation (no recreate, no resume-token loss — the change touches the key schema + pipeline only, not the pipeline match stage). Downstream oplog consumers do not read the Kafka message key, so the new key shape is consumer-transparent. History / discussion: https://scality.atlassian.net/browse/BB-768?focusedCommentId=477122 This supersedes the earlier SMT-based approach (a Kafka Connect Single Message Transform deriving the key from documentKey._id with master/version prefix-stripping), which would have spanned three repos and added a new Java artifact + operator feature flag. Closed in favour of this single-repo fix after a per-event-cost measurement showed the $addFields adds ~600–900 ns/event ≈ ~1–2% of one core at 20k ops/s on the mongod — not a throughput concern at our target rates. Superseded work (to be closed): #2741 (SMT-track Backbeat PR) scality/Zenko#2410 (ZENKO-5274 — Java SMT in kafka-connect image) https://scality.atlassian.net/browse/ZKOP-553 (operator feature flag — no longer needed) Issue: BB-768

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js

delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 650dd4e to 396d23b Compare May 21, 2026 15:05

claude Bot reviewed May 21, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated

scality deleted a comment from claude Bot May 21, 2026

delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 396d23b to 55af2e9 Compare May 21, 2026 15:41

scality deleted a comment from bert-e May 21, 2026

scality deleted a comment from claude Bot May 21, 2026

delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 55af2e9 to 3f61602 Compare May 21, 2026 15:46

scality deleted a comment from claude Bot May 21, 2026

delthas marked this pull request as ready for review May 22, 2026 15:36

delthas requested a review from francoisferrand May 22, 2026 15:40

francoisferrand reviewed May 26, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated

francoisferrand reviewed May 26, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated

delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from 3f61602 to aa29999 Compare May 27, 2026 09:38

delthas changed the base branch from development/9.4 to development/9.5 May 27, 2026 09:38

claude Bot reviewed May 27, 2026

View reviewed changes

Comment thread extensions/oplogPopulator/modules/ConnectorsManager.js Outdated

delthas force-pushed the improvement/BB-768/oplog-partition-by-object-key branch from aa29999 to 5bbc166 Compare May 27, 2026 09:43

scality deleted a comment from claude Bot May 27, 2026

scality deleted a comment from bert-e May 27, 2026

francoisferrand reviewed May 27, 2026

View reviewed changes

delthas closed this May 29, 2026

delthas mentioned this pull request May 29, 2026

oplogPopulator: synthesise oplog key via $addFields in connector pipeline #2744

Open

Conversation

delthas commented May 21, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Enablement model

Follow-up

Uh oh!

bert-e commented May 21, 2026

Hello delthas,

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bert-e commented May 21, 2026

Request integration branches

Uh oh!

francoisferrand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bert-e commented May 27, 2026

Waiting for approval

Uh oh!

claude Bot commented May 27, 2026

Uh oh!

delthas commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

francoisferrand May 27, 2026

Choose a reason for hiding this comment

Uh oh!

francoisferrand May 27, 2026

Choose a reason for hiding this comment

Uh oh!

delthas commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

delthas commented May 21, 2026 •

edited by atlassian Bot

Loading

codecov Bot commented May 21, 2026 •

edited

Loading

delthas commented May 27, 2026 •

edited

Loading