Skip to content

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840

Open
flaviofcruz wants to merge 8 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus
Open

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840
flaviofcruz wants to merge 8 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus

Conversation

@flaviofcruz
Copy link
Copy Markdown

@flaviofcruz flaviofcruz commented Mar 3, 2026

Summary

Databricks provides a Zerobus ingest connector [1], a push based API that writes data directly into Unity Catalog Delta tables. This PR introduces a new vector sink that integrates with Databricks, allowing Vector to push data into Databricks. We use the Databricks provided SDK to implement the sink [2].

Zerobus supports row level ingestion and that's we do here. Zerobus also has arrow batch in experimental mode but we didn't add support for it. We will swap the row level ingestion once it becomes stable and that will be the future default.

With row based ingestion, we extended the BatchSerializerConfig to support a batch serializer that creates vector's of protocol buffer bytes. This makes it the second option for doing batch serialization, along arrow batch.

Users do not have to specify the schema at all, we will fetch the schema for them from Unity Catalog and then use on the API. If users want to do schema changes, they should update their table as needed. We don't have a lot of support for dynamic schema changes at the moment.

Vector configuration

[sinks.databricks_zerobus]
type = "databricks_zerobus"
inputs = ["logs"]
ingestion_endpoint = "https://91041497925470.zerobus.us-west-2.cloud.databricks.com"
table_name = "main.default.zerobus_table"
unity_catalog_endpoint = "https://logfood-us-west-2-mt.cloud.databricks.com/"
[sinks.databricks_zerobus.auth]
strategy = "oauth"
client_id = "<client id>"
client_secret = "<secret>"

How did you test this PR?

Unit tests, running small toy examples and using it in production for actual traffic.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

[1] https://docs.databricks.com/aws/en/ingestion/zerobus-overview
[2] https://github.com/databricks/zerobus-sdk

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions Bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Mar 3, 2026
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for ingesting da… feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion Mar 3, 2026
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch 3 times, most recently from 2368e4a to 42bf043 Compare March 12, 2026 17:04
@flaviofcruz flaviofcruz marked this pull request as ready for review March 12, 2026 17:05
@flaviofcruz flaviofcruz requested review from a team as code owners March 12, 2026 17:05
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion feat(sinks): add new databricks_zerobus for Databricks ingestion Mar 12, 2026
@github-actions github-actions Bot added the domain: ci Anything related to Vector's CI environment label Mar 12, 2026
@drichards-87 drichards-87 self-assigned this Mar 12, 2026
@drichards-87 drichards-87 removed their assignment Mar 12, 2026
@pront
Copy link
Copy Markdown
Member

pront commented Apr 3, 2026

Thanks @flaviofcruz for this new integration! Apologies for the slow review on this one.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 436d0da4bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@pront pront added the meta: awaiting author Pull requests that are awaiting their author. label Apr 3, 2026
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 436d0da to 9dcb5d1 Compare April 10, 2026 17:10
@github-actions github-actions Bot removed the meta: awaiting author Pull requests that are awaiting their author. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9dcb5d1e71

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@pront
Copy link
Copy Markdown
Member

pront commented Apr 10, 2026

FYI I am waiting for @hsuanyi to sign the CLA (see comment) before reviewing this further. Also, there is an resolved review comment.

@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

FYI I am waiting for @hsuanyi to sign the CLA (see comment) before reviewing this further. Also, there is an resolved review comment.

@flaviofcruz in case you missed the above, we will require all profiles who contributed to this PR to sign the CLA. Happy to review once that is done.

@hsuanyi
Copy link
Copy Markdown

hsuanyi commented Apr 13, 2026

I have read the CLA Document and I hereby sign the CLA

@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

recheck

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 050409defd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/config.rs Outdated
Comment thread lib/codecs/src/encoding/format/proto_batch.rs
@flaviofcruz
Copy link
Copy Markdown
Author

@pront really appreciate your work for the review. However, I was looking at the zerobus SDK license and the license could be problematic: https://github.com/databricks/zerobus-sdk/blob/main/LICENSE Do you know if this could be a blocker? Let me know if that is.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 050409d to 36a74ef Compare April 13, 2026 22:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36a74ef530

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d7b620bfb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs Outdated
// Remove the stream from the slot so the next retry creates a fresh one,
// then try to close gracefully. Dropping the slot's Arc first means our
// local `stream` may be the sole owner, allowing `Arc::get_mut` to succeed.
self.stream.lock().await.take();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard stream reset against stale concurrent failures

When a retryable ingest error occurs, this unconditionally clears the shared stream slot, even if another task has already replaced it with a fresh stream. In concurrent ingest (which this sink uses), an old in-flight request can fail after recovery and then remove the new healthy stream, causing unnecessary reconnect churn and extra retries. Reset logic should only clear the slot if it still points to the same Arc that failed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a real issue, now checking if the existing stream is the same as the one we hold so that we take it only when that happens.

client_secret: &str,
) -> Result<UnityCatalogTableSchema, ZerobusSinkError> {
let http_client =
HttpClient::new(TlsSettings::default(), &ProxyConfig::default()).map_err(|e| {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use configured proxy for Unity Catalog schema fetches

Schema discovery always constructs its HTTP client with ProxyConfig::default(), so user-configured proxy settings from sink context are ignored. In environments that require egress via proxy, schema.type = "unity_catalog" can fail at startup even though the sink is otherwise configured correctly. This should use the sink/runtime proxy configuration instead of hardcoded defaults.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the UC call, sadly Zerobus SDK doesn't seem to support changing a separate ProxyConfig.

@flaviofcruz
Copy link
Copy Markdown
Author

@pront really appreciate your work for the review. However, I was looking at the zerobus SDK license and the license could be problematic: databricks/zerobus-sdk@main/LICENSE Do you know if this could be a blocker? Let me know if that is.

Hi @flaviofcruz, still thinking about this. I would pause work on this PR until we have a clear answer.

If it's a major concern, we can work on changing the license of the SDK.

Switching to any of the following Apache 2, MIT, MPL, BSD3 would immediately unblock this PR. If not possible to switch then we can discuss more complex alternatives such as FFI.

License is being changed to Apache 2 here: databricks/zerobus-sdk#198 Thanks for the patience :)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 059cb46452

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +408 to +409
let element_type = parse_complex_type(element_type_json)?;
Ok(ComplexType::Array(Box::new(element_type)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve array element nullability from Unity schema

parse_array_type only reads elementType and immediately constructs ComplexType::Array, so Unity Catalog's containsNull flag is discarded. That makes schemas like array<...> with containsNull = true look fully representable even though protobuf repeated fields cannot encode a true null element, which can cause runtime encode failures (or forced coercion) when events include null entries in those arrays. The descriptor generator should either reject nullable-element arrays up front or model them with an explicit wrapper that can encode nullability.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't matter because the source are non nullable map key/values and non nullable array elements.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch 2 times, most recently from 05e3f2e to a738888 Compare April 20, 2026 20:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a738888965

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from a738888 to 171d5cd Compare April 20, 2026 23:11
@flaviofcruz
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 171d5cd2ea

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +281 to +284
Ok(Some(offset)) if self.require_acknowledgements => {
stream.wait_for_offset(offset).await.map(|_| ())
}
Ok(_) => Ok(()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail when ack mode receives no ingest offset

When acknowledgements are enabled, this branch still treats ingest_records_offset returning Ok(None) as success, so events are marked delivered without waiting for any server-side confirmation. If the SDK ever returns None (e.g., offset not available for a successful ingest call), the sink violates its advertised acknowledgement semantics and can acknowledge data before it is durably accepted; in ack mode this should be surfaced as an error instead of falling through to Ok(()).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happens if we ever send a batch with 0 elements. Still I had an error case for this situation but the code should never reach this state.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 171d5cd to 2971be3 Compare April 20, 2026 23:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2971be31bb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/config.rs Outdated
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 2971be3 to 14d3847 Compare April 21, 2026 14:52
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14d3847388

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e0c4e45c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread website/cue/reference/components/sinks/databricks_zerobus.cue Outdated
Comment thread Cargo.toml Outdated
prost-types = { workspace = true, optional = true }

# Databricks Zerobus
databricks-zerobus-ingest-sdk = { git = "https://github.com/databricks/zerobus-sdk-rs", rev = "a963e81", optional = true }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate the third-party license inventory

This adds the new databricks-zerobus-ingest-sdk dependency and updates Cargo.lock, but LICENSE-3rdparty.csv was not updated (a repo search for databricks/zerobus only finds the lockfile entry). Vector's dependency license check expects new locked crates to be reflected in the license inventory, so this PR can fail CI or ship the SDK without the recorded license metadata; please run make build-licenses and commit the generated changes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we might still need databricks/zerobus-sdk#225

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passed now, licenses file updated!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dae9ad473

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/config.rs Outdated
Comment on lines +239 to +241
if self.table_name.matches('.').count() != 2 {
return Err(ZerobusSinkError::ConfigError {
message: "table_name must be in format 'catalog.schema.table' (exactly 3 parts)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty table-name segments during config validation

The table_name validation only counts dots, so values like catalog..table, .schema.table, or catalog.schema. pass validation even though they are not valid catalog.schema.table identifiers. These malformed values then fail later during Unity Catalog calls with a less actionable remote error, despite this function claiming to enforce exactly three parts. Split on . and ensure all three segments are non-empty to keep startup failures deterministic and user-facing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 2dae9ad to c73e6c9 Compare April 22, 2026 16:14
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c73e6c9565

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs Outdated
Comment on lines +34 to +35
if !proxy.enabled {
return None;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Override SDK proxy autodetection when proxy is disabled

When proxy.enabled is false, this function returns None, which leaves ZerobusSdk on its default environment-variable proxy autodetection path (the behavior is even documented in the surrounding comments). In environments where HTTP_PROXY/HTTPS_PROXY is set globally, Zerobus traffic will still be proxied despite explicit Vector config to disable proxies, causing unexpected routing and startup/ingest failures in no-proxy deployments. Instead of returning None in this branch, provide an explicit connector factory that forces direct connections.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36e4d92f8a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1941f62f41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread website/cue/reference/components/sinks/generated/clickhouse.cue Outdated
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from f7494f4 to 83c74f4 Compare April 22, 2026 18:56
@pront
Copy link
Copy Markdown
Member

pront commented Apr 22, 2026

FYI I will come back for a final review after @petere-datadog approves.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 0e202a3 to 73f567a Compare April 22, 2026 20:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73f567a675

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}
}

if let Some(max_bytes) = self.batch.max_bytes {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce Zerobus 10MB cap when batch.max_bytes is unset

The validation only checks the 10MB SDK limit when batch.max_bytes is Some(...), so a config that explicitly sets batch.max_bytes: null can bypass this guard and produce oversized requests. Because into_batcher_settings() will then allow size-unbounded batches (bounded only by timeout/max_events), the sink can send >10MB payloads that the Zerobus SDK rejects at ingest time, turning a deterministic startup validation error into runtime delivery failures.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK validate will merge the default. Added test to confirm that indeed this is what happens - we always cap at 10MB batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants