Skip to content

oracledb_cdc: support cache resources for internal transaction buffer#4409

Open
josephwoodward wants to merge 18 commits into
mainfrom
jw/oraclecdctransactioncache
Open

oracledb_cdc: support cache resources for internal transaction buffer#4409
josephwoodward wants to merge 18 commits into
mainfrom
jw/oraclecdctransactioncache

Conversation

@josephwoodward
Copy link
Copy Markdown
Contributor

@josephwoodward josephwoodward commented May 7, 2026

Currently OracleDB CDC buffers transactions via an internal, in-memory cache. It does this because it has to wait to see if the transaction rows are committed or rolled back before it can decide how to process them.

This change adds new behaviour, expanding the support of this internal buffer to be configurable via Connect's various cache resources, reducing memory footprint and improving the overall reliability of the connector when working with workflows featuring long-running transactions.

Performance:

Below graphs show streaming 5,000,000 records (see screenshot for different cache types):

image image

Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Commits
LGTM

Review
Two issues found in the new ConnectCacheResource: a precision-loss bug in the JSON value envelope, and the absence of any tests for the new serialization and cache lifecycle code.

  1. int64 values larger than 2^53 silently lose precision when round-tripping through decodeVal, because encoding/json decodes JSON numbers into float64 when unmarshaling into any. This is reachable in practice because OracleValueConverter produces int64 for bare numeric literals, so Oracle large-numeric IDs end up corrupted on the way back out of the cache. See inline comment.
  2. The PR adds ~349 lines of new code (custom typed JSON serialization plus a cache-backed TransactionCache lifecycle) with no unit or integration tests. The project's test patterns call for tests on new components. See inline comment.

@Jeffail
Copy link
Copy Markdown
Contributor

Jeffail commented May 12, 2026

Hi @josephwoodward — a few concerns worth surfacing on this change, ordered by severity.

P0 — Correctness

LowWatermarkSCN safe-checkpoint protection is bypassed for cache-backed transactions

In logminer.go:350-360, the safe-checkpoint computation is gated on the *InMemoryCache concrete type:

if cache, ok := lm.txnCache.(*InMemoryCache); ok {
    if lowestOpenSCN := cache.LowWatermarkSCN(redoEvent.TransactionID); ... {
        safeCheckpointSCN = lowestOpenSCN - 1
    }
}

When the new ConnectCacheResource is in use this branch is skipped entirely, and the checkpoint can advance past the start SCN of any concurrently open transaction. For an external persistent cache the in-flight events survive a restart, so the COMMIT at a higher SCN can re-resolve them. For a non-persistent backing — for example a memory cache resource — the buffer is gone after restart, and the events between the missed transaction's start and the advanced checkpoint are permanently lost.

The safe-checkpoint logic should apply to both implementations. The simplest fix is to maintain an in-memory txnID → startSCN side-index inside ConnectCacheResource — it scales with the number of concurrent open transactions, not the size of any single transaction, so it stays bounded.

txnID cache keys are not unique across multiple oracledb_cdc inputs

The cache key at cache_resource.go:50 is "txn:" + txnID. Oracle USN.SLOT.SEQ transaction identifiers are unique only within a single Oracle instance, so two oracledb_cdc inputs pointed at different databases that share one cache resource will silently collide. A configurable key prefix (mirroring the existing checkpoint_cache_key) would resolve this.

P1 — Reliability / performance

AddEvent is O(N²) in transaction size

Each call to AddEvent performs a Get + JSON decode + append + JSON encode + Set on the entire transaction. For a transaction of N events the total marshal/unmarshal work is on the order of N²/2, and against a remote cache the bytes-on-the-wire grow the same way. This is the worst case for the long-running transaction workload the change is intended to support. At minimum it deserves a note in the documentation; preferably each event would be stored under its own key with a small index, or appended via a list-style cache operation where the underlying implementation supports it.

Cache I/O errors are silently swallowed

The TransactionCache interface methods do not return errors, so Set / Get / Delete failures inside ConnectCacheResource are logged and dropped. With the in-memory implementation this was safe because the operations could not fail. With a network-backed cache a transient Set failure causes the next AddEvent to hit the "not found, creating..." path on GetTransaction, which starts a fresh transaction in the cache and drops every event that came before the failure. The interface likely needs to surface these errors so the read loop can decide on retry / fail semantics.

nil TTL is ambiguous

c.cache.Set(ctx, key, data, nil) at cache_resource.go:73 and :118. Depending on the cache resource implementation nil may mean "no expiry" or "use the cache's configured default." If a user configures Redis (or similar) with a default TTL shorter than their longest open transaction, entries will expire mid-flight and the transaction will be silently corrupted when it eventually commits. Either pass an explicit no-expire sentinel, validate the cache's default TTL at startup, or document the requirement loudly.

Happy to discuss any of these.

@josephwoodward josephwoodward force-pushed the jw/oraclecdctransactioncache branch 2 times, most recently from f091a40 to ea02135 Compare May 20, 2026 11:57
Comment thread internal/impl/oracledb/input_oracledb_cdc.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Commits
LGTM — both messages follow system: message form and scope cleanly (production code in commit 1, tests in commit 2). Minor nit: commit 1 has a typo ("resoueces" → "resources") in the headline.

Review
The PR cleanly adds a pluggable transaction buffer backed by a Connect cache resource. Existing in-memory behaviour is preserved as default. Concerns are concentrated in the new cache_resource.go: the cache reference is captured outside its AccessCache guard, context.Background() is used for every cache call, JSON-encoded int64 values can lose precision on the way back through interface{}, and GetTransaction cannot distinguish "key not found" from a transient error.

  1. service.Cache is held outside AccessCache, bypassing the resource manager's serialization that every other cache consumer in the repo respects (input_oracledb_cdc.go#L509-L519).
  2. int64 precision loss on JSON round-trip in cache_resource.go (decodeVal case typeInt64) — reachable for any Oracle NUMBER wider than ~16 digits since sqlredo returns those as int64 (valueconverter.go#L78-L82).
  3. context.Background() used for every cache operation in ConnectCacheResource — violates the project rule on context propagation and risks stalling Close() against a slow / hung backend.
  4. GetTransaction collapses every cache error to nil, so any non-ErrKeyNotFound failure causes silent transaction reset and downstream data corruption.

Comment thread internal/impl/oracledb/input_oracledb_cdc.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Commits

  1. The first commit subject oracledb_cdc: support cache resoueces for internal transaction buffer has a typo — "resoueces" should be "resources".

Review
Two issues around cache-resource usage. The service.Cache reference is retained across calls instead of being used only inside its AccessCache callback, which is inconsistent with the existing SCN cache code in the same file. Separately, every cache operation uses context.Background() rather than propagating the framework ctx, which violates the project's godev guidance and prevents cancellation of blocking cache backends.

  1. Cache reference captured outside AccessCache callback in internal/impl/oracledb/input_oracledb_cdc.go — see inline comment.
  2. context.Background() used in all ConnectCacheResource cache operations in internal/impl/oracledb/logminer/cache_resource.go — see inline comment.

Comment thread internal/impl/oracledb/input_oracledb_cdc.go Outdated
Comment thread internal/impl/oracledb/input_oracledb_cdc.go Outdated
Comment thread internal/impl/oracledb/logminer/cache_resource.go
@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Commits

  1. oracledb_cdc: support cache resoueces for internal transaction buffer — typo "resoueces" → "resources".
  2. oracle_db: support shared cache resources — system prefix oracle_db is inconsistent with the oracledb_cdc prefix used by the other three commits in this PR.
  3. oracledb_cdc: refactor and add integration test — mixes a non-trivial integration-test refactor with the new test case; consider splitting, or squashing all four commits into a single self-contained feature commit, since they all add the same feature and the intermediate states are not independently buildable as discrete logical changes.
  4. oracledb_cdc: address lowestwatermarkscn use cases — message is vague; it doesn't convey what behaviour changed (moving the watermark computation behind the TransactionCache interface).

Review
Adds a cache-resource-backed transaction buffer for oracledb_cdc LogMiner so long-running transactions can be offloaded from process memory.

  1. The new field's Contains check uses the wrong namespace — see input_oracledb_cdc.go#L748. The feature never activates, including in the new cache-resource integration test.
  2. JSON serialization loses precision for int64 values > 2^53 — see cache_resource.go#L252-L259. Affects Oracle NUMBER columns parsed to int64.
  3. service.Cache reference is captured outside its AccessCache callback and reused for the input's lifetime — see input_oracledb_cdc.go#L506-L511. The codebase convention is to call AccessCache per-operation (cf. internal/impl/mongodb/cdc/checkpoint_cache.go).

Comment thread internal/impl/oracledb/logminer/cache_resource.go
Comment thread internal/impl/oracledb/logminer/logminer_test.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Commits

  1. The first commit message has a typo — oracledb_cdc: support cache resoueces for internal transaction buffer should be resources.
  2. The fourth commit uses a different system prefix (oracle_db: support shared cache resources) than the others (oracledb_cdc:). Either pick oracledb_cdc: for consistency, or use the appropriate scope.
  3. The six commits look like an iterative development trail on a single feature (initial impl → refactor → fix watermark cases → share cache → docs → key restructuring). They would be more reviewable squashed into one or two logical commits per the CLAUDE.md granularity guidance.

Review

LogMiner's transaction buffer can now be backed by a Connect cache resource, which is a sensible addition. The serialization envelope covers all the types the SQL-redo parser actually emits (string, int64, json.Number, []byte, time.Time, nil), the new integration test exercises the full insert/update/delete cycle against the cache-resource path, and the unit test for concurrent transaction checkpointing is a nice addition.

Two issues worth addressing before merge:

  1. ConnectCacheResource.LowWatermarkSCN diverges semantically from InMemoryCache.LowWatermarkSCN — the new implementation considers every started transaction, while the in-memory one filters on len(txn.Events) > 0. Because StartTransaction is invoked for every Oracle OpStart regardless of whether DML follows on subscribed tables, the cache-resource path will hold back the checkpoint for transactions the in-memory path would let through. Details inline at cache_resource.go#L222-L230. The stale doc comment at cache.go#L139 ("not part of the cache interface") should also be updated since LowWatermarkSCN is now on the interface.
  2. The unit test comment at logminer_test.go#L129-L132 refers to a type assertion in processRedoEvent that was removed in this PR. The comment is now misleading.

Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Commits

  1. Commit oracledb_cdc: support cache resoueces for internal transaction buffer has a typo (resouecesresources).
  2. Commit oracle_db: support shared cache resources uses prefix oracle_db; the component is oracledb_cdc and the other commits use that prefix — pick one and be consistent.
  3. Commit oracledb_cdc: generate docs and address config issue mixes documentation changes with code changes. Per the commit policy ("In multi-commit PRs, documentation changes must be in a separate commit from code changes"), the regenerated docs should be in their own commit.
  4. Several commits read like unsquashed PR-review fixups against the original feature commit rather than self-contained logical changes (refactor and add integration test, address lowestwatermarkscn use cases, move cache_resource into cache and reference by key, address precision loss issue with int64, propagate ctx for network bound transaction caches, update outdated test code comment). Consider squashing the iterative fixups into the original support cache resources… commit so the history reflects logical changes, not the development process. Several of these messages are also vague (refactor, address … use cases, address config issue).

Review
The new ConnectCacheResource (internal/impl/oracledb/logminer/cache_resource.go) cleanly mirrors the InMemoryCache shape and serialization preserves the awkward int64/json.Number/time.Time/[]byte types correctly. Unit tests cover the concurrent-transaction checkpoint hold-back through the new cache path, and there is an integration test exercising the full pipeline.

  1. GetTransaction swallows cache errors and returns nil, which causes AddEvent to clobber existing buffered transactions on transient cache errors — potential silent data loss. See inline comment on cache_resource.go#L159-L179.

Comment thread internal/impl/oracledb/logminer/cache_resource.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Commits

  1. Commit oracledb_cdc: support cache resoueces for internal transaction buffer has a typo (resouecesresources) in the message.
  2. Commit oracle_db: support shared cache resources uses a different system prefix (oracle_db) than the other commits in this PR which use oracledb_cdc. Pick one for consistency.
  3. Several commits read as iterative fixups against earlier work in the same PR rather than self-contained logical changes — oracledb_cdc: address lowestwatermarkscn use cases, oracledb_cdc: generate docs and address config issue, oracledb_cdc: move cache_resource into cache and reference by key, oracledb_cdc: address precision loss issue with int64, oracledb_cdc: propagate ctx for network bound transaction caches, oracledb_cdc: update outdated test code comment, oracledb_cdc: further align cache resource to in memory version. Consider squashing these into the relevant feature commits.

Review

The change extends the LogMiner transaction buffer to use a configurable service.Cache resource via a new ConnectCacheResource implementing TransactionCache, threads context.Context through the cache interface, lifts LowWatermarkSCN onto the interface (with both implementations now filtering to txns with events), and adds an integration test variant exercising the file cache. Serialization uses a typed-value envelope to preserve int64/json.Number/[]byte/time.Time round-trips.

  1. cache_resource.go:175 compares getErr \!= service.ErrKeyNotFound rather than using errors.Is. See inline comment.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Commits

  1. oracledb_cdc: support cache resoueces for internal transaction buffer — typo in commit subject: resouecesresources.
  2. oracle_db: support shared cache resources — uses oracle_db: while every other commit in this PR uses oracledb_cdc:. The system name should match the component name (oracledb_cdc) per the commit policy convention.
  3. oracledb_cdc: generate docs and address config issue — mixes a generated docs update with a code change. Per repo convention, documentation changes belong in a separate commit from code changes.

The remaining 8 commits look like incremental fix-up commits on top of the original change (e.g. address lowestwatermarkscn use cases, further align cache resource to in memory version, add error handling). These would read more cleanly squashed into the primary change before merge, since reviewers and git blame get more value from a single coherent commit than from a bisect-able trail of in-progress fix-ups.

Review

The new ConnectCacheResource is well-isolated behind the existing TransactionCache interface, the LowWatermarkSCN semantics now match InMemoryCache (both correctly skip transactions with no events), and the typed-envelope JSON serialization preserves fidelity for every concrete type the SQL_REDO parser produces (string, int64, json.Number, []byte, time.Time, nil). The new TestProcessRedoEventWithConnectCacheResource covers the watermark cases and the integration test exercises the file-cache path end-to-end.

LGTM

@josephwoodward josephwoodward marked this pull request as ready for review May 21, 2026 12:11
@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Commits

  1. The first commit headline oracledb_cdc: support cache resoueces for internal transaction buffer has a typo — resouecesresources.
  2. Commit oracle_db: support shared cache resources uses oracle_db as the system prefix, but the component is oracledb_cdc and every other commit in this PR uses that name. Per the commit policy, the system name should match across commits.
  3. Most of the 11 commits are iterative refinements of the same feature within this PR (address lowestwatermarkscn use cases, address config issue, address precision loss issue with int64, update outdated test code comment, further align cache resource to in memory version, add error handling to support cache resources). These read as fixup-style follow-ups and should be squashed into a small number of self-contained logical commits before merging.
  4. Commit oracledb_cdc: generate docs and address config issue mixes generated documentation with a code fix. Per the commit policy, in multi-commit PRs documentation changes must be in a separate commit from code changes.

Review
LGTM. The new ConnectCacheResource mirrors InMemoryCache semantics (including the len(Events) > 0 filter in LowWatermarkSCN), the typed-value serialization preserves the documented set of value types across a JSON round-trip without losing int64 precision, ctx is now correctly threaded through every cache call, and the transactions_active / events_inflight gauges stay consistent across the start / add / discard / commit / rollback paths. Integration coverage exercises both the in-memory and cache-resource buffers end to end.

LGTM

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Commits

  1. The first commit message has a typo: oracledb_cdc: support cache resoueces for internal transaction bufferresoueces should be resources.
  2. The fourth commit (ab8c474c) uses the wrong system prefix: oracle_db: support shared cache resources should be oracledb_cdc: to match the rest of the series and the component name.
  3. Many of the trailing commits appear to be unsquashed fixups against the original feature commit, with vague titles that do not describe the change in isolation: tidy up, change how we store data, further align cache resource to in memory version, address lowestwatermarkscn use cases, generate docs and address config issue, address precision loss issue with int64, update outdated test code comment. Per the commit policy, each commit should be a small, self-contained, logical change with an imperative message describing what the commit does; please squash these into the introducing commits before merge.

Review

The change adds a transaction_cache/transaction_cache_key option on oracledb_cdc.logminer that swaps the in-memory transaction buffer for a serializing service.Cache-backed buffer, generalizes the TransactionCache interface to be context-aware and to expose LowWatermarkSCN, and adds unit and integration coverage for the new path. Implementation follows established conventions (RCL header on the new file, oci…Field… constants, ParsedConfig extraction, Optional()/Default() matching the existing checkpoint_cache_key).

LGTM

@josephwoodward josephwoodward force-pushed the jw/oraclecdctransactioncache branch from 1690032 to aec1e77 Compare May 22, 2026 14:00
@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Commits

  1. Several commits look like fixups that should be squashed into the original support cache resources commit before merge — they each address review/iteration on the same logical change: address lowestwatermarkscn use cases, address precision loss issue with int64, propagate ctx for network bound transaction caches, update outdated test code comment, further align cache resource to in memory version, add error handling to support cache resources, tidy up, change how we store data, refactor to return errors from cache. This violates the granularity rule that each commit be one small self-contained logical change.
  2. Typo in commit subject: oracledb_cdc: support cache resoueces for internal transaction buffer (resouecesresources).
  3. Typo in commit subject: oraclerdb_cdc: add cache to benchmark (oraclerdb_cdcoracledb_cdc).
  4. Inconsistent system prefix: oracle_db: support shared cache resources and oracle_db: tidy up benchmark running use oracle_db while every other commit (and the component name) uses oracledb_cdc.
  5. Vague/uninformative messages: oracledb_cdc: tidy up, oracledb_cdc: change how we store data, oracle_db: tidy up benchmark running — these don't convey what changed or why.

Review
Adds an optional service.Cache-backed transaction buffer (ConnectCacheResource) for the oracledb_cdc LogMiner, with JSON envelope serialization that preserves Go types across the cache boundary, integration test coverage for both buffer modes, and a benchmark Taskfile harness. The interface change to accept context.Context and return error from cache operations is threaded through LogMiner and the in-memory implementation cleanly.

LGTM

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Commits

  1. Typos in commit messages: oracledb_cdc: support cache resoueces... ("resoueces" → "resources") and oraclerdb_cdc: add cache to benchmark ("oraclerdb_cdc" → "oracledb_cdc").
  2. Inconsistent system prefix: oracle_db: support shared cache resources and oracle_db: tidy up benchmark running should use the component name oracledb_cdc: to match the other commits.
  3. Several fixup-style commits should be squashed into the logical parent commit they amend, e.g. address precision loss issue with int64, update outdated test code comment, further align cache resource to in memory version, add error handling to support cache resources, tidy up, change how we store data, refactor to return errors from cache, clean up cache_resource.go. This violates the granularity rule that each commit should be one small, self-contained, logical change rather than an iterative WIP history.
  4. Vague messages: oracledb_cdc: tidy up, oracle_db: tidy up benchmark running, and oracledb_cdc: change how we store data don't communicate the actual change. Per policy, messages must not be vague.

Review

The new transaction_cache cache_resource buffer is implemented cleanly, with the interface evolved consistently across InMemoryCache and the new ConnectCacheResource, context propagation through the lifecycle, and tests that mirror the existing in-memory cache coverage. The serialization wrapper around typedVal preserves the known concrete types produced by OracleValueConverter (string/int64/json.Number/[]byte/time.Time/nil) on a JSON round-trip.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants