[WIP] Incremental processing storage v3 by rkistner · Pull Request #585 · powersync-ja/powersync-service

rkistner · 2026-03-24T14:00:15Z

This rewrites the MongoDB storage for version 3, in preparation for incremental reprocessing.

Postgres storage will follow in a future PR - this one is already big enough.

On a high level:

We partition bucket_data and parameter_indexes (previously bucket_parameters) by source definition, instead of replication stream (previously group_id).
We partition source_records (previously current_data) by source table.
The partition is now physical, using separate collections, instead of only a logical separation by _id or foreign keys.
The specific storage format is slightly adjusted to account for the changes required for incremental reprocessing (detailed write-up to follow).

All of this is implemented in storage version 3 only - storage formats for versions 1 and 2 are unchanged. However, the implementation is restructured to account for the different logic in the different versions now.

The collection split has some advantages:

Removing data becomes very cheap - just drop the collection.
Reads and writes become faster by moving the common data to the collection name, instead of duplicating in each document.

The split is primarily for performance reasons, not functionally required for incremental reprocessing. It just makes sense to make the changes at the same time, while we're making significant storage changes. It does have some caveats - there are some code paths that query across multiple collections now, which could be slower. We do still need to optimize those cases.

This PR departs from the storage structure used in the incremental reprocessing POC in #468:

The POC made changes on the storage structure directly, without regard for storage versions or backwards-compatibility.
The POC did not use the collection splits.

TODO:

Properly document.
Cleanup code - we may need to further split the implementations between V1 and V3.
More tests?
Plan for performance improvements for (1) source_records pending_deletes, (2) parameter_index change detection between checkpoints.
Test performance of bucket checksum and data reads WRT clustered collections and split collections, compared to V1.

…g-storage

changeset-bot · 2026-03-24T14:00:23Z

⚠️ No Changeset found

Latest commit: 41c59b6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

rkistner added 30 commits March 16, 2026 11:48

WIP: Implement models for incremental reprocessing.

eb62ed6

Handle current_data v1/v3 differences.

d3e3d31

Split out PersistedBatch implementations.

20a99b2

Split out MongoBucketBatch implementations.

e9568ee

Resolve circular imports.

ede6515

Back to old bucket names for now.

5c76a2d

Fix type check.

c64020a

nullable CurrentDataDocumentV3.data.

de659ba

Use string ids.

70db4e9

Split collections for bucket_data.

45f14e3

Use clustered collections.

37fef15

Drop bucket_data collections when clearing.

a4226c1

Fix type issues.

5db7eba

Fixes.

e45066e

Fix tests.

9a02f5f

Split out checksum implementations.

aba9997

Split bucket_parameter collections.

0477215

Initialize collections upfront.

dd2cc4c

Workaround for MongoDB SERVER-121822.

d9634fc

Update snapshots.

82089ce

Fix for parameter lookups.

df12101

Optimize parameter query lookups.

c9f4780

Merge remote-tracking branch 'origin/main' into incremental-processin…

9498251

…g-storage

Minor restructuring.

bc8968f

Split current_data into separate source_record_ collections.

421a8ea

Refactor _id for source_records.

8c65154

Split out CurrentDataStore.

e0b5d39

Further split out implementations.

a2c4636

Rename CurrentData -> SourceRecord.

15bd883

Split out source_tables collections.

600a10d

rkistner added 7 commits March 24, 2026 12:49

Don't do initializeCurrentDataCollection for v1.

bdb8061

Initialize source records collection on resolveTable instead of flush.

f9a39c4

Refactor more collection initialization.

f46a8a7

Restructure v3 parameter index lookup values.

e2ae05c

Update tests.

2c5a1ec

Further test fix.

6051c05

Document collection structure.

41c59b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Incremental processing storage v3#585

[WIP] Incremental processing storage v3#585
rkistner wants to merge 37 commits intomainfrom
incremental-processing-storage

rkistner commented Mar 24, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rkistner commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 24, 2026

⚠️ No Changeset found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rkistner commented Mar 24, 2026 •

edited

Loading