Skip to content

Less quadratic half_join#706

Open
frankmcsherry wants to merge 3 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:half_join_ordered
Open

Less quadratic half_join#706
frankmcsherry wants to merge 3 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:half_join_ordered

Conversation

@frankmcsherry
Copy link
Copy Markdown
Member

@frankmcsherry frankmcsherry commented Mar 31, 2026

The current half_join.rs is a dog's breakfast of code. In particular, it has accidentally quadratic behavior when there is a substantial backlog of input updates and the user chooses to yield without doing much of the work. For example, with 1B input updates and yielding after 1M units of work, there will be ~1K reactivations each of which will do work linear in the remaining updates. Not great.

This new version is .. really quite different. It leans in to the total order on time that all uses of this operator have, and maintains blobs of pending updates ordered by time, and then other blobs of active updates ordered by key. We work our way through the active blobs subject to the yield logic, and when we run out we harvest more active blobs from the pending blobs.

The total order on time is meant to make it easy to reason about which updates are unlocked when. Relatively little funny frontier logic here.

There is some remaining complexity around the API which needs to provide an active session to the user closure (rather than just a container builder), which combined with the potential surprise of yielding makes it hard for us to plan work ahead of time. Changing this to be a container builder instead, which we can give without creating a session for each closure call, seems like a clear win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant