Skip to content

Implement per-pixel linked list for OIT#21831

Merged
alice-i-cecile merged 30 commits intobevyengine:mainfrom
beicause:oit-opt
Feb 5, 2026
Merged

Implement per-pixel linked list for OIT#21831
alice-i-cecile merged 30 commits intobevyengine:mainfrom
beicause:oit-opt

Conversation

@beicause
Copy link
Contributor

@beicause beicause commented Nov 14, 2025

Objective

The current OIT stores viewport-sized fragments per layer. It uses much more memory than it can be.

Solution

Implements per-pixel linked list for OIT, which saves memory and can handle more layers. The implementation references https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/api/oit_linked_lists

Testing

Tested with the order_independent_transparency example. I also added a new scene in it.

Details 屏幕截图_20251114_100337

@IceSentry IceSentry self-assigned this Nov 14, 2025
@IceSentry IceSentry self-requested a review November 14, 2025 04:32
@IceSentry IceSentry removed their assignment Nov 14, 2025
@IceSentry IceSentry added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Nov 14, 2025
@IceSentry IceSentry added S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Shaders This code uses GPU shader languages labels Nov 14, 2025
Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Thank you so much for working on this. Sorry it took so long for me to review, I got sick in the same week you opened the PR and haven't had time to come back to it since.

This is very close to what I had in mind as a follow up to my original OIT impl so I'm really happy to see it in action.

I managed to review the PR because I'm very familiar with OIT but to make the diff a bit simpler to follow I would suggest adding the depth prepass support in a separate PR to the current OIT impl. This way the linked list PR will be a bit easier to follow since it won't be mixed with the depth prepass changes.

Add `reserve_internal` to `BufferVec`
Add `capacity` `set_label` `get_label` to `UninitBufferVec`
Use `Vec::reserve` to reduce some allocation
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times C-Refinement Improves output quality, without fixing a clear bug or adding new functionality. and removed C-Feature A new feature, making something new possible labels Feb 2, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in Rendering (2026 Proposal) Feb 2, 2026
@IceSentry
Copy link
Contributor

@goodartistscopy btw, we highly encourage community reviews. Since it seems like you already looked over all the changes feel free to leave a review/approve.

@IceSentry
Copy link
Contributor

We need 2 reviews for a maintainer to look at it and merge it.

@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Feb 2, 2026
Copy link
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a memory leak in the example on macOS, but that's on main so not blocking here. Looks great! Thanks for your work @beicause

@group(0) @binding(2) var<storage, read_write> layer_ids: array<atomic<i32>>;
@group(0) @binding(1) var<storage, read> nodes: array<OitFragmentNode>;
@group(0) @binding(2) var<storage, read_write> heads: array<u32>; // No need to be atomic
@group(0) @binding(3) var<storage, read_write> atomic_counter: u32; // No need to be atomic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for other reviewers, was curious if contention on this is ever a source of concern, but seems that drivers very well optimize this case and so it's preferred over more complicated optimizations.

Copy link
Contributor

@goodartistscopy goodartistscopy Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contention on atomic_counter ("the allocator") is highest at the draw/accumulation phase. In the resolve shader, each thread has it's own head (and list of nodes), so no contention. However all threads also reset atomic_counter non atomically and I wonder if its very pedantically UB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from correctness, I wonder if it costs the full bandwidth too ? Like, would

let screen_index = u32(floor(in.position.x) + floor(in.position.y) * view.viewport.z);
if screen_index == 0 {
    atomic_counter = 0u;
}

be more efficient ?

@tychedelia tychedelia added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 4, 2026
@beicause
Copy link
Contributor Author

beicause commented Feb 5, 2026

There appears to be a memory leak in the example on macOS

I think it's due to OitBuffers is never released if OIT is disabled. It should be resolved.

Copy link
Contributor

@goodartistscopy goodartistscopy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • atomic_counter is a non informative name (allocator or next_node would be better)
  • I'll put the clean up of the shader loops along with #22781 but would be better included here

@IceSentry
Copy link
Contributor

I'll put the clean up of the shader loops

I'd prefer if this was in a self contained small PR. Large PRs always take a long time to merge just because it's harder to review but keeping things small makes the process faster.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Feb 5, 2026
Merged via the queue into bevyengine:main with commit afe0e5d Feb 5, 2026
38 checks passed
@github-project-automation github-project-automation bot moved this to Done in Rendering Feb 5, 2026
@github-project-automation github-project-automation bot moved this from Needs SME Triage to Done in Rendering (2026 Proposal) Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Refinement Improves output quality, without fixing a clear bug or adding new functionality. D-Shaders This code uses GPU shader languages S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants