Implement per-pixel linked list for OIT#21831
Implement per-pixel linked list for OIT#21831alice-i-cecile merged 30 commits intobevyengine:mainfrom
Conversation
IceSentry
left a comment
There was a problem hiding this comment.
This is awesome. Thank you so much for working on this. Sorry it took so long for me to review, I got sick in the same week you opened the PR and haven't had time to come back to it since.
This is very close to what I had in mind as a follow up to my original OIT impl so I'm really happy to see it in action.
I managed to review the PR because I'm very familiar with OIT but to make the diff a bit simpler to follow I would suggest adding the depth prepass support in a separate PR to the current OIT impl. This way the linked list PR will be a bit easier to follow since it won't be mixed with the depth prepass changes.
crates/bevy_core_pipeline/src/core_3d/main_transparent_pass_3d_node.rs
Outdated
Show resolved
Hide resolved
Add `reserve_internal` to `BufferVec` Add `capacity` `set_label` `get_label` to `UninitBufferVec` Use `Vec::reserve` to reduce some allocation
|
@goodartistscopy btw, we highly encourage community reviews. Since it seems like you already looked over all the changes feel free to leave a review/approve. |
|
We need 2 reviews for a maintainer to look at it and merge it. |
tychedelia
left a comment
There was a problem hiding this comment.
There appears to be a memory leak in the example on macOS, but that's on main so not blocking here. Looks great! Thanks for your work @beicause
| @group(0) @binding(2) var<storage, read_write> layer_ids: array<atomic<i32>>; | ||
| @group(0) @binding(1) var<storage, read> nodes: array<OitFragmentNode>; | ||
| @group(0) @binding(2) var<storage, read_write> heads: array<u32>; // No need to be atomic | ||
| @group(0) @binding(3) var<storage, read_write> atomic_counter: u32; // No need to be atomic |
There was a problem hiding this comment.
Note for other reviewers, was curious if contention on this is ever a source of concern, but seems that drivers very well optimize this case and so it's preferred over more complicated optimizations.
There was a problem hiding this comment.
Contention on atomic_counter ("the allocator") is highest at the draw/accumulation phase. In the resolve shader, each thread has it's own head (and list of nodes), so no contention. However all threads also reset atomic_counter non atomically and I wonder if its very pedantically UB.
There was a problem hiding this comment.
There was a problem hiding this comment.
Aside from correctness, I wonder if it costs the full bandwidth too ? Like, would
let screen_index = u32(floor(in.position.x) + floor(in.position.y) * view.viewport.z);
if screen_index == 0 {
atomic_counter = 0u;
}be more efficient ?
I think it's due to OitBuffers is never released if OIT is disabled. It should be resolved. |
There was a problem hiding this comment.
atomic_counteris a non informative name (allocatorornext_nodewould be better)- I'll put the clean up of the shader loops along with #22781 but would be better included here
I'd prefer if this was in a self contained small PR. Large PRs always take a long time to merge just because it's harder to review but keeping things small makes the process faster. |
Objective
The current OIT stores viewport-sized fragments per layer. It uses much more memory than it can be.
Solution
Implements per-pixel linked list for OIT, which saves memory and can handle more layers. The implementation references https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/api/oit_linked_lists
Testing
Tested with the
order_independent_transparencyexample. I also added a new scene in it.Details