Switch clustering to use per-froxel linked lists if storage buffers are in use.#22811
Open
pcwalton wants to merge 2 commits intobevyengine:mainfrom
Open
Switch clustering to use per-froxel linked lists if storage buffers are in use.#22811pcwalton wants to merge 2 commits intobevyengine:mainfrom
pcwalton wants to merge 2 commits intobevyengine:mainfrom
Conversation
in use. The data structure used for clustering currently consists of a heap of clusterable object indices, plus an *offset and counts* structure for each froxel. That is, each froxel's data consists of an offset that represents the first index in a heap of indices, followed by the number of point lights, spot lights, reflection probes, irradiance volumes, and decals belonging to that froxel respectively. The indices of spot lights are assumed to immediately follow the indices of point lights, the indices of reflection probes are assumed to immediately follow the indices of spot lights, and so on. This tightly-packed structure is cache- and memory-efficient, which is especially important on WebGL 2 where the size of the uniform is extremely limited. Unfortunately, this data structure inhibits *GPU clustering*, which we would like to perform in the future. In GPU clustering, we process every froxel-light pair in parallel, and the number of froxels that a light covers may exceed the workgroup size, so we're limited to atomic memory accesses for synchronization. There's no easy way I can see to build up such a tightly-packed data structure in parallel like this; the best we could do would be to build a linked list or a chunked linked list and have a second pass that compresses the linked list down, but the second pass would itself add unnecessary overhead. To fix this problem and prepare for GPU clustering, this patch changes the data structure used for clustering to instead have one singly linked list per clusterable object type. The offset and counts structure is changed to 5 linked list heads that point to offsets in the heap. Each element in the heap is a pair that contains the ID of a clusterable object and the offset in the heap of the next pair in the list. The list is terminated by `0xffffffffu`. The CPU clustering code is unchanged; `assign_objects_to_clusters` still creates offsets and counts in the same way. During extraction, the offset-and-count model is converted to a linked list. The reason for this is that the uniform cluster data structure (as opposed to the storage cluster data structure), which is still used on WebGL 2, needs to remain tightly packed because uniform space is still at a premium. It was easier to keep the code identical for now than to add complexity to `assign_objects_to_clusters`. Unfortunately, the shader code did incur a fair bit of complexity through added `#ifdef`s; when we drop WebGL 2, these can be removed. I tested the relevant examples and verified that they're unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The data structure used for clustering currently consists of a heap of clusterable object indices, plus an offset and counts structure for each froxel. That is, each froxel's data consists of an offset that represents the first index in a heap of indices, followed by the number of point lights, spot lights, reflection probes, irradiance volumes, and decals belonging to that froxel respectively. The indices of spot lights are assumed to immediately follow the indices of point lights, the indices of reflection probes are assumed to immediately follow the indices of spot lights, and so on. This tightly-packed structure is cache- and memory-efficient, which is especially important on WebGL 2 where the size of the uniform is extremely limited.
Unfortunately, this data structure inhibits GPU clustering, which we would like to perform in the future. In GPU clustering, we process every froxel-light pair in parallel, and the number of froxels that a light covers may exceed the workgroup size, so we're limited to atomic memory accesses for synchronization. There's no easy way I can see to build up such a tightly-packed data structure in parallel like this; the best we could do would be to build a linked list or a chunked linked list and have a second pass that compresses the linked list down, but the second pass would itself add unnecessary overhead.
To fix this problem and prepare for GPU clustering, this patch changes the data structure used for clustering to instead have one singly linked list per clusterable object type. The offset and counts structure is changed to 5 linked list heads that point to offsets in the heap. Each element in the heap is a pair that contains the ID of a clusterable object and the offset in the heap of the next pair in the list. The list is terminated by
0xffffffffu.The CPU clustering code is unchanged;
assign_objects_to_clustersstill creates offsets and counts in the same way. During extraction, the offset-and-count model is converted to a linked list. The reason for this is that the uniform cluster data structure (as opposed to the storage cluster data structure), which is still used on WebGL 2, needs to remain tightly packed because uniform space is still at a premium. It was easier to keep the code identical for now than to add complexity toassign_objects_to_clusters. Unfortunately, the shader code did incur a fair bit of complexity through added#ifdefs; when we drop WebGL 2, these can be removed.I tested the relevant examples and verified that they're unchanged.