Skip to content

Switch clustering to use per-froxel linked lists if storage buffers are in use.#22811

Open
pcwalton wants to merge 2 commits intobevyengine:mainfrom
pcwalton:per-froxel-linked-lists
Open

Switch clustering to use per-froxel linked lists if storage buffers are in use.#22811
pcwalton wants to merge 2 commits intobevyengine:mainfrom
pcwalton:per-froxel-linked-lists

Conversation

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Feb 5, 2026

The data structure used for clustering currently consists of a heap of clusterable object indices, plus an offset and counts structure for each froxel. That is, each froxel's data consists of an offset that represents the first index in a heap of indices, followed by the number of point lights, spot lights, reflection probes, irradiance volumes, and decals belonging to that froxel respectively. The indices of spot lights are assumed to immediately follow the indices of point lights, the indices of reflection probes are assumed to immediately follow the indices of spot lights, and so on. This tightly-packed structure is cache- and memory-efficient, which is especially important on WebGL 2 where the size of the uniform is extremely limited.

Unfortunately, this data structure inhibits GPU clustering, which we would like to perform in the future. In GPU clustering, we process every froxel-light pair in parallel, and the number of froxels that a light covers may exceed the workgroup size, so we're limited to atomic memory accesses for synchronization. There's no easy way I can see to build up such a tightly-packed data structure in parallel like this; the best we could do would be to build a linked list or a chunked linked list and have a second pass that compresses the linked list down, but the second pass would itself add unnecessary overhead.

To fix this problem and prepare for GPU clustering, this patch changes the data structure used for clustering to instead have one singly linked list per clusterable object type. The offset and counts structure is changed to 5 linked list heads that point to offsets in the heap. Each element in the heap is a pair that contains the ID of a clusterable object and the offset in the heap of the next pair in the list. The list is terminated by 0xffffffffu.

The CPU clustering code is unchanged; assign_objects_to_clusters still creates offsets and counts in the same way. During extraction, the offset-and-count model is converted to a linked list. The reason for this is that the uniform cluster data structure (as opposed to the storage cluster data structure), which is still used on WebGL 2, needs to remain tightly packed because uniform space is still at a premium. It was easier to keep the code identical for now than to add complexity to assign_objects_to_clusters. Unfortunately, the shader code did incur a fair bit of complexity through added #ifdefs; when we drop WebGL 2, these can be removed.

I tested the relevant examples and verified that they're unchanged.

in use.

The data structure used for clustering currently consists of a heap of
clusterable object indices, plus an *offset and counts* structure for
each froxel. That is, each froxel's data consists of an offset that
represents the first index in a heap of indices, followed by the number
of point lights, spot lights, reflection probes, irradiance volumes, and
decals belonging to that froxel respectively. The indices of spot lights
are assumed to immediately follow the indices of point lights, the
indices of reflection probes are assumed to immediately follow the
indices of spot lights, and so on. This tightly-packed structure is
cache- and memory-efficient, which is especially important on WebGL 2
where the size of the uniform is extremely limited.

Unfortunately, this data structure inhibits *GPU clustering*, which we
would like to perform in the future. In GPU clustering, we process every
froxel-light pair in parallel, and the number of froxels that a light
covers may exceed the workgroup size, so we're limited to atomic memory
accesses for synchronization. There's no easy way I can see to build up
such a tightly-packed data structure in parallel like this; the best we
could do would be to build a linked list or a chunked linked list and
have a second pass that compresses the linked list down, but the second
pass would itself add unnecessary overhead.

To fix this problem and prepare for GPU clustering, this patch changes
the data structure used for clustering to instead have one singly linked
list per clusterable object type. The offset and counts structure is
changed to 5 linked list heads that point to offsets in the heap. Each
element in the heap is a pair that contains the ID of a clusterable
object and the offset in the heap of the next pair in the list. The list
is terminated by `0xffffffffu`.

The CPU clustering code is unchanged; `assign_objects_to_clusters` still
creates offsets and counts in the same way. During extraction, the
offset-and-count model is converted to a linked list. The reason for
this is that the uniform cluster data structure (as opposed to the
storage cluster data structure), which is still used on WebGL 2, needs
to remain tightly packed because uniform space is still at a premium. It
was easier to keep the code identical for now than to add complexity to
`assign_objects_to_clusters`. Unfortunately, the shader code did incur a
fair bit of complexity through added `#ifdef`s; when we drop WebGL 2,
these can be removed.

I tested the relevant examples and verified that they're unchanged.
@pcwalton pcwalton requested review from atlv24 and tychedelia February 5, 2026 06:11
@pcwalton pcwalton added the A-Rendering Drawing game state to the screen label Feb 5, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in Rendering (2026 Proposal) Feb 5, 2026
@pcwalton pcwalton added S-Needs-Review Needs reviewer attention (from anyone!) to move forward C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: No status
Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

1 participant