From d0eb0a549fe8a70b0042578fa75905ca06982d97 Mon Sep 17 00:00:00 2001 From: spencer-lunarg Date: Sat, 14 Mar 2026 16:59:07 -0400 Subject: [PATCH 01/11] Add a chapter on Compute Shaders --- README.adoc | 2 + antora/modules/ROOT/nav.adoc | 1 + chapters/compute_shaders.adoc | 150 ++++++++++++++++++++++++++++++++++ 3 files changed, 153 insertions(+) create mode 100644 chapters/compute_shaders.adoc diff --git a/README.adoc b/README.adoc index 72d6f4d..7ab42d5 100644 --- a/README.adoc +++ b/README.adoc @@ -154,6 +154,8 @@ The Vulkan Guide content is also viewable from https://docs.vulkan.org/guide/lat === xref:{chapters}dynamic_state_map.adoc[Dynamic State Map] +== xref:{chapters}compute_shaders.adoc[Compute Shaders] + == xref:{chapters}subgroups.adoc[Subgroups] * `VK_EXT_subgroup_size_control`, `VK_KHR_shader_subgroup_extended_types`, `VK_EXT_shader_subgroup_ballot`, `VK_EXT_shader_subgroup_vote` diff --git a/antora/modules/ROOT/nav.adoc b/antora/modules/ROOT/nav.adoc index 6fd940a..c5662fe 100644 --- a/antora/modules/ROOT/nav.adoc +++ b/antora/modules/ROOT/nav.adoc @@ -60,6 +60,7 @@ ** xref:{chapters}robustness.adoc[] ** xref:{chapters}dynamic_state.adoc[] *** xref:{chapters}dynamic_state_map.adoc[] +** xref:{chapters}compute_shaders.adoc[] ** xref:{chapters}subgroups.adoc[] ** xref:{chapters}shader_memory_layout.adoc[] ** xref:{chapters}atomics.adoc[] diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc new file mode 100644 index 0000000..28a0b68 --- /dev/null +++ b/chapters/compute_shaders.adoc @@ -0,0 +1,150 @@ +// Copyright 2026 The Khronos Group, Inc. +// SPDX-License-Identifier: CC-BY-4.0 + +// Required for both single-page and combined guide xrefs to work +ifndef::chapters[:chapters:] +ifndef::images[:images: images/] + +// the [] in the URL messes up asciidoc +:max-compute-workgroup-size-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize[0]&platform=all +:max-compute-workgroup-count-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount[0]&platform=all + +[[compute-shaders]] += Compute Shaders + +This chapter is **not** a "how to use compute shader" article, there are plenty of resources online around GPGPU and compute. + +What this chapter is for is all the "Vulkan-ism", terms, etc that are associated with compute shaders. + +There is also a xref:{chapters}decoder_ring.adoc[Decoder Ring] created to help people transition from other APIs that use different terminology. + +[NOTE] +==== +If you want to play around with a simple compute example, suggest taking a look at the link:https://github.com/charles-lunarg/vk-bootstrap/blob/main/example/simple_compute.cpp[vk-bootstrap sample]. +==== + +== Coming from Vulkan Graphics + +For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except: + +- No vertex buffers, render passes, swapchains needed +- Call `vkCmdDispatch` instead of `vkCmdDraw` +- Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines` +- Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT` +- When binding descriptors or pipeline to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE` + +== SPIR-V Terminology + +The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work. + +`Invocations` are partitioned into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter]) + +Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader. + +[NOTE] +==== +While slightly annoying, Vulkan spec uses `WorkGroup` while the SPIR-V spec spells it as `Workgroup`. It has no significant meaning, other than a potential typo when going between the two. +==== + +=== Workgroup Size + +Setting the `workgroup` size can be done in 3 ways + +1. Using the `WorkgroupSize` built-in (link:https://godbolt.org/z/ees83eT7x[example]) +2. Using the `LocalSize` execution mode (link:https://godbolt.org/z/3zn1Preb8[example]) +3. Using the `LocalSizeId` execution mode (link:https://godbolt.org/z/dP7daqTas[example]) + +A few important things to note: + +- The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module. +- `LocalSizeId` was added in `VK_KHR_maintenance4` (Vulkan 1.3) to allow the ability to use specialization constants to set the size. +- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]) +- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]) + +=== Local and Global Workgroups + +When a `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables and issue memory and control flow barriers to synchronize with other members of the `local workgroup`. + +[NOTE] +==== +There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here. +==== + +== Shared memory + +When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class. + +Shared memory is essentially the "L1 cache you can control" in your compute shader and an important part of any performant shader. + +There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeSharedMemorySize&platform=all[mainly around 32k bytes]) that needs to be accounted for. + +=== Shared Memory Race Conditions + +It is very easy to get into race conditions with shared memory. + +The classic example is having multiple invocations initializing something to the same value. + +[source,glsl] +---- +shared uint my_var; +void main() { + // All the invocations in the workgroup are going to try to write to the same memory. + // RACE CONDITION + my_var = 0; +} +---- + +If you are asking "why?", the "technically correct" answer is "because the link:https://docs.vulkan.org/spec/latest/appendices/memorymodel.html[memory model] says so". + +When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value. + +Luckily the "fix" is simple, make sure to use atomics + +[source,glsl] +---- +shared uint my_var; +void main() { + atomicStore(temp, 0u, gl_ScopeWorkgroup, 0, 0); +} +---- + +Another option is to use a `OpControlBarrier` with `Workgroup` scope (link:https://godbolt.org/z/WcsvjYfPx[see online]). + +[source,glsl] +---- +layout(local_size_x = 32) in; // 32x1x1 workgroup +shared uint my_var[32]; // one slot for each invocation + +void main() { + my_var[gl_LocalInvocationIndex] = 0; + barrier(); // will generate an OpControlBarrier for you + uint x = my_var[gl_LocalInvocationIndex ^ 1]; +} +---- + +==== Detecting shared memory data races + +Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers! + +As of March 2026 (TODO - Add SDK version when released in May) when using GPU-AV, it will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses. + +=== Explicit Layout of shared memory + +The xref:{chapters}extensions/shader_features.adoc#VK_KHR_workgroup_memory_explicit_layout[VK_KHR_workgroup_memory_explicit_layout] extension was added to allow link:https://github.com/KhronosGroup/SPIRV-Guide/blob/main/chapters/explicit_layout.md[explicit layout] of shared memory. + +== Finding the invocation in your shader + +There are many SPIR-V built-in values that can be used to find the invocation in your shader. + +The following built-ins are well defined in the link:https://docs.vulkan.org/spec/latest/chapters/interfaces.html#interfaces-builtin-variables[builtin chapter] of the Vulkan spec. + +- `GlobalInvocationId` +- `LocalInvocationId` +- `LocalInvocationIndex` +- `NumSubgroups` +- `NumWorkgroups` +- `SubgroupId` +- `WorkgroupId` + +For those who want a more "hands on" example, link:https://godbolt.org/z/qhPrE6o5b[the following GLSL] demonstrates using most of these built-ins. + From aff6fdb47a7ee278df6ce69e7474dad702b1945b Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:34:14 -0700 Subject: [PATCH 02/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index 28a0b68..ddda741 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -31,7 +31,7 @@ For those who are more familiar with graphics in Vulkan, compute will be a simpl - Call `vkCmdDispatch` instead of `vkCmdDraw` - Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines` - Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT` -- When binding descriptors or pipeline to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE` +- When binding descriptors and pipelines to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE` == SPIR-V Terminology From 9ccfb65988f65efc38de8e7b7a3cd2671becfee0 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:34:28 -0700 Subject: [PATCH 03/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index ddda741..1174ad9 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -37,7 +37,7 @@ For those who are more familiar with graphics in Vulkan, compute will be a simpl The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work. -`Invocations` are partitioned into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter]) +Multiple `Invocations` are organized into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter]) Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader. From c12d90720cd046c9af10ccd4b961a62be4e679e1 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:34:39 -0700 Subject: [PATCH 04/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index 1174ad9..8258e51 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -57,7 +57,7 @@ Setting the `workgroup` size can be done in 3 ways A few important things to note: - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module. -- `LocalSizeId` was added in `VK_KHR_maintenance4` (Vulkan 1.3) to allow the ability to use specialization constants to set the size. +- `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size. - There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]) - There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]) From 2c985f4df9494dd3a5d4eb88ada6adb52d15fb87 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:34:59 -0700 Subject: [PATCH 05/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index 8258e51..b8baa9e 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -58,8 +58,8 @@ A few important things to note: - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module. - `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size. -- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]) -- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]) +- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension] +- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024] === Local and Global Workgroups From 07b2e8b41de5c7b10b534e5a23bdf52acc5b87d2 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:35:16 -0700 Subject: [PATCH 06/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index b8baa9e..a3851e0 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -63,7 +63,7 @@ A few important things to note: === Local and Global Workgroups -When a `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables and issue memory and control flow barriers to synchronize with other members of the `local workgroup`. +When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables as well as issue memory and control flow barriers to synchronize with other members of the `local workgroup`. [NOTE] ==== From f236747198f8b573bd9ccbaf71524e4cd8ea5c79 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:35:58 -0700 Subject: [PATCH 07/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> From ed173ead4455d249579002202e5ff97e67a1b8d7 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:36:15 -0700 Subject: [PATCH 08/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index a3851e0..3920bf4 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -80,7 +80,7 @@ There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/d === Shared Memory Race Conditions -It is very easy to get into race conditions with shared memory. +It is very easy to have race conditions when using shared memory. The classic example is having multiple invocations initializing something to the same value. From 080fd5a2c3f758368e39b1079dc5b7fd2c808e04 Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:36:30 -0700 Subject: [PATCH 09/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index 3920bf4..6bf8346 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -82,7 +82,7 @@ There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/d It is very easy to have race conditions when using shared memory. -The classic example is having multiple invocations initializing something to the same value. +The classic example is when multiple invocations initialize something to the same value. [source,glsl] ---- From 167676c43e0cfbb9144149f662325493c3bd7b8a Mon Sep 17 00:00:00 2001 From: Steven Winston Date: Fri, 20 Mar 2026 14:36:47 -0700 Subject: [PATCH 10/11] Update chapters/compute_shaders.adoc Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com> --- chapters/compute_shaders.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index 6bf8346..a59dc30 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -126,7 +126,7 @@ void main() { Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers! -As of March 2026 (TODO - Add SDK version when released in May) when using GPU-AV, it will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses. +As of March 2026 (TODO - Add SDK version when released in May), GPU-AV will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses. === Explicit Layout of shared memory From e1c6ece30db88c5e4e5762b919a561e083c3a588 Mon Sep 17 00:00:00 2001 From: spencer-lunarg Date: Sun, 22 Mar 2026 11:19:01 -0400 Subject: [PATCH 11/11] final review --- chapters/compute_shaders.adoc | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc index a59dc30..718b397 100644 --- a/chapters/compute_shaders.adoc +++ b/chapters/compute_shaders.adoc @@ -27,7 +27,6 @@ If you want to play around with a simple compute example, suggest taking a look For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except: -- No vertex buffers, render passes, swapchains needed - Call `vkCmdDispatch` instead of `vkCmdDraw` - Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines` - Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT` @@ -58,8 +57,8 @@ A few important things to note: - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module. - `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size. -- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension] -- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024] +- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be in each dimension. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]. +- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]. === Local and Global Workgroups @@ -70,6 +69,26 @@ When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. Th There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here. ==== +=== Dispatching size from a buffer + +The `vkCmdDispatchIndirect` (and newer `vkCmdDispatchIndirect2KHR`) allow the size to be controlled from a buffer. This means the GPU can set the number of workgroups to dispatch. + +[source,glsl] +---- +// or any other draw/dispath that will update the memory on the GPU +vkCmdDispatch(); + +vkCmdPipelineBarrier( + VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, // src stage + VK_ACCESS_SHADER_WRITE_BIT, // src access + VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, // dst stage + VK_ACCESS_INDIRECT_COMMAND_READ_BIT, // dst access +) + +// Reads VkDispatchIndirectCommand in buffer to set the number of local workgroups +vkCmdDispatchIndirect(my_buffer); +---- + == Shared memory When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class. @@ -98,13 +117,13 @@ If you are asking "why?", the "technically correct" answer is "because the link: When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value. -Luckily the "fix" is simple, make sure to use atomics +Luckily the fix is simple, make sure to use atomics [source,glsl] ---- shared uint my_var; void main() { - atomicStore(temp, 0u, gl_ScopeWorkgroup, 0, 0); + atomicStore(my_var, 0u, gl_ScopeWorkgroup, 0, 0); } ----