KhronosGroup · spencer-lunarg · Mar 22, 2026 · Mar 14, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/README.adoc b/README.adoc
@@ -154,6 +154,8 @@ The Vulkan Guide content is also viewable from https://docs.vulkan.org/guide/lat
 
 === xref:{chapters}dynamic_state_map.adoc[Dynamic State Map]
 
+== xref:{chapters}compute_shaders.adoc[Compute Shaders]
+
 == xref:{chapters}subgroups.adoc[Subgroups]
 
   * `VK_EXT_subgroup_size_control`, `VK_KHR_shader_subgroup_extended_types`, `VK_EXT_shader_subgroup_ballot`, `VK_EXT_shader_subgroup_vote`

diff --git a/antora/modules/ROOT/nav.adoc b/antora/modules/ROOT/nav.adoc
@@ -60,6 +60,7 @@
 ** xref:{chapters}robustness.adoc[]
 ** xref:{chapters}dynamic_state.adoc[]
 *** xref:{chapters}dynamic_state_map.adoc[]
+** xref:{chapters}compute_shaders.adoc[]
 ** xref:{chapters}subgroups.adoc[]
 ** xref:{chapters}shader_memory_layout.adoc[]
 ** xref:{chapters}atomics.adoc[]

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
@@ -0,0 +1,169 @@
+// Copyright 2026 The Khronos Group, Inc.
+// SPDX-License-Identifier: CC-BY-4.0
+
+// Required for both single-page and combined guide xrefs to work
+ifndef::chapters[:chapters:]
+ifndef::images[:images: images/]
+
+// the [] in the URL messes up asciidoc
+:max-compute-workgroup-size-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize&#91;0&#93;&platform=all
+:max-compute-workgroup-count-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount&#91;0&#93;&platform=all
+
+[[compute-shaders]]
+= Compute Shaders
+
+This chapter is **not** a "how to use compute shader" article, there are plenty of resources online around GPGPU and compute.
+
+What this chapter is for is all the "Vulkan-ism", terms, etc that are associated with compute shaders.
+
+There is also a xref:{chapters}decoder_ring.adoc[Decoder Ring] created to help people transition from other APIs that use different terminology.
+
+[NOTE]
+====
+If you want to play around with a simple compute example, suggest taking a look at the link:https://github.com/charles-lunarg/vk-bootstrap/blob/main/example/simple_compute.cpp[vk-bootstrap sample].
+====
+
+== Coming from Vulkan Graphics
+
+For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except:
+
+- Call `vkCmdDispatch` instead of `vkCmdDraw`
+- Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines`
+- Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT`
+- When binding descriptors and pipelines to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE`
+
+== SPIR-V Terminology
+
+The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work.
+
+Multiple `Invocations` are organized into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter])
+
+Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader.
+
+[NOTE]
+====
+While slightly annoying, Vulkan spec uses `WorkGroup` while the SPIR-V spec spells it as `Workgroup`. It has no significant meaning, other than a potential typo when going between the two.
+====
+
+=== Workgroup Size
+
+Setting the `workgroup` size can be done in 3 ways
+
+1. Using the `WorkgroupSize` built-in (link:https://godbolt.org/z/ees83eT7x[example])
+2. Using the `LocalSize` execution mode (link:https://godbolt.org/z/3zn1Preb8[example])
+3. Using the `LocalSizeId` execution mode (link:https://godbolt.org/z/dP7daqTas[example])
+
+A few important things to note:
+
+- The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module.
+- `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size.
+- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be in each dimension. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension].
+- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024].
+
+=== Local and Global Workgroups
+
+When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables as well as issue memory and control flow barriers to synchronize with other members of the `local workgroup`.
+
+[NOTE]
+====
+There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here.
+====
+
+=== Dispatching size from a buffer
+
+The `vkCmdDispatchIndirect` (and newer `vkCmdDispatchIndirect2KHR`) allow the size to be controlled from a buffer. This means the GPU can set the number of workgroups to dispatch.
+
+[source,glsl]
+----
+// or any other draw/dispath that will update the memory on the GPU
+vkCmdDispatch();
+
+vkCmdPipelineBarrier(
+    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, // src stage
+    VK_ACCESS_SHADER_WRITE_BIT,           // src access
+    VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,  // dst stage
+    VK_ACCESS_INDIRECT_COMMAND_READ_BIT,  // dst access
+)
+
+// Reads VkDispatchIndirectCommand in buffer to set the number of local workgroups
+vkCmdDispatchIndirect(my_buffer);
+----
+
+== Shared memory
+
+When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class.
+
+Shared memory is essentially the "L1 cache you can control" in your compute shader and an important part of any performant shader.
+
+There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeSharedMemorySize&platform=all[mainly around 32k bytes]) that needs to be accounted for.
+
+=== Shared Memory Race Conditions
+
+It is very easy to have race conditions when using shared memory.
+
+The classic example is when multiple invocations initialize something to the same value.
+
+[source,glsl]
+----
+shared uint my_var;
+void main() {
+    // All the invocations in the workgroup are going to try to write to the same memory.
+    // RACE CONDITION
+    my_var = 0;
+}
+----
+
+If you are asking "why?", the "technically correct" answer is "because the link:https://docs.vulkan.org/spec/latest/appendices/memorymodel.html[memory model] says so".
+
+When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value.
+
+Luckily the fix is simple, make sure to use atomics
+
+[source,glsl]
+----
+shared uint my_var;
+void main() {
+    atomicStore(my_var, 0u, gl_ScopeWorkgroup, 0, 0);
+}
+----
+
+Another option is to use a `OpControlBarrier` with `Workgroup` scope (link:https://godbolt.org/z/WcsvjYfPx[see online]).
+
+[source,glsl]
+----
+layout(local_size_x = 32) in; // 32x1x1 workgroup
+shared uint my_var[32]; // one slot for each invocation
+
+void main() {
+    my_var[gl_LocalInvocationIndex] = 0;
+    barrier(); // will generate an OpControlBarrier for you
+    uint x = my_var[gl_LocalInvocationIndex ^ 1];
+}
+----
+
+==== Detecting shared memory data races
+
+Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers!
+
+As of March 2026 (TODO - Add SDK version when released in May), GPU-AV will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses.
+
+=== Explicit Layout of shared memory
+
+The xref:{chapters}extensions/shader_features.adoc#VK_KHR_workgroup_memory_explicit_layout[VK_KHR_workgroup_memory_explicit_layout] extension was added to allow link:https://github.com/KhronosGroup/SPIRV-Guide/blob/main/chapters/explicit_layout.md[explicit layout] of shared memory.
+
+== Finding the invocation in your shader
+
+There are many SPIR-V built-in values that can be used to find the invocation in your shader.
+
+The following built-ins are well defined in the link:https://docs.vulkan.org/spec/latest/chapters/interfaces.html#interfaces-builtin-variables[builtin chapter] of the Vulkan spec.
+
+- `GlobalInvocationId`
+- `LocalInvocationId`
+- `LocalInvocationIndex`
+- `NumSubgroups`
+- `NumWorkgroups`
+- `SubgroupId`
+- `WorkgroupId`
+
+For those who want a more "hands on" example, link:https://godbolt.org/z/qhPrE6o5b[the following GLSL] demonstrates using most of these built-ins.
+