From d0eb0a549fe8a70b0042578fa75905ca06982d97 Mon Sep 17 00:00:00 2001
From: spencer-lunarg <spencer@lunarg.com>
Date: Sat, 14 Mar 2026 16:59:07 -0400
Subject: [PATCH 01/11] Add a chapter on Compute Shaders

---
 README.adoc                   |   2 +
 antora/modules/ROOT/nav.adoc  |   1 +
 chapters/compute_shaders.adoc | 150 ++++++++++++++++++++++++++++++++++
 3 files changed, 153 insertions(+)
 create mode 100644 chapters/compute_shaders.adoc

diff --git a/README.adoc b/README.adoc
index 72d6f4d..7ab42d5 100644
--- a/README.adoc
+++ b/README.adoc
@@ -154,6 +154,8 @@ The Vulkan Guide content is also viewable from https://docs.vulkan.org/guide/lat
 
 === xref:{chapters}dynamic_state_map.adoc[Dynamic State Map]
 
+== xref:{chapters}compute_shaders.adoc[Compute Shaders]
+
 == xref:{chapters}subgroups.adoc[Subgroups]
 
   * `VK_EXT_subgroup_size_control`, `VK_KHR_shader_subgroup_extended_types`, `VK_EXT_shader_subgroup_ballot`, `VK_EXT_shader_subgroup_vote`
diff --git a/antora/modules/ROOT/nav.adoc b/antora/modules/ROOT/nav.adoc
index 6fd940a..c5662fe 100644
--- a/antora/modules/ROOT/nav.adoc
+++ b/antora/modules/ROOT/nav.adoc
@@ -60,6 +60,7 @@
 ** xref:{chapters}robustness.adoc[]
 ** xref:{chapters}dynamic_state.adoc[]
 *** xref:{chapters}dynamic_state_map.adoc[]
+** xref:{chapters}compute_shaders.adoc[]
 ** xref:{chapters}subgroups.adoc[]
 ** xref:{chapters}shader_memory_layout.adoc[]
 ** xref:{chapters}atomics.adoc[]
diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
new file mode 100644
index 0000000..28a0b68
--- /dev/null
+++ b/chapters/compute_shaders.adoc
@@ -0,0 +1,150 @@
+// Copyright 2026 The Khronos Group, Inc.
+// SPDX-License-Identifier: CC-BY-4.0
+
+// Required for both single-page and combined guide xrefs to work
+ifndef::chapters[:chapters:]
+ifndef::images[:images: images/]
+
+// the [] in the URL messes up asciidoc
+:max-compute-workgroup-size-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize&#91;0&#93;&platform=all
+:max-compute-workgroup-count-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount&#91;0&#93;&platform=all
+
+[[compute-shaders]]
+= Compute Shaders
+
+This chapter is **not** a "how to use compute shader" article, there are plenty of resources online around GPGPU and compute.
+
+What this chapter is for is all the "Vulkan-ism", terms, etc that are associated with compute shaders.
+
+There is also a xref:{chapters}decoder_ring.adoc[Decoder Ring] created to help people transition from other APIs that use different terminology.
+
+[NOTE]
+====
+If you want to play around with a simple compute example, suggest taking a look at the link:https://github.com/charles-lunarg/vk-bootstrap/blob/main/example/simple_compute.cpp[vk-bootstrap sample].
+====
+
+== Coming from Vulkan Graphics
+
+For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except:
+
+- No vertex buffers, render passes, swapchains needed
+- Call `vkCmdDispatch` instead of `vkCmdDraw`
+- Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines`
+- Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT`
+- When binding descriptors or pipeline to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE`
+
+== SPIR-V Terminology
+
+The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work.
+
+`Invocations` are partitioned into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter])
+
+Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader.
+
+[NOTE]
+====
+While slightly annoying, Vulkan spec uses `WorkGroup` while the SPIR-V spec spells it as `Workgroup`. It has no significant meaning, other than a potential typo when going between the two.
+====
+
+=== Workgroup Size
+
+Setting the `workgroup` size can be done in 3 ways
+
+1. Using the `WorkgroupSize` built-in (link:https://godbolt.org/z/ees83eT7x[example])
+2. Using the `LocalSize` execution mode (link:https://godbolt.org/z/3zn1Preb8[example])
+3. Using the `LocalSizeId` execution mode (link:https://godbolt.org/z/dP7daqTas[example])
+
+A few important things to note:
+
+- The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module.
+- `LocalSizeId` was added in `VK_KHR_maintenance4` (Vulkan 1.3) to allow the ability to use specialization constants to set the size.
+- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension])
+- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024])
+
+=== Local and Global Workgroups
+
+When a `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables and issue memory and control flow barriers to synchronize with other members of the `local workgroup`.
+
+[NOTE]
+====
+There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here.
+====
+
+== Shared memory
+
+When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class.
+
+Shared memory is essentially the "L1 cache you can control" in your compute shader and an important part of any performant shader.
+
+There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeSharedMemorySize&platform=all[mainly around 32k bytes]) that needs to be accounted for.
+
+=== Shared Memory Race Conditions
+
+It is very easy to get into race conditions with shared memory.
+
+The classic example is having multiple invocations initializing something to the same value.
+
+[source,glsl]
+----
+shared uint my_var;
+void main() {
+    // All the invocations in the workgroup are going to try to write to the same memory.
+    // RACE CONDITION
+    my_var = 0;
+}
+----
+
+If you are asking "why?", the "technically correct" answer is "because the link:https://docs.vulkan.org/spec/latest/appendices/memorymodel.html[memory model] says so".
+
+When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value.
+
+Luckily the "fix" is simple, make sure to use atomics
+
+[source,glsl]
+----
+shared uint my_var;
+void main() {
+    atomicStore(temp, 0u, gl_ScopeWorkgroup, 0, 0);
+}
+----
+
+Another option is to use a `OpControlBarrier` with `Workgroup` scope (link:https://godbolt.org/z/WcsvjYfPx[see online]).
+
+[source,glsl]
+----
+layout(local_size_x = 32) in; // 32x1x1 workgroup
+shared uint my_var[32]; // one slot for each invocation
+
+void main() {
+    my_var[gl_LocalInvocationIndex] = 0;
+    barrier(); // will generate an OpControlBarrier for you
+    uint x = my_var[gl_LocalInvocationIndex ^ 1];
+}
+----
+
+==== Detecting shared memory data races
+
+Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers!
+
+As of March 2026 (TODO - Add SDK version when released in May) when using GPU-AV, it will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses.
+
+=== Explicit Layout of shared memory
+
+The xref:{chapters}extensions/shader_features.adoc#VK_KHR_workgroup_memory_explicit_layout[VK_KHR_workgroup_memory_explicit_layout] extension was added to allow link:https://github.com/KhronosGroup/SPIRV-Guide/blob/main/chapters/explicit_layout.md[explicit layout] of shared memory.
+
+== Finding the invocation in your shader
+
+There are many SPIR-V built-in values that can be used to find the invocation in your shader.
+
+The following built-ins are well defined in the link:https://docs.vulkan.org/spec/latest/chapters/interfaces.html#interfaces-builtin-variables[builtin chapter] of the Vulkan spec.
+
+- `GlobalInvocationId`
+- `LocalInvocationId`
+- `LocalInvocationIndex`
+- `NumSubgroups`
+- `NumWorkgroups`
+- `SubgroupId`
+- `WorkgroupId`
+
+For those who want a more "hands on" example, link:https://godbolt.org/z/qhPrE6o5b[the following GLSL] demonstrates using most of these built-ins.
+

From aff6fdb47a7ee278df6ce69e7474dad702b1945b Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:34:14 -0700
Subject: [PATCH 02/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index 28a0b68..ddda741 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -31,7 +31,7 @@ For those who are more familiar with graphics in Vulkan, compute will be a simpl
 - Call `vkCmdDispatch` instead of `vkCmdDraw`
 - Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines`
 - Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT`
-- When binding descriptors or pipeline to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE`
+- When binding descriptors and pipelines to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE`
 
 == SPIR-V Terminology
 

From 9ccfb65988f65efc38de8e7b7a3cd2671becfee0 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:34:28 -0700
Subject: [PATCH 03/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index ddda741..1174ad9 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -37,7 +37,7 @@ For those who are more familiar with graphics in Vulkan, compute will be a simpl
 
 The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work.
 
-`Invocations` are partitioned into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter])
+Multiple `Invocations` are organized into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter])
 
 Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader.
 

From c12d90720cd046c9af10ccd4b961a62be4e679e1 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:34:39 -0700
Subject: [PATCH 04/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index 1174ad9..8258e51 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -57,7 +57,7 @@ Setting the `workgroup` size can be done in 3 ways
 A few important things to note:
 
 - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module.
-- `LocalSizeId` was added in `VK_KHR_maintenance4` (Vulkan 1.3) to allow the ability to use specialization constants to set the size.
+- `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size.
 - There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension])
 - There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024])
 

From 2c985f4df9494dd3a5d4eb88ada6adb52d15fb87 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:34:59 -0700
Subject: [PATCH 05/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index 8258e51..b8baa9e 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -58,8 +58,8 @@ A few important things to note:
 
 - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module.
 - `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size.
-- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be (most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension])
-- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be (most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024])
+- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]
+- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]
 
 === Local and Global Workgroups
 

From 07b2e8b41de5c7b10b534e5a23bdf52acc5b87d2 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:35:16 -0700
Subject: [PATCH 06/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index b8baa9e..a3851e0 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -63,7 +63,7 @@ A few important things to note:
 
 === Local and Global Workgroups
 
-When a `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables and issue memory and control flow barriers to synchronize with other members of the `local workgroup`.
+When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables as well as issue memory and control flow barriers to synchronize with other members of the `local workgroup`.
 
 [NOTE]
 ====

From f236747198f8b573bd9ccbaf71524e4cd8ea5c79 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:35:58 -0700
Subject: [PATCH 07/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>

From ed173ead4455d249579002202e5ff97e67a1b8d7 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:36:15 -0700
Subject: [PATCH 08/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index a3851e0..3920bf4 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -80,7 +80,7 @@ There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/d
 
 === Shared Memory Race Conditions
 
-It is very easy to get into race conditions with shared memory.
+It is very easy to have race conditions when using shared memory.
 
 The classic example is having multiple invocations initializing something to the same value.
 

From 080fd5a2c3f758368e39b1079dc5b7fd2c808e04 Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:36:30 -0700
Subject: [PATCH 09/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index 3920bf4..6bf8346 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -82,7 +82,7 @@ There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/d
 
 It is very easy to have race conditions when using shared memory.
 
-The classic example is having multiple invocations initializing something to the same value.
+The classic example is when multiple invocations initialize something to the same value.
 
 [source,glsl]
 ----

From 167676c43e0cfbb9144149f662325493c3bd7b8a Mon Sep 17 00:00:00 2001
From: Steven Winston <gpx1000@users.noreply.github.com>
Date: Fri, 20 Mar 2026 14:36:47 -0700
Subject: [PATCH 10/11] Update chapters/compute_shaders.adoc

Co-authored-by: Charles Giessen <46324611+charles-lunarg@users.noreply.github.com>
---
 chapters/compute_shaders.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index 6bf8346..a59dc30 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -126,7 +126,7 @@ void main() {
 
 Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers!
 
-As of March 2026 (TODO - Add SDK version when released in May) when using GPU-AV, it will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses.
+As of March 2026 (TODO - Add SDK version when released in May), GPU-AV will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses.
 
 === Explicit Layout of shared memory
 

From e1c6ece30db88c5e4e5762b919a561e083c3a588 Mon Sep 17 00:00:00 2001
From: spencer-lunarg <spencer@lunarg.com>
Date: Sun, 22 Mar 2026 11:19:01 -0400
Subject: [PATCH 11/11] final review

---
 chapters/compute_shaders.adoc | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/chapters/compute_shaders.adoc b/chapters/compute_shaders.adoc
index a59dc30..718b397 100644
--- a/chapters/compute_shaders.adoc
+++ b/chapters/compute_shaders.adoc
@@ -27,7 +27,6 @@ If you want to play around with a simple compute example, suggest taking a look
 
 For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except:
 
-- No vertex buffers, render passes, swapchains needed
 - Call `vkCmdDispatch` instead of `vkCmdDraw`
 - Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines`
 - Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT`
@@ -58,8 +57,8 @@ A few important things to note:
 
 - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module.
 - `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size.
-- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]
-- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]
+- There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be in each dimension. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension].
+- There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024].
 
 === Local and Global Workgroups
 
@@ -70,6 +69,26 @@ When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. Th
 There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here.
 ====
 
+=== Dispatching size from a buffer
+
+The `vkCmdDispatchIndirect` (and newer `vkCmdDispatchIndirect2KHR`) allow the size to be controlled from a buffer. This means the GPU can set the number of workgroups to dispatch.
+
+[source,glsl]
+----
+// or any other draw/dispath that will update the memory on the GPU
+vkCmdDispatch();
+
+vkCmdPipelineBarrier(
+    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, // src stage
+    VK_ACCESS_SHADER_WRITE_BIT,           // src access
+    VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,  // dst stage
+    VK_ACCESS_INDIRECT_COMMAND_READ_BIT,  // dst access
+)
+
+// Reads VkDispatchIndirectCommand in buffer to set the number of local workgroups
+vkCmdDispatchIndirect(my_buffer);
+----
+
 == Shared memory
 
 When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class.
@@ -98,13 +117,13 @@ If you are asking "why?", the "technically correct" answer is "because the link:
 
 When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value.
 
-Luckily the "fix" is simple, make sure to use atomics
+Luckily the fix is simple, make sure to use atomics
 
 [source,glsl]
 ----
 shared uint my_var;
 void main() {
-    atomicStore(temp, 0u, gl_ScopeWorkgroup, 0, 0);
+    atomicStore(my_var, 0u, gl_ScopeWorkgroup, 0, 0);
 }
 ----