-
Notifications
You must be signed in to change notification settings - Fork 192
Add a chapter on Compute Shaders #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d0eb0a5
aff6fdb
9ccfb65
c12d907
2c985f4
07b2e8b
f236747
ed173ea
080fd5a
167676c
e1c6ece
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| // Copyright 2026 The Khronos Group, Inc. | ||
| // SPDX-License-Identifier: CC-BY-4.0 | ||
|
|
||
| // Required for both single-page and combined guide xrefs to work | ||
| ifndef::chapters[:chapters:] | ||
| ifndef::images[:images: images/] | ||
|
|
||
| // the [] in the URL messes up asciidoc | ||
| :max-compute-workgroup-size-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize[0]&platform=all | ||
| :max-compute-workgroup-count-link: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount[0]&platform=all | ||
|
|
||
| [[compute-shaders]] | ||
| = Compute Shaders | ||
|
|
||
| This chapter is **not** a "how to use compute shader" article, there are plenty of resources online around GPGPU and compute. | ||
|
|
||
| What this chapter is for is all the "Vulkan-ism", terms, etc that are associated with compute shaders. | ||
|
|
||
| There is also a xref:{chapters}decoder_ring.adoc[Decoder Ring] created to help people transition from other APIs that use different terminology. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| If you want to play around with a simple compute example, suggest taking a look at the link:https://github.com/charles-lunarg/vk-bootstrap/blob/main/example/simple_compute.cpp[vk-bootstrap sample]. | ||
| ==== | ||
|
|
||
| == Coming from Vulkan Graphics | ||
|
|
||
| For those who are more familiar with graphics in Vulkan, compute will be a simple transition. Basically everything is the same except: | ||
|
|
||
| - Call `vkCmdDispatch` instead of `vkCmdDraw` | ||
| - Use `vkCreateComputePipelines` instead of `vkCreateGraphicsPipelines` | ||
| - Make sure your `VkQueue` xref:{chapters}queues.adoc[supports] `VK_QUEUE_COMPUTE_BIT` | ||
| - When binding descriptors and pipelines to your command buffer, make sure to use `VK_PIPELINE_BIND_POINT_COMPUTE` | ||
|
|
||
| == SPIR-V Terminology | ||
|
|
||
| The smallest unit of work that is done is called an `invocation`. It is a "thread" or "lane" of work. | ||
|
|
||
| Multiple `Invocations` are organized into `subgroups`, where `invocations` within a `subgroup` can synchronize and share data with each other efficiently. (See more in the xref:{chapters}subgroups.adoc[subgroup chapter]) | ||
|
|
||
| Next we have `workgroups` which is the smallest unit of work that an application can define. A `workgroup` is a collection of `invocations` that execute the same shader. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| While slightly annoying, Vulkan spec uses `WorkGroup` while the SPIR-V spec spells it as `Workgroup`. It has no significant meaning, other than a potential typo when going between the two. | ||
| ==== | ||
|
|
||
| === Workgroup Size | ||
|
|
||
| Setting the `workgroup` size can be done in 3 ways | ||
|
|
||
| 1. Using the `WorkgroupSize` built-in (link:https://godbolt.org/z/ees83eT7x[example]) | ||
spencer-lunarg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 2. Using the `LocalSize` execution mode (link:https://godbolt.org/z/3zn1Preb8[example]) | ||
| 3. Using the `LocalSizeId` execution mode (link:https://godbolt.org/z/dP7daqTas[example]) | ||
|
|
||
| A few important things to note: | ||
|
|
||
| - The `WorkgroupSize` decoration will take precedence over any `LocalSize` or `LocalSizeId` in the same module. | ||
| - `LocalSizeId` was added in the extension `VK_KHR_maintenance4` (made core in Vulkan 1.3) to allow the ability to use specialization constants to set the size. | ||
| - There is a `maxComputeWorkGroupSize` limit how large the `X`, `Y`, and `Z` size can each be in each dimension. Most implementations {max-compute-workgroup-size-link}[support around 1024 for each dimension]. | ||
| - There is a `maxComputeWorkGroupInvocations` limit how large the product of `X` * `Y` * `Z` can be. Most implementations link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations&platform=all[support around 1024]. | ||
|
|
||
| === Local and Global Workgroups | ||
|
|
||
| When `vkCmdDispatch` is called, it sets the number of workgroups to dispatch. This produces a `global workgroup` space that the GPU will work on. Each single workgroup is a `local workgroup`. An `invocation` within a `local workgroup` can share data with other members of the `local workgroup` through shared variables as well as issue memory and control flow barriers to synchronize with other members of the `local workgroup`. | ||
|
|
||
| [NOTE] | ||
| ==== | ||
| There is a `maxComputeWorkGroupCount` limit link:{max-compute-workgroup-count-link}[some hardware] supports only 64k, but newer hardware can basically be unlimited here. | ||
| ==== | ||
|
|
||
| === Dispatching size from a buffer | ||
|
|
||
| The `vkCmdDispatchIndirect` (and newer `vkCmdDispatchIndirect2KHR`) allow the size to be controlled from a buffer. This means the GPU can set the number of workgroups to dispatch. | ||
|
|
||
| [source,glsl] | ||
| ---- | ||
| // or any other draw/dispath that will update the memory on the GPU | ||
| vkCmdDispatch(); | ||
|
|
||
| vkCmdPipelineBarrier( | ||
| VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, // src stage | ||
| VK_ACCESS_SHADER_WRITE_BIT, // src access | ||
| VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, // dst stage | ||
| VK_ACCESS_INDIRECT_COMMAND_READ_BIT, // dst access | ||
| ) | ||
|
|
||
| // Reads VkDispatchIndirectCommand in buffer to set the number of local workgroups | ||
| vkCmdDispatchIndirect(my_buffer); | ||
| ---- | ||
|
|
||
| == Shared memory | ||
|
|
||
| When inside a single `local workgroup` "shared memory" can be used. In SPIR-V this is referenced with the `Workgroup` storage class. | ||
|
|
||
| Shared memory is essentially the "L1 cache you can control" in your compute shader and an important part of any performant shader. | ||
|
|
||
| There is a `maxComputeSharedMemorySize` limit (link:https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeSharedMemorySize&platform=all[mainly around 32k bytes]) that needs to be accounted for. | ||
|
|
||
| === Shared Memory Race Conditions | ||
|
|
||
| It is very easy to have race conditions when using shared memory. | ||
|
|
||
| The classic example is when multiple invocations initialize something to the same value. | ||
|
|
||
| [source,glsl] | ||
| ---- | ||
| shared uint my_var; | ||
| void main() { | ||
| // All the invocations in the workgroup are going to try to write to the same memory. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's confusing to lead with this example, and would be better to start with examples of cross-thread communication and using barrier() first.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you provide some GLSL snippets here (in the comment), I know you are more of an expert in this area and probably better at providing some better examples to provide There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could use a reduction as a simple example of cross-thread communication:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, we can add this in a follow up, this example makes is confusing... was is suppose to be an example of having a race condition? |
||
| // RACE CONDITION | ||
| my_var = 0; | ||
| } | ||
| ---- | ||
|
|
||
| If you are asking "why?", the "technically correct" answer is "because the link:https://docs.vulkan.org/spec/latest/appendices/memorymodel.html[memory model] says so". | ||
|
|
||
| When you do a weak store to a memory location, that invocation "owns" that memory location until synchronization occurs. The compiler **can** use that information and choose to reuse that location as temporary storage for another value. | ||
|
|
||
| Luckily the fix is simple, make sure to use atomics | ||
|
|
||
| [source,glsl] | ||
| ---- | ||
| shared uint my_var; | ||
| void main() { | ||
| atomicStore(my_var, 0u, gl_ScopeWorkgroup, 0, 0); | ||
| } | ||
| ---- | ||
|
|
||
| Another option is to use a `OpControlBarrier` with `Workgroup` scope (link:https://godbolt.org/z/WcsvjYfPx[see online]). | ||
|
|
||
| [source,glsl] | ||
| ---- | ||
| layout(local_size_x = 32) in; // 32x1x1 workgroup | ||
| shared uint my_var[32]; // one slot for each invocation | ||
|
|
||
| void main() { | ||
| my_var[gl_LocalInvocationIndex] = 0; | ||
| barrier(); // will generate an OpControlBarrier for you | ||
| uint x = my_var[gl_LocalInvocationIndex ^ 1]; | ||
| } | ||
| ---- | ||
|
|
||
| ==== Detecting shared memory data races | ||
|
|
||
| Luckily this problem can be caught automatically using the link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/gpu_validation.md[GPU-AV] feature in Vulkan Validation Layers! | ||
|
|
||
| As of March 2026 (TODO - Add SDK version when released in May), GPU-AV will attempt to detect these races for you. There are a link:https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/layers/gpuav/shaders/instrumentation/shared_memory_data_race.comp#L47[few limitations], but highly suggest trying out if having strange issues around your shared memory accesses. | ||
|
|
||
| === Explicit Layout of shared memory | ||
|
|
||
| The xref:{chapters}extensions/shader_features.adoc#VK_KHR_workgroup_memory_explicit_layout[VK_KHR_workgroup_memory_explicit_layout] extension was added to allow link:https://github.com/KhronosGroup/SPIRV-Guide/blob/main/chapters/explicit_layout.md[explicit layout] of shared memory. | ||
|
|
||
| == Finding the invocation in your shader | ||
|
|
||
| There are many SPIR-V built-in values that can be used to find the invocation in your shader. | ||
|
|
||
| The following built-ins are well defined in the link:https://docs.vulkan.org/spec/latest/chapters/interfaces.html#interfaces-builtin-variables[builtin chapter] of the Vulkan spec. | ||
|
|
||
| - `GlobalInvocationId` | ||
| - `LocalInvocationId` | ||
| - `LocalInvocationIndex` | ||
| - `NumSubgroups` | ||
| - `NumWorkgroups` | ||
| - `SubgroupId` | ||
| - `WorkgroupId` | ||
|
|
||
| For those who want a more "hands on" example, link:https://godbolt.org/z/qhPrE6o5b[the following GLSL] demonstrates using most of these built-ins. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.