feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation#2142
feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation#2142
@typegpu/sort scaffolding with simple bitonic sort implementation#2142Conversation
@typegpu/sort scaffolding with simple bitonic sort implementation
📊 Bundle Size Comparison
👀 Notable resultsStatic test results:No major changes. Dynamic test results:No major changes. 📋 All resultsClick to reveal the results table (344 entries).
If you wish to run a comparison for other, slower bundlers, run the 'Tree-shake test' from the GitHub Actions menu. |
|
pkg.pr.new packages benchmark commit |
cieplypolar
left a comment
There was a problem hiding this comment.
Amazing work! Love the bitwise operations within the sorting kernel.
Left some nits.
In the future, we could optimize this by using workgroup shared memory for example.
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts
Outdated
Show resolved
Hide resolved
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts
Outdated
Show resolved
Hide resolved
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
general remark, I don't remember the maximal size of wgsl buffer, but using only 1 dimension of compute grid seems limiting. I believe, we utilized all 3 dimensions during the development of the parallel scan.
There was a problem hiding this comment.
The dimensions do not really matter. We should get good occupancy with 256 threads in a workgroup regardless if its 256 or 16x16.
There was a problem hiding this comment.
I think that it's not about the occupancy, but about the limit imposed by maxComputeWorkgroupSizeX and maxComputeWorkgroupsPerDimension limits (which, at worst, is only 2^24, so approx 16kk)
There was a problem hiding this comment.
Great catch! Added decomposition logic for big arrays and increased the buffer sizes in examples
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/decomposeWorkgroups.ts
Outdated
Show resolved
Hide resolved
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Szymon Szulc <103948576+cieplypolar@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new @typegpu/sort workspace package that bundles GPU sorting (bitonic sort) plus the existing scan/prefix-scan implementation, and updates docs/examples/tests to consume the new package while removing @typegpu/concurrent-scan.
Changes:
- Added
@typegpu/sortpackage scaffolding and a bitonic sort implementation (with padding + comparator slot support). - Migrated/rehomed scan/prefix-scan APIs into
@typegpu/sort(including renaminginitCache→createPrefixScanComputer). - Updated TypeGPU docs app sandbox module mapping and examples/tests to use
@typegpu/sort; removed the old@typegpu/concurrent-scanpackage references.
Reviewed changes
Copilot reviewed 29 out of 32 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Removes @typegpu/concurrent-scan workspace link and adds @typegpu/sort. |
| packages/typegpu/tests/utils/extendedIt.ts | Extends WebGPU adapter mock limits with maxBufferSize for new example needs. |
| packages/typegpu/tests/examples/individual/bitonic-sort.test.ts | Adds an example shader-generation snapshot test for the new bitonic sort example. |
| packages/typegpu-sort/tsconfig.json | New package TS config. |
| packages/typegpu-sort/src/scan/types.ts | Introduces BinaryOp type in its own module. |
| packages/typegpu-sort/src/scan/schemas.ts | Updates TypeGPU imports and keeps scan bind group layouts/slots. |
| packages/typegpu-sort/src/scan/prefixScan.ts | Renames cache initializer API and adjusts imports; core scan logic remains. |
| packages/typegpu-sort/src/scan/index.ts | Re-exports scan APIs/types. |
| packages/typegpu-sort/src/scan/compute/shared.ts | Consolidates TypeGPU imports for scan compute helpers. |
| packages/typegpu-sort/src/scan/compute/scan.ts | Consolidates TypeGPU imports for scan kernel generation. |
| packages/typegpu-sort/src/scan/compute/applySums.ts | Consolidates TypeGPU imports for scan “apply sums” kernel. |
| packages/typegpu-sort/src/index.ts | Public package surface for bitonic + scan exports. |
| packages/typegpu-sort/src/bitonic/utils.ts | Adds nextPowerOf2 and dispatch-grid decomposition helper. |
| packages/typegpu-sort/src/bitonic/types.ts | Defines public sorter options/run options/interfaces. |
| packages/typegpu-sort/src/bitonic/slots.ts | Defines comparator slot and default comparator. |
| packages/typegpu-sort/src/bitonic/index.ts | Bitonic module exports. |
| packages/typegpu-sort/src/bitonic/bitonicSort.ts | Implements the bitonic sorter (padding copy, step kernel, timestamps). |
| packages/typegpu-sort/package.json | Renames/defines the new package metadata and export map. |
| packages/typegpu-sort/deno.json | Adjusts Deno fmt exclusions (adds dist). |
| packages/typegpu-sort/build.config.ts | Updates unbuild config default export shape. |
| packages/typegpu-sort/README.md | Adds initial usage docs for bitonic sort and prefix scan. |
| packages/typegpu-concurrent-scan/src/index.ts | Removes legacy package entrypoint export. |
| packages/typegpu-concurrent-scan/README.md | Removes legacy package README. |
| apps/typegpu-docs/src/utils/examples/sandboxModules.ts | Updates sandbox module routing from @typegpu/concurrent-scan to @typegpu/sort. |
| apps/typegpu-docs/src/examples/tests/prefix-scan/index.ts | Switches scan imports to @typegpu/sort. |
| apps/typegpu-docs/src/examples/tests/prefix-scan/functions.ts | Switches BinaryOp import to @typegpu/sort. |
| apps/typegpu-docs/src/examples/algorithms/concurrent-chart/calculator.ts | Updates API usage to createPrefixScanComputer from @typegpu/sort. |
| apps/typegpu-docs/src/examples/algorithms/bitonic-sort/meta.json | Adds metadata for the new bitonic sort example. |
| apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts | Adds the bitonic sort interactive example implementation. |
| apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.html | Adds the example HTML + overlay UI. |
| apps/typegpu-docs/package.json | Swaps dependency from @typegpu/concurrent-scan to @typegpu/sort. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const originalSize = data.dataType.elementCount; | ||
| const paddedSize = nextPowerOf2(originalSize); | ||
| const wasPadded = paddedSize !== originalSize; | ||
|
|
||
| const paddingValue = options?.paddingValue ?? 0xffffffff; | ||
| const compareFunc = options?.compare ?? defaultCompare; |
There was a problem hiding this comment.
createBitonicSorter derives originalSize from data.dataType.elementCount, but TypeGPU uses elementCount === 0 for runtime-sized arrays (e.g. d.arrayOf(d.u32)), which would make the sorter silently treat the input as length 0/1 and compute incorrect dispatch/padding. Consider validating elementCount > 0 (throw a clear error) and/or changing the API to accept an explicit length / require a fixed-size d.arrayOf(d.u32, N) buffer type so this can’t happen accidentally.
| const maxBufferSize = await navigator.gpu.requestAdapter().then((adapter) => { | ||
| if (!adapter) { | ||
| throw new Error('No GPU adapter found'); | ||
| } | ||
| const limits = adapter.limits; | ||
| return Math.min(limits.maxStorageBufferBindingSize, limits.maxBufferSize); | ||
| }); | ||
|
|
||
| const root = await tgpu.init({ | ||
| device: { | ||
| optionalFeatures: ['timestamp-query'], | ||
| requiredLimits: { | ||
| maxStorageBufferBindingSize: maxBufferSize, | ||
| maxBufferSize: maxBufferSize, | ||
| }, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
This example calls navigator.gpu.requestAdapter() to compute limits, but then tgpu.init() will call navigator.gpu.requestAdapter() again internally. If the browser returns a different adapter the second time, the requiredLimits computed from the first adapter may exceed the second adapter’s limits and cause initialization to fail. Prefer requesting the device once (using the adapter you queried) and passing it to tgpu.initFromDevice, or otherwise ensuring the same adapter/device is used for both limit discovery and initialization.
| const maxBufferSize = await navigator.gpu.requestAdapter().then((adapter) => { | |
| if (!adapter) { | |
| throw new Error('No GPU adapter found'); | |
| } | |
| const limits = adapter.limits; | |
| return Math.min(limits.maxStorageBufferBindingSize, limits.maxBufferSize); | |
| }); | |
| const root = await tgpu.init({ | |
| device: { | |
| optionalFeatures: ['timestamp-query'], | |
| requiredLimits: { | |
| maxStorageBufferBindingSize: maxBufferSize, | |
| maxBufferSize: maxBufferSize, | |
| }, | |
| }, | |
| }); | |
| const adapter = await navigator.gpu.requestAdapter(); | |
| if (!adapter) { | |
| throw new Error('No GPU adapter found'); | |
| } | |
| const limits = adapter.limits; | |
| const maxBufferSize = Math.min( | |
| limits.maxStorageBufferBindingSize, | |
| limits.maxBufferSize, | |
| ); | |
| const requiredFeatures: GPUFeatureName[] = []; | |
| if (adapter.features.has('timestamp-query' as GPUFeatureName)) { | |
| requiredFeatures.push('timestamp-query'); | |
| } | |
| const device = await adapter.requestDevice({ | |
| requiredLimits: { | |
| maxStorageBufferBindingSize: maxBufferSize, | |
| maxBufferSize: maxBufferSize, | |
| }, | |
| requiredFeatures, | |
| }); | |
| const root = await tgpu.initFromDevice(device); |
It also eats the concurrent-sum library