Skip to content

feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation#2142

Open
reczkok wants to merge 31 commits intomainfrom
feat/tgpu-sort
Open

feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation#2142
reczkok wants to merge 31 commits intomainfrom
feat/tgpu-sort

Conversation

@reczkok
Copy link
Contributor

@reczkok reczkok commented Feb 3, 2026

It also eats the concurrent-sum library

@reczkok reczkok changed the title feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation feat: Add @typegpu/sort scaffolding with simple bitonic sort implementation Feb 3, 2026
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

📊 Bundle Size Comparison

🟢 Decreased ➖ Unchanged 🔴 Increased ❔ Unknown
0 345 0 0

👀 Notable results

Static test results:

No major changes.

Dynamic test results:

No major changes.

📋 All results

Click to reveal the results table (344 entries).
Test tsdown
dataImportEverything.ts 85.95 kB (➖)
dataImportOneDirect.ts 55.52 kB (➖)
dataImportOneStar.ts 55.52 kB (➖)
functionWithUseGpu.ts 282 B (➖)
functionWithoutUseGpu.ts 24 B (➖)
importEntireLibrary.ts 270.04 kB (➖)
stdImportEverything.ts 98.84 kB (➖)
stdImportOneDirect.ts 44.00 kB (➖)
stdImportOneStar.ts 44.00 kB (➖)
tgpuImportEverything.ts 254.94 kB (➖)
tgpuImportOne.ts 254.95 kB (➖)
MissingBindGroupsError from typegpu.ts 44.21 kB (➖)
MissingLinksError from typegpu.ts 44.17 kB (➖)
MissingSlotValueError from typegpu.ts 44.11 kB (➖)
MissingVertexBuffersError from typegpu.ts 44.22 kB (➖)
NotUniformError from typegpu.ts 44.17 kB (➖)
ResolutionError from typegpu.ts 44.41 kB (➖)
Void from typegpudata.ts 44.04 kB (➖)
abs from typegpustd.ts 92.81 kB (➖)
acos from typegpustd.ts 92.81 kB (➖)
acosh from typegpustd.ts 92.81 kB (➖)
add from typegpustd.ts 44.00 kB (➖)
align from typegpudata.ts 55.55 kB (➖)
alignmentOf from typegpudata.ts 55.52 kB (➖)
allEq from typegpustd.ts 92.80 kB (➖)
all from typegpustd.ts 92.80 kB (➖)
and from typegpustd.ts 92.79 kB (➖)
any from typegpustd.ts 92.81 kB (➖)
arrayLength from typegpustd.ts 92.80 kB (➖)
arrayOf from typegpudata.ts 55.49 kB (➖)
asin from typegpustd.ts 92.81 kB (➖)
asinh from typegpustd.ts 92.81 kB (➖)
atan2 from typegpustd.ts 92.81 kB (➖)
atan from typegpustd.ts 92.81 kB (➖)
atanh from typegpustd.ts 92.81 kB (➖)
atomicAdd from typegpustd.ts 92.81 kB (➖)
atomicAnd from typegpustd.ts 92.81 kB (➖)
atomicLoad from typegpustd.ts 92.79 kB (➖)
atomicMax from typegpustd.ts 92.81 kB (➖)
atomicMin from typegpustd.ts 92.81 kB (➖)
atomicOr from typegpustd.ts 92.81 kB (➖)
atomicStore from typegpustd.ts 92.80 kB (➖)
atomicSub from typegpustd.ts 92.81 kB (➖)
atomicXor from typegpustd.ts 92.80 kB (➖)
atomic from typegpudata.ts 44.73 kB (➖)
bitcastU32toF32 from typegpustd.ts 92.81 kB (➖)
bitcastU32toI32 from typegpustd.ts 92.81 kB (➖)
bool from typegpudata.ts 43.99 kB (➖)
builtin from typegpudata.ts 55.75 kB (➖)
ceil from typegpustd.ts 92.81 kB (➖)
clamp from typegpustd.ts 92.81 kB (➖)
common from typegpu.ts 63.13 kB (➖)
comparisonSampler from typegpudata.ts 44.71 kB (➖)
cos from typegpustd.ts 92.81 kB (➖)
cosh from typegpustd.ts 92.81 kB (➖)
countLeadingZeros from typegpustd.ts 92.81 kB (➖)
countOneBits from typegpustd.ts 92.81 kB (➖)
countTrailingZeros from typegpustd.ts 92.81 kB (➖)
cross from typegpustd.ts 92.81 kB (➖)
d from typegpu.ts 83.66 kB (➖)
deepEqual from typegpudata.ts 45.65 kB (➖)
degrees from typegpustd.ts 92.81 kB (➖)
determinant from typegpustd.ts 92.81 kB (➖)
disarrayOf from typegpudata.ts 44.65 kB (➖)
discard from typegpustd.ts 92.80 kB (➖)
distance from typegpustd.ts 92.80 kB (➖)
div from typegpustd.ts 44.00 kB (➖)
dot4I8Packed from typegpustd.ts 92.81 kB (➖)
dot4U8Packed from typegpustd.ts 92.80 kB (➖)
dot from typegpustd.ts 92.80 kB (➖)
dpdxCoarse from typegpustd.ts 92.81 kB (➖)
dpdxFine from typegpustd.ts 92.81 kB (➖)
dpdx from typegpustd.ts 92.80 kB (➖)
dpdyCoarse from typegpustd.ts 92.81 kB (➖)
dpdyFine from typegpustd.ts 92.81 kB (➖)
dpdy from typegpustd.ts 92.81 kB (➖)
eq from typegpustd.ts 92.80 kB (➖)
exp2 from typegpustd.ts 92.81 kB (➖)
exp from typegpustd.ts 92.81 kB (➖)
extensionEnabled from typegpustd.ts 92.81 kB (➖)
extractBits from typegpustd.ts 92.81 kB (➖)
f16 from typegpudata.ts 43.99 kB (➖)
f32 from typegpudata.ts 43.99 kB (➖)
faceForward from typegpustd.ts 92.81 kB (➖)
firstLeadingBit from typegpustd.ts 92.81 kB (➖)
firstTrailingBit from typegpustd.ts 92.81 kB (➖)
float16 from typegpudata.ts 55.50 kB (➖)
float16x2 from typegpudata.ts 55.50 kB (➖)
float16x4 from typegpudata.ts 55.50 kB (➖)
float32 from typegpudata.ts 55.50 kB (➖)
float32x2 from typegpudata.ts 55.50 kB (➖)
float32x3 from typegpudata.ts 55.50 kB (➖)
float32x4 from typegpudata.ts 55.50 kB (➖)
floor from typegpustd.ts 92.81 kB (➖)
fma from typegpustd.ts 92.81 kB (➖)
formatToWGSLType from typegpudata.ts 55.49 kB (➖)
fract from typegpustd.ts 92.80 kB (➖)
frexp from typegpustd.ts 92.80 kB (➖)
fwidthCoarse from typegpustd.ts 92.81 kB (➖)
fwidthFine from typegpustd.ts 92.81 kB (➖)
fwidth from typegpustd.ts 92.81 kB (➖)
ge from typegpustd.ts 92.80 kB (➖)
getLongestContiguousPrefix from typegpudata.ts 56.11 kB (➖)
gt from typegpustd.ts 92.81 kB (➖)
i32 from typegpudata.ts 43.99 kB (➖)
identity2 from typegpustd.ts 43.99 kB (➖)
identity3 from typegpustd.ts 43.99 kB (➖)
identity4 from typegpustd.ts 43.99 kB (➖)
insertBits from typegpustd.ts 92.81 kB (➖)
interpolate from typegpudata.ts 55.56 kB (➖)
invariant from typegpudata.ts 55.82 kB (➖)
inverseSqrt from typegpustd.ts 92.81 kB (➖)
isAccessor from typegpu.ts 44.04 kB (➖)
isAlignAttrib from typegpudata.ts 44.03 kB (➖)
isAtomic from typegpudata.ts 44.03 kB (➖)
isBufferShorthand from typegpu.ts 166.23 kB (➖)
isBuffer from typegpu.ts 165.79 kB (➖)
isBuiltinAttrib from typegpudata.ts 44.04 kB (➖)
isBuiltin from typegpudata.ts 55.49 kB (➖)
isCloseTo from typegpustd.ts 92.81 kB (➖)
isComparisonSampler from typegpu.ts 165.82 kB (➖)
isContiguous from typegpudata.ts 56.09 kB (➖)
isData from typegpudata.ts 44.73 kB (➖)
isDecorated from typegpudata.ts 43.99 kB (➖)
isDisarray from typegpudata.ts 44.04 kB (➖)
isInterpolateAttrib from typegpudata.ts 44.04 kB (➖)
isLazy from typegpu.ts 44.04 kB (➖)
isLocationAttrib from typegpudata.ts 44.04 kB (➖)
isLooseData from typegpudata.ts 44.08 kB (➖)
isLooseDecorated from typegpudata.ts 43.99 kB (➖)
isMutableAccessor from typegpu.ts 44.05 kB (➖)
isPackedData from typegpudata.ts 55.53 kB (➖)
isPtr from typegpudata.ts 43.99 kB (➖)
isSampler from typegpu.ts 165.81 kB (➖)
isSizeAttrib from typegpudata.ts 44.03 kB (➖)
isSlot from typegpu.ts 44.04 kB (➖)
isTexture from typegpu.ts 165.80 kB (➖)
isTgpuComputeFn from typegpu.ts 165.80 kB (➖)
isTgpuFn from typegpu.ts 165.80 kB (➖)
isTgpuFragmentFn from typegpu.ts 165.80 kB (➖)
isTgpuVertexFn from typegpu.ts 62.98 kB (➖)
isUnstruct from typegpudata.ts 44.04 kB (➖)
isUsableAsRender from typegpu.ts 165.79 kB (➖)
isUsableAsSampled from typegpu.ts 165.79 kB (➖)
isUsableAsStorage from typegpu.ts 165.79 kB (➖)
isUsableAsUniform from typegpu.ts 165.79 kB (➖)
isUsableAsVertex from typegpu.ts 165.79 kB (➖)
isVariable from typegpu.ts 165.78 kB (➖)
isWgslArray from typegpudata.ts 44.03 kB (➖)
isWgslData from typegpudata.ts 44.60 kB (➖)
isWgslStruct from typegpudata.ts 44.03 kB (➖)
ldexp from typegpustd.ts 92.81 kB (➖)
le from typegpustd.ts 92.81 kB (➖)
length from typegpustd.ts 92.80 kB (➖)
location from typegpudata.ts 55.49 kB (➖)
log2 from typegpustd.ts 92.81 kB (➖)
log from typegpustd.ts 92.81 kB (➖)
lt from typegpustd.ts 92.80 kB (➖)
mat2x2f from typegpudata.ts 43.99 kB (➖)
mat3x3f from typegpudata.ts 43.99 kB (➖)
mat4x4f from typegpudata.ts 43.99 kB (➖)
matToArray from typegpudata.ts 44.13 kB (➖)
max from typegpustd.ts 92.81 kB (➖)
memoryLayoutOf from typegpudata.ts 73.37 kB (➖)
min from typegpustd.ts 92.81 kB (➖)
mix from typegpustd.ts 92.80 kB (➖)
mod from typegpustd.ts 44.00 kB (➖)
modf from typegpustd.ts 92.80 kB (➖)
mul from typegpustd.ts 44.00 kB (➖)
ne from typegpustd.ts 92.80 kB (➖)
neg from typegpustd.ts 44.00 kB (➖)
normalize from typegpustd.ts 92.81 kB (➖)
not from typegpustd.ts 92.79 kB (➖)
or from typegpustd.ts 92.79 kB (➖)
pack2x16float from typegpustd.ts 92.81 kB (➖)
pack4x8unorm from typegpustd.ts 92.81 kB (➖)
packedFormats from typegpudata.ts 55.49 kB (➖)
pow from typegpustd.ts 92.81 kB (➖)
ptrFn from typegpudata.ts 44.04 kB (➖)
ptrHandle from typegpudata.ts 44.03 kB (➖)
ptrPrivate from typegpudata.ts 44.04 kB (➖)
ptrStorage from typegpudata.ts 44.04 kB (➖)
ptrUniform from typegpudata.ts 44.03 kB (➖)
ptrWorkgroup from typegpudata.ts 44.04 kB (➖)
quantizeToF16 from typegpustd.ts 92.81 kB (➖)
radians from typegpustd.ts 92.81 kB (➖)
ref from typegpudata.ts 44.00 kB (➖)
reflect from typegpustd.ts 92.81 kB (➖)
refract from typegpustd.ts 92.81 kB (➖)
reverseBits from typegpustd.ts 92.81 kB (➖)
rotateX4 from typegpustd.ts 92.80 kB (➖)
rotateY4 from typegpustd.ts 92.81 kB (➖)
rotateZ4 from typegpustd.ts 92.81 kB (➖)
rotationX4 from typegpustd.ts 43.99 kB (➖)
rotationY4 from typegpustd.ts 43.99 kB (➖)
rotationZ4 from typegpustd.ts 43.99 kB (➖)
round from typegpustd.ts 92.81 kB (➖)
sampler from typegpudata.ts 44.69 kB (➖)
saturate from typegpustd.ts 92.81 kB (➖)
scale4 from typegpustd.ts 92.80 kB (➖)
scaling4 from typegpustd.ts 43.99 kB (➖)
select from typegpustd.ts 92.81 kB (➖)
sign from typegpustd.ts 92.81 kB (➖)
sin from typegpustd.ts 92.81 kB (➖)
sinh from typegpustd.ts 92.81 kB (➖)
sint16 from typegpudata.ts 55.50 kB (➖)
sint16x2 from typegpudata.ts 55.50 kB (➖)
sint16x4 from typegpudata.ts 55.50 kB (➖)
sint32 from typegpudata.ts 55.50 kB (➖)
sint32x2 from typegpudata.ts 55.50 kB (➖)
sint32x3 from typegpudata.ts 55.50 kB (➖)
sint32x4 from typegpudata.ts 55.50 kB (➖)
sint8 from typegpudata.ts 55.50 kB (➖)
sint8x2 from typegpudata.ts 55.50 kB (➖)
sint8x4 from typegpudata.ts 55.50 kB (➖)
sizeOf from typegpudata.ts 55.52 kB (➖)
size from typegpudata.ts 55.55 kB (➖)
smoothstep from typegpustd.ts 92.81 kB (➖)
snorm16 from typegpudata.ts 55.50 kB (➖)
snorm16x2 from typegpudata.ts 55.50 kB (➖)
snorm16x4 from typegpudata.ts 55.50 kB (➖)
snorm8 from typegpudata.ts 55.50 kB (➖)
snorm8x2 from typegpudata.ts 55.50 kB (➖)
snorm8x4 from typegpudata.ts 55.50 kB (➖)
sqrt from typegpustd.ts 92.81 kB (➖)
std from typegpu.ts 96.15 kB (➖)
step from typegpustd.ts 92.81 kB (➖)
storageBarrier from typegpustd.ts 92.81 kB (➖)
struct from typegpudata.ts 46.10 kB (➖)
sub from typegpustd.ts 44.00 kB (➖)
subgroupAdd from typegpustd.ts 92.80 kB (➖)
subgroupAll from typegpustd.ts 92.81 kB (➖)
subgroupAnd from typegpustd.ts 92.81 kB (➖)
subgroupAny from typegpustd.ts 92.81 kB (➖)
subgroupBallot from typegpustd.ts 92.81 kB (➖)
subgroupBroadcastFirst from typegpustd.ts 92.81 kB (➖)
subgroupBroadcast from typegpustd.ts 92.81 kB (➖)
subgroupElect from typegpustd.ts 92.81 kB (➖)
subgroupExclusiveAdd from typegpustd.ts 92.81 kB (➖)
subgroupExclusiveMul from typegpustd.ts 92.81 kB (➖)
subgroupInclusiveAdd from typegpustd.ts 92.81 kB (➖)
subgroupInclusiveMul from typegpustd.ts 92.81 kB (➖)
subgroupMax from typegpustd.ts 92.81 kB (➖)
subgroupMin from typegpustd.ts 92.81 kB (➖)
subgroupMul from typegpustd.ts 92.81 kB (➖)
subgroupOr from typegpustd.ts 92.81 kB (➖)
subgroupShuffleDown from typegpustd.ts 92.81 kB (➖)
subgroupShuffleUp from typegpustd.ts 92.81 kB (➖)
subgroupShuffleXor from typegpustd.ts 92.81 kB (➖)
subgroupShuffle from typegpustd.ts 92.81 kB (➖)
subgroupXor from typegpustd.ts 92.81 kB (➖)
tan from typegpustd.ts 92.81 kB (➖)
tanh from typegpustd.ts 92.81 kB (➖)
texture1d from typegpudata.ts 44.44 kB (➖)
texture2dArray from typegpudata.ts 44.45 kB (➖)
texture2d from typegpudata.ts 44.44 kB (➖)
texture3d from typegpudata.ts 44.44 kB (➖)
textureBarrier from typegpustd.ts 92.80 kB (➖)
textureCubeArray from typegpudata.ts 44.46 kB (➖)
textureCube from typegpudata.ts 44.44 kB (➖)
textureDepth2dArray from typegpudata.ts 44.44 kB (➖)
textureDepth2d from typegpudata.ts 44.42 kB (➖)
textureDepthCubeArray from typegpudata.ts 44.45 kB (➖)
textureDepthCube from typegpudata.ts 44.43 kB (➖)
textureDepthMultisampled2d from typegpudata.ts 44.45 kB (➖)
textureDimensions from typegpustd.ts 92.80 kB (➖)
textureExternal from typegpudata.ts 44.17 kB (➖)
textureGather from typegpustd.ts 92.80 kB (➖)
textureLoad from typegpustd.ts 92.81 kB (➖)
textureMultisampled2d from typegpudata.ts 44.46 kB (➖)
textureSampleBaseClampToEdge from typegpustd.ts 92.80 kB (➖)
textureSampleBias from typegpustd.ts 92.81 kB (➖)
textureSampleCompareLevel from typegpustd.ts 92.81 kB (➖)
textureSampleCompare from typegpustd.ts 92.81 kB (➖)
textureSampleLevel from typegpustd.ts 92.81 kB (➖)
textureSample from typegpustd.ts 92.81 kB (➖)
textureStorage1d from typegpudata.ts 44.34 kB (➖)
textureStorage2dArray from typegpudata.ts 44.35 kB (➖)
textureStorage2d from typegpudata.ts 44.34 kB (➖)
textureStorage3d from typegpudata.ts 44.34 kB (➖)
textureStore from typegpustd.ts 92.81 kB (➖)
tgpu.accessor from typegpu.ts 254.95 kB (➖)
tgpu.bindGroupLayout from typegpu.ts 254.96 kB (➖)
tgpu.comptime from typegpu.ts 254.95 kB (➖)
tgpu.computeFn from typegpu.ts 254.95 kB (➖)
tgpu.const from typegpu.ts 254.95 kB (➖)
tgpu.fn from typegpu.ts 254.94 kB (➖)
tgpu.fragmentFn from typegpu.ts 254.95 kB (➖)
tgpu.initFromDevice from typegpu.ts 254.95 kB (➖)
tgpu.init from typegpu.ts 254.94 kB (➖)
tgpu.lazy from typegpu.ts 254.94 kB (➖)
tgpu.mutableAccessor from typegpu.ts 254.96 kB (➖)
tgpu.privateVar from typegpu.ts 254.95 kB (➖)
tgpu.resolveWithContext from typegpu.ts 254.96 kB (➖)
tgpu.resolve from typegpu.ts 254.95 kB (➖)
tgpu.slot from typegpu.ts 254.94 kB (➖)
tgpu.unroll from typegpu.ts 254.95 kB (➖)
tgpu.vertexFn from typegpu.ts 254.95 kB (➖)
tgpu.vertexLayout from typegpu.ts 254.95 kB (➖)
tgpu.workgroupVar from typegpu.ts 254.95 kB (➖)
tgpu from typegpu.ts 254.94 kB (➖)
translate4 from typegpustd.ts 92.80 kB (➖)
translation4 from typegpustd.ts 43.99 kB (➖)
transpose from typegpustd.ts 92.81 kB (➖)
trunc from typegpustd.ts 92.80 kB (➖)
u16 from typegpudata.ts 44.01 kB (➖)
u32 from typegpudata.ts 43.99 kB (➖)
uint16 from typegpudata.ts 55.50 kB (➖)
uint16x2 from typegpudata.ts 55.50 kB (➖)
uint16x4 from typegpudata.ts 55.50 kB (➖)
uint32 from typegpudata.ts 55.50 kB (➖)
uint32x2 from typegpudata.ts 55.50 kB (➖)
uint32x3 from typegpudata.ts 55.50 kB (➖)
uint32x4 from typegpudata.ts 55.50 kB (➖)
uint8 from typegpudata.ts 55.49 kB (➖)
uint8x2 from typegpudata.ts 55.50 kB (➖)
uint8x4 from typegpudata.ts 55.50 kB (➖)
unorm10 10 10 2 from typegpudata.ts 55.50 kB (➖)
unorm16 from typegpudata.ts 55.50 kB (➖)
unorm16x2 from typegpudata.ts 55.50 kB (➖)
unorm16x4 from typegpudata.ts 55.50 kB (➖)
unorm8 from typegpudata.ts 55.50 kB (➖)
unorm8x2 from typegpudata.ts 55.50 kB (➖)
unorm8x4 bgra from typegpudata.ts 55.49 kB (➖)
unorm8x4 from typegpudata.ts 55.50 kB (➖)
unpack2x16float from typegpustd.ts 92.81 kB (➖)
unpack4x8unorm from typegpustd.ts 92.81 kB (➖)
unstruct from typegpudata.ts 44.90 kB (➖)
vec2b from typegpudata.ts 43.99 kB (➖)
vec2f from typegpudata.ts 43.99 kB (➖)
vec2h from typegpudata.ts 43.99 kB (➖)
vec2i from typegpudata.ts 43.99 kB (➖)
vec2u from typegpudata.ts 43.99 kB (➖)
vec3b from typegpudata.ts 43.99 kB (➖)
vec3f from typegpudata.ts 43.99 kB (➖)
vec3h from typegpudata.ts 43.99 kB (➖)
vec3i from typegpudata.ts 43.99 kB (➖)
vec3u from typegpudata.ts 43.99 kB (➖)
vec4b from typegpudata.ts 43.99 kB (➖)
vec4f from typegpudata.ts 43.99 kB (➖)
vec4h from typegpudata.ts 43.99 kB (➖)
vec4i from typegpudata.ts 43.99 kB (➖)
vec4u from typegpudata.ts 43.99 kB (➖)
workgroupBarrier from typegpustd.ts 92.81 kB (➖)

If you wish to run a comparison for other, slower bundlers, run the 'Tree-shake test' from the GitHub Actions menu.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

pkg.pr.new

packages
Ready to be installed by your favorite package manager ⬇️

https://pkg.pr.new/software-mansion/TypeGPU/typegpu@96edde44b7ba359b84683c36ccdac33e007e74f4
https://pkg.pr.new/software-mansion/TypeGPU/@typegpu/noise@96edde44b7ba359b84683c36ccdac33e007e74f4
https://pkg.pr.new/software-mansion/TypeGPU/unplugin-typegpu@96edde44b7ba359b84683c36ccdac33e007e74f4

benchmark
view benchmark

commit
view commit

@reczkok reczkok marked this pull request as ready for review February 4, 2026 09:10
Copy link
Contributor

@aleksanderkatan aleksanderkatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Copy link
Collaborator

@cieplypolar cieplypolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Love the bitwise operations within the sorting kernel.

Left some nits.

In the future, we could optimize this by using workgroup shared memory for example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general remark, I don't remember the maximal size of wgsl buffer, but using only 1 dimension of compute grid seems limiting. I believe, we utilized all 3 dimensions during the development of the parallel scan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dimensions do not really matter. We should get good occupancy with 256 threads in a workgroup regardless if its 256 or 16x16.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's not about the occupancy, but about the limit imposed by maxComputeWorkgroupSizeX and maxComputeWorkgroupsPerDimension limits (which, at worst, is only 2^24, so approx 16kk)

Copy link
Contributor Author

@reczkok reczkok Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Added decomposition logic for big arrays and increased the buffer sizes in examples

@reczkok reczkok requested review from cieplypolar and removed request for cieplypolar February 13, 2026 23:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new @typegpu/sort workspace package that bundles GPU sorting (bitonic sort) plus the existing scan/prefix-scan implementation, and updates docs/examples/tests to consume the new package while removing @typegpu/concurrent-scan.

Changes:

  • Added @typegpu/sort package scaffolding and a bitonic sort implementation (with padding + comparator slot support).
  • Migrated/rehomed scan/prefix-scan APIs into @typegpu/sort (including renaming initCachecreatePrefixScanComputer).
  • Updated TypeGPU docs app sandbox module mapping and examples/tests to use @typegpu/sort; removed the old @typegpu/concurrent-scan package references.

Reviewed changes

Copilot reviewed 29 out of 32 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pnpm-lock.yaml Removes @typegpu/concurrent-scan workspace link and adds @typegpu/sort.
packages/typegpu/tests/utils/extendedIt.ts Extends WebGPU adapter mock limits with maxBufferSize for new example needs.
packages/typegpu/tests/examples/individual/bitonic-sort.test.ts Adds an example shader-generation snapshot test for the new bitonic sort example.
packages/typegpu-sort/tsconfig.json New package TS config.
packages/typegpu-sort/src/scan/types.ts Introduces BinaryOp type in its own module.
packages/typegpu-sort/src/scan/schemas.ts Updates TypeGPU imports and keeps scan bind group layouts/slots.
packages/typegpu-sort/src/scan/prefixScan.ts Renames cache initializer API and adjusts imports; core scan logic remains.
packages/typegpu-sort/src/scan/index.ts Re-exports scan APIs/types.
packages/typegpu-sort/src/scan/compute/shared.ts Consolidates TypeGPU imports for scan compute helpers.
packages/typegpu-sort/src/scan/compute/scan.ts Consolidates TypeGPU imports for scan kernel generation.
packages/typegpu-sort/src/scan/compute/applySums.ts Consolidates TypeGPU imports for scan “apply sums” kernel.
packages/typegpu-sort/src/index.ts Public package surface for bitonic + scan exports.
packages/typegpu-sort/src/bitonic/utils.ts Adds nextPowerOf2 and dispatch-grid decomposition helper.
packages/typegpu-sort/src/bitonic/types.ts Defines public sorter options/run options/interfaces.
packages/typegpu-sort/src/bitonic/slots.ts Defines comparator slot and default comparator.
packages/typegpu-sort/src/bitonic/index.ts Bitonic module exports.
packages/typegpu-sort/src/bitonic/bitonicSort.ts Implements the bitonic sorter (padding copy, step kernel, timestamps).
packages/typegpu-sort/package.json Renames/defines the new package metadata and export map.
packages/typegpu-sort/deno.json Adjusts Deno fmt exclusions (adds dist).
packages/typegpu-sort/build.config.ts Updates unbuild config default export shape.
packages/typegpu-sort/README.md Adds initial usage docs for bitonic sort and prefix scan.
packages/typegpu-concurrent-scan/src/index.ts Removes legacy package entrypoint export.
packages/typegpu-concurrent-scan/README.md Removes legacy package README.
apps/typegpu-docs/src/utils/examples/sandboxModules.ts Updates sandbox module routing from @typegpu/concurrent-scan to @typegpu/sort.
apps/typegpu-docs/src/examples/tests/prefix-scan/index.ts Switches scan imports to @typegpu/sort.
apps/typegpu-docs/src/examples/tests/prefix-scan/functions.ts Switches BinaryOp import to @typegpu/sort.
apps/typegpu-docs/src/examples/algorithms/concurrent-chart/calculator.ts Updates API usage to createPrefixScanComputer from @typegpu/sort.
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/meta.json Adds metadata for the new bitonic sort example.
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.ts Adds the bitonic sort interactive example implementation.
apps/typegpu-docs/src/examples/algorithms/bitonic-sort/index.html Adds the example HTML + overlay UI.
apps/typegpu-docs/package.json Swaps dependency from @typegpu/concurrent-scan to @typegpu/sort.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +145
const originalSize = data.dataType.elementCount;
const paddedSize = nextPowerOf2(originalSize);
const wasPadded = paddedSize !== originalSize;

const paddingValue = options?.paddingValue ?? 0xffffffff;
const compareFunc = options?.compare ?? defaultCompare;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createBitonicSorter derives originalSize from data.dataType.elementCount, but TypeGPU uses elementCount === 0 for runtime-sized arrays (e.g. d.arrayOf(d.u32)), which would make the sorter silently treat the input as length 0/1 and compute incorrect dispatch/padding. Consider validating elementCount > 0 (throw a clear error) and/or changing the API to accept an explicit length / require a fixed-size d.arrayOf(d.u32, N) buffer type so this can’t happen accidentally.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +28
const maxBufferSize = await navigator.gpu.requestAdapter().then((adapter) => {
if (!adapter) {
throw new Error('No GPU adapter found');
}
const limits = adapter.limits;
return Math.min(limits.maxStorageBufferBindingSize, limits.maxBufferSize);
});

const root = await tgpu.init({
device: {
optionalFeatures: ['timestamp-query'],
requiredLimits: {
maxStorageBufferBindingSize: maxBufferSize,
maxBufferSize: maxBufferSize,
},
},
});
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example calls navigator.gpu.requestAdapter() to compute limits, but then tgpu.init() will call navigator.gpu.requestAdapter() again internally. If the browser returns a different adapter the second time, the requiredLimits computed from the first adapter may exceed the second adapter’s limits and cause initialization to fail. Prefer requesting the device once (using the adapter you queried) and passing it to tgpu.initFromDevice, or otherwise ensuring the same adapter/device is used for both limit discovery and initialization.

Suggested change
const maxBufferSize = await navigator.gpu.requestAdapter().then((adapter) => {
if (!adapter) {
throw new Error('No GPU adapter found');
}
const limits = adapter.limits;
return Math.min(limits.maxStorageBufferBindingSize, limits.maxBufferSize);
});
const root = await tgpu.init({
device: {
optionalFeatures: ['timestamp-query'],
requiredLimits: {
maxStorageBufferBindingSize: maxBufferSize,
maxBufferSize: maxBufferSize,
},
},
});
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) {
throw new Error('No GPU adapter found');
}
const limits = adapter.limits;
const maxBufferSize = Math.min(
limits.maxStorageBufferBindingSize,
limits.maxBufferSize,
);
const requiredFeatures: GPUFeatureName[] = [];
if (adapter.features.has('timestamp-query' as GPUFeatureName)) {
requiredFeatures.push('timestamp-query');
}
const device = await adapter.requestDevice({
requiredLimits: {
maxStorageBufferBindingSize: maxBufferSize,
maxBufferSize: maxBufferSize,
},
requiredFeatures,
});
const root = await tgpu.initFromDevice(device);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants