Add PDL to cub::DeviceRadixSort#9247
Draft
gonidelis wants to merge 15 commits into
Draft
Conversation
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…ring further inside the init kernel onesweep pair
…anges give perf gain now
Member
Author
|
Full sweep final perf results on B200 here or below: ['main.json', 'pdl_final.json']
# base
## [0] NVIDIA B200
| T{ct} | OffsetT{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|---------|---------------|----------------|-----------|------------|-------------|------------|-------------|------------|---------|----------|
| I8 | I32 | 2^16 | 1 | 24.470 us | 1.42% | 20.490 us | 1.12% | -3.980 us | -16.26% | 🟢 FAST |
| I8 | I32 | 2^20 | 1 | 31.670 us | 1.84% | 27.926 us | 3.37% | -3.745 us | -11.82% | 🟢 FAST |
| I8 | I32 | 2^24 | 1 | 120.148 us | 0.65% | 120.133 us | 0.64% | -0.014 us | -0.01% | 🔵 SAME |
| I8 | I32 | 2^28 | 1 | 1.310 ms | 0.25% | 1.311 ms | 0.26% | 0.588 us | 0.04% | 🔵 SAME |
| I8 | I32 | 2^16 | 0.544 | 23.532 us | 4.30% | 20.389 us | 1.35% | -3.143 us | -13.36% | 🟢 FAST |
| I8 | I32 | 2^20 | 0.544 | 31.454 us | 2.86% | 28.008 us | 3.01% | -3.446 us | -10.96% | 🟢 FAST |
| I8 | I32 | 2^24 | 0.544 | 122.896 us | 0.40% | 123.446 us | 0.69% | 0.550 us | 0.45% | 🔴 SLOW |
| I8 | I32 | 2^28 | 0.544 | 1.339 ms | 0.11% | 1.339 ms | 0.11% | 0.376 us | 0.03% | 🔵 SAME |
| I8 | I32 | 2^16 | 0.201 | 23.941 us | 2.21% | 20.041 us | 3.24% | -3.901 us | -16.29% | 🟢 FAST |
| I8 | I32 | 2^20 | 0.201 | 29.798 us | 2.18% | 25.989 us | 3.65% | -3.810 us | -12.79% | 🟢 FAST |
| I8 | I32 | 2^24 | 0.201 | 114.650 us | 0.90% | 114.747 us | 0.89% | 0.098 us | 0.09% | 🔵 SAME |
| I8 | I32 | 2^28 | 0.201 | 1.257 ms | 0.27% | 1.257 ms | 0.28% | 0.165 us | 0.01% | 🔵 SAME |
| I8 | I64 | 2^16 | 1 | 24.170 us | 2.69% | 20.412 us | 1.62% | -3.758 us | -15.55% | 🟢 FAST |
| I8 | I64 | 2^20 | 1 | 31.593 us | 1.17% | 27.142 us | 3.36% | -4.451 us | -14.09% | 🟢 FAST |
| I8 | I64 | 2^24 | 1 | 120.438 us | 0.80% | 120.312 us | 0.74% | -0.126 us | -0.10% | 🔵 SAME |
| I8 | I64 | 2^28 | 1 | 1.327 ms | 0.11% | 1.327 ms | 0.12% | 0.012 us | 0.00% | 🔵 SAME |
| I8 | I64 | 2^16 | 0.544 | 24.046 us | 3.67% | 20.260 us | 2.19% | -3.787 us | -15.75% | 🟢 FAST |
| I8 | I64 | 2^20 | 0.544 | 31.989 us | 2.97% | 27.746 us | 3.26% | -4.243 us | -13.26% | 🟢 FAST |
| I8 | I64 | 2^24 | 0.544 | 122.254 us | 0.68% | 122.166 us | 0.61% | -0.089 us | -0.07% | 🔵 SAME |
| I8 | I64 | 2^28 | 0.544 | 1.339 ms | 0.24% | 1.339 ms | 0.25% | 0.047 us | 0.00% | 🔵 SAME |
| I8 | I64 | 2^16 | 0.201 | 23.570 us | 1.05% | 20.060 us | 4.08% | -3.510 us | -14.89% | 🟢 FAST |
| I8 | I64 | 2^20 | 0.201 | 29.678 us | 0.27% | 25.735 us | 2.83% | -3.943 us | -13.29% | 🟢 FAST |
| I8 | I64 | 2^24 | 0.201 | 112.713 us | 0.55% | 112.600 us | 0.35% | -0.114 us | -0.10% | 🔵 SAME |
| I8 | I64 | 2^28 | 0.201 | 1.271 ms | 0.12% | 1.271 ms | 0.12% | 0.048 us | 0.00% | 🔵 SAME |
| I16 | I32 | 2^16 | 1 | 35.083 us | 1.86% | 29.884 us | 1.90% | -5.200 us | -14.82% | 🟢 FAST |
| I16 | I32 | 2^20 | 1 | 46.326 us | 1.48% | 41.028 us | 2.48% | -5.298 us | -11.44% | 🟢 FAST |
| I16 | I32 | 2^24 | 1 | 201.302 us | 0.59% | 201.490 us | 0.59% | 0.188 us | 0.09% | 🔵 SAME |
| I16 | I32 | 2^28 | 1 | 2.535 ms | 0.10% | 2.538 ms | 0.10% | 3.033 us | 0.12% | 🔴 SLOW |
| I16 | I32 | 2^16 | 0.544 | 35.363 us | 2.34% | 30.131 us | 3.01% | -5.232 us | -14.80% | 🟢 FAST |
| I16 | I32 | 2^20 | 0.544 | 47.517 us | 1.74% | 41.986 us | 1.14% | -5.532 us | -11.64% | 🟢 FAST |
| I16 | I32 | 2^24 | 0.544 | 206.325 us | 0.54% | 206.578 us | 0.53% | 0.253 us | 0.12% | 🔵 SAME |
| I16 | I32 | 2^28 | 0.544 | 2.590 ms | 0.11% | 2.593 ms | 0.12% | 3.309 us | 0.13% | 🔴 SLOW |
| I16 | I32 | 2^16 | 0.201 | 34.282 us | 2.57% | 29.142 us | 2.96% | -5.140 us | -14.99% | 🟢 FAST |
| I16 | I32 | 2^20 | 0.201 | 44.916 us | 2.03% | 38.927 us | 0.47% | -5.989 us | -13.33% | 🟢 FAST |
| I16 | I32 | 2^24 | 0.201 | 192.663 us | 0.59% | 192.581 us | 0.60% | -0.082 us | -0.04% | 🔵 SAME |
| I16 | I32 | 2^28 | 0.201 | 2.419 ms | 0.12% | 2.423 ms | 0.13% | 3.264 us | 0.13% | 🔴 SLOW |
| I16 | I64 | 2^16 | 1 | 34.525 us | 1.67% | 28.691 us | 1.28% | -5.834 us | -16.90% | 🟢 FAST |
| I16 | I64 | 2^20 | 1 | 47.076 us | 1.85% | 40.859 us | 1.19% | -6.217 us | -13.21% | 🟢 FAST |
| I16 | I64 | 2^24 | 1 | 200.712 us | 0.59% | 200.729 us | 0.52% | 0.017 us | 0.01% | 🔵 SAME |
| I16 | I64 | 2^28 | 1 | 2.482 ms | 0.12% | 2.481 ms | 0.12% | -0.783 us | -0.03% | 🔵 SAME |
| I16 | I64 | 2^16 | 0.544 | 34.807 us | 0.30% | 28.669 us | 0.27% | -6.138 us | -17.63% | 🟢 FAST |
| I16 | I64 | 2^20 | 0.544 | 47.345 us | 1.67% | 41.963 us | 0.67% | -5.382 us | -11.37% | 🟢 FAST |
| I16 | I64 | 2^24 | 0.544 | 203.470 us | 0.50% | 203.102 us | 0.62% | -0.368 us | -0.18% | 🔵 SAME |
| I16 | I64 | 2^28 | 0.544 | 2.541 ms | 0.12% | 2.540 ms | 0.12% | -0.538 us | -0.02% | 🔵 SAME |
| I16 | I64 | 2^16 | 0.201 | 33.856 us | 2.61% | 28.597 us | 1.18% | -5.260 us | -15.54% | 🟢 FAST |
| I16 | I64 | 2^20 | 0.201 | 45.026 us | 2.18% | 39.322 us | 2.28% | -5.704 us | -12.67% | 🟢 FAST |
| I16 | I64 | 2^24 | 0.201 | 191.861 us | 0.67% | 192.502 us | 0.66% | 0.641 us | 0.33% | 🔵 SAME |
| I16 | I64 | 2^28 | 0.201 | 2.379 ms | 0.13% | 2.378 ms | 0.12% | -0.747 us | -0.03% | 🔵 SAME |
| I32 | I32 | 2^16 | 1 | 54.705 us | 1.15% | 46.829 us | 2.06% | -7.876 us | -14.40% | 🟢 FAST |
| I32 | I32 | 2^20 | 1 | 76.069 us | 1.27% | 68.633 us | 0.85% | -7.437 us | -9.78% | 🟢 FAST |
| I32 | I32 | 2^24 | 1 | 372.268 us | 0.29% | 372.443 us | 0.28% | 0.174 us | 0.05% | 🔵 SAME |
| I32 | I32 | 2^28 | 1 | 4.929 ms | 0.06% | 4.929 ms | 0.06% | -0.391 us | -0.01% | 🔵 SAME |
| I32 | I32 | 2^16 | 0.544 | 55.053 us | 1.66% | 47.265 us | 1.88% | -7.788 us | -14.15% | 🟢 FAST |
| I32 | I32 | 2^20 | 0.544 | 77.516 us | 0.92% | 69.610 us | 1.18% | -7.906 us | -10.20% | 🟢 FAST |
| I32 | I32 | 2^24 | 0.544 | 381.053 us | 0.25% | 380.514 us | 0.27% | -0.539 us | -0.14% | 🔵 SAME |
| I32 | I32 | 2^28 | 0.544 | 5.081 ms | 0.13% | 5.080 ms | 0.12% | -0.654 us | -0.01% | 🔵 SAME |
| I32 | I32 | 2^16 | 0.201 | 54.069 us | 1.40% | 46.044 us | 0.73% | -8.024 us | -14.84% | 🟢 FAST |
| I32 | I32 | 2^20 | 0.201 | 74.387 us | 1.02% | 66.381 us | 1.16% | -8.006 us | -10.76% | 🟢 FAST |
| I32 | I32 | 2^24 | 0.201 | 359.566 us | 0.23% | 359.762 us | 0.25% | 0.197 us | 0.05% | 🔵 SAME |
| I32 | I32 | 2^28 | 0.201 | 4.732 ms | 0.09% | 4.731 ms | 0.09% | -0.376 us | -0.01% | 🔵 SAME |
| I32 | I64 | 2^16 | 1 | 54.325 us | 0.87% | 47.078 us | 2.03% | -7.247 us | -13.34% | 🟢 FAST |
| I32 | I64 | 2^20 | 1 | 76.001 us | 1.16% | 68.927 us | 0.84% | -7.074 us | -9.31% | 🟢 FAST |
| I32 | I64 | 2^24 | 1 | 378.250 us | 0.29% | 378.293 us | 0.31% | 0.043 us | 0.01% | 🔵 SAME |
| I32 | I64 | 2^28 | 1 | 5.080 ms | 0.05% | 5.079 ms | 0.05% | -1.095 us | -0.02% | 🔵 SAME |
| I32 | I64 | 2^16 | 0.544 | 54.686 us | 1.49% | 47.763 us | 1.57% | -6.922 us | -12.66% | 🟢 FAST |
| I32 | I64 | 2^20 | 0.544 | 76.806 us | 0.36% | 69.245 us | 0.86% | -7.561 us | -9.84% | 🟢 FAST |
| I32 | I64 | 2^24 | 0.544 | 384.241 us | 0.24% | 383.871 us | 0.24% | -0.370 us | -0.10% | 🔵 SAME |
| I32 | I64 | 2^28 | 0.544 | 5.224 ms | 0.06% | 5.223 ms | 0.06% | -1.048 us | -0.02% | 🔵 SAME |
| I32 | I64 | 2^16 | 0.201 | 54.221 us | 0.87% | 46.080 us | 0.00% | -8.141 us | -15.02% | 🟢 FAST |
| I32 | I64 | 2^20 | 0.201 | 73.408 us | 1.23% | 66.647 us | 0.72% | -6.761 us | -9.21% | 🟢 FAST |
| I32 | I64 | 2^24 | 0.201 | 363.703 us | 0.35% | 363.556 us | 0.34% | -0.148 us | -0.04% | 🔵 SAME |
| I32 | I64 | 2^28 | 0.201 | 4.876 ms | 0.06% | 4.876 ms | 0.06% | -0.315 us | -0.01% | 🔵 SAME |
| I64 | I32 | 2^16 | 1 | 90.076 us | 1.13% | 78.653 us | 0.94% | -11.423 us | -12.68% | 🟢 FAST |
| I64 | I32 | 2^20 | 1 | 158.963 us | 0.38% | 147.786 us | 0.41% | -11.177 us | -7.03% | 🟢 FAST |
| I64 | I32 | 2^24 | 1 | 1.412 ms | 0.15% | 1.412 ms | 0.15% | -0.150 us | -0.01% | 🔵 SAME |
| I64 | I32 | 2^28 | 1 | 21.218 ms | 0.26% | 21.752 ms | 0.04% | 534.063 us | 2.52% | 🔴 SLOW |
| I64 | I32 | 2^16 | 0.544 | 91.178 us | 0.52% | 79.203 us | 0.74% | -11.975 us | -13.13% | 🟢 FAST |
| I64 | I32 | 2^20 | 0.544 | 158.479 us | 0.55% | 146.142 us | 0.46% | -12.337 us | -7.78% | 🟢 FAST |
| I64 | I32 | 2^24 | 0.544 | 1.410 ms | 0.14% | 1.410 ms | 0.15% | -0.371 us | -0.03% | 🔵 SAME |
| I64 | I32 | 2^28 | 0.544 | 21.162 ms | 0.24% | 21.162 ms | 0.23% | -0.318 us | -0.00% | 🔵 SAME |
| I64 | I32 | 2^16 | 0.201 | 87.034 us | 0.16% | 75.519 us | 0.72% | -11.515 us | -13.23% | 🟢 FAST |
| I64 | I32 | 2^20 | 0.201 | 152.515 us | 0.23% | 141.271 us | 0.35% | -11.244 us | -7.37% | 🟢 FAST |
| I64 | I32 | 2^24 | 0.201 | 1.360 ms | 0.13% | 1.360 ms | 0.13% | 0.555 us | 0.04% | 🔵 SAME |
| I64 | I32 | 2^28 | 0.201 | 20.410 ms | 0.22% | 20.409 ms | 0.23% | -1.461 us | -0.01% | 🔵 SAME |
| I64 | I64 | 2^16 | 1 | 90.883 us | 0.61% | 78.886 us | 0.34% | -11.997 us | -13.20% | 🟢 FAST |
| I64 | I64 | 2^20 | 1 | 159.396 us | 0.46% | 148.083 us | 0.39% | -11.313 us | -7.10% | 🟢 FAST |
| I64 | I64 | 2^24 | 1 | 1.429 ms | 0.12% | 1.429 ms | 0.14% | -0.062 us | -0.00% | 🔵 SAME |
| I64 | I64 | 2^28 | 1 | 21.378 ms | 0.26% | 21.386 ms | 0.25% | 8.195 us | 0.04% | 🔵 SAME |
| I64 | I64 | 2^16 | 0.544 | 91.210 us | 0.43% | 79.484 us | 0.97% | -11.726 us | -12.86% | 🟢 FAST |
| I64 | I64 | 2^20 | 0.544 | 158.677 us | 0.24% | 147.065 us | 0.59% | -11.612 us | -7.32% | 🟢 FAST |
| I64 | I64 | 2^24 | 0.544 | 1.428 ms | 0.12% | 1.427 ms | 0.12% | -0.710 us | -0.05% | 🔵 SAME |
| I64 | I64 | 2^28 | 0.544 | 21.341 ms | 0.22% | 21.343 ms | 0.23% | 2.216 us | 0.01% | 🔵 SAME |
| I64 | I64 | 2^16 | 0.201 | 87.940 us | 0.95% | 76.529 us | 0.84% | -11.411 us | -12.98% | 🟢 FAST |
| I64 | I64 | 2^20 | 0.201 | 152.811 us | 0.42% | 141.461 us | 0.35% | -11.350 us | -7.43% | 🟢 FAST |
| I64 | I64 | 2^24 | 0.201 | 1.371 ms | 0.13% | 1.370 ms | 0.12% | -0.217 us | -0.02% | 🔵 SAME |
| I64 | I64 | 2^28 | 0.201 | 20.561 ms | 0.23% | 20.561 ms | 0.22% | -0.366 us | -0.00% | 🔵 SAME |
| I128 | I32 | 2^16 | 1 | 149.840 us | 0.52% | 130.056 us | 0.26% | -19.784 us | -13.20% | 🟢 FAST |
| I128 | I32 | 2^20 | 1 | 339.795 us | 0.36% | 322.952 us | 0.36% | -16.843 us | -4.96% | 🟢 FAST |
| I128 | I32 | 2^24 | 1 | 3.469 ms | 0.07% | 3.469 ms | 0.07% | 0.010 us | 0.00% | 🔵 SAME |
| I128 | I32 | 2^28 | 1 | 53.036 ms | 0.05% | 53.033 ms | 0.05% | -2.701 us | -0.01% | 🔵 SAME |
| I128 | I32 | 2^16 | 0.544 | 148.759 us | 0.47% | 128.477 us | 0.67% | -20.282 us | -13.63% | 🟢 FAST |
| I128 | I32 | 2^20 | 0.544 | 337.840 us | 0.33% | 320.703 us | 0.39% | -17.137 us | -5.07% | 🟢 FAST |
| I128 | I32 | 2^24 | 0.544 | 3.449 ms | 0.07% | 3.448 ms | 0.08% | -1.093 us | -0.03% | 🔵 SAME |
| I128 | I32 | 2^28 | 0.544 | 53.403 ms | 0.02% | 53.399 ms | 0.02% | -3.940 us | -0.01% | 🔵 SAME |
| I128 | I32 | 2^16 | 0.201 | 147.245 us | 0.68% | 127.980 us | 0.32% | -19.265 us | -13.08% | 🟢 FAST |
| I128 | I32 | 2^20 | 0.201 | 333.038 us | 0.34% | 316.622 us | 0.39% | -16.417 us | -4.93% | 🟢 FAST |
| I128 | I32 | 2^24 | 0.201 | 3.404 ms | 0.08% | 3.404 ms | 0.07% | 0.371 us | 0.01% | 🔵 SAME |
| I128 | I32 | 2^28 | 0.201 | 52.567 ms | 0.01% | 52.567 ms | 0.02% | -0.165 us | -0.00% | 🔵 SAME |
| I128 | I64 | 2^16 | 1 | 150.992 us | 0.59% | 131.833 us | 0.49% | -19.159 us | -12.69% | 🟢 FAST |
| I128 | I64 | 2^20 | 1 | 341.711 us | 0.35% | 324.675 us | 0.38% | -17.036 us | -4.99% | 🟢 FAST |
| I128 | I64 | 2^24 | 1 | 3.483 ms | 0.07% | 3.484 ms | 0.06% | 1.286 us | 0.04% | 🔵 SAME |
| I128 | I64 | 2^28 | 1 | 53.840 ms | 0.02% | 53.847 ms | 0.01% | 7.642 us | 0.01% | 🔵 SAME |
| I128 | I64 | 2^16 | 0.544 | 150.362 us | 0.39% | 130.196 us | 0.45% | -20.166 us | -13.41% | 🟢 FAST |
| I128 | I64 | 2^20 | 0.544 | 339.335 us | 0.35% | 322.322 us | 0.37% | -17.013 us | -5.01% | 🟢 FAST |
| I128 | I64 | 2^24 | 0.544 | 3.464 ms | 0.06% | 3.465 ms | 0.06% | 0.158 us | 0.00% | 🔵 SAME |
| I128 | I64 | 2^28 | 0.544 | 52.832 ms | 0.05% | 52.836 ms | 0.05% | 4.200 us | 0.01% | 🔵 SAME |
| I128 | I64 | 2^16 | 0.201 | 149.118 us | 0.66% | 129.597 us | 0.52% | -19.521 us | -13.09% | 🟢 FAST |
| I128 | I64 | 2^20 | 0.201 | 335.076 us | 0.35% | 318.236 us | 0.41% | -16.840 us | -5.03% | 🟢 FAST |
| I128 | I64 | 2^24 | 0.201 | 3.418 ms | 0.06% | 3.418 ms | 0.07% | -0.270 us | -0.01% | 🔵 SAME |
| I128 | I64 | 2^28 | 0.201 | 52.034 ms | 0.05% | 52.036 ms | 0.05% | 2.157 us | 0.00% | 🔵 SAME |
| F32 | I32 | 2^16 | 1 | 55.666 us | 0.96% | 47.824 us | 1.47% | -7.842 us | -14.09% | 🟢 FAST |
| F32 | I32 | 2^20 | 1 | 76.503 us | 1.11% | 68.807 us | 1.23% | -7.696 us | -10.06% | 🟢 FAST |
| F32 | I32 | 2^24 | 1 | 433.350 us | 0.22% | 433.219 us | 0.22% | -0.132 us | -0.03% | 🔵 SAME |
| F32 | I32 | 2^28 | 1 | 6.083 ms | 0.07% | 6.080 ms | 0.06% | -2.483 us | -0.04% | 🔵 SAME |
| F32 | I32 | 2^16 | 0.544 | 55.337 us | 0.55% | 48.104 us | 0.41% | -7.233 us | -13.07% | 🟢 FAST |
| F32 | I32 | 2^20 | 0.544 | 76.991 us | 1.17% | 68.716 us | 0.71% | -8.274 us | -10.75% | 🟢 FAST |
| F32 | I32 | 2^24 | 0.544 | 439.117 us | 0.38% | 438.765 us | 0.36% | -0.352 us | -0.08% | 🔵 SAME |
| F32 | I32 | 2^28 | 0.544 | 6.151 ms | 0.11% | 6.149 ms | 0.10% | -2.513 us | -0.04% | 🔵 SAME |
| F32 | I32 | 2^16 | 0.201 | 55.202 us | 1.17% | 47.495 us | 1.23% | -7.707 us | -13.96% | 🟢 FAST |
| F32 | I32 | 2^20 | 0.201 | 74.127 us | 1.04% | 66.667 us | 0.71% | -7.461 us | -10.06% | 🟢 FAST |
| F32 | I32 | 2^24 | 0.201 | 424.965 us | 0.26% | 424.872 us | 0.25% | -0.094 us | -0.02% | 🔵 SAME |
| F32 | I32 | 2^28 | 0.201 | 5.923 ms | 0.13% | 5.922 ms | 0.13% | -1.246 us | -0.02% | 🔵 SAME |
| F32 | I64 | 2^16 | 1 | 54.190 us | 0.71% | 46.626 us | 1.22% | -7.565 us | -13.96% | 🟢 FAST |
| F32 | I64 | 2^20 | 1 | 77.952 us | 0.91% | 70.679 us | 0.39% | -7.273 us | -9.33% | 🟢 FAST |
| F32 | I64 | 2^24 | 1 | 393.466 us | 0.27% | 393.244 us | 0.28% | -0.223 us | -0.06% | 🔵 SAME |
| F32 | I64 | 2^28 | 1 | 5.293 ms | 0.05% | 5.291 ms | 0.05% | -1.399 us | -0.03% | 🔵 SAME |
| F32 | I64 | 2^16 | 0.544 | 54.238 us | 0.78% | 46.537 us | 1.52% | -7.701 us | -14.20% | 🟢 FAST |
| F32 | I64 | 2^20 | 0.544 | 76.881 us | 0.53% | 70.126 us | 1.27% | -6.755 us | -8.79% | 🟢 FAST |
| F32 | I64 | 2^24 | 0.544 | 398.899 us | 0.25% | 398.591 us | 0.23% | -0.307 us | -0.08% | 🔵 SAME |
| F32 | I64 | 2^28 | 0.544 | 5.378 ms | 0.05% | 5.376 ms | 0.05% | -1.672 us | -0.03% | 🔵 SAME |
| F32 | I64 | 2^16 | 0.201 | 53.044 us | 1.90% | 45.904 us | 1.12% | -7.139 us | -13.46% | 🟢 FAST |
| F32 | I64 | 2^20 | 0.201 | 74.775 us | 0.56% | 67.827 us | 1.44% | -6.948 us | -9.29% | 🟢 FAST |
| F32 | I64 | 2^24 | 0.201 | 383.314 us | 0.30% | 382.994 us | 0.29% | -0.320 us | -0.08% | 🔵 SAME |
| F32 | I64 | 2^28 | 0.201 | 5.144 ms | 0.05% | 5.142 ms | 0.04% | -1.541 us | -0.03% | 🔵 SAME |
| F64 | I32 | 2^16 | 1 | 95.176 us | 0.38% | 82.810 us | 0.70% | -12.365 us | -12.99% | 🟢 FAST |
| F64 | I32 | 2^20 | 1 | 147.506 us | 0.60% | 135.783 us | 0.53% | -11.723 us | -7.95% | 🟢 FAST |
| F64 | I32 | 2^24 | 1 | 1.053 ms | 0.16% | 1.053 ms | 0.16% | -0.037 us | -0.00% | 🔵 SAME |
| F64 | I32 | 2^28 | 1 | 15.328 ms | 0.03% | 15.335 ms | 0.03% | 6.902 us | 0.05% | 🔴 SLOW |
| F64 | I32 | 2^16 | 0.544 | 95.445 us | 0.65% | 83.444 us | 0.79% | -12.001 us | -12.57% | 🟢 FAST |
| F64 | I32 | 2^20 | 0.544 | 147.682 us | 0.56% | 135.873 us | 0.46% | -11.809 us | -8.00% | 🟢 FAST |
| F64 | I32 | 2^24 | 0.544 | 1.071 ms | 0.14% | 1.071 ms | 0.14% | -0.214 us | -0.02% | 🔵 SAME |
| F64 | I32 | 2^28 | 0.544 | 15.567 ms | 0.03% | 15.567 ms | 0.03% | 0.498 us | 0.00% | 🔵 SAME |
| F64 | I32 | 2^16 | 0.201 | 94.843 us | 0.81% | 82.650 us | 1.00% | -12.193 us | -12.86% | 🟢 FAST |
| F64 | I32 | 2^20 | 0.201 | 144.343 us | 0.25% | 132.518 us | 0.56% | -11.825 us | -8.19% | 🟢 FAST |
| F64 | I32 | 2^24 | 0.201 | 1.037 ms | 0.15% | 1.037 ms | 0.15% | 0.183 us | 0.02% | 🔵 SAME |
| F64 | I32 | 2^28 | 0.201 | 15.026 ms | 0.03% | 15.027 ms | 0.03% | 0.571 us | 0.00% | 🔵 SAME |
| F64 | I64 | 2^16 | 1 | 95.325 us | 0.51% | 83.689 us | 0.64% | -11.636 us | -12.21% | 🟢 FAST |
| F64 | I64 | 2^20 | 1 | 147.956 us | 0.43% | 136.437 us | 0.51% | -11.519 us | -7.79% | 🟢 FAST |
| F64 | I64 | 2^24 | 1 | 1.048 ms | 0.15% | 1.048 ms | 0.14% | 0.261 us | 0.02% | 🔵 SAME |
| F64 | I64 | 2^28 | 1 | 15.259 ms | 0.03% | 15.264 ms | 0.03% | 5.380 us | 0.04% | 🔴 SLOW |
| F64 | I64 | 2^16 | 0.544 | 95.158 us | 0.45% | 83.515 us | 0.94% | -11.643 us | -12.24% | 🟢 FAST |
| F64 | I64 | 2^20 | 0.544 | 147.271 us | 0.57% | 135.626 us | 0.57% | -11.645 us | -7.91% | 🟢 FAST |
| F64 | I64 | 2^24 | 0.544 | 1.056 ms | 0.15% | 1.057 ms | 0.14% | 0.233 us | 0.02% | 🔵 SAME |
| F64 | I64 | 2^28 | 0.544 | 15.490 ms | 0.03% | 15.495 ms | 0.03% | 5.296 us | 0.03% | 🔴 SLOW |
| F64 | I64 | 2^16 | 0.201 | 95.195 us | 0.42% | 83.021 us | 0.60% | -12.175 us | -12.79% | 🟢 FAST |
| F64 | I64 | 2^20 | 0.201 | 144.181 us | 0.51% | 132.892 us | 0.69% | -11.289 us | -7.83% | 🟢 FAST |
| F64 | I64 | 2^24 | 0.201 | 1.023 ms | 0.14% | 1.023 ms | 0.13% | -0.064 us | -0.01% | 🔵 SAME |
| F64 | I64 | 2^28 | 0.201 | 14.979 ms | 0.03% | 14.983 ms | 0.03% | 4.474 us | 0.03% | 🔴 SLOW |
# Summary
- Total Matches: 168
- Pass (diff <= min_noise): 75
- Unknown (infinite noise): 0
- Failure (diff > min_noise): 93 |
Member
Author
|
codex suggests that this breaks C Parallel API :'| I will ping c parallel folk to keep an eye |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
do not review
fixes #8765
BEFORE
AFTER
zoomed in: