make quantized_max_pool2d_nhwc handle case of C>64 (#19238)#19238
make quantized_max_pool2d_nhwc handle case of C>64 (#19238)#19238wl1026sun wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19238
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 Awaiting Approval, 1 New Failure, 1 Pending, 3 Unrelated FailuresAs of commit eb3a4de with merge base 94d2881 ( AWAITING APPROVAL - The following workflows need approval before CI can run:
NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@wl1026sun has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103096179. |
This PR needs a
|
b90b2ec to
835e67c
Compare
Summary: now the TIE quantized_max_pool2d_nhwc general path processes channels in chunks of 16 groups (64 bytes) at a time using a fixed stack array with an outer loop. This supports arbitrary C (any multiple of 4). Also adds test cases for C=128, C=256, k=3x3, and padding to cover all TIE kernel dispatch paths. Reviewed By: khazaei Differential Revision: D103096179
835e67c to
1c24640
Compare
Summary: now the TIE quantized_max_pool2d_nhwc general path processes channels in chunks of 16 groups (64 bytes) at a time using a fixed stack array with an outer loop. This supports arbitrary C (any multiple of 4). Also adds test cases for C=128, C=256, k=3x3, and padding to cover all TIE kernel dispatch paths. Reviewed By: khazaei Differential Revision: D103096179
1c24640 to
eb3a4de
Compare
Summary:
now the TIE quantized_max_pool2d_nhwc general path processes channels in chunks of 16 groups (64 bytes) at a time using a fixed stack array with an outer loop. This supports arbitrary C (any multiple of 4).
Also adds test cases for C=128, C=256, k=3x3, and padding to cover all TIE kernel dispatch paths.
Reviewed By: khazaei
Differential Revision: D103096179