support 4d mat in more operators#6737
Conversation
Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.
|
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #6737 +/- ##
==========================================
+ Coverage 95.98% 96.01% +0.03%
==========================================
Files 965 965
Lines 404957 405109 +152
==========================================
+ Hits 388681 388968 +287
+ Misses 16276 16141 -135 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR extends NCNN’s operator implementations and the pnnx conversion/test suite to support 4D Mat (w, h, d, c) across more layers and backends (generic CPU + arch-specific + Vulkan), with expanded coverage via both C++ layer tests and pnnx Torch operator tests.
Changes:
- Add/extend 4D tensor handling in multiple layers (math ops, normalization/norm layers, shuffle/split, quantize/dequantize/requantize) across generic, x86/ARM/RISC-V/MIPS/LoongArch, and Vulkan paths.
- Update Vulkan pipelines/shaders to treat 4D as flattened height×depth where appropriate and adjust indexing/grouping logic for norm-related kernels.
- Expand C++ layer tests and pnnx Torch tests to include 4D-shaped inputs; add new tests and a new pnnx NCNN pass for
nn.InstanceNorm3d.
Reviewed changes
Copilot reviewed 155 out of 155 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tools/pnnx/tests/ncnn/test_torch_pow.py | Extend pow test to include 4D input/output cases. |
| tools/pnnx/tests/ncnn/test_torch_erf.py | New Torch erf test covering 1D/3D/4D inputs. |
| tools/pnnx/tests/ncnn/test_torch_cumsum.py | Extend cumsum test to cover 4D and multiple axes. |
| tools/pnnx/tests/ncnn/test_nn_Softplus.py | New nn.Softplus module test including 4D input. |
| tools/pnnx/tests/ncnn/test_nn_RMSNorm.py | Extend RMSNorm module test to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_nn_PReLU.py | Extend PReLU module test to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_nn_LayerNorm.py | Extend LayerNorm module test to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_nn_InstanceNorm3d.py | New nn.InstanceNorm3d module test (batch+4D -> 4D Mat). |
| tools/pnnx/tests/ncnn/test_nn_GLU.py | Extend GLU module test to include 4D input and more dims. |
| tools/pnnx/tests/ncnn/test_nn_ChannelShuffle.py | Extend ChannelShuffle module test to include 4D input. |
| tools/pnnx/tests/ncnn/test_F_softplus.py | New functional softplus test including 4D input. |
| tools/pnnx/tests/ncnn/test_F_rms_norm.py | Extend functional rms_norm to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_F_prelu.py | Extend functional prelu to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_F_pad.py | Extend functional pad test with 4D padding modes. |
| tools/pnnx/tests/ncnn/test_F_normalize.py | Extend functional normalize test to include 4D input. |
| tools/pnnx/tests/ncnn/test_F_layer_norm.py | Extend functional layer_norm to include (batch+4D) shapes. |
| tools/pnnx/tests/ncnn/test_F_instance_norm.py | New functional instance_norm test covering 3D+4D mats. |
| tools/pnnx/tests/ncnn/test_F_glu.py | Extend functional GLU test to include 4D input and axes. |
| tools/pnnx/tests/ncnn/CMakeLists.txt | Register new/extended pnnx-ncnn tests (erf, softplus, instance_norm, InstanceNorm3d). |
| tools/pnnx/src/pass_ncnn/nn_InstanceNorm3d.cpp | New pass mapping nn.InstanceNorm3d to NCNN InstanceNorm. |
| tools/pnnx/src/pass_ncnn/F_normalize.cpp | Extend normalize pass to accept rank-4 tensors (after batch removal). |
| tools/pnnx/src/CMakeLists.txt | Build-system wiring for the new InstanceNorm3d pass. |
| tests/test_statisticspooling.cpp | New layer test including 4D inputs for StatisticsPooling. |
| tests/test_split.cpp | New layer test covering 4D Split output counts. |
| tests/test_softplus.cpp | Add 4D RandomMat coverage and improved debug print. |
| tests/test_shufflechannel.cpp | Refactor helper to accept Mat; add 4D ShuffleChannel coverage. |
| tests/test_scale.cpp | Add dims==4 scale sizing + new 4D test cases. |
| tests/test_rmsnorm.cpp | Extend RMSNorm tests to include 4D mats + improved debug print. |
| tests/test_requantize.cpp | Add 4D requantize tests and adjust pack8 forcing for riscv. |
| tests/test_quantize.cpp | Add 4D quantize randomization + new 4D test cases. |
| tests/test_prelu.cpp | Add 4D PReLU test cases + improved debug print. |
| tests/test_power.cpp | Add 4D Power test cases + improved debug print. |
| tests/test_padding.cpp | Add additional padding test parameter coverage (incl. 3D padding params). |
| tests/test_normalize.cpp | Add 4D Normalize tests + improved debug print. |
| tests/test_noop.cpp | Add explicit 4D Noop tests + improved debug print. |
| tests/test_mvn.cpp | New MVN layer test with 4D inputs. |
| tests/test_memorydata.cpp | Add 4D MemoryData coverage (incl. param 11=d) + helper factoring. |
| tests/test_log.cpp | Add 4D Log test cases. |
| tests/test_layernorm.cpp | Add 4D LayerNorm tests + improved debug print. |
| tests/test_instancenorm.cpp | Add 4D InstanceNorm tests + improved debug print. |
| tests/test_input.cpp | New Input layer test including 4D inputs (param 11=d). |
| tests/test_hardswish.cpp | Add 4D HardSwish tests + improved debug print. |
| tests/test_hardsigmoid.cpp | Fix HardSigmoid beta ParamDict index; add 4D tests + improved debug print. |
| tests/test_glu.cpp | Add explicit 4D GLU axis coverage + improved debug print. |
| tests/test_expanddims.cpp | Add additional axis coverage (incl. negative axes) relevant to higher-rank behavior. |
| tests/test_exp.cpp | Add 4D Exp test cases. |
| tests/test_erf.cpp | Add 4D Erf test cases + improved debug print. |
| tests/test_dropout.cpp | Add 4D Dropout test cases + improved debug print. |
| tests/test_dequantize.cpp | Add 4D dequantize tests and adjust pack8 forcing for riscv. |
| tests/test_deepcopy.cpp | Add 4D DeepCopy test cases + improved debug print. |
| tests/test_cumulativesum.cpp | Add explicit 4D CumulativeSum axis coverage + improved debug print. |
| tests/test_cast.cpp | Skip GPU fp16p cast test if fp16 packed not supported. |
| tests/test_bnll.cpp | Add 4D BNLL test cases + improved debug print. |
| tests/test_absval.cpp | Add 4D AbsVal test cases + improved debug print. |
| tests/CMakeLists.txt | Register new tests (Input/MVN/Split/StatisticsPooling). |
| src/layer/x86/shufflechannel_x86.cpp | Extend ShuffleChannel x86 to treat 4D as whd per channel and preserve shape. |
| src/layer/x86/rmsnorm_x86.cpp | Add 4D RMSNorm handling paths (fp32 + bf16). |
| src/layer/x86/requantize_x86.cpp | Extend requantize x86 to support dims==4 allocation and processing size whd. |
| src/layer/x86/quantize_x86.cpp | Extend quantize x86 to support dims==4 allocation and processing size whd. |
| src/layer/x86/quantize_bf16s.h | Extend bf16 quantize helper to handle dims==4. |
| src/layer/x86/prelu_x86.cpp | Extend PReLU x86 to apply over dims==4 (size whd). |
| src/layer/x86/layernorm_x86.cpp | Add 4D LayerNorm handling (fp32 + bf16). |
| src/layer/x86/instancenorm_x86.cpp | Extend InstanceNorm x86 to compute stats over whd. |
| src/layer/x86/dropout_x86.cpp | Extend Dropout x86 to process dims==4 (size whd). |
| src/layer/x86/dequantize_x86.cpp | Extend dequantize x86 to process dims==4 (size whd). |
| src/layer/x86/dequantize_bf16s.h | Extend bf16 dequantize helper to allocate/process dims==4. |
| src/layer/vulkan/shufflechannel_vulkan.cpp | Treat 4D as flattened h*d in pipeline constants/specialization; preserve shape on output. |
| src/layer/vulkan/scale_vulkan.cpp | Treat 4D as flattened h*d for pipeline sizing/constants. |
| src/layer/vulkan/rmsnorm_vulkan.cpp | Extend grouping logic for dims==4 and flattened h*d layout. |
| src/layer/vulkan/prelu_vulkan.cpp | Treat 4D as flattened h*d in pipeline sizing/constants. |
| src/layer/vulkan/normalize_vulkan.cpp | Treat 4D as flattened h*d for reductions and constants. |
| src/layer/vulkan/layernorm_vulkan.cpp | Extend grouping logic for dims==4 and flattened h*d layout. |
| src/layer/vulkan/innerproduct_vulkan.cpp | Flatten includes depth dimension when preparing shape for GEMM path. |
| src/layer/vulkan/deepcopy_vulkan.cpp | Treat 4D as flattened h*d in pipeline sizing/constants. |
| src/layer/vulkan/shader/rmsnorm_norm.comp | Generalize coeff indexing for flattened h*d layout. |
| src/layer/vulkan/shader/rmsnorm_norm_pack4.comp | Same as above for pack4 path. |
| src/layer/vulkan/shader/padding_3d.comp | Implement replicate/reflect for 3D padding shader. |
| src/layer/vulkan/shader/padding_3d_pack4.comp | Same as above for pack4 path. |
| src/layer/vulkan/shader/layernorm_sub_mean_square.comp | Generalize group indexing for flattened h*d layout. |
| src/layer/vulkan/shader/layernorm_sub_mean_square_pack4.comp | Same as above for pack4 path. |
| src/layer/vulkan/shader/layernorm_norm.comp | Generalize group/inner indexing for flattened h*d layout. |
| src/layer/vulkan/shader/layernorm_norm_pack4.comp | Same as above for pack4 path. |
| src/layer/vulkan/shader/instancenorm_reduce_sum4_fp32.comp | Fix tail handling for reduce-sum4 over width. |
| src/layer/statisticspooling.cpp | Include depth in pooling size and use faster variance accumulation. |
| src/layer/shufflechannel.cpp | Extend generic ShuffleChannel to treat 4D as whd per channel. |
| src/layer/scale.cpp | Extend Scale to support dims==4 (size whd). |
| src/layer/rmsnorm.cpp | Add generic 4D RMSNorm implementation. |
| src/layer/requantize.cpp | Extend generic requantize to allocate/process dims==4 (size whd). |
| src/layer/quantize.cpp | Extend generic quantize to allocate/process dims==4 (size whd). |
| src/layer/prelu.cpp | Extend generic PReLU to support dims==4 (size whd). |
| src/layer/power.cpp | Extend Power to process dims==4 (size whd). |
| src/layer/normalize.cpp | Extend Normalize to process dims==4 (size whd). |
| src/layer/mvn.cpp | Extend MVN output shape and internal size to include depth. |
| src/layer/log.cpp | Extend Log to process dims==4 (size whd). |
| src/layer/layernorm.cpp | Add generic 4D LayerNorm implementation. |
| src/layer/instancenorm.cpp | Extend InstanceNorm to compute stats over whd. |
| src/layer/innerproduct.cpp | Include depth in flattened size computation (whd). |
| src/layer/hardswish.cpp | Extend HardSwish to process dims==4 (size whd). |
| src/layer/hardsigmoid.cpp | Extend HardSigmoid to process dims==4 (size whd). |
| src/layer/glu.cpp | Add explicit dims==4 GLU implementations for all axes. |
| src/layer/exp.cpp | Extend Exp to process dims==4 (size whd). |
| src/layer/erf.cpp | Extend Erf to process dims==4 (size whd). |
| src/layer/dropout.cpp | Extend Dropout to process dims==4 (size whd). |
| src/layer/dequantize.cpp | Extend Dequantize to process dims==4 (size whd). |
| src/layer/cumulativesum.cpp | Add explicit dims==4 cumulative-sum implementations for all axes. |
| src/layer/bnll.cpp | Extend BNLL to process dims==4 (size whd). |
| src/layer/absval.cpp | Extend AbsVal to process dims==4 (size whd). |
| src/layer/riscv/requantize_riscv.cpp | Extend requantize riscv to allocate/process dims==4 (size whd). |
| src/layer/riscv/quantize_riscv.cpp | Extend quantize riscv to allocate/process dims==4 (size whd). |
| src/layer/riscv/quantize_riscv_zfh.cpp | Extend fp16 quantize riscv (zfh) to allocate/process dims==4. |
| src/layer/riscv/prelu_riscv.cpp | Extend PReLU riscv to support dims==4 (size whd). |
| src/layer/riscv/prelu_riscv_zfh.cpp | Extend fp16 PReLU riscv (zfh) to support dims==4. |
| src/layer/riscv/layernorm_riscv.cpp | Add 4D LayerNorm handling on riscv. |
| src/layer/riscv/layernorm_riscv_zfh.cpp | Add 4D fp16 LayerNorm handling on riscv (zfh). |
| src/layer/riscv/instancenorm_riscv.cpp | Extend InstanceNorm riscv to compute stats over whd. |
| src/layer/riscv/instancenorm_riscv_zfh.cpp | Extend fp16 InstanceNorm riscv (zfh) to compute stats over whd. |
| src/layer/riscv/dequantize_riscv.cpp | Extend dequantize riscv to process dims==4 (size whd). |
| src/layer/riscv/dequantize_riscv_zfh.cpp | Extend fp16 dequantize riscv (zfh) to allocate/process dims==4. |
| src/layer/mips/shufflechannel_mips.cpp | Extend ShuffleChannel mips to treat 4D as whd and preserve shape. |
| src/layer/mips/scale_mips.cpp | Extend Scale mips to support dims==4 (size whd). |
| src/layer/mips/rmsnorm_mips.cpp | Add 4D RMSNorm handling on mips (fp32 + bf16). |
| src/layer/mips/requantize_mips.cpp | Extend requantize mips to allocate/process dims==4 (size whd). |
| src/layer/mips/prelu_mips.cpp | Extend PReLU mips to support dims==4 (size whd). |
| src/layer/mips/layernorm_mips.cpp | Add 4D LayerNorm handling on mips (fp32 + bf16). |
| src/layer/mips/instancenorm_mips.cpp | Extend InstanceNorm mips to compute stats over whd. |
| src/layer/mips/dequantize_mips.cpp | Extend dequantize mips to allocate/process dims==4 and compute size whd. |
| src/layer/loongarch/scale_loongarch.cpp | Extend Scale loongarch to support dims==4 (size whd). |
| src/layer/loongarch/rmsnorm_loongarch.cpp | Add 4D RMSNorm handling on loongarch (fp32 + bf16). |
| src/layer/loongarch/requantize_loongarch.cpp | Extend requantize loongarch to allocate/process dims==4 (size whd). |
| src/layer/loongarch/prelu_loongarch.cpp | Extend PReLU loongarch to support dims==4 (size whd). |
| src/layer/loongarch/layernorm_loongarch.cpp | Add 4D LayerNorm handling on loongarch (fp32 + bf16). |
| src/layer/loongarch/instancenorm_loongarch.cpp | Extend InstanceNorm loongarch to compute stats over whd. |
| src/layer/loongarch/dequantize_loongarch.cpp | Extend dequantize loongarch to allocate/process dims==4 and compute size whd. |
| src/layer/arm/shufflechannel_arm.cpp | Extend ShuffleChannel arm to treat 4D as whd and preserve shape. |
| src/layer/arm/scale_arm.cpp | Extend Scale arm to support dims==4; update fallback condition. |
| src/layer/arm/rmsnorm_arm.cpp | Add 4D RMSNorm handling on arm (fp32 + bf16). |
| src/layer/arm/rmsnorm_arm_asimdhp.cpp | Add 4D fp16 RMSNorm handling on arm. |
| src/layer/arm/requantize_arm.cpp | Extend requantize arm to allocate/process dims==4 (size whd). |
| src/layer/arm/quantize_arm_asimdhp.cpp | Extend fp16 quantize arm to allocate/process dims==4. |
| src/layer/arm/prelu_arm.cpp | Extend PReLU arm to support dims==4 (size whd). |
| src/layer/arm/prelu_arm_asimdhp.cpp | Extend fp16 PReLU arm to support dims==4. |
| src/layer/arm/layernorm_arm.cpp | Add 4D LayerNorm handling on arm (fp32 + bf16). |
| src/layer/arm/layernorm_arm_asimdhp.cpp | Add 4D fp16 LayerNorm handling on arm. |
| src/layer/arm/instancenorm_arm.cpp | Extend InstanceNorm arm to compute stats over whd. |
| src/layer/arm/instancenorm_arm_asimdhp.cpp | Extend fp16 InstanceNorm arm to compute stats over whd. |
| src/layer/arm/dequantize_arm.cpp | Extend dequantize arm to allocate/process dims==4 (incl. bf16). |
| src/layer/arm/dequantize_arm_asimdhp.cpp | Extend fp16 dequantize arm to allocate/process dims==4. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.