Skip to content

support 4d mat in more operators#6737

Merged
nihui merged 4 commits into
Tencent:masterfrom
nihui:op4d
May 22, 2026
Merged

support 4d mat in more operators#6737
nihui merged 4 commits into
Tencent:masterfrom
nihui:op4d

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented May 22, 2026

Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.

Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

❌ Patch coverage is 99.36190% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.01%. Comparing base (866b73d) to head (6d81c8a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/loongarch/shufflechannel_loongarch.cpp 87.50% 2 Missing ⚠️
src/layer/arm/dequantize_arm.cpp 90.00% 1 Missing ⚠️
src/layer/arm/dequantize_arm_asimdhp.cpp 85.71% 1 Missing ⚠️
src/layer/loongarch/quantize_loongarch.cpp 93.75% 1 Missing ⚠️
src/layer/mips/quantize_mips.cpp 93.75% 1 Missing ⚠️
src/layer/mvn.cpp 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6737      +/-   ##
==========================================
+ Coverage   95.98%   96.01%   +0.03%     
==========================================
  Files         965      965              
  Lines      404957   405109     +152     
==========================================
+ Hits       388681   388968     +287     
+ Misses      16276    16141     -135     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui
Copy link
Copy Markdown
Member Author

nihui commented May 22, 2026

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends NCNN’s operator implementations and the pnnx conversion/test suite to support 4D Mat (w, h, d, c) across more layers and backends (generic CPU + arch-specific + Vulkan), with expanded coverage via both C++ layer tests and pnnx Torch operator tests.

Changes:

  • Add/extend 4D tensor handling in multiple layers (math ops, normalization/norm layers, shuffle/split, quantize/dequantize/requantize) across generic, x86/ARM/RISC-V/MIPS/LoongArch, and Vulkan paths.
  • Update Vulkan pipelines/shaders to treat 4D as flattened height×depth where appropriate and adjust indexing/grouping logic for norm-related kernels.
  • Expand C++ layer tests and pnnx Torch tests to include 4D-shaped inputs; add new tests and a new pnnx NCNN pass for nn.InstanceNorm3d.

Reviewed changes

Copilot reviewed 155 out of 155 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tools/pnnx/tests/ncnn/test_torch_pow.py Extend pow test to include 4D input/output cases.
tools/pnnx/tests/ncnn/test_torch_erf.py New Torch erf test covering 1D/3D/4D inputs.
tools/pnnx/tests/ncnn/test_torch_cumsum.py Extend cumsum test to cover 4D and multiple axes.
tools/pnnx/tests/ncnn/test_nn_Softplus.py New nn.Softplus module test including 4D input.
tools/pnnx/tests/ncnn/test_nn_RMSNorm.py Extend RMSNorm module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_PReLU.py Extend PReLU module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_LayerNorm.py Extend LayerNorm module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_InstanceNorm3d.py New nn.InstanceNorm3d module test (batch+4D -> 4D Mat).
tools/pnnx/tests/ncnn/test_nn_GLU.py Extend GLU module test to include 4D input and more dims.
tools/pnnx/tests/ncnn/test_nn_ChannelShuffle.py Extend ChannelShuffle module test to include 4D input.
tools/pnnx/tests/ncnn/test_F_softplus.py New functional softplus test including 4D input.
tools/pnnx/tests/ncnn/test_F_rms_norm.py Extend functional rms_norm to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_prelu.py Extend functional prelu to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_pad.py Extend functional pad test with 4D padding modes.
tools/pnnx/tests/ncnn/test_F_normalize.py Extend functional normalize test to include 4D input.
tools/pnnx/tests/ncnn/test_F_layer_norm.py Extend functional layer_norm to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_instance_norm.py New functional instance_norm test covering 3D+4D mats.
tools/pnnx/tests/ncnn/test_F_glu.py Extend functional GLU test to include 4D input and axes.
tools/pnnx/tests/ncnn/CMakeLists.txt Register new/extended pnnx-ncnn tests (erf, softplus, instance_norm, InstanceNorm3d).
tools/pnnx/src/pass_ncnn/nn_InstanceNorm3d.cpp New pass mapping nn.InstanceNorm3d to NCNN InstanceNorm.
tools/pnnx/src/pass_ncnn/F_normalize.cpp Extend normalize pass to accept rank-4 tensors (after batch removal).
tools/pnnx/src/CMakeLists.txt Build-system wiring for the new InstanceNorm3d pass.
tests/test_statisticspooling.cpp New layer test including 4D inputs for StatisticsPooling.
tests/test_split.cpp New layer test covering 4D Split output counts.
tests/test_softplus.cpp Add 4D RandomMat coverage and improved debug print.
tests/test_shufflechannel.cpp Refactor helper to accept Mat; add 4D ShuffleChannel coverage.
tests/test_scale.cpp Add dims==4 scale sizing + new 4D test cases.
tests/test_rmsnorm.cpp Extend RMSNorm tests to include 4D mats + improved debug print.
tests/test_requantize.cpp Add 4D requantize tests and adjust pack8 forcing for riscv.
tests/test_quantize.cpp Add 4D quantize randomization + new 4D test cases.
tests/test_prelu.cpp Add 4D PReLU test cases + improved debug print.
tests/test_power.cpp Add 4D Power test cases + improved debug print.
tests/test_padding.cpp Add additional padding test parameter coverage (incl. 3D padding params).
tests/test_normalize.cpp Add 4D Normalize tests + improved debug print.
tests/test_noop.cpp Add explicit 4D Noop tests + improved debug print.
tests/test_mvn.cpp New MVN layer test with 4D inputs.
tests/test_memorydata.cpp Add 4D MemoryData coverage (incl. param 11=d) + helper factoring.
tests/test_log.cpp Add 4D Log test cases.
tests/test_layernorm.cpp Add 4D LayerNorm tests + improved debug print.
tests/test_instancenorm.cpp Add 4D InstanceNorm tests + improved debug print.
tests/test_input.cpp New Input layer test including 4D inputs (param 11=d).
tests/test_hardswish.cpp Add 4D HardSwish tests + improved debug print.
tests/test_hardsigmoid.cpp Fix HardSigmoid beta ParamDict index; add 4D tests + improved debug print.
tests/test_glu.cpp Add explicit 4D GLU axis coverage + improved debug print.
tests/test_expanddims.cpp Add additional axis coverage (incl. negative axes) relevant to higher-rank behavior.
tests/test_exp.cpp Add 4D Exp test cases.
tests/test_erf.cpp Add 4D Erf test cases + improved debug print.
tests/test_dropout.cpp Add 4D Dropout test cases + improved debug print.
tests/test_dequantize.cpp Add 4D dequantize tests and adjust pack8 forcing for riscv.
tests/test_deepcopy.cpp Add 4D DeepCopy test cases + improved debug print.
tests/test_cumulativesum.cpp Add explicit 4D CumulativeSum axis coverage + improved debug print.
tests/test_cast.cpp Skip GPU fp16p cast test if fp16 packed not supported.
tests/test_bnll.cpp Add 4D BNLL test cases + improved debug print.
tests/test_absval.cpp Add 4D AbsVal test cases + improved debug print.
tests/CMakeLists.txt Register new tests (Input/MVN/Split/StatisticsPooling).
src/layer/x86/shufflechannel_x86.cpp Extend ShuffleChannel x86 to treat 4D as whd per channel and preserve shape.
src/layer/x86/rmsnorm_x86.cpp Add 4D RMSNorm handling paths (fp32 + bf16).
src/layer/x86/requantize_x86.cpp Extend requantize x86 to support dims==4 allocation and processing size whd.
src/layer/x86/quantize_x86.cpp Extend quantize x86 to support dims==4 allocation and processing size whd.
src/layer/x86/quantize_bf16s.h Extend bf16 quantize helper to handle dims==4.
src/layer/x86/prelu_x86.cpp Extend PReLU x86 to apply over dims==4 (size whd).
src/layer/x86/layernorm_x86.cpp Add 4D LayerNorm handling (fp32 + bf16).
src/layer/x86/instancenorm_x86.cpp Extend InstanceNorm x86 to compute stats over whd.
src/layer/x86/dropout_x86.cpp Extend Dropout x86 to process dims==4 (size whd).
src/layer/x86/dequantize_x86.cpp Extend dequantize x86 to process dims==4 (size whd).
src/layer/x86/dequantize_bf16s.h Extend bf16 dequantize helper to allocate/process dims==4.
src/layer/vulkan/shufflechannel_vulkan.cpp Treat 4D as flattened h*d in pipeline constants/specialization; preserve shape on output.
src/layer/vulkan/scale_vulkan.cpp Treat 4D as flattened h*d for pipeline sizing/constants.
src/layer/vulkan/rmsnorm_vulkan.cpp Extend grouping logic for dims==4 and flattened h*d layout.
src/layer/vulkan/prelu_vulkan.cpp Treat 4D as flattened h*d in pipeline sizing/constants.
src/layer/vulkan/normalize_vulkan.cpp Treat 4D as flattened h*d for reductions and constants.
src/layer/vulkan/layernorm_vulkan.cpp Extend grouping logic for dims==4 and flattened h*d layout.
src/layer/vulkan/innerproduct_vulkan.cpp Flatten includes depth dimension when preparing shape for GEMM path.
src/layer/vulkan/deepcopy_vulkan.cpp Treat 4D as flattened h*d in pipeline sizing/constants.
src/layer/vulkan/shader/rmsnorm_norm.comp Generalize coeff indexing for flattened h*d layout.
src/layer/vulkan/shader/rmsnorm_norm_pack4.comp Same as above for pack4 path.
src/layer/vulkan/shader/padding_3d.comp Implement replicate/reflect for 3D padding shader.
src/layer/vulkan/shader/padding_3d_pack4.comp Same as above for pack4 path.
src/layer/vulkan/shader/layernorm_sub_mean_square.comp Generalize group indexing for flattened h*d layout.
src/layer/vulkan/shader/layernorm_sub_mean_square_pack4.comp Same as above for pack4 path.
src/layer/vulkan/shader/layernorm_norm.comp Generalize group/inner indexing for flattened h*d layout.
src/layer/vulkan/shader/layernorm_norm_pack4.comp Same as above for pack4 path.
src/layer/vulkan/shader/instancenorm_reduce_sum4_fp32.comp Fix tail handling for reduce-sum4 over width.
src/layer/statisticspooling.cpp Include depth in pooling size and use faster variance accumulation.
src/layer/shufflechannel.cpp Extend generic ShuffleChannel to treat 4D as whd per channel.
src/layer/scale.cpp Extend Scale to support dims==4 (size whd).
src/layer/rmsnorm.cpp Add generic 4D RMSNorm implementation.
src/layer/requantize.cpp Extend generic requantize to allocate/process dims==4 (size whd).
src/layer/quantize.cpp Extend generic quantize to allocate/process dims==4 (size whd).
src/layer/prelu.cpp Extend generic PReLU to support dims==4 (size whd).
src/layer/power.cpp Extend Power to process dims==4 (size whd).
src/layer/normalize.cpp Extend Normalize to process dims==4 (size whd).
src/layer/mvn.cpp Extend MVN output shape and internal size to include depth.
src/layer/log.cpp Extend Log to process dims==4 (size whd).
src/layer/layernorm.cpp Add generic 4D LayerNorm implementation.
src/layer/instancenorm.cpp Extend InstanceNorm to compute stats over whd.
src/layer/innerproduct.cpp Include depth in flattened size computation (whd).
src/layer/hardswish.cpp Extend HardSwish to process dims==4 (size whd).
src/layer/hardsigmoid.cpp Extend HardSigmoid to process dims==4 (size whd).
src/layer/glu.cpp Add explicit dims==4 GLU implementations for all axes.
src/layer/exp.cpp Extend Exp to process dims==4 (size whd).
src/layer/erf.cpp Extend Erf to process dims==4 (size whd).
src/layer/dropout.cpp Extend Dropout to process dims==4 (size whd).
src/layer/dequantize.cpp Extend Dequantize to process dims==4 (size whd).
src/layer/cumulativesum.cpp Add explicit dims==4 cumulative-sum implementations for all axes.
src/layer/bnll.cpp Extend BNLL to process dims==4 (size whd).
src/layer/absval.cpp Extend AbsVal to process dims==4 (size whd).
src/layer/riscv/requantize_riscv.cpp Extend requantize riscv to allocate/process dims==4 (size whd).
src/layer/riscv/quantize_riscv.cpp Extend quantize riscv to allocate/process dims==4 (size whd).
src/layer/riscv/quantize_riscv_zfh.cpp Extend fp16 quantize riscv (zfh) to allocate/process dims==4.
src/layer/riscv/prelu_riscv.cpp Extend PReLU riscv to support dims==4 (size whd).
src/layer/riscv/prelu_riscv_zfh.cpp Extend fp16 PReLU riscv (zfh) to support dims==4.
src/layer/riscv/layernorm_riscv.cpp Add 4D LayerNorm handling on riscv.
src/layer/riscv/layernorm_riscv_zfh.cpp Add 4D fp16 LayerNorm handling on riscv (zfh).
src/layer/riscv/instancenorm_riscv.cpp Extend InstanceNorm riscv to compute stats over whd.
src/layer/riscv/instancenorm_riscv_zfh.cpp Extend fp16 InstanceNorm riscv (zfh) to compute stats over whd.
src/layer/riscv/dequantize_riscv.cpp Extend dequantize riscv to process dims==4 (size whd).
src/layer/riscv/dequantize_riscv_zfh.cpp Extend fp16 dequantize riscv (zfh) to allocate/process dims==4.
src/layer/mips/shufflechannel_mips.cpp Extend ShuffleChannel mips to treat 4D as whd and preserve shape.
src/layer/mips/scale_mips.cpp Extend Scale mips to support dims==4 (size whd).
src/layer/mips/rmsnorm_mips.cpp Add 4D RMSNorm handling on mips (fp32 + bf16).
src/layer/mips/requantize_mips.cpp Extend requantize mips to allocate/process dims==4 (size whd).
src/layer/mips/prelu_mips.cpp Extend PReLU mips to support dims==4 (size whd).
src/layer/mips/layernorm_mips.cpp Add 4D LayerNorm handling on mips (fp32 + bf16).
src/layer/mips/instancenorm_mips.cpp Extend InstanceNorm mips to compute stats over whd.
src/layer/mips/dequantize_mips.cpp Extend dequantize mips to allocate/process dims==4 and compute size whd.
src/layer/loongarch/scale_loongarch.cpp Extend Scale loongarch to support dims==4 (size whd).
src/layer/loongarch/rmsnorm_loongarch.cpp Add 4D RMSNorm handling on loongarch (fp32 + bf16).
src/layer/loongarch/requantize_loongarch.cpp Extend requantize loongarch to allocate/process dims==4 (size whd).
src/layer/loongarch/prelu_loongarch.cpp Extend PReLU loongarch to support dims==4 (size whd).
src/layer/loongarch/layernorm_loongarch.cpp Add 4D LayerNorm handling on loongarch (fp32 + bf16).
src/layer/loongarch/instancenorm_loongarch.cpp Extend InstanceNorm loongarch to compute stats over whd.
src/layer/loongarch/dequantize_loongarch.cpp Extend dequantize loongarch to allocate/process dims==4 and compute size whd.
src/layer/arm/shufflechannel_arm.cpp Extend ShuffleChannel arm to treat 4D as whd and preserve shape.
src/layer/arm/scale_arm.cpp Extend Scale arm to support dims==4; update fallback condition.
src/layer/arm/rmsnorm_arm.cpp Add 4D RMSNorm handling on arm (fp32 + bf16).
src/layer/arm/rmsnorm_arm_asimdhp.cpp Add 4D fp16 RMSNorm handling on arm.
src/layer/arm/requantize_arm.cpp Extend requantize arm to allocate/process dims==4 (size whd).
src/layer/arm/quantize_arm_asimdhp.cpp Extend fp16 quantize arm to allocate/process dims==4.
src/layer/arm/prelu_arm.cpp Extend PReLU arm to support dims==4 (size whd).
src/layer/arm/prelu_arm_asimdhp.cpp Extend fp16 PReLU arm to support dims==4.
src/layer/arm/layernorm_arm.cpp Add 4D LayerNorm handling on arm (fp32 + bf16).
src/layer/arm/layernorm_arm_asimdhp.cpp Add 4D fp16 LayerNorm handling on arm.
src/layer/arm/instancenorm_arm.cpp Extend InstanceNorm arm to compute stats over whd.
src/layer/arm/instancenorm_arm_asimdhp.cpp Extend fp16 InstanceNorm arm to compute stats over whd.
src/layer/arm/dequantize_arm.cpp Extend dequantize arm to allocate/process dims==4 (incl. bf16).
src/layer/arm/dequantize_arm_asimdhp.cpp Extend fp16 dequantize arm to allocate/process dims==4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@nihui nihui merged commit 706c4d1 into Tencent:master May 22, 2026
171 of 176 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants