support 4d mat in more operators by nihui · Pull Request #6737 · Tencent/ncnn

nihui · 2026-05-22T03:18:12Z

Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.

tencent-adm · 2026-05-22T03:18:28Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-05-22T03:21:59Z

Codecov Report

❌ Patch coverage is 99.36190% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.01%. Comparing base (866b73d) to head (6d81c8a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/loongarch/shufflechannel_loongarch.cpp	87.50%	2 Missing ⚠️
src/layer/arm/dequantize_arm.cpp	90.00%	1 Missing ⚠️
src/layer/arm/dequantize_arm_asimdhp.cpp	85.71%	1 Missing ⚠️
src/layer/loongarch/quantize_loongarch.cpp	93.75%	1 Missing ⚠️
src/layer/mips/quantize_mips.cpp	93.75%	1 Missing ⚠️
src/layer/mvn.cpp	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6737      +/-   ##
==========================================
+ Coverage   95.98%   96.01%   +0.03%     
==========================================
  Files         965      965              
  Lines      404957   405109     +152     
==========================================
+ Hits       388681   388968     +287     
+ Misses      16276    16141     -135

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nihui · 2026-05-22T04:52:56Z

@codex review

Copilot

Pull request overview

This PR extends NCNN’s operator implementations and the pnnx conversion/test suite to support 4D Mat (w, h, d, c) across more layers and backends (generic CPU + arch-specific + Vulkan), with expanded coverage via both C++ layer tests and pnnx Torch operator tests.

Changes:

Add/extend 4D tensor handling in multiple layers (math ops, normalization/norm layers, shuffle/split, quantize/dequantize/requantize) across generic, x86/ARM/RISC-V/MIPS/LoongArch, and Vulkan paths.
Update Vulkan pipelines/shaders to treat 4D as flattened height×depth where appropriate and adjust indexing/grouping logic for norm-related kernels.
Expand C++ layer tests and pnnx Torch tests to include 4D-shaped inputs; add new tests and a new pnnx NCNN pass for nn.InstanceNorm3d.

Reviewed changes

Copilot reviewed 155 out of 155 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tools/pnnx/tests/ncnn/test_torch_pow.py	Extend pow test to include 4D input/output cases.
tools/pnnx/tests/ncnn/test_torch_erf.py	New Torch erf test covering 1D/3D/4D inputs.
tools/pnnx/tests/ncnn/test_torch_cumsum.py	Extend cumsum test to cover 4D and multiple axes.
tools/pnnx/tests/ncnn/test_nn_Softplus.py	New nn.Softplus module test including 4D input.
tools/pnnx/tests/ncnn/test_nn_RMSNorm.py	Extend RMSNorm module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_PReLU.py	Extend PReLU module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_LayerNorm.py	Extend LayerNorm module test to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_nn_InstanceNorm3d.py	New nn.InstanceNorm3d module test (batch+4D -> 4D Mat).
tools/pnnx/tests/ncnn/test_nn_GLU.py	Extend GLU module test to include 4D input and more dims.
tools/pnnx/tests/ncnn/test_nn_ChannelShuffle.py	Extend ChannelShuffle module test to include 4D input.
tools/pnnx/tests/ncnn/test_F_softplus.py	New functional softplus test including 4D input.
tools/pnnx/tests/ncnn/test_F_rms_norm.py	Extend functional rms_norm to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_prelu.py	Extend functional prelu to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_pad.py	Extend functional pad test with 4D padding modes.
tools/pnnx/tests/ncnn/test_F_normalize.py	Extend functional normalize test to include 4D input.
tools/pnnx/tests/ncnn/test_F_layer_norm.py	Extend functional layer_norm to include (batch+4D) shapes.
tools/pnnx/tests/ncnn/test_F_instance_norm.py	New functional instance_norm test covering 3D+4D mats.
tools/pnnx/tests/ncnn/test_F_glu.py	Extend functional GLU test to include 4D input and axes.
tools/pnnx/tests/ncnn/CMakeLists.txt	Register new/extended pnnx-ncnn tests (erf, softplus, instance_norm, InstanceNorm3d).
tools/pnnx/src/pass_ncnn/nn_InstanceNorm3d.cpp	New pass mapping `nn.InstanceNorm3d` to NCNN InstanceNorm.
tools/pnnx/src/pass_ncnn/F_normalize.cpp	Extend normalize pass to accept rank-4 tensors (after batch removal).
tools/pnnx/src/CMakeLists.txt	Build-system wiring for the new InstanceNorm3d pass.
tests/test_statisticspooling.cpp	New layer test including 4D inputs for StatisticsPooling.
tests/test_split.cpp	New layer test covering 4D Split output counts.
tests/test_softplus.cpp	Add 4D RandomMat coverage and improved debug print.
tests/test_shufflechannel.cpp	Refactor helper to accept Mat; add 4D ShuffleChannel coverage.
tests/test_scale.cpp	Add dims==4 scale sizing + new 4D test cases.
tests/test_rmsnorm.cpp	Extend RMSNorm tests to include 4D mats + improved debug print.
tests/test_requantize.cpp	Add 4D requantize tests and adjust pack8 forcing for riscv.
tests/test_quantize.cpp	Add 4D quantize randomization + new 4D test cases.
tests/test_prelu.cpp	Add 4D PReLU test cases + improved debug print.
tests/test_power.cpp	Add 4D Power test cases + improved debug print.
tests/test_padding.cpp	Add additional padding test parameter coverage (incl. 3D padding params).
tests/test_normalize.cpp	Add 4D Normalize tests + improved debug print.
tests/test_noop.cpp	Add explicit 4D Noop tests + improved debug print.
tests/test_mvn.cpp	New MVN layer test with 4D inputs.
tests/test_memorydata.cpp	Add 4D MemoryData coverage (incl. param 11=d) + helper factoring.
tests/test_log.cpp	Add 4D Log test cases.
tests/test_layernorm.cpp	Add 4D LayerNorm tests + improved debug print.
tests/test_instancenorm.cpp	Add 4D InstanceNorm tests + improved debug print.
tests/test_input.cpp	New Input layer test including 4D inputs (param 11=d).
tests/test_hardswish.cpp	Add 4D HardSwish tests + improved debug print.
tests/test_hardsigmoid.cpp	Fix HardSigmoid beta ParamDict index; add 4D tests + improved debug print.
tests/test_glu.cpp	Add explicit 4D GLU axis coverage + improved debug print.
tests/test_expanddims.cpp	Add additional axis coverage (incl. negative axes) relevant to higher-rank behavior.
tests/test_exp.cpp	Add 4D Exp test cases.
tests/test_erf.cpp	Add 4D Erf test cases + improved debug print.
tests/test_dropout.cpp	Add 4D Dropout test cases + improved debug print.
tests/test_dequantize.cpp	Add 4D dequantize tests and adjust pack8 forcing for riscv.
tests/test_deepcopy.cpp	Add 4D DeepCopy test cases + improved debug print.
tests/test_cumulativesum.cpp	Add explicit 4D CumulativeSum axis coverage + improved debug print.
tests/test_cast.cpp	Skip GPU fp16p cast test if fp16 packed not supported.
tests/test_bnll.cpp	Add 4D BNLL test cases + improved debug print.
tests/test_absval.cpp	Add 4D AbsVal test cases + improved debug print.
tests/CMakeLists.txt	Register new tests (Input/MVN/Split/StatisticsPooling).
src/layer/x86/shufflechannel_x86.cpp	Extend ShuffleChannel x86 to treat 4D as whd per channel and preserve shape.
src/layer/x86/rmsnorm_x86.cpp	Add 4D RMSNorm handling paths (fp32 + bf16).
src/layer/x86/requantize_x86.cpp	Extend requantize x86 to support dims==4 allocation and processing size whd.
src/layer/x86/quantize_x86.cpp	Extend quantize x86 to support dims==4 allocation and processing size whd.
src/layer/x86/quantize_bf16s.h	Extend bf16 quantize helper to handle dims==4.
src/layer/x86/prelu_x86.cpp	Extend PReLU x86 to apply over dims==4 (size whd).
src/layer/x86/layernorm_x86.cpp	Add 4D LayerNorm handling (fp32 + bf16).
src/layer/x86/instancenorm_x86.cpp	Extend InstanceNorm x86 to compute stats over whd.
src/layer/x86/dropout_x86.cpp	Extend Dropout x86 to process dims==4 (size whd).
src/layer/x86/dequantize_x86.cpp	Extend dequantize x86 to process dims==4 (size whd).
src/layer/x86/dequantize_bf16s.h	Extend bf16 dequantize helper to allocate/process dims==4.
src/layer/vulkan/shufflechannel_vulkan.cpp	Treat 4D as flattened h*d in pipeline constants/specialization; preserve shape on output.
src/layer/vulkan/scale_vulkan.cpp	Treat 4D as flattened h*d for pipeline sizing/constants.
src/layer/vulkan/rmsnorm_vulkan.cpp	Extend grouping logic for dims==4 and flattened h*d layout.
src/layer/vulkan/prelu_vulkan.cpp	Treat 4D as flattened h*d in pipeline sizing/constants.
src/layer/vulkan/normalize_vulkan.cpp	Treat 4D as flattened h*d for reductions and constants.
src/layer/vulkan/layernorm_vulkan.cpp	Extend grouping logic for dims==4 and flattened h*d layout.
src/layer/vulkan/innerproduct_vulkan.cpp	Flatten includes depth dimension when preparing shape for GEMM path.
src/layer/vulkan/deepcopy_vulkan.cpp	Treat 4D as flattened h*d in pipeline sizing/constants.
src/layer/vulkan/shader/rmsnorm_norm.comp	Generalize coeff indexing for flattened h*d layout.
src/layer/vulkan/shader/rmsnorm_norm_pack4.comp	Same as above for pack4 path.
src/layer/vulkan/shader/padding_3d.comp	Implement replicate/reflect for 3D padding shader.
src/layer/vulkan/shader/padding_3d_pack4.comp	Same as above for pack4 path.
src/layer/vulkan/shader/layernorm_sub_mean_square.comp	Generalize group indexing for flattened h*d layout.
src/layer/vulkan/shader/layernorm_sub_mean_square_pack4.comp	Same as above for pack4 path.
src/layer/vulkan/shader/layernorm_norm.comp	Generalize group/inner indexing for flattened h*d layout.
src/layer/vulkan/shader/layernorm_norm_pack4.comp	Same as above for pack4 path.
src/layer/vulkan/shader/instancenorm_reduce_sum4_fp32.comp	Fix tail handling for reduce-sum4 over width.
src/layer/statisticspooling.cpp	Include depth in pooling size and use faster variance accumulation.
src/layer/shufflechannel.cpp	Extend generic ShuffleChannel to treat 4D as whd per channel.
src/layer/scale.cpp	Extend Scale to support dims==4 (size whd).
src/layer/rmsnorm.cpp	Add generic 4D RMSNorm implementation.
src/layer/requantize.cpp	Extend generic requantize to allocate/process dims==4 (size whd).
src/layer/quantize.cpp	Extend generic quantize to allocate/process dims==4 (size whd).
src/layer/prelu.cpp	Extend generic PReLU to support dims==4 (size whd).
src/layer/power.cpp	Extend Power to process dims==4 (size whd).
src/layer/normalize.cpp	Extend Normalize to process dims==4 (size whd).
src/layer/mvn.cpp	Extend MVN output shape and internal size to include depth.
src/layer/log.cpp	Extend Log to process dims==4 (size whd).
src/layer/layernorm.cpp	Add generic 4D LayerNorm implementation.
src/layer/instancenorm.cpp	Extend InstanceNorm to compute stats over whd.
src/layer/innerproduct.cpp	Include depth in flattened size computation (whd).
src/layer/hardswish.cpp	Extend HardSwish to process dims==4 (size whd).
src/layer/hardsigmoid.cpp	Extend HardSigmoid to process dims==4 (size whd).
src/layer/glu.cpp	Add explicit dims==4 GLU implementations for all axes.
src/layer/exp.cpp	Extend Exp to process dims==4 (size whd).
src/layer/erf.cpp	Extend Erf to process dims==4 (size whd).
src/layer/dropout.cpp	Extend Dropout to process dims==4 (size whd).
src/layer/dequantize.cpp	Extend Dequantize to process dims==4 (size whd).
src/layer/cumulativesum.cpp	Add explicit dims==4 cumulative-sum implementations for all axes.
src/layer/bnll.cpp	Extend BNLL to process dims==4 (size whd).
src/layer/absval.cpp	Extend AbsVal to process dims==4 (size whd).
src/layer/riscv/requantize_riscv.cpp	Extend requantize riscv to allocate/process dims==4 (size whd).
src/layer/riscv/quantize_riscv.cpp	Extend quantize riscv to allocate/process dims==4 (size whd).
src/layer/riscv/quantize_riscv_zfh.cpp	Extend fp16 quantize riscv (zfh) to allocate/process dims==4.
src/layer/riscv/prelu_riscv.cpp	Extend PReLU riscv to support dims==4 (size whd).
src/layer/riscv/prelu_riscv_zfh.cpp	Extend fp16 PReLU riscv (zfh) to support dims==4.
src/layer/riscv/layernorm_riscv.cpp	Add 4D LayerNorm handling on riscv.
src/layer/riscv/layernorm_riscv_zfh.cpp	Add 4D fp16 LayerNorm handling on riscv (zfh).
src/layer/riscv/instancenorm_riscv.cpp	Extend InstanceNorm riscv to compute stats over whd.
src/layer/riscv/instancenorm_riscv_zfh.cpp	Extend fp16 InstanceNorm riscv (zfh) to compute stats over whd.
src/layer/riscv/dequantize_riscv.cpp	Extend dequantize riscv to process dims==4 (size whd).
src/layer/riscv/dequantize_riscv_zfh.cpp	Extend fp16 dequantize riscv (zfh) to allocate/process dims==4.
src/layer/mips/shufflechannel_mips.cpp	Extend ShuffleChannel mips to treat 4D as whd and preserve shape.
src/layer/mips/scale_mips.cpp	Extend Scale mips to support dims==4 (size whd).
src/layer/mips/rmsnorm_mips.cpp	Add 4D RMSNorm handling on mips (fp32 + bf16).
src/layer/mips/requantize_mips.cpp	Extend requantize mips to allocate/process dims==4 (size whd).
src/layer/mips/prelu_mips.cpp	Extend PReLU mips to support dims==4 (size whd).
src/layer/mips/layernorm_mips.cpp	Add 4D LayerNorm handling on mips (fp32 + bf16).
src/layer/mips/instancenorm_mips.cpp	Extend InstanceNorm mips to compute stats over whd.
src/layer/mips/dequantize_mips.cpp	Extend dequantize mips to allocate/process dims==4 and compute size whd.
src/layer/loongarch/scale_loongarch.cpp	Extend Scale loongarch to support dims==4 (size whd).
src/layer/loongarch/rmsnorm_loongarch.cpp	Add 4D RMSNorm handling on loongarch (fp32 + bf16).
src/layer/loongarch/requantize_loongarch.cpp	Extend requantize loongarch to allocate/process dims==4 (size whd).
src/layer/loongarch/prelu_loongarch.cpp	Extend PReLU loongarch to support dims==4 (size whd).
src/layer/loongarch/layernorm_loongarch.cpp	Add 4D LayerNorm handling on loongarch (fp32 + bf16).
src/layer/loongarch/instancenorm_loongarch.cpp	Extend InstanceNorm loongarch to compute stats over whd.
src/layer/loongarch/dequantize_loongarch.cpp	Extend dequantize loongarch to allocate/process dims==4 and compute size whd.
src/layer/arm/shufflechannel_arm.cpp	Extend ShuffleChannel arm to treat 4D as whd and preserve shape.
src/layer/arm/scale_arm.cpp	Extend Scale arm to support dims==4; update fallback condition.
src/layer/arm/rmsnorm_arm.cpp	Add 4D RMSNorm handling on arm (fp32 + bf16).
src/layer/arm/rmsnorm_arm_asimdhp.cpp	Add 4D fp16 RMSNorm handling on arm.
src/layer/arm/requantize_arm.cpp	Extend requantize arm to allocate/process dims==4 (size whd).
src/layer/arm/quantize_arm_asimdhp.cpp	Extend fp16 quantize arm to allocate/process dims==4.
src/layer/arm/prelu_arm.cpp	Extend PReLU arm to support dims==4 (size whd).
src/layer/arm/prelu_arm_asimdhp.cpp	Extend fp16 PReLU arm to support dims==4.
src/layer/arm/layernorm_arm.cpp	Add 4D LayerNorm handling on arm (fp32 + bf16).
src/layer/arm/layernorm_arm_asimdhp.cpp	Add 4D fp16 LayerNorm handling on arm.
src/layer/arm/instancenorm_arm.cpp	Extend InstanceNorm arm to compute stats over whd.
src/layer/arm/instancenorm_arm_asimdhp.cpp	Extend fp16 InstanceNorm arm to compute stats over whd.
src/layer/arm/dequantize_arm.cpp	Extend dequantize arm to allocate/process dims==4 (incl. bf16).
src/layer/arm/dequantize_arm_asimdhp.cpp	Extend fp16 dequantize arm to allocate/process dims==4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector · 2026-05-22T04:58:29Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

support 4d mat in more operators

2859f27

Add 4d Mat handling across generic and optimized backends, extend operator tests, and document supported input and output Mat dimensions.

github-actions Bot added riscv vulkan test layer arm loongarch mips x86 doc labels May 22, 2026

nihui added 2 commits May 22, 2026 11:38

add pnnx ncnn 4d coverage

5f6c2fd

f

0a9d5e5

github-actions Bot added tool pnnx labels May 22, 2026

nihui requested a review from Copilot May 22, 2026 04:52

Copilot started reviewing on behalf of nihui May 22, 2026 04:52 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

f

6d81c8a

nihui merged commit 706c4d1 into Tencent:master May 22, 2026
171 of 176 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support 4d mat in more operators#6737

support 4d mat in more operators#6737
nihui merged 4 commits into
Tencent:masterfrom
nihui:op4d

nihui commented May 22, 2026

Uh oh!

tencent-adm commented May 22, 2026

Uh oh!

codecov-commenter commented May 22, 2026 •

edited

Loading

Uh oh!

nihui commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nihui commented May 22, 2026

Uh oh!

tencent-adm commented May 22, 2026

Uh oh!

codecov-commenter commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nihui commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 22, 2026 •

edited

Loading