[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9 by runwangdl · Pull Request #183 · pulp-platform/Deeploy

runwangdl · 2026-04-13T22:29:53Z

Adds the NE16 neural engine as an accelerator Engine on top of the existing GAP9 platform, registered as a new composite platform GAP9_w_NE16 that mirrors the Siracusa_w_neureka pattern.

Added

Deeploy/Targets/NE16/ — full Target: Platform.py (NE16Platform extends GAP9Platform, engines=[NE16Engine, GAP9ClusterEngine]), Engine.py, Bindings.py, Parsers.py, Tiler.py, Deployer.py (extends GAP9Deployer to reuse ClDma transformers), Templates/{Allocate,Conv}Template.py, TileConstraints/{NE16Pointwise,NE16Depthwise,NE16Dense}Constraint.py, TopologyOptimizationPasses/Passes.py (incl. _weightEncode ported from pulp-nnx/test/Ne16Weight.py).
DeeployTest/deeployRunner_tiled_gap9_w_ne16.py and DeeployTest/test_gap9_ne16_tiled_config.py — new runner + minimal test config covering PW 1x1, DW 3x3, Dense 3x3 RQ Conv.
TargetLibraries/GAP9/CMakeLists.txt — for GAP9_w_NE16 platform, add_subdirectory on pulp-nnx with USE_NE16=ON and link it into deeploygap9.

Changed

DeeployTest/testUtils/platformMapping.py — register GAP9_w_NE16 in the platforms list, mapPlatform, setupMemoryPlatform, and mapDeployer.
DeeployTest/testMVP.py — include GAP9_w_NE16 in the EngineColoringDeployerWrapper branch (without it, NE16AdjustWeightMemoryLayoutPass never fires and parsing backtracks).
DeeployTest/testUtils/core/execution.py — build the GAP9 SDK image target for GAP9_w_NE16 too (so chip.soc.mram.bin is produced before gvsoc run).
CMakeLists.txt, DeeployTest/CMakeLists.txt — accept GAP9_w_NE16 alongside GAP9 in the platform branches.
Deeploy/Targets/NE16/Templates/ConvTemplate.py — swap Neureka-inherited subtile constants for NE16 spec per ne16_task_defs.h: CIN_SUBTILE 32/28 → 16 (single mode), output 6 → 3, per-(cout,cinMajor) weight bytes PW 16, DW/Dense 144.

Fixed

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py — work around a pre-existing ImportError: cannot import name 'float32_tPtr' from 'Deeploy.AbstractDataTypes' by defining it locally via PointerClass(float32_t).

Test plan

Ran on gvsoc gap9.evk (inside ghcr.io/pulp-platform/deeploy-gap9:devel), full pipeline gen -> parse -> lower -> codegen -> CMake -> build -> gapy image -> gvsoc flash run:

Test	Errors	Runtime (cycles)
`Kernels/Integer/Conv/PW_2D_RQ/Regular_RQ`	0 / 1152	901 917
`Kernels/Integer/Conv/DW_2D_RQ` (`--enable-3x3`)	0 / 1280	27 339
`Kernels/Integer/Conv/Regular_2D_RQ` (`--enable-3x3`)	0 / 6372	244 595

Follow-up (out of scope for this PR): PW Unsigned_RQ variant currently produces non-zero errors — likely unsigned weight-offset or out_type conf0 handling; tracked separately.

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183

[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183
runwangdl wants to merge 14 commits intopulp-platform:develfrom
runwangdl:gap9-ne16

runwangdl commented Apr 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

runwangdl commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

Test plan

PR Merge Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

runwangdl commented Apr 13, 2026 •

edited

Loading