Skip to content

[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183

Open
runwangdl wants to merge 14 commits intopulp-platform:develfrom
runwangdl:gap9-ne16
Open

[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183
runwangdl wants to merge 14 commits intopulp-platform:develfrom
runwangdl:gap9-ne16

Conversation

@runwangdl
Copy link
Copy Markdown
Contributor

@runwangdl runwangdl commented Apr 13, 2026

Adds the NE16 neural engine as an accelerator Engine on top of the existing GAP9 platform, registered as a new composite platform GAP9_w_NE16 that mirrors the Siracusa_w_neureka pattern.

Added

  • Deeploy/Targets/NE16/ — full Target: Platform.py (NE16Platform extends GAP9Platform, engines=[NE16Engine, GAP9ClusterEngine]), Engine.py, Bindings.py, Parsers.py, Tiler.py, Deployer.py (extends GAP9Deployer to reuse ClDma transformers), Templates/{Allocate,Conv}Template.py, TileConstraints/{NE16Pointwise,NE16Depthwise,NE16Dense}Constraint.py, TopologyOptimizationPasses/Passes.py (incl. _weightEncode ported from pulp-nnx/test/Ne16Weight.py).
  • DeeployTest/deeployRunner_tiled_gap9_w_ne16.py and DeeployTest/test_gap9_ne16_tiled_config.py — new runner + minimal test config covering PW 1x1, DW 3x3, Dense 3x3 RQ Conv.
  • TargetLibraries/GAP9/CMakeLists.txt — for GAP9_w_NE16 platform, add_subdirectory on pulp-nnx with USE_NE16=ON and link it into deeploygap9.

Changed

  • DeeployTest/testUtils/platformMapping.py — register GAP9_w_NE16 in the platforms list, mapPlatform, setupMemoryPlatform, and mapDeployer.
  • DeeployTest/testMVP.py — include GAP9_w_NE16 in the EngineColoringDeployerWrapper branch (without it, NE16AdjustWeightMemoryLayoutPass never fires and parsing backtracks).
  • DeeployTest/testUtils/core/execution.py — build the GAP9 SDK image target for GAP9_w_NE16 too (so chip.soc.mram.bin is produced before gvsoc run).
  • CMakeLists.txt, DeeployTest/CMakeLists.txt — accept GAP9_w_NE16 alongside GAP9 in the platform branches.
  • Deeploy/Targets/NE16/Templates/ConvTemplate.py — swap Neureka-inherited subtile constants for NE16 spec per ne16_task_defs.h: CIN_SUBTILE 32/28 → 16 (single mode), output 63, per-(cout,cinMajor) weight bytes PW 16, DW/Dense 144.

Fixed

  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py — work around a pre-existing ImportError: cannot import name 'float32_tPtr' from 'Deeploy.AbstractDataTypes' by defining it locally via PointerClass(float32_t).

Test plan

Ran on gvsoc gap9.evk (inside ghcr.io/pulp-platform/deeploy-gap9:devel), full pipeline gen -> parse -> lower -> codegen -> CMake -> build -> gapy image -> gvsoc flash run:

Test Errors Runtime (cycles)
Kernels/Integer/Conv/PW_2D_RQ/Regular_RQ 0 / 1152 901 917
Kernels/Integer/Conv/DW_2D_RQ (--enable-3x3) 0 / 1280 27 339
Kernels/Integer/Conv/Regular_2D_RQ (--enable-3x3) 0 / 6372 244 595

Follow-up (out of scope for this PR): PW Unsigned_RQ variant currently produces non-zero errors — likely unsigned weight-offset or out_type conf0 handling; tracked separately.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants