Skip to content

[Backend][Relax] Add NPU BYOC backend example#19425

Open
Aristide021 wants to merge 3 commits intoapache:mainfrom
Aristide021:contrib-npu-generic-v2
Open

[Backend][Relax] Add NPU BYOC backend example#19425
Aristide021 wants to merge 3 commits intoapache:mainfrom
Aristide021:contrib-npu-generic-v2

Conversation

@Aristide021
Copy link
Copy Markdown

Supersedes #18247. Per maintainer guidance, resubmitting as a fresh PR due to CI workflow changes affecting old PRs.

Summary

This PR adds an example NPU BYOC backend for Relax, including end-to-end integration points:

  • pattern registration (python/tvm/relax/backend/contrib/example_npu/patterns.py)
  • backend registration (python/tvm/relax/backend/contrib/example_npu/__init__.py)
  • codegen entrypoint (src/relax/backend/contrib/example_npu/codegen.cc)
  • runtime module (src/runtime/contrib/example_npu/example_npu_runtime.cc)
  • CMake integration (cmake/modules/contrib/ExampleNPU.cmake, CMakeLists.txt, cmake/modules/LibInfo.cmake, src/support/libinfo.cc)
  • tutorial/docs (docs/how_to/tutorials/byoc_npu_example.py, README under contrib path)
  • tests (tests/python/contrib/test_example_npu.py)
  • CI build config enablement (tests/scripts/task_config_build_cpu.sh)

Review feedback addressed from #18247

  • Test location under tests/python/contrib/
  • README includes explicit enable instructions for USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME
  • README quick-start uses inline MatmulReLU (no import from test module)
  • Added CMake source wiring and feature flags for runtime/codegen
  • Added docs tutorial under docs/how_to/tutorials/ (not only README)
  • Reorganized motivation/context section near top of README
  • Extended pattern coverage to include example_npu.softmax with tests

Validation

Local checks run:

  • pre-commit on touched files (pass)
  • PYTHONPATH=python python -m pytest -q tests/python/contrib/test_example_npu.py (pass)

Notes

This backend is an example/tutorial implementation (CPU-emulated) intended to document modern NPU-oriented BYOC integration patterns and provide a reference path for future hardware-specific backends.

Adds a vendor-neutral example NPU backend demonstrating the BYOC
(Bring Your Own Codegen) pattern for custom accelerator integration
in TVM's Relax framework.

Components added:
- python/tvm/relax/backend/contrib/example_npu/: pattern registry with
  op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch
  norm, softmax, activations, elementwise ops, quantization, and a
  fused conv2d+relu pattern
- src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer
  registered as relax.ext.example_npu
- src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime
  demonstrating NPU architectural concepts (memory hierarchy, tiling,
  execution engines, quantization) via CPU emulation
- cmake/modules/contrib/ExampleNPU.cmake: build integration via
  USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags
- docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through
  the full BYOC flow from pattern registration to runtime execution
- tests/python/contrib/test_example_npu.py: test suite covering pattern
  registration, graph partitioning, codegen, and end-to-end execution

CI is enabled via tests/scripts/task_config_build_cpu.sh.

Addresses reviewer feedback from apache#18247: cmake integration, self-
contained README with build instructions, tutorial in docs/how_to,
and Context section reorganization.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an example NPU backend for TVM's Relax framework, providing a JSON-based codegen, a C++ runtime that demonstrates architectural concepts like memory hierarchy and tiling, and Relax pattern registrations for offloading operations. The reviewer identified a potential division-by-zero bug in the quantization parameter calculation, suggested replacing placeholder logic in depthwise convolution checks with actual attribute verification, and recommended optimizing global function lookups in the compiler by moving them outside of loops.

Comment thread src/runtime/contrib/example_npu/example_npu_runtime.cc
Comment thread python/tvm/relax/backend/contrib/example_npu/patterns.py
Comment thread src/relax/backend/contrib/example_npu/codegen.cc
…mple

Fix three issues identified in automated code review of apache#19425:

- Fix division-by-zero in CalculateQuantizationParams when all tensor
  values are identical (zero range); clamp scale floor to 1e-7f, guard
  against empty input, and use std::round for zero_point accuracy
- Implement actual groups attribute check in _check_depthwise instead
  of relying solely on placeholder constraints; demonstrates how to
  access op attributes from PatternCheckContext
- Move GetGlobalRequired lookup outside the compiler loop in codegen.cc
  so the registry hash-map is queried once rather than per-function
@Aristide021 Aristide021 marked this pull request as ready for review April 20, 2026 15:49
Fully qualify make_object as tvm::ffi::make_object to fix GCC build
failure on CI. Clang accepted the unqualified form as a C++20 extension
but GCC requires explicit namespace resolution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant