[Backend][Relax] Add NPU BYOC backend example#19425
Open
Aristide021 wants to merge 3 commits intoapache:mainfrom
Open
[Backend][Relax] Add NPU BYOC backend example#19425Aristide021 wants to merge 3 commits intoapache:mainfrom
Aristide021 wants to merge 3 commits intoapache:mainfrom
Conversation
Adds a vendor-neutral example NPU backend demonstrating the BYOC (Bring Your Own Codegen) pattern for custom accelerator integration in TVM's Relax framework. Components added: - python/tvm/relax/backend/contrib/example_npu/: pattern registry with op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch norm, softmax, activations, elementwise ops, quantization, and a fused conv2d+relu pattern - src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer registered as relax.ext.example_npu - src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime demonstrating NPU architectural concepts (memory hierarchy, tiling, execution engines, quantization) via CPU emulation - cmake/modules/contrib/ExampleNPU.cmake: build integration via USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags - docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through the full BYOC flow from pattern registration to runtime execution - tests/python/contrib/test_example_npu.py: test suite covering pattern registration, graph partitioning, codegen, and end-to-end execution CI is enabled via tests/scripts/task_config_build_cpu.sh. Addresses reviewer feedback from apache#18247: cmake integration, self- contained README with build instructions, tutorial in docs/how_to, and Context section reorganization.
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces an example NPU backend for TVM's Relax framework, providing a JSON-based codegen, a C++ runtime that demonstrates architectural concepts like memory hierarchy and tiling, and Relax pattern registrations for offloading operations. The reviewer identified a potential division-by-zero bug in the quantization parameter calculation, suggested replacing placeholder logic in depthwise convolution checks with actual attribute verification, and recommended optimizing global function lookups in the compiler by moving them outside of loops.
…mple Fix three issues identified in automated code review of apache#19425: - Fix division-by-zero in CalculateQuantizationParams when all tensor values are identical (zero range); clamp scale floor to 1e-7f, guard against empty input, and use std::round for zero_point accuracy - Implement actual groups attribute check in _check_depthwise instead of relying solely on placeholder constraints; demonstrates how to access op attributes from PatternCheckContext - Move GetGlobalRequired lookup outside the compiler loop in codegen.cc so the registry hash-map is queried once rather than per-function
Fully qualify make_object as tvm::ffi::make_object to fix GCC build failure on CI. Clang accepted the unqualified form as a C++20 extension but GCC requires explicit namespace resolution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #18247. Per maintainer guidance, resubmitting as a fresh PR due to CI workflow changes affecting old PRs.
Summary
This PR adds an example NPU BYOC backend for Relax, including end-to-end integration points:
python/tvm/relax/backend/contrib/example_npu/patterns.py)python/tvm/relax/backend/contrib/example_npu/__init__.py)src/relax/backend/contrib/example_npu/codegen.cc)src/runtime/contrib/example_npu/example_npu_runtime.cc)cmake/modules/contrib/ExampleNPU.cmake,CMakeLists.txt,cmake/modules/LibInfo.cmake,src/support/libinfo.cc)docs/how_to/tutorials/byoc_npu_example.py, README under contrib path)tests/python/contrib/test_example_npu.py)tests/scripts/task_config_build_cpu.sh)Review feedback addressed from #18247
tests/python/contrib/USE_EXAMPLE_NPU_CODEGENandUSE_EXAMPLE_NPU_RUNTIMEMatmulReLU(no import from test module)docs/how_to/tutorials/(not only README)example_npu.softmaxwith testsValidation
Local checks run:
pre-commiton touched files (pass)PYTHONPATH=python python -m pytest -q tests/python/contrib/test_example_npu.py(pass)Notes
This backend is an example/tutorial implementation (CPU-emulated) intended to document modern NPU-oriented BYOC integration patterns and provide a reference path for future hardware-specific backends.