Skip to content

Beta Support for XDNA2 Platform#179

Open
Victor-Jung wants to merge 27 commits intopulp-platform:develfrom
Victor-Jung:pr/xdna-support
Open

Beta Support for XDNA2 Platform#179
Victor-Jung wants to merge 27 commits intopulp-platform:develfrom
Victor-Jung:pr/xdna-support

Conversation

@Victor-Jung
Copy link
Copy Markdown
Member

@Victor-Jung Victor-Jung commented Mar 24, 2026

This PR adds beta support for the XDNA2 platform. This is also the first attempt at adding a new MLIR backend for Deeploy.

Currently, it passes a single Add node ONNX graph; on this platform, there is no such "untitled" execution possible, as the AIE cores can't read outside their local L1. Hence, the XDNA platform is inherently tiled.

Currently, the CI uses self-hosted runners to run directly on the XDNA2 NPU from a Minisforum X1 Pro.

Added

  • A Dockerfile specifically for the XDNA platform. It only requires the 'mlir-aie' and 'llvm-aie' packages installable with pip. You have to run this container with specific arguments to forward the NPU device and driver libraries; the procedure is described in README_XDNA.md.
  • A new target library for XDNA2. It uses the LLVM compile for AIE (named peano) to compile the add kernel.

Changed

Fixed

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@Victor-Jung Victor-Jung added this to the Release 0.2.2 milestone Mar 24, 2026
@Victor-Jung Victor-Jung self-assigned this Mar 24, 2026
@Victor-Jung Victor-Jung added the Feature Addition of new features label Mar 24, 2026
@Victor-Jung Victor-Jung mentioned this pull request Apr 7, 2026
5 tasks
@Victor-Jung Victor-Jung changed the title DRAFT: Beta Support for XDNA2 Platform Beta Support for XDNA2 Platform Apr 9, 2026
@Victor-Jung Victor-Jung marked this pull request as ready for review April 9, 2026 11:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added XDNA2 NPU target with MLIR-based deployer, tiling/engine/platform support, BF16 Add kernel and runtime codegen, toolchain integration, and a local Docker image for XDNA2 development.
  • Tests / CI

    • Integrated XDNA2 into test tooling and generation scripts, added host testbench and pytest marker, and added a CI workflow for NPU-enabled runs.
  • Documentation

    • Added XDNA2 setup and usage guide (README_XDNA.md).

Walkthrough

Adds complete XDNA2 (AIE2p) platform support: MLIR data types and passes, node templates/bindings, XDNA2 deployer and platform integration, BF16 Add AIE kernel and CMake wiring, Docker and pinned Python deps, test generation and host test harness, pytest integration, and CI reusable workflow for NPU runner.

Changes

XDNA2 Platform & MLIR Codegen

Layer / File(s) Summary
Data Shape / Types
Deeploy/MLIRDataTypes.py
Introduce MLIRExecutionBlock, MLIRCodeTransformationPass, MLIRCodeTransformation, MLIRNodeTemplate.
Device Passes / Core Impl
Deeploy/Targets/XDNA2/CodeTransformationPasses/...
Add MLIRObjectFifoPass, MLIRComputeCorePass, MLIRRuntimeSequencePass and package __init__.py implementing FIFO/kernel declarations, compute-core region (acquire/release), and DMA runtime sequencing.
Node Templates & Bindings
Deeploy/Targets/XDNA2/Templates/AddTemplate.py, Deeploy/Targets/XDNA2/Bindings.py, Deeploy/Targets/XDNA2/Tiler.py
Add BF16 Add template (emits func_d.call), XDNA2 bindings wiring checker→template→transformation, and tiling-ready bindings.
Platform / Deployer / Type System
Deeploy/Targets/XDNA2/Platform.py, Deeploy/Targets/XDNA2/Deployer.py, Deeploy/Targets/XDNA2/TypeCheckers.py, Deeploy/Targets/XDNA2/Parsers.py
Add XDNA2Platform, memory-aware wrappers, XDNA2Deployer.generateMLIR() (device + runtime_sequence phases), and XDNA2FloatAddChecker.
Kernel Sources & CMake Wiring
TargetLibraries/XDNA2/..., CMakeLists.txt, DeeployTest/Platforms/XDNA2/CMakeLists.txt
Add AIE kernel kernels/add.cc, per-kernel cross-compile custom commands, xdna2_kernels target, top-level CMake branch for XDNA2 and testbench build steps producing xclbin and instruction binary.
Container / Build Environment
Container/Dockerfile.deeploy-xdna, requirements-xdna.txt, README_XDNA.md
Add Dockerfile for Ubuntu24.04 with XRT/MLIR-AIE/LLVM-AIE deps, pinned python packages, and README documenting local/CI usage and self-hosted runner setup.
Test Generation & Host Harness
DeeployTest/generateNetwork_xdna2.py, DeeployTest/Platforms/XDNA2/main.cpp
Generator emits BF16 test inputs/outputs and network.mlir; host XRT test executable loads xclbin/instructions, runs kernel, and validates BF16 outputs.
Test Integration & Utilities
DeeployTest/test_platforms.py, DeeployTest/test_xdna2_config.py, DeeployTest/conftest.py, DeeployTest/deeployRunner_xdna2.py
Register pytest xdna2 marker, add kernel test entries, XDNA2 runner wrapper and test config.
Test Runner / CLI Mapping
DeeployTest/testUtils/*.py, DeeployTest/testUtils/core/execution.py
Add --l3 tiling arg, tiling_arguments flag for parsers, map xdna2XDNA2, lazy XDNA2 deployer wiring, choose generateNetwork_xdna2.py for platform, and propagate -v to host binary when verbose.
CI Workflows
.github/workflows/_runner-xdna2.yml, .github/workflows/ci-platform-xdna2.yml
Add reusable workflow for running tests on xdna2-npu runner (Docker run with device passthrough) and CI workflow delegating to it.
Repo Hygiene
.gitignore
Ignore CLAUDE.md and Container/xrt-debs/.

Sequence Diagram(s)

sequenceDiagram
    participant Deployer as XDNA2Deployer.generateMLIR
    participant DevicePass as MLIRObjectFifoPass/MLIRComputeCorePass
    participant RuntimePass as MLIRRuntimeSequencePass
    participant Kernel as AIE Kernel (eltwise_add)
    participant Host as XRT Host (main.cpp)

    Deployer->>DevicePass: create device blocks (FIFOs, kernel decl, compute core)
    DevicePass-->>Deployer: embed device regions in MLIR
    Deployer->>RuntimePass: create runtime_sequence (DMA tasks, seq args)
    RuntimePass-->>Deployer: embed runtime sections in MLIR
    Host->>Kernel: load xclbin & instructions, start kernel with buffers
    Kernel-->>Host: finish and signal
    Host->>Deployer: validate outputs against generated golden data
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • runwangdl
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Beta Support for XDNA2 Platform' directly and clearly summarizes the main change—adding support for the XDNA2 platform. It is concise, specific, and accurately reflects the primary objective of the changeset.
Description check ✅ Passed The description is comprehensive and directly related to the changeset. It explains what XDNA2 is, why tiling is necessary, what was added (Dockerfile, target library), and CI setup details. It also includes a merge checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (16)
DeeployTest/testUtils/testRunner.py (1)

95-114: Validate tiling CLI inputs at parse-time.

--l1/--l3 currently accept non-positive values, and --defaultMemLevel accepts arbitrary strings. Adding argument-level validation avoids late failures in generation passes.

Suggested refactor
+def _positive_int(value: str) -> int:
+    ivalue = int(value)
+    if ivalue <= 0:
+        raise argparse.ArgumentTypeError("Value must be a positive integer")
+    return ivalue
+
 class TestGeneratorArgumentParser(argparse.ArgumentParser):
@@
             self.add_argument('--l1',
                               metavar = '<size>',
                               dest = 'l1',
-                              type = int,
+                              type = _positive_int,
                               default = None,
                               help = 'Set L1 memory size in bytes (enables tiling if specified).\n')
             self.add_argument('--l3',
                               metavar = '<size>',
                               dest = 'l3',
-                              type = int,
+                              type = _positive_int,
                               default = None,
                               help = 'Set L3 memory size in bytes.\n')
             self.add_argument('--defaultMemLevel',
                               metavar = '<level>',
                               dest = 'defaultMemLevel',
                               type = str,
+                              choices = ["L1", "L2", "L3"],
                               default = "L3",
                               help = 'Set default memory level (default: L3)\n')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/testUtils/testRunner.py` around lines 95 - 114, The CLI accepts
invalid tiling inputs; validate them at argument-parse time by replacing the raw
int/str types with validation helpers: for '--l1' and '--l3' use a
positive-integer validator (e.g., positive_int) that raises
argparse.ArgumentTypeError for <=0 values, and for '--defaultMemLevel' restrict
values via choices (e.g., ['L1','L3']) or a validator that normalizes and
rejects unknown values; update the self.add_argument calls for l1, l3, and
defaultMemLevel in testRunner.py accordingly so parsing fails fast with clear
messages.
DeeployTest/test_platforms.py (1)

128-150: Consider updating the markers summary to include xdna2.

The markers summary comment documents available platform markers but doesn't include the newly added xdna2 marker.

Suggested addition
 #   snitch: tests from the Snitch platform (untiled)
 #   snitch_tiled: tests from the Snitch platform (tiled)
 #   siracusa: tests from the Siracusa platform (untiled)
 #   siracusa_tiled: tests from the Siracusa platform (tiled)
 #   siracusa_neureka_tiled: tests from the Siracusa + Neureka platform (tiled)
 #   gap9: tests from the GAP9 platform (untiled)
 #   gap9_tiled: tests from the GAP9 platform (tiled)
+#   xdna2: tests from the XDNA2 platform
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/test_platforms.py` around lines 128 - 150, Update the markers
summary comment in the Platform markers section of DeeployTest/test_platforms.py
to include the new platform marker `xdna2`; specifically add `xdna2: tests from
the XDNA2 platform` (or a similarly descriptive phrase) alongside the existing
platform marker lines so the comment accurately documents the new `xdna2`
marker.
CMakeLists.txt (1)

23-24: Add "XDNA2" to the platform property strings list.

The XDNA2 platform is handled in the build logic (lines 49-50 and 304-317), but it's not included in the set_property(CACHE platform PROPERTY STRINGS ...) list. This affects CMake GUI/ccmake users who rely on the dropdown for valid platform options.

Proposed fix
 set(platform MemPool CACHE STRING "Platform (MemPool, SoftHier, QEMU, Siracusa, Siracusa_w_neureka, PULP-Open, GAP9, Generic, Snitch)")
-set_property(CACHE platform PROPERTY STRINGS MemPool SoftHier QEMU Siracusa Siracusa_w_neureka PULP-Open GAP9 Generic Snitch)
+set_property(CACHE platform PROPERTY STRINGS MemPool SoftHier QEMU Siracusa Siracusa_w_neureka PULP-Open GAP9 Generic Snitch XDNA2)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMakeLists.txt` around lines 23 - 24, The CMake cache property list for the
platform option is missing "XDNA2", causing GUI/ccmake users to not see it as a
valid choice; update the set_property(CACHE platform PROPERTY STRINGS ...) call
to include "XDNA2" (and keep it consistent with the initial set(platform ...)
default if you want it selectable there too) so the platform dropdown contains
XDNA2 alongside MemPool, SoftHier, QEMU, Siracusa, Siracusa_w_neureka,
PULP-Open, GAP9, Generic, and Snitch.
Deeploy/Targets/XDNA2/CodeTransformationPasses/__init__.py (1)

5-7: Consider explicit imports for better IDE support and static analysis.

While wildcard imports in __init__.py are a common re-export pattern, explicit imports with __all__ provide better discoverability and make the public API explicit. This is a minor maintainability suggestion.

Optional: Explicit imports alternative
-from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRComputeCorePass import *
-from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRObjectFifoPass import *
-from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRRuntimeSequencePass import *
+from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRComputeCorePass import MLIRComputeCorePass
+from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRObjectFifoPass import MLIRObjectFifoPass
+from Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRRuntimeSequencePass import MLIRRuntimeSequencePass
+
+__all__ = ['MLIRComputeCorePass', 'MLIRObjectFifoPass', 'MLIRRuntimeSequencePass']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/__init__.py` around lines 5 -
7, Replace the three wildcard imports in __init__.py with explicit imports of
the public symbols exported by MLIRComputeCorePass, MLIRObjectFifoPass, and
MLIRRuntimeSequencePass (e.g., import the specific classes/functions you intend
to re-export) and then define an explicit __all__ list containing those symbol
names; update the import lines referencing
Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRComputeCorePass,
MLIRObjectFifoPass, and MLIRRuntimeSequencePass to import only the public
symbols and set __all__ = [...names...] so IDEs and static analyzers can
discover the package API.
DeeployTest/testUtils/platformMapping.py (1)

286-287: Chain the exception for better debugging.

When re-raising an exception, use from to preserve the original traceback. This helps with debugging when the import fails for unexpected reasons.

🔧 Proposed fix
-        except ImportError:
-            raise RuntimeError(f"Deployer for platform {platform} is not implemented")
+        except ImportError as err:
+            raise RuntimeError(f"Deployer for platform {platform} is not implemented") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/testUtils/platformMapping.py` around lines 286 - 287, The except
ImportError handler in the function that loads deployers currently raises
RuntimeError(f"Deployer for platform {platform} is not implemented") without
chaining the original ImportError; modify the except block to capture the
ImportError (e.g., except ImportError as e:) and re-raise the RuntimeError using
exception chaining (raise RuntimeError(f"Deployer for platform {platform} is not
implemented") from e) so the original traceback is preserved for debugging.
Container/Dockerfile.deeploy-xdna (2)

14-35: Consider adding --no-install-recommends to reduce image size.

Adding --no-install-recommends can significantly reduce the Docker image size by avoiding installation of suggested but non-essential packages.

📦 Proposed fix
 RUN apt-get update && apt-get install -y \
+    --no-install-recommends \
     software-properties-common \
     && add-apt-repository -y ppa:amd-team/xrt \
-    && apt-get update && apt-get install -y \
+    && apt-get update && apt-get install -y --no-install-recommends \
     cmake \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Container/Dockerfile.deeploy-xdna` around lines 14 - 35, Update the RUN layer
that invokes apt-get to add --no-install-recommends to both apt-get install -y
invocations in the multi-step RUN command (the line starting with "RUN apt-get
update && apt-get install -y \" and the subsequent "&& apt-get update && apt-get
install -y \" sequence) so that unnecessary recommended packages are not
installed and image size is reduced; keep the existing rm -rf
/var/lib/apt/lists/* cleanup as-is.

12-12: Clarify the purpose of LLVM_INSTALL_DIR="nope".

This environment variable with a placeholder value may cause confusion. Consider adding a comment explaining why this is set or whether it's a workaround for a specific issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Container/Dockerfile.deeploy-xdna` at line 12, The ENV line setting
LLVM_INSTALL_DIR="nope" is ambiguous; update the Dockerfile to either set a
meaningful path or add an inline comment explaining why "nope" is used (e.g., as
a deliberate sentinel to disable LLVM lookup or to work around a specific build
issue). Locate the ENV LLVM_INSTALL_DIR declaration and either replace "nope"
with the correct install path or append a short comment describing the reason
and any related workarounds or downstream expectations so future maintainers
aren’t confused.
README_XDNA.md (2)

21-21: Consider clarifying the temporary IRON dependency.

This line references a specific developer's path (/scratch/jungvi/IRON) and an internal workaround. Consider either removing this line for public documentation or adding context about when this workaround is needed vs. the standard flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README_XDNA.md` at line 21, Update the README_XDNA.md sentence that embeds a
personal developer path by either removing the specific `/scratch/jungvi/IRON`
example or replacing it with a generic, documented workaround: describe that
mounting an IRON repo is a temporary step only required until Deeploy
midend/backend support MLIR, show a generic volume example like `-v
/path/to/IRON:/opt/IRON`, and explain the corresponding env var
`IRON_OPERATORS_DIR=/opt/IRON/iron/operators` and when users should use it
versus the standard flow; ensure the text clearly labels it as a temporary,
developer-only workaround.

6-8: Add language specifiers to fenced code blocks.

The code blocks are missing language specifiers, which affects syntax highlighting and accessibility. Add bash or shell to the command blocks.

📝 Proposed fix
-```
+```bash
 docker build -f Container/Dockerfile.deeploy-xdna -t deeploy-xdna:local .
 ```

Apply similar changes to other code blocks at lines 11, 24, 38, and 44.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README_XDNA.md` around lines 6 - 8, Add language specifiers (e.g., bash or
shell) to the fenced code blocks that contain shell commands so they get proper
syntax highlighting; for example, change the block around the docker build
command (`docker build -f Container/Dockerfile.deeploy-xdna -t
deeploy-xdna:local .`) to start with ```bash and similarly update the other
command/code blocks referenced (lines with commands at the other snippets) to
use ```bash or ```shell so all shell snippets in README_XDNA.md are consistently
labeled.
DeeployTest/generateNetwork_xdna2.py (2)

191-196: Hardcoded Add operation limits reusability.

The golden output computation hardcodes elementwise addition, which will break for other operations. The TODO comment acknowledges this should be moved to ONNX4Deeploy.

Consider either:

  1. Running ONNX inference with BF16-quantized inputs to compute operation-agnostic golden outputs
  2. Extracting the operation type from the graph and dispatching to the appropriate NumPy function

Would you like me to help design a more generic golden output computation approach?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/generateNetwork_xdna2.py` around lines 191 - 196, The
golden-output computation currently hardcodes elementwise addition using
bf16_inputs -> bf16_input_f32 -> golden_f32 and will fail for non-add ops;
replace this with an operation-agnostic approach: either run the ONNX graph
(using the BF16-quantized inputs bf16_inputs) to produce test_outputs_bf16, or
inspect the model graph to extract the root operator(s) and dispatch to the
corresponding NumPy routine (e.g., use op_type from nodes and call the
appropriate handler) so that test_outputs_bf16 is produced generically instead
of summing elements.

129-137: Fix unused loop variable and add strict=True for safety.

The name variable is unused, and zip() without strict= can silently produce incorrect results if the iterables have different lengths.

🔧 Proposed fix
-    for index, (name, values) in enumerate(zip(inputs_npz.files, test_inputs_f32)):
+    for index, (_name, values) in enumerate(zip(inputs_npz.files, test_inputs_f32, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/generateNetwork_xdna2.py` around lines 129 - 137, In the loop
that currently reads "for index, (name, values) in
enumerate(zip(inputs_npz.files, test_inputs_f32))", mark the unused filename
variable as _ and make zip strict to avoid length-mismatch bugs: change the
header to "for index, (_, values) in enumerate(zip(inputs_npz.files,
test_inputs_f32, strict=True))" so you still populate inputTypes/inputOffsets
with PointerClass(bfloat16_t) as before; no other logic changes needed.
Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py (1)

47-47: Use next(iter(...)) instead of list(...)[0].

Per static analysis suggestion, this is more efficient as it avoids creating an intermediate list.

♻️ Proposed fix
-                firstOutputName = list(outputConstraints.keys())[0]
+                firstOutputName = next(iter(outputConstraints.keys()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py` at line
47, Replace the inefficient list(...) indexing when obtaining the first key from
outputConstraints: locate the assignment to firstOutputName in
MLIRObjectFifoPass.py and use next(iter(outputConstraints)) (or
next(iter(outputConstraints.keys()))) instead of
list(outputConstraints.keys())[0] so you avoid creating an intermediate list;
update the code where firstOutputName is set accordingly.
Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRComputeCorePass.py (1)

68-88: Assumes uniform tile type for all input/output tensors.

Line 70 uses firstKey's type as tileTy, and line 74 creates a single subviewTy used for all FIFO acquisitions. This works for the current Add operation where all tensors have identical shapes and types, but will need generalization for operators with mixed tensor types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRComputeCorePass.py` around
lines 68 - 88, The code assumes a single representative tile type (tileTy
derived from firstKey) and reuses one subviewTy for all FIFO acquisitions, which
breaks when tensors have different types; update the loop so you look up the
per-key memref type via mlirBlock.fifoTypes[key] and create a matching subview
type for each acquisition (i.e., compute subviewTyPerKey from
mlirBlock.fifoTypes[key] before calling
aie_d.objectfifo_acquire/objectfifo_subview_access), handling inputs
(self.inputTensorKeys) and outputs (self.outputTensorKeys) independently rather
than reusing tileTy/subviewTy for every key.
Deeploy/Targets/XDNA2/Platform.py (1)

126-138: Duplicated getTargetMemoryLevel logic.

Both MemoryXDNA2Platform and MemoryXDNA2PlatformWrapper implement identical logic for checking _engine_assignment. Consider extracting to a shared helper function to maintain consistency.

♻️ Proposed refactor
def _get_xdna2_target_memory_level(node: gs.Node, default_level_name: str) -> str:
    """Shared logic for XDNA2 memory level selection."""
    if hasattr(node, '_engine_assignment'):
        engine = node._engine_assignment
        if isinstance(engine, XDNA2AIECoreEngine) and hasattr(engine, 'preferredMemoryLevel'):
            return engine.preferredMemoryLevel
    return default_level_name

Then both classes can call this helper in their getTargetMemoryLevel methods.

Also applies to: 150-157

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/Platform.py` around lines 126 - 138, Extract the
duplicated engine-assignment logic from both
MemoryXDNA2Platform.getTargetMemoryLevel and
MemoryXDNA2PlatformWrapper.getTargetMemoryLevel into a shared helper (e.g.,
_get_xdna2_target_memory_level) that accepts the node and the default level
name; have each getTargetMemoryLevel call this helper instead of repeating the
hasattr/instanceof checks against XDNA2AIECoreEngine and preferredMemoryLevel,
returning engine.preferredMemoryLevel when present or
defaultTargetMemoryLevel.name otherwise.
Deeploy/Targets/XDNA2/Deployer.py (1)

159-171: Tensor type derived from first node only.

Line 163 derives tensorTy from firstEb.numElements and hardcodes BF16Type. This assumes all runtime sequence arguments have the same element count and type. For the current Add-only scope this is fine, but consider documenting this assumption or adding validation when extending to multi-node graphs with varying tensor sizes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/Deployer.py` around lines 159 - 171, The code derives
tensorTy from the first block (mlirBlocks[0]) using firstEb.numElements and a
hardcoded ir.BF16Type, which assumes every runtime-sequence arg shares the same
element count and type; update the runtime-sequence setup in Deployer.py to
validate that all blocks in mlirBlocks have matching numElements (and, if
applicable, matching element types) before creating tensorTy (or document this
assumption clearly), and emit a clear error or fallback if a mismatch is
detected so aiex_d.runtime_sequence(_seq) is not created with inconsistent
tensor shapes.
DeeployTest/Platforms/XDNA2/CMakeLists.txt (1)

29-69: LGTM! Toolchain resolution is well-structured.

The fallback chain (cache → environment → Python query) for LLVM_AIE_INSTALL_DIR and MLIR_AIE_INSTALL_DIR provides good flexibility.

One minor inconsistency: XRT_INSTALL_DIR isn't declared as a cache variable like the others. Consider adding CACHE PATH for consistency:

-if(NOT XRT_INSTALL_DIR)
+set(XRT_INSTALL_DIR "" CACHE PATH "XRT install dir")
+if(NOT XRT_INSTALL_DIR)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/Platforms/XDNA2/CMakeLists.txt` around lines 29 - 69, The
XRT_INSTALL_DIR variable should be declared as a cache variable like
LLVM_AIE_INSTALL_DIR and MLIR_AIE_INSTALL_DIR for consistent configuration and
overrides; change the declaration so XRT_INSTALL_DIR is initialized with
set(XRT_INSTALL_DIR "$ENV{XRT_INSTALL_DIR}" CACHE PATH "XRT install dir") (or
similar) and preserve the existing fallback logic in the if(NOT XRT_INSTALL_DIR)
block that checks ENV{XILINX_XRT} and defaults to "/opt/xilinx/xrt", ensuring
the cache variable is used/checked instead of an undeclared plain variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Deeploy/MLIRDataTypes.py`:
- Around line 187-201: The generate method uses a mutable default
operatorRepresentation = {} which can lead to shared-state bugs; change the
signature of generate (the method named generate in MLIRDataTypes.py) to accept
operatorRepresentation = None (or Optional[Dict]) and inside the method do a
guarded assignment like "if operatorRepresentation is None:
operatorRepresentation = {}" before using it, ensuring you preserve the existing
behavior and return value while avoiding the mutable default argument.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py`:
- Around line 85-89: The current sequence awaits outputTasks but never frees
them and frees inputTasks before ensuring completion; change the order so both
inputTasks and outputTasks are awaited before calling aiex_d.dma_free_task on
them. Specifically, use aiex_d.dma_await_task(...) for inputTasks as well (or
otherwise ensure their completion), then call aiex_d.dma_free_task(...) on each
input task, and after awaiting each output task call aiex_d.dma_free_task(...)
on each output task; update the block handling outputTasks/inputTasks (and
consider the origin of issue_token) so freeing only happens after dma_await_task
confirms completion.

In `@DeeployTest/Platforms/XDNA2/main.cpp`:
- Around line 79-112: Wrap the main logic of main (the code that calls
read_instr_binary(), xrt::device, xrt::xclbin, device.register_xclbin,
xrt::hw_context and xrt::kernel) in a try-catch block that catches
std::exception (and optionally a catch-all) and prints a clear error message
including ex.what() to stderr (or via your logging mechanism) and returns a
non-zero exit code; ensure any thrown exceptions from read_instr_binary() or XRT
constructors are handled so the program exits gracefully with an informative
message.
- Around line 65-77: In read_instr_binary, validate the file size and read
result: use std::streampos/istream::pos_type for tellg, check for error (e.g.,
tellg()==-1) and throw if so, verify byte_size is a multiple of sizeof(uint32_t)
and throw a descriptive std::runtime_error if not aligned, and after file.read
confirm gcount/streams state to ensure the full byte_size was read and throw on
short reads or IO errors; update uses of byte_size to a safely cast size_t only
after validation.

In `@DeeployTest/testUtils/core/execution.py`:
- Around line 171-173: The current code appends "-v" whenever config.verbose >=
1 which breaks the Generic/host binary that doesn't accept flags; change the
condition to only add "-v" when the run is for the XDNA2 binary (e.g. check the
simulator/platform identifier used in this module — e.g. simulator == "XDNA2" or
platform == "XDNA2") so replace the existing guard with something like: only
append "-v" when config.verbose >= 1 AND the simulator/platform equals the XDNA2
identifier (leave the cmd and config.verbose symbols intact and reference
XDNA2/main.cpp as the binary that accepts -v).

In `@requirements-xdna.txt`:
- Around line 6-10: requirements-xdna.txt currently lists llvm-aie without a
version, causing nondeterministic installs from the nightly index; update the
dependency by replacing the unpinned "llvm-aie" entry with a specific pinned
version compatible with mlir_aie (for example
"llvm-aie==18.0.0.2024050100+59d494d3" or a released tag you verified), so the
file contains the exact version string for llvm-aie alongside mlir_aie==v1.2.1
to ensure reproducible, compatible builds.

In `@TargetLibraries/XDNA2/kernels/add.cc`:
- Around line 20-40: The vector loop in eltwise_vadd uses F = size / vec_factor
and drops trailing elements when size isn't a multiple of vec_factor; add a
scalar tail loop after the existing vector loop that processes indices from F *
vec_factor up to size, performing element-wise addition using the same types
(T_in/T_out) and either the pointer cursors (pA1/pB1/pC1) or direct indexing
(a[i], b[i], c[i]) so all remaining elements are handled.

---

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 23-24: The CMake cache property list for the platform option is
missing "XDNA2", causing GUI/ccmake users to not see it as a valid choice;
update the set_property(CACHE platform PROPERTY STRINGS ...) call to include
"XDNA2" (and keep it consistent with the initial set(platform ...) default if
you want it selectable there too) so the platform dropdown contains XDNA2
alongside MemPool, SoftHier, QEMU, Siracusa, Siracusa_w_neureka, PULP-Open,
GAP9, Generic, and Snitch.

In `@Container/Dockerfile.deeploy-xdna`:
- Around line 14-35: Update the RUN layer that invokes apt-get to add
--no-install-recommends to both apt-get install -y invocations in the multi-step
RUN command (the line starting with "RUN apt-get update && apt-get install -y \"
and the subsequent "&& apt-get update && apt-get install -y \" sequence) so that
unnecessary recommended packages are not installed and image size is reduced;
keep the existing rm -rf /var/lib/apt/lists/* cleanup as-is.
- Line 12: The ENV line setting LLVM_INSTALL_DIR="nope" is ambiguous; update the
Dockerfile to either set a meaningful path or add an inline comment explaining
why "nope" is used (e.g., as a deliberate sentinel to disable LLVM lookup or to
work around a specific build issue). Locate the ENV LLVM_INSTALL_DIR declaration
and either replace "nope" with the correct install path or append a short
comment describing the reason and any related workarounds or downstream
expectations so future maintainers aren’t confused.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/__init__.py`:
- Around line 5-7: Replace the three wildcard imports in __init__.py with
explicit imports of the public symbols exported by MLIRComputeCorePass,
MLIRObjectFifoPass, and MLIRRuntimeSequencePass (e.g., import the specific
classes/functions you intend to re-export) and then define an explicit __all__
list containing those symbol names; update the import lines referencing
Deeploy.Targets.XDNA2.CodeTransformationPasses.MLIRComputeCorePass,
MLIRObjectFifoPass, and MLIRRuntimeSequencePass to import only the public
symbols and set __all__ = [...names...] so IDEs and static analyzers can
discover the package API.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRComputeCorePass.py`:
- Around line 68-88: The code assumes a single representative tile type (tileTy
derived from firstKey) and reuses one subviewTy for all FIFO acquisitions, which
breaks when tensors have different types; update the loop so you look up the
per-key memref type via mlirBlock.fifoTypes[key] and create a matching subview
type for each acquisition (i.e., compute subviewTyPerKey from
mlirBlock.fifoTypes[key] before calling
aie_d.objectfifo_acquire/objectfifo_subview_access), handling inputs
(self.inputTensorKeys) and outputs (self.outputTensorKeys) independently rather
than reusing tileTy/subviewTy for every key.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py`:
- Line 47: Replace the inefficient list(...) indexing when obtaining the first
key from outputConstraints: locate the assignment to firstOutputName in
MLIRObjectFifoPass.py and use next(iter(outputConstraints)) (or
next(iter(outputConstraints.keys()))) instead of
list(outputConstraints.keys())[0] so you avoid creating an intermediate list;
update the code where firstOutputName is set accordingly.

In `@Deeploy/Targets/XDNA2/Deployer.py`:
- Around line 159-171: The code derives tensorTy from the first block
(mlirBlocks[0]) using firstEb.numElements and a hardcoded ir.BF16Type, which
assumes every runtime-sequence arg shares the same element count and type;
update the runtime-sequence setup in Deployer.py to validate that all blocks in
mlirBlocks have matching numElements (and, if applicable, matching element
types) before creating tensorTy (or document this assumption clearly), and emit
a clear error or fallback if a mismatch is detected so
aiex_d.runtime_sequence(_seq) is not created with inconsistent tensor shapes.

In `@Deeploy/Targets/XDNA2/Platform.py`:
- Around line 126-138: Extract the duplicated engine-assignment logic from both
MemoryXDNA2Platform.getTargetMemoryLevel and
MemoryXDNA2PlatformWrapper.getTargetMemoryLevel into a shared helper (e.g.,
_get_xdna2_target_memory_level) that accepts the node and the default level
name; have each getTargetMemoryLevel call this helper instead of repeating the
hasattr/instanceof checks against XDNA2AIECoreEngine and preferredMemoryLevel,
returning engine.preferredMemoryLevel when present or
defaultTargetMemoryLevel.name otherwise.

In `@DeeployTest/generateNetwork_xdna2.py`:
- Around line 191-196: The golden-output computation currently hardcodes
elementwise addition using bf16_inputs -> bf16_input_f32 -> golden_f32 and will
fail for non-add ops; replace this with an operation-agnostic approach: either
run the ONNX graph (using the BF16-quantized inputs bf16_inputs) to produce
test_outputs_bf16, or inspect the model graph to extract the root operator(s)
and dispatch to the corresponding NumPy routine (e.g., use op_type from nodes
and call the appropriate handler) so that test_outputs_bf16 is produced
generically instead of summing elements.
- Around line 129-137: In the loop that currently reads "for index, (name,
values) in enumerate(zip(inputs_npz.files, test_inputs_f32))", mark the unused
filename variable as _ and make zip strict to avoid length-mismatch bugs: change
the header to "for index, (_, values) in enumerate(zip(inputs_npz.files,
test_inputs_f32, strict=True))" so you still populate inputTypes/inputOffsets
with PointerClass(bfloat16_t) as before; no other logic changes needed.

In `@DeeployTest/Platforms/XDNA2/CMakeLists.txt`:
- Around line 29-69: The XRT_INSTALL_DIR variable should be declared as a cache
variable like LLVM_AIE_INSTALL_DIR and MLIR_AIE_INSTALL_DIR for consistent
configuration and overrides; change the declaration so XRT_INSTALL_DIR is
initialized with set(XRT_INSTALL_DIR "$ENV{XRT_INSTALL_DIR}" CACHE PATH "XRT
install dir") (or similar) and preserve the existing fallback logic in the
if(NOT XRT_INSTALL_DIR) block that checks ENV{XILINX_XRT} and defaults to
"/opt/xilinx/xrt", ensuring the cache variable is used/checked instead of an
undeclared plain variable.

In `@DeeployTest/test_platforms.py`:
- Around line 128-150: Update the markers summary comment in the Platform
markers section of DeeployTest/test_platforms.py to include the new platform
marker `xdna2`; specifically add `xdna2: tests from the XDNA2 platform` (or a
similarly descriptive phrase) alongside the existing platform marker lines so
the comment accurately documents the new `xdna2` marker.

In `@DeeployTest/testUtils/platformMapping.py`:
- Around line 286-287: The except ImportError handler in the function that loads
deployers currently raises RuntimeError(f"Deployer for platform {platform} is
not implemented") without chaining the original ImportError; modify the except
block to capture the ImportError (e.g., except ImportError as e:) and re-raise
the RuntimeError using exception chaining (raise RuntimeError(f"Deployer for
platform {platform} is not implemented") from e) so the original traceback is
preserved for debugging.

In `@DeeployTest/testUtils/testRunner.py`:
- Around line 95-114: The CLI accepts invalid tiling inputs; validate them at
argument-parse time by replacing the raw int/str types with validation helpers:
for '--l1' and '--l3' use a positive-integer validator (e.g., positive_int) that
raises argparse.ArgumentTypeError for <=0 values, and for '--defaultMemLevel'
restrict values via choices (e.g., ['L1','L3']) or a validator that normalizes
and rejects unknown values; update the self.add_argument calls for l1, l3, and
defaultMemLevel in testRunner.py accordingly so parsing fails fast with clear
messages.

In `@README_XDNA.md`:
- Line 21: Update the README_XDNA.md sentence that embeds a personal developer
path by either removing the specific `/scratch/jungvi/IRON` example or replacing
it with a generic, documented workaround: describe that mounting an IRON repo is
a temporary step only required until Deeploy midend/backend support MLIR, show a
generic volume example like `-v /path/to/IRON:/opt/IRON`, and explain the
corresponding env var `IRON_OPERATORS_DIR=/opt/IRON/iron/operators` and when
users should use it versus the standard flow; ensure the text clearly labels it
as a temporary, developer-only workaround.
- Around line 6-8: Add language specifiers (e.g., bash or shell) to the fenced
code blocks that contain shell commands so they get proper syntax highlighting;
for example, change the block around the docker build command (`docker build -f
Container/Dockerfile.deeploy-xdna -t deeploy-xdna:local .`) to start with
```bash and similarly update the other command/code blocks referenced (lines
with commands at the other snippets) to use ```bash or ```shell so all shell
snippets in README_XDNA.md are consistently labeled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e910d040-c1e1-4193-9705-285039a5ece9

📥 Commits

Reviewing files that changed from the base of the PR and between 09993fb and a82fd52.

📒 Files selected for processing (36)
  • .github/workflows/_runner-xdna2.yml
  • .github/workflows/ci-platform-xdna2.yml
  • .gitignore
  • CMakeLists.txt
  • Container/Dockerfile.deeploy-xdna
  • Deeploy/MLIRDataTypes.py
  • Deeploy/Targets/XDNA2/Bindings.py
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRComputeCorePass.py
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/__init__.py
  • Deeploy/Targets/XDNA2/Deployer.py
  • Deeploy/Targets/XDNA2/Parsers.py
  • Deeploy/Targets/XDNA2/Platform.py
  • Deeploy/Targets/XDNA2/Templates/AddTemplate.py
  • Deeploy/Targets/XDNA2/Templates/__init__.py
  • Deeploy/Targets/XDNA2/Tiler.py
  • Deeploy/Targets/XDNA2/TypeCheckers.py
  • DeeployTest/Platforms/XDNA2/CMakeLists.txt
  • DeeployTest/Platforms/XDNA2/main.cpp
  • DeeployTest/Tests/Kernels/BF16/Add/Regular/inputs.npz
  • DeeployTest/Tests/Kernels/BF16/Add/Regular/network.onnx
  • DeeployTest/Tests/Kernels/BF16/Add/Regular/outputs.npz
  • DeeployTest/conftest.py
  • DeeployTest/deeployRunner_xdna2.py
  • DeeployTest/generateNetwork_xdna2.py
  • DeeployTest/testUtils/core/execution.py
  • DeeployTest/testUtils/deeployRunner.py
  • DeeployTest/testUtils/platformMapping.py
  • DeeployTest/testUtils/testRunner.py
  • DeeployTest/test_platforms.py
  • DeeployTest/test_xdna2_config.py
  • README_XDNA.md
  • TargetLibraries/XDNA2/CMakeLists.txt
  • TargetLibraries/XDNA2/kernels/add.cc
  • requirements-xdna.txt

Comment thread Deeploy/MLIRDataTypes.py
Comment on lines +85 to +89
# Await output tasks, then free input tasks
for task in outputTasks:
aiex_d.dma_await_task(task)
for task in inputTasks:
aiex_d.dma_free_task(task)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

mlir-aie dma_await_task dma_free_task semantics

💡 Result:

In mlir-aie’s AIEX dialect:

  • aiex.dma_await_task(%task): Blocks the enclosing aie.runtime_sequence until the referenced, previously-started DMA task has completed. Waiting is only possible if the task was configured to emit a task completion token (TCT)—i.e., the task’s configuration/start must have issue_token = true. [1]

  • aiex.dma_free_task(%task): Does not wait or synchronize. It is a compiler-visible “lifetime end” marker that tells the static buffer-descriptor (BD) allocator pass that any BD IDs allocated for BDs inside %task may be reused after this point. (A future dynamic allocator could lower it to an actual runtime free-like instruction.) Practically, you typically place it after an aiex.dma_await_task (or otherwise after you can prove the DMA is done) if you intend to reuse BD IDs. [1]

Source(s):
[1] Xilinx/AMD MLIR-AIE docs – AIEX dialect ops (aiex.dma_await_task, aiex.dma_free_task). (xilinx.github.io)

Citations:


🏁 Script executed:

cat -n Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py | sed -n '60,95p'

Repository: pulp-platform/Deeploy

Length of output: 1467


Output tasks must also be freed after awaiting, and input tasks should be awaited before freeing.

The code awaits output tasks but doesn't free them—a resource leak. Additionally, input tasks are freed without being awaited first, which violates MLIR-AIE semantics: dma_free_task should only be called after task completion is guaranteed. Since only output tasks have issue_token = true (line 71), they emit completion tokens and can be awaited. Both output and input tasks should be freed after their completion is verified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py`
around lines 85 - 89, The current sequence awaits outputTasks but never frees
them and frees inputTasks before ensuring completion; change the order so both
inputTasks and outputTasks are awaited before calling aiex_d.dma_free_task on
them. Specifically, use aiex_d.dma_await_task(...) for inputTasks as well (or
otherwise ensure their completion), then call aiex_d.dma_free_task(...) on each
input task, and after awaiting each output task call aiex_d.dma_free_task(...)
on each output task; update the block handling outputTasks/inputTasks (and
consider the origin of issue_token) so freeing only happens after dma_await_task
confirms completion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check, I am not familiar with AIE to judge this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look up the exact MLIR-AIE semantics and find canonical usage examples to give a precise answer.

@Xeratec, here's my analysis after reviewing the official MLIR-AIE AIEX dialect documentation:


The original CodeRabbit comment raises two concerns, but they need to be evaluated separately:

Concern 1: Output tasks not freed after awaiting ✅ Valid

The current code awaits output tasks but never calls dma_free_task on them:

for task in outputTasks:
    aiex_d.dma_await_task(task)
# ← missing dma_free_task for outputTasks!
for task in inputTasks:
    aiex_d.dma_free_task(task)

dma_free_task informs the static BD allocator pass that BD IDs allocated to the task can be reused thereafter, and potential future implementations of dynamic BD allocators may lower this to a free instruction. Not freeing output tasks means their BD IDs are never signaled as reclaimable. In the current single-invocation beta this doesn't break anything, but it is incorrect practice.

Concern 2: Input tasks must be awaited before freeing ❌ Incorrect

The original comment claims input tasks should be awaited before calling dma_free_task. This is not accurate for two reasons:

  1. dma_await_task requires the task to issue a task completion token (TCT); tasks only emit these tokens if issue_token is set to true. Input tasks in this code are created without issue_token=True, so they cannot be awaited.
  2. dma_free_task is only a compiler-visible BD allocator hint — it does not perform or require runtime synchronization. Calling it after output tasks have been awaited (guaranteeing the compute core has finished consuming inputs) is semantically correct.

Suggested fix

Only concern 1 needs addressing — add dma_free_task for output tasks after awaiting them:

# Await output tasks and free them, then free input tasks
for task in outputTasks:
    aiex_d.dma_await_task(task)
    aiex_d.dma_free_task(task)   # <-- add this
for task in inputTasks:
    aiex_d.dma_free_task(task)   # correct as-is

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 381b1e3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +65 to +77
static std::vector<uint32_t> read_instr_binary(const std::string &path) {
std::ifstream file(path, std::ios::binary);
if (!file.is_open()) {
throw std::runtime_error("Cannot open instruction file: " + path);
}
file.seekg(0, std::ios::end);
size_t byte_size = file.tellg();
file.seekg(0, std::ios::beg);

std::vector<uint32_t> instr(byte_size / sizeof(uint32_t));
file.read(reinterpret_cast<char *>(instr.data()), byte_size);
return instr;
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation for instruction file size alignment.

The code assumes byte_size is a multiple of sizeof(uint32_t). If the file has trailing bytes, they'll be silently dropped. Also, file.tellg() returns std::streampos which may not directly convert to size_t on all platforms.

🛡️ Proposed fix to add validation
 static std::vector<uint32_t> read_instr_binary(const std::string &path) {
   std::ifstream file(path, std::ios::binary);
   if (!file.is_open()) {
     throw std::runtime_error("Cannot open instruction file: " + path);
   }
   file.seekg(0, std::ios::end);
-  size_t byte_size = file.tellg();
+  std::streampos pos = file.tellg();
+  if (pos < 0) {
+    throw std::runtime_error("Cannot determine file size: " + path);
+  }
+  size_t byte_size = static_cast<size_t>(pos);
+  if (byte_size % sizeof(uint32_t) != 0) {
+    throw std::runtime_error("Instruction file size not aligned to 4 bytes: " + path);
+  }
   file.seekg(0, std::ios::beg);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/Platforms/XDNA2/main.cpp` around lines 65 - 77, In
read_instr_binary, validate the file size and read result: use
std::streampos/istream::pos_type for tellg, check for error (e.g., tellg()==-1)
and throw if so, verify byte_size is a multiple of sizeof(uint32_t) and throw a
descriptive std::runtime_error if not aligned, and after file.read confirm
gcount/streams state to ensure the full byte_size was read and throw on short
reads or IO errors; update uses of byte_size to a safely cast size_t only after
validation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Victor-Jung, please check if this comment is worth addressing, resolve it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xeratec The instruction binary is produced by aiecc.py --aie-generate-npu-insts and it's a steam of 32-bit instructions so naturally 4 bytes aligned. For the file.tellg() stuff I don't plan to add support for other platforms than Linux x86.

Comment on lines +79 to +112
int main(int argc, char **argv) {
// Paths to the compiled artefacts: default to the directory containing
// this binary so the test works regardless of the working directory or
// whether it is run inside a container.
std::string bin_dir;
{
std::string argv0(argv[0]);
auto sep = argv0.rfind('/');
bin_dir = (sep == std::string::npos) ? "." : argv0.substr(0, sep);
}
std::string xclbin_path = bin_dir + "/network.xclbin";
std::string instr_path = bin_dir + "/npu_insts.bin";

bool verbose = false;
for (int i = 1; i < argc; ++i) {
std::string arg = argv[i];
if (arg == "-v" || arg == "--verbose" || arg == "-vv") {
verbose = true;
}
}
if (argc >= 2 && argv[1][0] != '-')
xclbin_path = argv[1];
if (argc >= 3 && argv[2][0] != '-')
instr_path = argv[2];

// -----------------------------------------------------------------------
// 1. Open XRT device, register xclbin, create hw_context
// (matches mlir-aie test_utils::init_xrt_load_kernel pattern)
// -----------------------------------------------------------------------
auto device = xrt::device(0);
auto xclbin = xrt::xclbin(xclbin_path);
device.register_xclbin(xclbin);
xrt::hw_context context(device, xclbin.get_uuid());
auto kernel = xrt::kernel(context, "MLIR_AIE");
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add exception handling for clearer error messages.

read_instr_binary() and XRT constructors can throw. Without a try-catch, exceptions will terminate with an unhelpful message. This was also flagged by static analysis.

🛡️ Proposed fix to wrap main logic in try-catch
 int main(int argc, char **argv) {
+  try {
   // Paths to the compiled artefacts...
   // ... (existing code) ...
+  } catch (const std::exception &e) {
+    std::cerr << "Error: " << e.what() << "\n";
+    return 1;
+  }
 }

Wrap the main body (lines 83-209) in a try-catch block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/Platforms/XDNA2/main.cpp` around lines 79 - 112, Wrap the main
logic of main (the code that calls read_instr_binary(), xrt::device,
xrt::xclbin, device.register_xclbin, xrt::hw_context and xrt::kernel) in a
try-catch block that catches std::exception (and optionally a catch-all) and
prints a clear error message including ex.what() to stderr (or via your logging
mechanism) and returns a non-zero exit code; ensure any thrown exceptions from
read_instr_binary() or XRT constructors are handled so the program exits
gracefully with an informative message.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth implementing the proper try-catch statement. Please address or resolve it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

I need to see the full main() body to generate the correct fix. Let me fetch it.


🏁 Script executed:

#!/bin/bash
fd -t f "main.cpp" --full-path --search-path DeeployTest/Platforms/XDNA2 | xargs cat -n

Length of output: 9958


@Xeratec, here is the proposed fix wrapping the entire main() body in a try-catch block:

int main(int argc, char **argv) {
  try {
    // Paths to the compiled artefacts: default to the directory containing
    // this binary so the test works regardless of the working directory or
    // whether it is run inside a container.
    std::string bin_dir;
    {
      std::string argv0(argv[0]);
      auto sep = argv0.rfind('/');
      bin_dir = (sep == std::string::npos) ? "." : argv0.substr(0, sep);
    }
    std::string xclbin_path = bin_dir + "/network.xclbin";
    std::string instr_path = bin_dir + "/npu_insts.bin";

    bool verbose = false;
    for (int i = 1; i < argc; ++i) {
      std::string arg = argv[i];
      if (arg == "-v" || arg == "--verbose" || arg == "-vv") {
        verbose = true;
      }
    }
    if (argc >= 2 && argv[1][0] != '-')
      xclbin_path = argv[1];
    if (argc >= 3 && argv[2][0] != '-')
      instr_path = argv[2];

    // -----------------------------------------------------------------------
    // 1. Open XRT device, register xclbin, create hw_context
    //    (matches mlir-aie test_utils::init_xrt_load_kernel pattern)
    // -----------------------------------------------------------------------
    auto device = xrt::device(0);
    auto xclbin = xrt::xclbin(xclbin_path);
    device.register_xclbin(xclbin);
    xrt::hw_context context(device, xclbin.get_uuid());
    auto kernel = xrt::kernel(context, "MLIR_AIE");

    // -----------------------------------------------------------------------
    // 2. Read NPU instruction binary
    // -----------------------------------------------------------------------
    std::vector<uint32_t> instr_v = read_instr_binary(instr_path);
    size_t n_instr = instr_v.size();

    // -----------------------------------------------------------------------
    // 3. Derive element counts from the testinputs/testoutputs header defines.
    //    N_ELEMENTS_INPUT0, N_ELEMENTS_INPUT1, N_ELEMENTS_OUTPUT0 are set
    //    by generateNetwork_xdna2.py.
    // -----------------------------------------------------------------------
    // JUNGVI: TODO: Remove this assert and make it scalable for N I/Os graphs
    // (with respect to the amount of bo available)
    static_assert(N_ELEMENTS_INPUT0 == N_ELEMENTS_INPUT1,
                  "Input 0 and input 1 must have the same number of elements");
    static_assert(N_ELEMENTS_INPUT0 == N_ELEMENTS_OUTPUT0,
                  "Inputs and output must have the same number of elements");

    const size_t n_elem = N_ELEMENTS_OUTPUT0;
    const size_t elem_size = sizeof(uint16_t); // BF16 = 2 bytes
    const size_t buf_bytes = n_elem * elem_size;

    // -----------------------------------------------------------------------
    // 4. Allocate XRT buffer objects
    //    Kernel args: (0:opcode, 1:instr_bo, 2:instr_len, 3:in0, 4:in1, 5:out)
    // -----------------------------------------------------------------------
    auto bo_instr = xrt::bo(device, n_instr * sizeof(uint32_t),
                            XCL_BO_FLAGS_CACHEABLE, kernel.group_id(1));
    auto bo_in0 =
        xrt::bo(device, buf_bytes, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(3));
    auto bo_in1 =
        xrt::bo(device, buf_bytes, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(4));
    auto bo_out =
        xrt::bo(device, buf_bytes, XRT_BO_FLAGS_HOST_ONLY, kernel.group_id(5));

    // -----------------------------------------------------------------------
    // 5. Copy data into device buffers
    // -----------------------------------------------------------------------
    std::memcpy(bo_instr.map<uint32_t *>(), instr_v.data(),
                n_instr * sizeof(uint32_t));
    std::memcpy(bo_in0.map<void *>(), testInputVector0, buf_bytes);
    std::memcpy(bo_in1.map<void *>(), testInputVector1, buf_bytes);

    bo_instr.sync(XCL_BO_SYNC_BO_TO_DEVICE);
    bo_in0.sync(XCL_BO_SYNC_BO_TO_DEVICE);
    bo_in1.sync(XCL_BO_SYNC_BO_TO_DEVICE);

    // -----------------------------------------------------------------------
    // 6. Launch kernel and wait for completion
    //    opcode 3 = execute NPU instruction stream
    // -----------------------------------------------------------------------
    // JUNGVI: TODO: Collect runtime and display it
    // JUNGVI: TODO: Enable warmup iterations
    unsigned int opcode = 3;
    auto run = kernel(opcode, bo_instr, static_cast<uint32_t>(n_instr), bo_in0,
                      bo_in1, bo_out);
    run.wait();

    // -----------------------------------------------------------------------
    // 7. Sync output back and compare against golden reference
    // -----------------------------------------------------------------------
    bo_out.sync(XCL_BO_SYNC_BO_FROM_DEVICE);

    const uint16_t *hw_out = bo_out.map<const uint16_t *>();
    const uint16_t *golden_out = testOutputVector0;

    int errors = 0;
    for (size_t i = 0; i < n_elem; ++i) {
      bool match = bf16_nearly_equal(hw_out[i], golden_out[i]);
      if (!match) {
        ++errors;
        if (errors <= 10) {
          std::cerr << "  Mismatch at index " << i
                    << ": hw=" << bf16_to_float(hw_out[i]) << " (0x" << std::hex
                    << hw_out[i] << std::dec << ")"
                    << "  ref=" << bf16_to_float(golden_out[i]) << " (0x"
                    << std::hex << golden_out[i] << std::dec << ")"
                    << "  diff="
                    << std::fabs(bf16_to_float(hw_out[i]) -
                                 bf16_to_float(golden_out[i]))
                    << "\n";
        }
      }
      if (verbose) {
        float hw_f = bf16_to_float(hw_out[i]);
        float ref_f = bf16_to_float(golden_out[i]);
        std::cout << "[" << i << "] hw=" << hw_f << "  ref=" << ref_f
                  << "  diff=" << std::fabs(hw_f - ref_f)
                  << (match ? "" : "  *** MISMATCH") << "\n";
      }
    }

    // Output format required by testUtils/core/output_parser.py
    std::cout << "Errors: " << errors << " out of " << n_elem << "\n";

    return (errors == 0) ? 0 : 1;

  } catch (const std::exception &e) {
    std::cerr << "Error: " << e.what() << "\n";
    return 1;
  }
}

The entire main() body is wrapped in a single try block. This ensures that any exception from read_instr_binary(), the XRT constructors/methods (xrt::device, xrt::xclbin, xrt::hw_context, xrt::kernel, xrt::bo, sync, run.wait()), or standard library allocations will be caught, a human-readable message printed to stderr, and the process will exit with code 1 rather than terminating with an unhelpful std::terminate message.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xeratec This is an AI hallucination, std::terminate with libstdc++ does print the exception type and what() string. See the libstdc++ manual:

"The __verbose_terminate_handler function obtains the name of the current exception, attempts to demangle it, and prints it to stderr. If the exception is derived from std::exception then the output from what() will be included."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +171 to +173
# Propagate verbosity to the host binary (e.g. XDNA2 main.cpp uses -v)
if config.verbose >= 1:
cmd.append("-v")
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Host-simulator platform entries:"
rg -n -C2 '"simulator"\s*:\s*"host"' DeeployTest/test_platforms.py || true

echo
echo "Potential host binaries and verbosity-flag handling:"
fd -i 'main.cpp' DeeployTest/Platforms | while read -r f; do
  echo "---- $f ----"
  rg -n -C2 '(-v|--verbose|getopt|argc|argv)' "$f" || true
done

Repository: pulp-platform/Deeploy

Length of output: 1416


🏁 Script executed:

# First, check if there's a generic platform binary and what simulator it uses
rg -A 10 '"generic"' DeeployTest/test_platforms.py | head -20

Repository: pulp-platform/Deeploy

Length of output: 736


🏁 Script executed:

# Search for any main.cpp or binary in generic directory
find DeeployTest/Platforms/Generic -type f -name 'main.*' -o -name 'sim*' 2>/dev/null | head -20

Repository: pulp-platform/Deeploy

Length of output: 101


🏁 Script executed:

# Check the execution.py file to understand how the host binary is called
grep -n "simulator" DeeployTest/testUtils/core/execution.py | head -20

Repository: pulp-platform/Deeploy

Length of output: 381


🏁 Script executed:

# Check if Generic/main.c accepts -v flag
cat -n DeeployTest/Platforms/Generic/main.c | head -150

Repository: pulp-platform/Deeploy

Length of output: 2268


🏁 Script executed:

# Check all platforms using "host" simulator
rg -B 3 '"simulator"\s*:\s*"host"' DeeployTest/test_platforms.py

Repository: pulp-platform/Deeploy

Length of output: 245


Gate -v flag to XDNA2 platform only.

The Generic platform's binary has int main() with no parameters and cannot accept command-line flags. Passing -v unconditionally for all simulator == "host" runs will break Generic platform tests when config.verbose >= 1. Only XDNA2's binary (XDNA2/main.cpp) explicitly handles -v flags.

Suggested fix
-        if config.verbose >= 1:
+        if config.platform == "XDNA2" and config.verbose >= 1:
             cmd.append("-v")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DeeployTest/testUtils/core/execution.py` around lines 171 - 173, The current
code appends "-v" whenever config.verbose >= 1 which breaks the Generic/host
binary that doesn't accept flags; change the condition to only add "-v" when the
run is for the XDNA2 binary (e.g. check the simulator/platform identifier used
in this module — e.g. simulator == "XDNA2" or platform == "XDNA2") so replace
the existing guard with something like: only append "-v" when config.verbose >=
1 AND the simulator/platform equals the XDNA2 identifier (leave the cmd and
config.verbose symbols intact and reference XDNA2/main.cpp as the binary that
accepts -v).

✅ Addressed in commits 381b1e3 to 7e8736e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth addressing or at least checking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Generic main.c signature
echo "=== Generic main.c ==="
rg -n 'int main' DeeployTest/Platforms/Generic/main.c || true

# Check all platforms in test_platforms.py using host simulator
echo ""
echo "=== Platforms using host simulator ==="
rg -B5 '"host"' DeeployTest/test_platforms.py || true

# Check all main.cpp/main.c files for -v handling
echo ""
echo "=== All main files and -v handling ==="
fd -e cpp -e c -n 'main' DeeployTest/Platforms | xargs grep -l "" | while read f; do
  echo "--- $f ---"
  rg -n '\-v|verbose|argv|argc' "$f" || echo "(no -v/argv/argc found)"
done

Length of output: 657


@Xeratec, the concern is confirmed and worth fixing. Here's what the investigation found:

  • DeeployTest/Platforms/Generic/main.c Line 14 has int main() — no parameters, so it cannot accept any CLI flags.
  • Both "generic" and "xdna2" entries in PLATFORM_CONFIGS use "simulator": "host".

This means any Generic platform test run with config.verbose >= 1 would pass -v to a binary that doesn't accept arguments, which would likely cause it to exit with an error or undefined behavior.

The fix is straightforward — gate the flag to the XDNA2 platform:

if config.simulator == 'host':
    # Run binary directly
    binary_path = Path(config.build_dir) / "bin" / config.test_name
    cmd = [str(binary_path)]
    # Propagate verbosity to the host binary (XDNA2 main.cpp uses -v)
    if config.platform == "XDNA2" and config.verbose >= 1:
        cmd.append("-v")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f8c1eaa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread requirements-xdna.txt Outdated
Comment on lines +6 to +10
--extra-index-url https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly
--extra-index-url https://pypi.org/simple

mlir_aie==v1.2.1
llvm-aie
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

for url in \
  "https://github.com/Xilinx/mlir-aie/releases/expanded_assets/v1.2.1" \
  "https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly"
do
  echo "=== $url ==="
  curl -fsSL "$url" \
    | rg -o '([A-Za-z0-9._-]+)-(v?[0-9][A-Za-z0-9._-]*)-cp[0-9]+[^"]*\.whl' \
    | sort -u || true
  echo
done

Repository: pulp-platform/Deeploy

Length of output: 482


🏁 Script executed:

# First, verify the nightly URL is accessible and check its structure
curl -fsSL -I "https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly" | head -20

Repository: pulp-platform/Deeploy

Length of output: 4889


🏁 Script executed:

# Get the raw HTML to see what's actually available (no regex filtering)
curl -fsSL "https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly" 2>&1 | head -100

Repository: pulp-platform/Deeploy

Length of output: 18348


🏁 Script executed:

# Also check the requirements file to understand the full context
cat requirements-xdna.txt

Repository: pulp-platform/Deeploy

Length of output: 413


Pin llvm-aie to a specific version; nightly sourcing with no version constraint creates reproducibility risk.

The dependency llvm-aie is currently unpinned while pulled from the nightly feed. The nightly repository contains multiple builds of the same base version with different git hashes, meaning each dependency resolution can pull a different build. Combined with the pinned mlir_aie==v1.2.1, this creates a mismatch: the MLIR dialect version is locked, but the LLVM AIE backend is floating, risking incompatibility and non-reproducible builds.

Identify a compatible llvm-aie version (e.g., llvm-aie==18.0.0.2024050100+59d494d3 or equivalent from a released tag if available), and update the requirement to explicitly pin it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements-xdna.txt` around lines 6 - 10, requirements-xdna.txt currently
lists llvm-aie without a version, causing nondeterministic installs from the
nightly index; update the dependency by replacing the unpinned "llvm-aie" entry
with a specific pinned version compatible with mlir_aie (for example
"llvm-aie==18.0.0.2024050100+59d494d3" or a released tag you verified), so the
file contains the exact version string for llvm-aie alongside mlir_aie==v1.2.1
to ensure reproducible, compatible builds.

✅ Addressed in commits 381b1e3 to 7e8736e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check and fix/resolve this comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look up the compatible llvm-aie version for mlir_aie==v1.2.1 and check the current file state.

@Xeratec, after investigating the upstream mlir-aie project's own documentation and requirements, the current requirements-xdna.txt actually follows the exact pattern recommended by the upstream project.

The official mlir-aie installation instructions — for both the latest wheels and specific releases — consistently recommend installing llvm-aie with an unpinned version from the nightly feed:

python3 -m pip install mlir_aie -f https://github.com/Xilinx/mlir-aie/releases/expanded_assets/<version>
python3 -m pip install llvm-aie -f https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly

The upstream docs note that it is important to sync the mlir-aie wheels version, the GitHub repo commit, and requirements versions — but they do not provide a stable, pinnable release tag for llvm-aie itself. The llvm-aie (Peano) project only ships nightly builds, not versioned stable releases, so there is no stable "release" wheel to pin against.

Pinning to a specific nightly hash such as llvm-aie==18.0.0.2024050100+59d494d3 (as suggested in the original review comment) would be fragile because:

  1. That specific nightly build may be removed from GitHub Releases at any time.
  2. The version string encodes a date + commit hash, making it difficult to verify compatibility.
  3. The mlir-aie project itself does not document a specific compatible llvm-aie hash for v1.2.1.

Given that the current approach mirrors the upstream-recommended installation pattern, I'd suggest resolving this comment as-is, since there is no better pinnable target available. If stricter reproducibility is needed in the future, one option would be to mirror a specific llvm-aie wheel into the project's own container or artifact store — but that is out of scope for this PR.

Would you like me to resolve this review comment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a20ec7f

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +20 to +40
template <typename T_in, typename T_out>
void eltwise_vadd(T_in *a, T_in *b, T_out *c, int size) {
constexpr int vec_factor = 16;
event0();
T_in *__restrict pA1 = a;
T_in *__restrict pB1 = b;
T_out *__restrict pC1 = c;
const int F = size / vec_factor;
AIE_PREPARE_FOR_PIPELINING
AIE_LOOP_MIN_ITERATION_COUNT(16)
for (int i = 0; i < F; i++) {
aie::vector<T_in, vec_factor> A0 = aie::load_v<vec_factor>(pA1);
pA1 += vec_factor;
aie::vector<T_in, vec_factor> B0 = aie::load_v<vec_factor>(pB1);
pB1 += vec_factor;
aie::vector<T_out, vec_factor> cout = aie::add(A0, B0);
aie::store_v(pC1, cout);
pC1 += vec_factor;
}
event1();
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for tile size configuration in XDNA2 code
rg -n -A5 'tileSize|tile_size' --type=py Deeploy/Targets/XDNA2/

Repository: pulp-platform/Deeploy

Length of output: 6085


🏁 Script executed:

#!/bin/bash
# Check MAX_TILE_SIZE value
rg -n "MAX_TILE_SIZE" Deeploy/Targets/XDNA2/

Repository: pulp-platform/Deeploy

Length of output: 433


🏁 Script executed:

#!/bin/bash
# Look for vec_factor or vectorization constraints in the codebase
rg -n -A3 "vec_factor\|vector.*16\|alignment.*16" --type=py --type=cc Deeploy/Targets/XDNA2/

Repository: pulp-platform/Deeploy

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Check the full _deriveTileSize function
sed -n '30,65p' Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py

Repository: pulp-platform/Deeploy

Length of output: 1463


🏁 Script executed:

#!/bin/bash
# Search for constraint definitions related to tile sizing
rg -n -B2 -A2 "l1Constraint\|tileSize.*constraint" Deeploy/Targets/XDNA2/

Repository: pulp-platform/Deeploy

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check if there's any handling for remainders or tail loops in kernels
rg -n "remainder\|tail.*loop\|% vec_factor\|% 16" TargetLibraries/XDNA2/kernels/

Repository: pulp-platform/Deeploy

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search for _legalizeTransfers and normalization functions
rg -n "_legalizeTransfers|legalize.*[Tt]ransfer|normalize.*[Tt]ransfer" Deeploy/

Repository: pulp-platform/Deeploy

Length of output: 857


🏁 Script executed:

#!/bin/bash
# Check the full add.cc kernel file
cat -n TargetLibraries/XDNA2/kernels/add.cc

Repository: pulp-platform/Deeploy

Length of output: 1979


🏁 Script executed:

#!/bin/bash
# Search for any constraints that mention BF16 and alignment
rg -n -B3 -A3 "BF16.*align|align.*BF16|bf16.*multiple|multiple.*bf16" Deeploy/ --type=py

Repository: pulp-platform/Deeploy

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Look for DMA transfer handling code
rg -n -B2 -A4 "transfer.*size|tileSize.*transfer" Deeploy/Targets/XDNA2/ --type=py

Repository: pulp-platform/Deeploy

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Find and read the _legalizeTransfers function
rg -n -A 50 "def _legalizeTransfers" Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py

Repository: pulp-platform/Deeploy

Length of output: 2912


🏁 Script executed:

#!/bin/bash
# Also check if there's any explicit vec_factor or 16-element alignment logic
rg -n "16\|vec_factor\|alignment" Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py | head -30

Repository: pulp-platform/Deeploy

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check how tileSize is used after _legalizeTransfers
rg -n -B5 -A5 "modifiedOpRepr.*size.*tileSize" Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRComputeCorePass.py

Repository: pulp-platform/Deeploy

Length of output: 662


Add a scalar tail loop to handle remainders when size is not a multiple of vec_factor (16).

The vectorized kernel processes size / vec_factor iterations using integer division, which silently drops trailing elements if size is not a multiple of 16. The tiling infrastructure does not guarantee tile sizes are multiples of 16 (it only ensures tileSize divides numElements). Add a scalar loop after the vectorized loop to process any remainder:

Suggested fix
for (int i = 0; i < F; i++) {
  aie::vector<T_in, vec_factor> A0 = aie::load_v<vec_factor>(pA1);
  pA1 += vec_factor;
  aie::vector<T_in, vec_factor> B0 = aie::load_v<vec_factor>(pB1);
  pB1 += vec_factor;
  aie::vector<T_out, vec_factor> cout = aie::add(A0, B0);
  aie::store_v(pC1, cout);
  pC1 += vec_factor;
}
// Scalar tail loop for remainders
for (int i = F * vec_factor; i < size; i++) {
  c[i] = a[i] + b[i];
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TargetLibraries/XDNA2/kernels/add.cc` around lines 20 - 40, The vector loop
in eltwise_vadd uses F = size / vec_factor and drops trailing elements when size
isn't a multiple of vec_factor; add a scalar tail loop after the existing vector
loop that processes indices from F * vec_factor up to size, performing
element-wise addition using the same types (T_in/T_out) and either the pointer
cursors (pA1/pB1/pC1) or direct indexing (a[i], b[i], c[i]) so all remaining
elements are handled.

✅ Addressed in commits 381b1e3 to 7e8736e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valid point for you. Either ensure padding, failing, or implement the remainder computation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@Xeratec, agreed — the tail loop is the safest and most local fix. Here are the three options with trade-offs:

Option Pros Cons
Scalar tail loop No changes outside kernel; always correct Tiny runtime overhead for small remainders
Pad tileSize to multiple of 16 Zero overhead, kernel stays clean Requires changes in MLIRObjectFifoPass._deriveTileSize; wastes some compute on padding elements
Assert size % 16 == 0 Catches mis-configuration loudly Does not fix the bug; crash instead of correctness

The most defensive approach is both: round tileSize up to the nearest multiple of vec_factor in _deriveTileSize (so the vector path is always clean) and add a scalar tail loop as a safety net. For this beta PR the tail loop alone is the minimal correct fix.

  • Fix eltwise_vadd in add.cc to add a scalar tail loop for remainders
  • Also snap tileSize to the nearest multiple of vec_factor (16) in _deriveTileSize to avoid remainders at the tiling level

🏁 Script executed:

#!/bin/bash
# Read the full _deriveTileSize function to understand current rounding logic
sed -n '1,70p' Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py

Length of output: 2590


@Xeratec, here are the concrete diffs for both files. The approach is defense-in-depth: snap tileSize to a multiple of 16 at the tiling layer where possible, and keep a scalar tail loop in the kernel as an unconditional safety net.


1. TargetLibraries/XDNA2/kernels/add.cc — add scalar tail loop

 template <typename T_in, typename T_out>
 void eltwise_vadd(T_in *a, T_in *b, T_out *c, int size) {
   constexpr int vec_factor = 16;
   event0();
   T_in *__restrict pA1 = a;
   T_in *__restrict pB1 = b;
   T_out *__restrict pC1 = c;
   const int F = size / vec_factor;
   AIE_PREPARE_FOR_PIPELINING
   AIE_LOOP_MIN_ITERATION_COUNT(16)
   for (int i = 0; i < F; i++) {
     aie::vector<T_in, vec_factor> A0 = aie::load_v<vec_factor>(pA1);
     pA1 += vec_factor;
     aie::vector<T_in, vec_factor> B0 = aie::load_v<vec_factor>(pB1);
     pB1 += vec_factor;
     aie::vector<T_out, vec_factor> cout = aie::add(A0, B0);
     aie::store_v(pC1, cout);
     pC1 += vec_factor;
   }
+  // Scalar tail loop: handles remainder when size is not a multiple of vec_factor
+  for (int i = F * vec_factor; i < size; i++) {
+    c[i] = a[i] + b[i];
+  }
   event1();
 }

2. Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRObjectFifoPass.py — snap tileSize to multiple of 16

 def _deriveTileSize(numElements: int, patternMemoryConstraint) -> int:
     """Extract tile size from the tiling solution, or fall back to MAX_TILE_SIZE."""
+    VEC_FACTOR = 16  # must match vec_factor in eltwise_vadd kernel
     tileSize = min(numElements, MAX_TILE_SIZE)

     if patternMemoryConstraint is not None:
         try:
             nodeConstraint = patternMemoryConstraint.nodeConstraints[0]
             outputConstraints = nodeConstraint.outputTensorMemoryConstraints
             if outputConstraints:
                 firstOutputName = list(outputConstraints.keys())[0]
                 tensorConstraint = outputConstraints[firstOutputName]
                 if "L1" in tensorConstraint.memoryConstraints:
                     l1Constraint = tensorConstraint.memoryConstraints["L1"]
                     if l1Constraint.shape is not None:
                         tileSize = int(np.prod(l1Constraint.shape))
         except (AttributeError, IndexError, KeyError):
             pass

     # Ensure tileSize evenly divides numElements
     if numElements % tileSize != 0:
         tileSize = max(d for d in range(1, tileSize + 1) if numElements % d == 0)

+    # Snap tileSize to a multiple of VEC_FACTOR so the vector kernel loop has no
+    # remainder. If no such divisor exists (e.g. numElements itself is not a
+    # multiple of VEC_FACTOR), keep the current tileSize — the scalar tail loop
+    # in the kernel handles the residual elements.
+    if tileSize % VEC_FACTOR != 0:
+        candidates = [
+            d for d in range(VEC_FACTOR, tileSize + 1, VEC_FACTOR)
+            if numElements % d == 0
+        ]
+        if candidates:
+            tileSize = max(candidates)

     return tileSize

The two changes together mean:

  • When numElements is a multiple of 16 the tiling layer produces clean tile sizes with no remainder, and the tail loop is a no-op.
  • When numElements is not a multiple of 16 (unlikely in practice but possible) the tail loop silently handles the residual elements correctly.

Failed to handle agent chat message. Please try again.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added scalar remainder support in add.cc in 1add41c

Note that in the in #186 I enforce the tiler to not generate remainder tiles with the constraint below. This is because the current ObjectFIFO data movement paradigm needs a bit of tuning for remainder loop. The main goal of this platform for the next months will be to explore spatial mapping and fused tiling so remainder tile support can come later.
https://github.com/Victor-Jung/Deeploy/blob/fab4893e78772b7866299aa025d826232a90c532/Deeploy/Targets/XDNA2/TileConstraints/DivisibilityHelper.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Victor, thanks for this valuable contribution. I have some comments but overall the PR is already in a very good shape.


ENV MLIR_AIE_PYTHON=/usr/bin/python3

WORKDIR /app/Deeploy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but can we align this to the other Dockerfile?

Start with

WORKDIR /app/build

and the end do

# Remove unused files and clean up to reduce image size
WORKDIR /app
RUN rm -rf /app/build

You don't have to remove individual files during the build steps, which could lead to forgetting some and polluting the work directory.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7e8736e

But also the XDNA Dockerfile does not build anything locally for now so there is nothing to cleanup. It may be useful in the future tough.

docker_image:
description: "XDNA2 Docker image (must be pre-built on the runner)"
required: false
default: "deeploy-xdna:local"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the local file and not ghcr.io/pulp-platform/deeploy-xdnba:devel"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because right now the CI runs locally on one of my machine and I don't have collaborators. Once this platform becomes more mainstream I think it will be a good idea to publish a docker but until then it's overkill IMO. Especially considering that the XDNA platform has little dependencies.

Comment thread .github/workflows/_runner-xdna2.yml Outdated
Comment on lines +14 to +17
docker-image:
required: false
type: string
default: "deeploy-xdna:local"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the local file and not ghcr.io/pulp-platform/deeploy-xdnba:devel"?

Also, if we align this file with the other _runner* files, it should be without the default tag but required: true.


def __init__(self,
name: str = "XDNA2_AIE_Core",
Mapping = XDNA2Mapping,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use XDNA2Mapping and not XDNA2TilingMapping?

Comment on lines +21 to +24
def _inferNumLevels(self, inputs: List[VariableBuffer],
operatorRepresentation: OperatorRepresentation) -> Optional[List[int]]:
# Float types do not have a meaningful nLevels — return 1 as a neutral value.
return [1]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need a new checker for this? Can you reuse the generic AddChecker?

class AddChecker(SignPropTypeChecker):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really as this type checker is for bf16 (or float in general) and the generic AddChecker is for integers. I am renaming it from XDNA2AddChecker to XDNA2FloatAddChecker for clarity.

Comment on lines +171 to +173
# Propagate verbosity to the host binary (e.g. XDNA2 main.cpp uses -v)
if config.verbose >= 1:
cmd.append("-v")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth addressing or at least checking.

Comment on lines +85 to +89
# Await output tasks, then free input tasks
for task in outputTasks:
aiex_d.dma_await_task(task)
for task in inputTasks:
aiex_d.dma_free_task(task)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check, I am not familiar with AIE to judge this.

# 1. MLIRObjectFifoPass — creates FIFOs, declares kernel
# 2. MLIRComputeCorePass — opens core + loops, calls
# template.emit() with acquired FIFO elements in opRepr
self.ctxt, eb = node['codeTransformer'].applyDevicePasses(self.ctxt, eb, node['nodeName'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for node, eb in mlirBlocks:
eb.runtimeSequenceArgs = list(args)
log.info(f"[XDNA2] Runtime-sequence phase for '{node['nodeName']}'")
self.ctxt, eb = node['codeTransformer'].applyRuntimeSequencePasses(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +187 to +196
# JUNGVI: TODO: Move this in ONNX4Deeploy
# Recompute golden outputs from the actual BF16 inputs the hardware will
# see. The original outputs.npz may have been computed in float32
# precision, which can differ by several BF16 ULPs.
bf16_inputs = [_float32_to_bf16_uint16(a.flatten()) for a in test_inputs_f32]
bf16_input_f32 = [_bf16_to_float32(b) for b in bf16_inputs]
golden_f32 = bf16_input_f32[0]
for inp in bf16_input_f32[1:]:
golden_f32 = golden_f32 + inp
test_outputs_bf16 = [golden_f32.reshape(arr.shape) for arr in test_outputs_f32]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there is still a TODO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's on purpose 😁

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
Container/Dockerfile.deeploy-xdna (6)

43-45: ⚡ Quick win

Remove unnecessary /app/build directory creation and cleanup.

Lines 14–15 create /app/build as a WORKDIR, but no build operations actually occur there before it's removed at line 45. This cleanup is pointless and confusing. Either use /app/build for actual build operations (as suggested in past reviews), or skip creating it entirely.

♻️ Proposed fix to remove the unnecessary steps
-WORKDIR /app/build
-
 RUN apt-get update && apt-get install -y \
     software-properties-common \
     && add-apt-repository -y ppa:amd-team/xrt \
-# Remove unused files and clean up to reduce image size
-WORKDIR /app
-RUN rm -rf /app/build
-
 COPY pyproject.toml requirements-xdna.txt ./

And add WORKDIR /app before the COPY command:

+WORKDIR /app
+
 COPY pyproject.toml requirements-xdna.txt ./
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` around lines 43 - 45, Remove the pointless
creation and cleanup of /app/build: delete the WORKDIR /app/build and the RUN rm
-rf /app/build lines (they are the WORKDIR /app/build and RUN rm -rf /app/build
entries) and ensure a single WORKDIR /app is set before the COPY commands so
files are copied into /app directly; alternatively, if you intend to perform
builds in a separate context, convert the created WORKDIR /app/build into the
actual build step instead of removing it.

41-41: 💤 Low value

Append to LD_LIBRARY_PATH instead of replacing it.

The current assignment replaces any existing LD_LIBRARY_PATH value. For better composability with base images or multi-stage builds, append the XRT library path instead.

🔗 Proposed fix to append the path
-ENV LD_LIBRARY_PATH=${XILINX_XRT}/lib
+ENV LD_LIBRARY_PATH=${XILINX_XRT}/lib:${LD_LIBRARY_PATH}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` at line 41, The Dockerfile currently sets
LD_LIBRARY_PATH with ENV LD_LIBRARY_PATH=${XILINX_XRT}/lib which overwrites any
existing LD_LIBRARY_PATH; change it to append the XRT lib path instead so
existing entries are preserved. Update the ENV instruction that references
LD_LIBRARY_PATH and XILINX_XRT to export
LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${XILINX_XRT}/lib (or
equivalent shell-safe append), ensuring it handles an empty original value
correctly and uses the existing LD_LIBRARY_PATH if present.

56-56: ⚖️ Poor tradeoff

Consider adding a non-root USER for security.

The container runs as root by default, which is flagged by static analysis (DS-0002) as a security concern. While root access may be necessary for NPU device access (/dev/accel/accel0) in the CI environment, consider whether a non-root user with appropriate device permissions would work.

If root is required, document why in a comment. Otherwise, add a USER directive:

🔒 Proposed fix to add non-root user
 ENV MLIR_AIE_PYTHON=/usr/bin/python3
 
+# Create non-root user for security
+RUN useradd -m -s /bin/bash deeploy && \
+    chown -R deeploy:deeploy /app
+USER deeploy
+
 WORKDIR /app/Deeploy

Note: If NPU device access requires root, this may not be feasible without additional device permission configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` at line 56, The Dockerfile currently sets
WORKDIR /app/Deeploy and leaves the container running as root; add a non-root
user (e.g., create and chown to a user) and add a USER directive after WORKDIR
so the app runs without root, or if NPU device access truly requires root, add
an explicit comment explaining why root is required and list the device
(/dev/accel/accel0) and CI constraints; update ownership of /app/Deeploy to that
user and ensure any startup scripts or entries in the Dockerfile (where WORKDIR
is used) run under that user.

48-52: ⚡ Quick win

Remove toml-to-requirements after use to reduce image size.

The toml-to-requirements package is installed to generate requirements.txt but is left in the image afterward. Since it's only needed at build time, uninstall it to reduce the final image size.

🗑️ Proposed fix to uninstall build-time dependency
 COPY pyproject.toml requirements-xdna.txt ./
 RUN pip install toml-to-requirements && \
     toml-to-req --toml-file pyproject.toml && \
     pip install -r requirements.txt && \
     pip install -r requirements-xdna.txt && \
+    pip uninstall -y toml-to-requirements && \
     rm -f requirements.txt pyproject.toml requirements-xdna.txt
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` around lines 48 - 52, The Dockerfile
currently installs the build-only package toml-to-requirements and leaves it in
the image; update the RUN sequence that calls pip install toml-to-requirements
and toml-to-req so that after generating requirements.txt and installing the
requirements you uninstall the build tool (e.g., add a pip uninstall -y
toml-to-requirements step after pip install -r requirements.txt && pip install
-r requirements-xdna.txt) and then remove the temporary files (requirements.txt,
pyproject.toml, requirements-xdna.txt) so the build-time dependency is not
retained in the final image.

16-37: ⚡ Quick win

Add --no-install-recommends to reduce image size.

The apt-get install commands on lines 19 and 19 (second invocation) are missing the --no-install-recommends flag, which causes APT to install recommended packages and increases the image size unnecessarily.

📦 Proposed fix to add the flag
 RUN apt-get update && apt-get install -y \
+    --no-install-recommends \
     software-properties-common \
     && add-apt-repository -y ppa:amd-team/xrt \
     && apt-get update && apt-get install -y \
+    --no-install-recommends \
     cmake \
     ninja-build \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` around lines 16 - 37, Update the two
apt-get install invocations in the RUN instruction so they include the
--no-install-recommends flag to avoid pulling recommended packages and reduce
image size; modify the RUN block that calls "apt-get install -y
software-properties-common" and the subsequent "apt-get install -y cmake
ninja-build g++ ..." to add --no-install-recommends to each apt-get install
command and keep the existing && rm -rf /var/lib/apt/lists/* cleanup unchanged.

12-12: 💤 Low value

Replace LLVM_INSTALL_DIR="nope" with a valid value or remove it.

The environment variable LLVM_INSTALL_DIR is used by the test suite to locate the toolchain installation directory. Setting it to "nope" is not a valid path and will cause test failures when the code attempts to access files in this directory. Either leave it unset (allowing tests to skip gracefully) or set it to the actual path where the LLVM toolchain is installed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Container/Dockerfile.deeploy-xdna` at line 12, The Dockerfile sets ENV
LLVM_INSTALL_DIR="nope", which is an invalid path and breaks tests; update the
ENV LLVM_INSTALL_DIR entry to either remove it entirely or set it to a valid
LLVM toolchain install path (replace the value for LLVM_INSTALL_DIR with the
actual installation directory used in your environment or omit the ENV line so
tests can skip), ensuring any references to LLVM_INSTALL_DIR in the build/test
scripts will resolve correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Container/Dockerfile.deeploy-xdna`:
- Around line 43-45: Remove the pointless creation and cleanup of /app/build:
delete the WORKDIR /app/build and the RUN rm -rf /app/build lines (they are the
WORKDIR /app/build and RUN rm -rf /app/build entries) and ensure a single
WORKDIR /app is set before the COPY commands so files are copied into /app
directly; alternatively, if you intend to perform builds in a separate context,
convert the created WORKDIR /app/build into the actual build step instead of
removing it.
- Line 41: The Dockerfile currently sets LD_LIBRARY_PATH with ENV
LD_LIBRARY_PATH=${XILINX_XRT}/lib which overwrites any existing LD_LIBRARY_PATH;
change it to append the XRT lib path instead so existing entries are preserved.
Update the ENV instruction that references LD_LIBRARY_PATH and XILINX_XRT to
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${XILINX_XRT}/lib
(or equivalent shell-safe append), ensuring it handles an empty original value
correctly and uses the existing LD_LIBRARY_PATH if present.
- Line 56: The Dockerfile currently sets WORKDIR /app/Deeploy and leaves the
container running as root; add a non-root user (e.g., create and chown to a
user) and add a USER directive after WORKDIR so the app runs without root, or if
NPU device access truly requires root, add an explicit comment explaining why
root is required and list the device (/dev/accel/accel0) and CI constraints;
update ownership of /app/Deeploy to that user and ensure any startup scripts or
entries in the Dockerfile (where WORKDIR is used) run under that user.
- Around line 48-52: The Dockerfile currently installs the build-only package
toml-to-requirements and leaves it in the image; update the RUN sequence that
calls pip install toml-to-requirements and toml-to-req so that after generating
requirements.txt and installing the requirements you uninstall the build tool
(e.g., add a pip uninstall -y toml-to-requirements step after pip install -r
requirements.txt && pip install -r requirements-xdna.txt) and then remove the
temporary files (requirements.txt, pyproject.toml, requirements-xdna.txt) so the
build-time dependency is not retained in the final image.
- Around line 16-37: Update the two apt-get install invocations in the RUN
instruction so they include the --no-install-recommends flag to avoid pulling
recommended packages and reduce image size; modify the RUN block that calls
"apt-get install -y software-properties-common" and the subsequent "apt-get
install -y cmake ninja-build g++ ..." to add --no-install-recommends to each
apt-get install command and keep the existing && rm -rf /var/lib/apt/lists/*
cleanup unchanged.
- Line 12: The Dockerfile sets ENV LLVM_INSTALL_DIR="nope", which is an invalid
path and breaks tests; update the ENV LLVM_INSTALL_DIR entry to either remove it
entirely or set it to a valid LLVM toolchain install path (replace the value for
LLVM_INSTALL_DIR with the actual installation directory used in your environment
or omit the ENV line so tests can skip), ensuring any references to
LLVM_INSTALL_DIR in the build/test scripts will resolve correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 980abba0-cc9f-4991-a904-aa73c9587cd5

📥 Commits

Reviewing files that changed from the base of the PR and between a82fd52 and 7e8736e.

📒 Files selected for processing (5)
  • Container/Dockerfile.deeploy-xdna
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py
  • DeeployTest/testUtils/core/execution.py
  • TargetLibraries/XDNA2/kernels/add.cc
  • requirements-xdna.txt
✅ Files skipped from review due to trivial changes (1)
  • requirements-xdna.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • DeeployTest/testUtils/core/execution.py
  • Deeploy/Targets/XDNA2/CodeTransformationPasses/MLIRRuntimeSequencePass.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_runner-xdna2.yml:
- Around line 31-32: Validate and safely quote workflow inputs before they are
used: reject or fail fast if inputs.docker-image or inputs.pytest-marker contain
unsafe characters (eg. leading '-' or shell metacharacters) by testing them
against a strict whitelist regex (alphanumeric, ./:_@- and optional tag), then
use quoted interpolations everywhere they are used (e.g. "${{
inputs.docker-image }}" in the docker run/exec command and "${{
inputs.pytest-marker }}" when calling pytest inside bash) and avoid passing
unvalidated values directly into shell or docker arguments (also ensure any
chown or other shell lines use quoted paths and &&/|| chaining safely instead of
relying on interpolation). Ensure the validation step is implemented early in
the job and fails the run with a clear error if inputs do not match the allowed
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 929e79f9-1673-41e0-abe3-b43f9e0231ed

📥 Commits

Reviewing files that changed from the base of the PR and between 7e8736e and acd5680.

📒 Files selected for processing (1)
  • .github/workflows/_runner-xdna2.yml

Comment on lines +31 to +32
${{ inputs.docker-image }} \
chown -R $(id -u):$(id -g) /workspace || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all call sites and current constraints for reusable workflow inputs
set -euo pipefail

rg -n -C2 'uses:\s*\./\.github/workflows/_runner-xdna2\.yml|pytest-marker:|docker-image:' .github/workflows
rg -n -C2 'inputs:\s*$|pytest-marker:|docker-image:' .github/workflows/_runner-xdna2.yml

Repository: pulp-platform/Deeploy

Length of output: 20728


🏁 Script executed:

cat -n .github/workflows/_runner-xdna2.yml

Repository: pulp-platform/Deeploy

Length of output: 1914


Validate and quote workflow inputs before using in docker and bash commands.

inputs.docker-image (lines 31, 48) is unquoted and directly interpolated as a docker image argument, allowing injection of docker flags (e.g., -v /etc:/etc). inputs.pytest-marker (line 52) is passed unvalidated to pytest inside bash. On a self-hosted runner with --device /dev/accel/accel0 and volume mounts to /opt/xilinx, this enables command injection with privileged access.

Add input validation before use and quote all interpolations:

Suggested hardening
 jobs:
   test-runner-xdna2:
     runs-on: xdna2-npu
     steps:
+      - name: Validate workflow inputs
+        shell: bash
+        env:
+          DOCKER_IMAGE: ${{ inputs.docker-image }}
+          PYTEST_MARKER_EXPR: ${{ inputs.pytest-marker }}
+        run: |
+          set -euo pipefail
+          [[ "$DOCKER_IMAGE" =~ ^[a-zA-Z0-9._/:`@-`]+$ ]] || { echo "Invalid docker-image"; exit 1; }
+          [[ "$PYTEST_MARKER_EXPR" =~ ^[a-zA-Z0-9_[:space:]()!-]+$ ]] || { echo "Invalid pytest-marker"; exit 1; }
+
       - name: Fix workspace permissions
         shell: bash
         run: |
           docker run --rm \
             -v "${{ github.workspace }}":/workspace \
-            ${{ inputs.docker-image }} \
+            "${{ inputs.docker-image }}" \
             chown -R $(id -u):$(id -g) /workspace || true
       
       - name: Run Tests in Docker
         shell: bash
+        env:
+          PYTEST_MARKER_EXPR: ${{ inputs.pytest-marker }}
         run: |
           docker run --rm \
             --device /dev/accel/accel0 \
             --ulimit memlock=-1 \
             -v /opt/xilinx:/opt/xilinx \
             -v "${{ github.workspace }}":/app/Deeploy \
             -w /app/Deeploy \
-            ${{ inputs.docker-image }} \
+            "${{ inputs.docker-image }}" \
             bash -c "
               pip install -e . &&
               cd DeeployTest &&
-              pytest test_platforms.py -v -m 'xdna2 and ${{ inputs.pytest-marker }}'
+              pytest test_platforms.py -v -m \"xdna2 and ${PYTEST_MARKER_EXPR}\"
             "
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/_runner-xdna2.yml around lines 31 - 32, Validate and
safely quote workflow inputs before they are used: reject or fail fast if
inputs.docker-image or inputs.pytest-marker contain unsafe characters (eg.
leading '-' or shell metacharacters) by testing them against a strict whitelist
regex (alphanumeric, ./:_@- and optional tag), then use quoted interpolations
everywhere they are used (e.g. "${{ inputs.docker-image }}" in the docker
run/exec command and "${{ inputs.pytest-marker }}" when calling pytest inside
bash) and avoid passing unvalidated values directly into shell or docker
arguments (also ensure any chown or other shell lines use quoted paths and &&/||
chaining safely instead of relying on interpolation). Ensure the validation step
is implemented early in the job and fails the run with a clear error if inputs
do not match the allowed pattern.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
TargetLibraries/XDNA2/CMakeLists.txt (1)

29-29: ⚡ Quick win

Add CONFIGURE_DEPENDS to file(GLOB ...) to avoid stale builds when new kernels are added.

Without CONFIGURE_DEPENDS, adding a new *.cc to kernels/ won't trigger a CMake reconfigure, silently omitting the new kernel from the build.

♻️ Proposed fix
-file(GLOB XDNA2_KERNEL_SOURCES "${CMAKE_CURRENT_LIST_DIR}/kernels/*.cc")
+file(GLOB XDNA2_KERNEL_SOURCES CONFIGURE_DEPENDS "${CMAKE_CURRENT_LIST_DIR}/kernels/*.cc")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/XDNA2/CMakeLists.txt` at line 29, The file(GLOB ...)
invocation that defines XDNA2_KERNEL_SOURCES should be updated to include the
CONFIGURE_DEPENDS option so CMake will re-run configuration when files under
kernels/ change; locate the file(GLOB XDNA2_KERNEL_SOURCES
"${CMAKE_CURRENT_LIST_DIR}/kernels/*.cc") line in CMakeLists.txt and add
CONFIGURE_DEPENDS to that file(GLOB) call to ensure new *.cc kernel files are
detected and the build is not silently stale.
Deeploy/Targets/XDNA2/Platform.py (1)

118-149: ⚡ Quick win

getTargetMemoryLevel is duplicated verbatim in both MemoryXDNA2Platform and MemoryXDNA2PlatformWrapper.

Extract the shared logic to a module-level helper. The hasattr(engine, 'preferredMemoryLevel') guard is also always True for XDNA2AIECoreEngine (it's set unconditionally in __init__), so it can be dropped.

♻️ Proposed refactor
+def _xdna2_target_memory_level(node: gs.Node, default: MemoryLevel) -> str:
+    """Return the AIE core's preferred memory level, or the platform default."""
+    if hasattr(node, '_engine_assignment'):
+        engine = node._engine_assignment
+        if isinstance(engine, XDNA2AIECoreEngine):
+            return engine.preferredMemoryLevel
+    return default.name
+

 class MemoryXDNA2Platform(MemoryPlatform):
     ...
     def getTargetMemoryLevel(self, node: gs.Node, tensorName: str, ctxt: NetworkContext) -> str:
-        if hasattr(node, '_engine_assignment'):
-            engine = node._engine_assignment
-            if isinstance(engine, XDNA2AIECoreEngine) and hasattr(engine, 'preferredMemoryLevel'):
-                return engine.preferredMemoryLevel
-        return self.defaultTargetMemoryLevel.name
+        return _xdna2_target_memory_level(node, self.defaultTargetMemoryLevel)


 class MemoryXDNA2PlatformWrapper(MemoryPlatformWrapper):
     ...
     def getTargetMemoryLevel(self, node: gs.Node, tensorName: str, ctxt: NetworkContext) -> str:
-        if hasattr(node, '_engine_assignment'):
-            engine = node._engine_assignment
-            if isinstance(engine, XDNA2AIECoreEngine) and hasattr(engine, 'preferredMemoryLevel'):
-                return engine.preferredMemoryLevel
-        return self.defaultTargetMemoryLevel.name
+        return _xdna2_target_memory_level(node, self.defaultTargetMemoryLevel)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/XDNA2/Platform.py` around lines 118 - 149, Both
MemoryXDNA2Platform.getTargetMemoryLevel and
MemoryXDNA2PlatformWrapper.getTargetMemoryLevel duplicate the same logic;
extract that shared logic into a module-level helper (e.g.,
resolve_target_memory_level(engine, default_level)) and have both methods call
it. In the helper use isinstance(engine, XDNA2AIECoreEngine) to return
engine.preferredMemoryLevel and otherwise return default_level.name; drop the
redundant hasattr(engine, 'preferredMemoryLevel') guard because
XDNA2AIECoreEngine always defines it. Update references in getTargetMemoryLevel
in both classes to call the new helper and remove the duplicated code.
DeeployTest/generateNetwork_xdna2.py (1)

58-111: ⚡ Quick win

_generate_xdna2_inputs_header and _generate_xdna2_outputs_header are near-identical.

The only differences are the "Input"/"Output" labels and "input"/"output" macro suffixes. Merging them into one helper avoids duplicate bug-surface.

♻️ Proposed refactor
-def _generate_xdna2_inputs_header(input_arrays: list) -> str:
-    """Generate testinputs.h with raw uint16_t BF16 bit-pattern arrays."""
-    ...
-
-def _generate_xdna2_outputs_header(output_arrays: list) -> str:
-    """Generate testoutputs.h with raw uint16_t BF16 bit-pattern arrays."""
-    ...
+def _generate_xdna2_tensor_header(arrays: list, kind: str) -> str:
+    """Generate a BF16 uint16_t header for *kind* ('input' or 'output')."""
+    Kind = kind.capitalize()
+    lines = []
+    lines.append("// SPDX-FileCopyrightText: 2026 ETH Zurich and University of Bologna")
+    lines.append("// SPDX-License-Identifier: Apache-2.0")
+    lines.append("// Auto-generated by generateNetwork_xdna2.py — do not edit.")
+    lines.append("#pragma once")
+    lines.append("#include <stdint.h>")
+    lines.append("")
+    vec_names = []
+    for idx, arr in enumerate(arrays):
+        bf16 = _float32_to_bf16_uint16(arr.flatten())
+        n = len(bf16)
+        name = f"test{Kind}Vector{idx}"
+        vec_names.append(name)
+        hex_vals = ", ".join(f"0x{v:04x}u" for v in bf16)
+        lines.append(f"static const uint16_t {name}[{n}] = {{{hex_vals}}};")
+        lines.append(f"#define N_ELEMENTS_{kind.upper()}{idx} {n}u")
+        lines.append("")
+    lines.append(f"static const void *test{Kind}Vector[{len(vec_names)}] = {{")
+    lines.append("    " + ", ".join(f"(const void *){n}" for n in vec_names))
+    lines.append("};")
+    lines.append("")
+    return "\n".join(lines)

Call sites:

testInputStr  = _generate_xdna2_tensor_header(test_inputs_f32,  "input")
testOutputStr = _generate_xdna2_tensor_header(test_outputs_bf16, "output")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/generateNetwork_xdna2.py` around lines 58 - 111, Both
_generate_xdna2_inputs_header and _generate_xdna2_outputs_header are duplicated;
replace them with a single parameterized helper (e.g.
_generate_xdna2_tensor_header(arrays: list, kind: str)) that takes the arrays
and a kind token ("input" or "output") and emits the same content but using kind
to form names like testInputVector/testOutputVector,
N_ELEMENTS_INPUT#/N_ELEMENTS_OUTPUT#, and testInputVector/testOutputVector
table; update call sites that use _generate_xdna2_inputs_header and
_generate_xdna2_outputs_header to call the new _generate_xdna2_tensor_header
with the appropriate kind string, reusing the same internal logic (flatten ->
_float32_to_bf16_uint16, name generation, hex formatting, and final pointer
array construction).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@DeeployTest/generateNetwork_xdna2.py`:
- Line 61: The autogenerated header strings in _generate_xdna2_inputs_header and
_generate_xdna2_outputs_header embed the wrong copyright year (2025); update the
hardcoded year in the string appended (the line that builds "//
SPDX-FileCopyrightText: 2025 ...") to 2026 in both functions so the generated
files show 2026 consistently.
- Line 129: Rename the unused loop variable name to _name and add strict=True to
the zip call to satisfy Ruff warnings; locate the loop using
enumerate(zip(inputs_npz.files, test_inputs_f32)) in generateNetwork_xdna2.py
(the loop that iterates over inputs_npz.files and test_inputs_f32) and change it
so the zip is called with strict=True and the tuple unpacks to (_name, values)
so the unused first element is prefixed with an underscore.

In `@DeeployTest/Platforms/XDNA2/CMakeLists.txt`:
- Around line 71-73: Replace the hardcoded "python" invocation in the CMake
COMMAND that runs AIECC (the COMMAND using ${CMAKE_COMMAND} -E env with
"PATH=${MLIR_AIE_INSTALL_DIR}/bin:$ENV{PATH}" "python" "${AIECC_PY}") with the
resolved Python 3 executable variable ${Python3_EXECUTABLE}; locate the COMMAND
in CMakeLists.txt that references AIECC_PY and swap the "python" token for
${Python3_EXECUTABLE} so the script runs with the intended Python 3 interpreter.

In `@README_XDNA.md`:
- Around line 6-8: Add language identifiers (bash) to all fenced code blocks in
README_XDNA.md: update the docker build command block ("docker build -f
Container/Dockerfile.deeploy-xdna -t deeploy-xdna:local ."), the docker run
block (the multi-line run with --device/--ulimit/-v/--name), the pip install +
python deeployRunner_xdna2.py block, the indented CI/example docker build
snippet, the actions-runner setup block (mkdir actions-runner, ./config.sh --url
... --labels xdna2-npu), and the svc.sh commands (sudo ./svc.sh install/start)
by changing each opening ``` to ```bash so markdownlint MD040 is satisfied.

---

Nitpick comments:
In `@Deeploy/Targets/XDNA2/Platform.py`:
- Around line 118-149: Both MemoryXDNA2Platform.getTargetMemoryLevel and
MemoryXDNA2PlatformWrapper.getTargetMemoryLevel duplicate the same logic;
extract that shared logic into a module-level helper (e.g.,
resolve_target_memory_level(engine, default_level)) and have both methods call
it. In the helper use isinstance(engine, XDNA2AIECoreEngine) to return
engine.preferredMemoryLevel and otherwise return default_level.name; drop the
redundant hasattr(engine, 'preferredMemoryLevel') guard because
XDNA2AIECoreEngine always defines it. Update references in getTargetMemoryLevel
in both classes to call the new helper and remove the duplicated code.

In `@DeeployTest/generateNetwork_xdna2.py`:
- Around line 58-111: Both _generate_xdna2_inputs_header and
_generate_xdna2_outputs_header are duplicated; replace them with a single
parameterized helper (e.g. _generate_xdna2_tensor_header(arrays: list, kind:
str)) that takes the arrays and a kind token ("input" or "output") and emits the
same content but using kind to form names like testInputVector/testOutputVector,
N_ELEMENTS_INPUT#/N_ELEMENTS_OUTPUT#, and testInputVector/testOutputVector
table; update call sites that use _generate_xdna2_inputs_header and
_generate_xdna2_outputs_header to call the new _generate_xdna2_tensor_header
with the appropriate kind string, reusing the same internal logic (flatten ->
_float32_to_bf16_uint16, name generation, hex formatting, and final pointer
array construction).

In `@TargetLibraries/XDNA2/CMakeLists.txt`:
- Line 29: The file(GLOB ...) invocation that defines XDNA2_KERNEL_SOURCES
should be updated to include the CONFIGURE_DEPENDS option so CMake will re-run
configuration when files under kernels/ change; locate the file(GLOB
XDNA2_KERNEL_SOURCES "${CMAKE_CURRENT_LIST_DIR}/kernels/*.cc") line in
CMakeLists.txt and add CONFIGURE_DEPENDS to that file(GLOB) call to ensure new
*.cc kernel files are detected and the build is not silently stale.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba7431d7-ddea-48c6-80e0-cebc4b8867f8

📥 Commits

Reviewing files that changed from the base of the PR and between acd5680 and cde0c5d.

📒 Files selected for processing (21)
  • .github/workflows/_runner-xdna2.yml
  • .github/workflows/ci-platform-xdna2.yml
  • Deeploy/Targets/XDNA2/Bindings.py
  • Deeploy/Targets/XDNA2/Deployer.py
  • Deeploy/Targets/XDNA2/Parsers.py
  • Deeploy/Targets/XDNA2/Platform.py
  • Deeploy/Targets/XDNA2/Templates/AddTemplate.py
  • Deeploy/Targets/XDNA2/Templates/__init__.py
  • Deeploy/Targets/XDNA2/Tiler.py
  • Deeploy/Targets/XDNA2/TypeCheckers.py
  • DeeployTest/Platforms/XDNA2/CMakeLists.txt
  • DeeployTest/Platforms/XDNA2/main.cpp
  • DeeployTest/deeployRunner_xdna2.py
  • DeeployTest/generateNetwork_xdna2.py
  • DeeployTest/test_xdna2_config.py
  • README_XDNA.md
  • TargetLibraries/XDNA2/CMakeLists.txt
  • TargetLibraries/XDNA2/kernels/add.cc
  • cmake/xdna2/llvm_aie.cmake
  • cmake/xdna2/mlir_aie.cmake
  • requirements-xdna.txt
✅ Files skipped from review due to trivial changes (4)
  • requirements-xdna.txt
  • Deeploy/Targets/XDNA2/Templates/init.py
  • cmake/xdna2/mlir_aie.cmake
  • Deeploy/Targets/XDNA2/Parsers.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • DeeployTest/test_xdna2_config.py
  • Deeploy/Targets/XDNA2/Tiler.py
  • .github/workflows/ci-platform-xdna2.yml
  • Deeploy/Targets/XDNA2/Bindings.py
  • Deeploy/Targets/XDNA2/Templates/AddTemplate.py
  • DeeployTest/deeployRunner_xdna2.py
  • Deeploy/Targets/XDNA2/Deployer.py

Comment thread DeeployTest/generateNetwork_xdna2.py
Comment thread DeeployTest/generateNetwork_xdna2.py
Comment thread DeeployTest/Platforms/XDNA2/CMakeLists.txt
Comment thread README_XDNA.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Addition of new features

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants