NXP backend: Improve test result gathering by roman-janik-nxp · Pull Request #20024 · pytorch/executorch

roman-janik-nxp · 2026-06-04T13:06:03Z

Summary

Feature for creating improved test results for reporting when test log-level is set to DEBUG. Adds IR models to .outputs/test_dir, result tensor differences, text variant of input and result tensors, summary with test info.

Test plan

Test provided. All tests that use NSYS.

cc @robert-kalmar

pytorch-bot · 2026-06-04T13:06:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20024

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 27 Pending

As of commit 0b19317 with merge base 06143cb ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / hifi-build / hifi4 (gh)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh)
Input required and not supplied: aws-region
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t 03e4e09cc052fd95e88838294f5f1de6b8050ea736f4ad8dab5980d3b373f18e /exec failed with exit code 92
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 80f1a048134851918e63b93946096f92e3983b9844bfc40d2a0e8d08d06bbd61 /exec failed with exit code 92

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-04T13:06:14Z

❌ - login: @roman-janik-nxp / name: roman-janik-nxp. The commit (0b19317, 2f9a282) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

roman-janik-nxp · 2026-06-04T13:06:55Z

@novak-vaclav

MartinPavella · 2026-06-04T13:36:26Z

+                "tag1_neutron.et.tflite"
+            ), "Converted Neutron model not found in working directory, export in NeutronBackend failed."
+            shutil.copy("tag1_pure.et.tflite", test_dir)
+            shutil.copy("tag1_neutron.et.tflite", test_dir)


How does this work when there are more delegated partitions? The file names suggest it only works with 1 such partition.

I counted with single partition, as multiple partitions means there is a problem. But we also need to take these cases into account. As Robert pointed out the copy-solution unfortunate, so I changed it to pass test_dir to Neutron backend via CompileSpec and store all partitions directly there.

robert-kalmar · 2026-06-05T07:35:35Z

            remove_quant_io_ops=remove_quant_io_ops,
        )
+
+        # Copy converter Neutron model and Neutron IR model to test_dir


This is a quite impractical workaround for the code in executorch/backends/nxp/nxp_backend.py::NeutronBackend::preprocess() which stores the debug files into current working directory.

Now you have:

logic in lower_run_compare, which knows where to properly store the test files - derived from test ID from request

logic in NeutronBackend::preprocess storing them into current working directory, what e.g. Arm handles with passing the intermediates folder in compile_spec, what is acceptable, see https://github.com/pytorch/executorch/blob/main/backends/arm/scripts/aot_arm_compiler.py#L511

logic in this function (_run_delegated_executorch_program) which stores the pytorch/executorch visualized graphs (.json files) into the test_dir what is kind of OK as it is called from lower_run_compare but unflexible:

Unflexible because: the folder name for intermediate artefacts is created in lower_run_compare based on request fixture. The lower_run_compare is quite low level function to know about request [ https://en.wikipedia.org/wiki/Principle_of_least_privilege ] . Why not to move the test_dir name creation logic into test itself and pass it into utility functions. Such a change also opens the possibility to store intermediate artefacts also from the tests not calling the lower_run_compare , like assert_not_delegated tests.

And as Martin pointed out, you must rely on specific tag name to avoid mistakenly copying garbage.

I agree the copy-solution is unfortunate, so I changed it to pass test_dir to Neutron backend via CompileSpec and store all partitions directly there. Now the storing logic is in one place. I made the test_dir param inside our pipeline optional, so the default store dir is still cwd.
I also wouldn't move the test_name extraction outside of lower_run_compare as I consider it as the one compact API for NSYS test. Other type of tests typically don't generate such test results and the test_name can still be be extracted from request in those tests.

novak-vaclav · 2026-06-09T11:47:45Z

Provided additional information to some of my comments. Otherwise LGTM

robert-kalmar · 2026-06-15T10:41:12Z

-    )
-    def test__basic_nsys_inference_qat(self, x_input_shape, mocker):
-        x_input_spec = ModelInputSpec(x_input_shape)
+    def test__basic_nsys_inference_qat(self, mocker, request):


Return back the variants for the test.

I removed it because there is no need for them anymore. I added them and marked as xfail, because I wasn't sure if there is an error in QAT. Turns out it's not. The shapes are the same for PTQ test. The QAT behaves the same as PTQ in the context of conversion and precision. To be in line with other op tests, I intentionally kept only one shape. I consulted it with Martin and he agreed.

robert-kalmar · 2026-06-15T13:36:39Z

-                marks=pytest.mark.xfail(reason="AIR-14602: incorrect results"),
-            ),
-        ],
-    )


Keep these tests

I removed it because there is no need for them anymore. I added them and marked as xfail, because I wasn't sure if there is an error in QAT. Turns out it's not. The shapes are the same for PTQ test. The QAT behaves the same as PTQ in the context of conversion and precision. To be in line with other op tests, I intentionally kept only one shape. I consulted it with Martin and he agreed.

robert-kalmar · 2026-06-15T13:50:48Z

        """Generate compile spec for Neutron NPU

        :param config: Neutron accelerator configuration, e.g. "imxrt700"
+        :param test_dir: Test directory to store test related files.


intermediates_dir ==> In the nxp_backend you are not storing test related data but intermediates from the nxp_backend conversion flow.

The intermediate models are related to the test. They are stored to the test directory where other test related data are store as well (results, datasets, etc.) Don't see a point in different naming.

"The intermediate models are related to the test."
==> Not related just with test. Consider it from the user perspective. He uses the eIQ Neutron backend and wants to keep the intermediate results. By using this item in the conversion config he can do it. This can be even enabled in the aot_example. He is not testing anything.

"They are stored to the test directory where other test related data are store as well (results, datasets, etc.)"
==> They are stored to the specified destination, what is context related. The test case set its to the test directory for obvious reasons. The aot_example can set it to some tmp folder, and user in its own conversion pipeline can set it differently. From the perspective of the nxp_backend, there is no relation to tests.
Test use this conversion config entry to collect all the artefacts in one test specific directory.

robert-kalmar · 2026-06-15T13:53:18Z

@@ -230,14 +241,16 @@ def preprocess(  # noqa C901

            # Dump the tflite file if logging level is enabled
            if logging.root.isEnabledFor(logging.DEBUG):


As now we already introduce the test_dir or intermediates_dir, we can replace the logic to check if this *_dir is set or None, instead of limit the functionality to DEBUG logging level only.

Don't understand. The functionality is (and was before) limited to DEBUG logging level only. The logic to check is test_dir is set is because now the backend can be run without the use of lower_run_compare(), e.g. from aot_run_example.py

It was limited to DEBUG level only as there was no other means to enable/disable it. Now you introduced a specific entry in the conversion_config, so you can use that. And not limit the intermediates dump only to DEBUG log level.

roman-janik-nxp requested review from MartinPavella, StrycekSimon and jirioc June 4, 2026 13:06

roman-janik-nxp added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels Jun 4, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 4, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Jun 4, 2026

MartinPavella reviewed Jun 4, 2026

View reviewed changes

robert-kalmar reviewed Jun 5, 2026

View reviewed changes

novak-vaclav suggested changes Jun 8, 2026

View reviewed changes

roman-janik-nxp force-pushed the feature/nxg11066/EIEX-925-Improve-test-result-gathering branch from ceed080 to 9b79072 Compare June 9, 2026 11:30

roman-janik-nxp force-pushed the feature/nxg11066/EIEX-925-Improve-test-result-gathering branch 3 times, most recently from c4223bb to 1f7d775 Compare June 11, 2026 11:17

roman-janik-nxp marked this pull request as ready for review June 11, 2026 14:38

robert-kalmar reviewed Jun 15, 2026

View reviewed changes

roman-janik-nxp force-pushed the feature/nxg11066/EIEX-925-Improve-test-result-gathering branch from 1f7d775 to c8e76a9 Compare June 15, 2026 15:34

roman-janik-nxp had a problem deploying to cadence June 15, 2026 15:34 — with GitHub Actions Failure

roman-janik-nxp added 2 commits June 15, 2026 18:01

NXP backend: Improve test result gathering

2f9a282

NXP backend: Add pattern for NSYS tool generated file

0b19317

roman-janik-nxp force-pushed the feature/nxg11066/EIEX-925-Improve-test-result-gathering branch from c8e76a9 to 0b19317 Compare June 15, 2026 16:02

roman-janik-nxp had a problem deploying to cadence June 15, 2026 16:02 — with GitHub Actions Failure

		@@ -230,14 +241,16 @@ def preprocess( # noqa C901

		# Dump the tflite file if logging level is enabled
		if logging.root.isEnabledFor(logging.DEBUG):

Conversation

roman-janik-nxp commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20024

❌ 4 New Failures, 27 Pending

Uh oh!

linux-foundation-easycla Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roman-janik-nxp commented Jun 4, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

novak-vaclav commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roman-janik-nxp commented Jun 4, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 4, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 4, 2026 •

edited

Loading