Add event tracing and ETDumps to executor_runner#5027
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5027
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 662cb81 with merge base 7bc06d1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label 'partner: arm' |
|
@pytorchbot label ciflow/trunk |
|
Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help. |
|
Hi @GregoryComer. Would it be possible to run the CI on your side to see if the issue from the previous PR is still occurring? I'm having a hard time understanding where this comes from. |
|
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2 similar comments
|
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
- Enabled via EXECUTORCH_ENABLE_EVENT_TRACER - Add flag 'etdump_path' to specify the file path for the ETDump file - Add flag 'num_executions' for number of iterations to run - Create and pass event tracer 'ETDumpGen' - Save ETDump to disk - Update docs to reflect the changes Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com> Change-Id: I7e8e8b7f21453bb8d88fa2b9c2ef66c532f3ea46
3288eda to
b09d09e
Compare
|
Hi @dbort . Sorry for dragging you into this, but I saw your comment on |
|
I don't see a CI failure anymore |
@digantdesai To me |
|
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Yeah I see, from the main CMakeList, and qnn does have |
|
Any update on this? |
|
Hi @digantdesai, I'm still hoping for some pointer from @dbort or you as I'm struggling to reproduce it locally and can't really make sense of the error. |
|
@digantdesai will you have a look at this one since it touches code outside arm delegate. Thx! |
|
The error shows up when running this script If you have a linux machine, can you follow https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html and see if the script fails? |
|
@cccclai I finally managed to reproduce the issue by running script |
- Raise a CMake error if event tracing is enabled without the devtools - Re-factoring of the changes in the portable executor_runner - Minor fix in docs Change-Id: Ia50fef8172f678f9cbe2b33e2178780ff983f335 Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com>
benkli01
left a comment
There was a problem hiding this comment.
Thanks for the review! All issues be fixed now.
Change-Id: I0ebb22636cdd64aea24bcee51cba05496ed78b1f
Change-Id: I7d72e4d8f46ec727a60c9553851d5b71da8e91d4 Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com>
dbort
left a comment
There was a problem hiding this comment.
Thanks for refactoring executor_runner, it looks great. Remaining issue is the flatccrt dep
Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com> Change-Id: I5e7d8ef5d66bc3d5de36ea451b31fb3bdcd42d09
|
@dbort has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Thanks for updating the dependency, and again I apologize for how long this has taken me to review. I'm running internal tests now and should be able to merge this soon. |
|
Thanks @dbort ! There is a linker error remaining in the qnn tests that I could not reproduce locally and I don't understand where it is coming from. Maybe you have an idea... |
dbort
left a comment
There was a problem hiding this comment.
You're right, I see the failure
https://github.com/pytorch/executorch/actions/runs/12866353984/job/35868756676?pr=5027#step:14:34489
/usr/bin/ld: CMakeFiles/executor_runner.dir/examples/portable/executor_runner/executor_runner.cpp.o: in function `main':
executor_runner.cpp:(.text.main+0x5f9): undefined reference to `executorch::etdump::ETDumpGen::ETDumpGen(executorch::runtime::Span<unsigned char>)'
/usr/bin/ld: executor_runner.cpp:(.text.main+0x95d): undefined reference to `executorch::etdump::ETDumpGen::get_etdump_data()'
The build output says "EXECUTORCH_BUILD_DEVTOOLS : OFF", but the linker error implies that ET_EVENT_TRACER_ENABLED is defined.
It does seem like the block you added to the top-level CMakeLists.txt should have triggered a SEND_ERROR in this case.
For debugging, it would help to print EXECUTORCH_ENABLE_EVENT_TRACER in Utils.cmake.
And to reproduce, you might be able to use the cmake command from the log:
https://github.com/pytorch/executorch/actions/runs/12866353984/job/35868756676?pr=5027#step:14:33817
cmake -DCMAKE_INSTALL_PREFIX=cmake-out -DCMAKE_BUILD_TYPE=Release -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON -DEXECUTORCH_BUILD_KERNELS_CUSTOM=OFF -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON -DEXECUTORCH_BUILD_XNNPACK=OFF -DEXECUTORCH_BUILD_MPS=OFF -DEXECUTORCH_BUILD_COREML=OFF -DEXECUTORCH_BUILD_QNN=ON -DQNN_SDK_ROOT=/tmp/qnn/2.28.0.241029 -DPYTHON_EXECUTABLE=python -Bcmake-out .
cmake --build cmake-out -j9 --target install --config Release
rather than waiting for the full CI to run. Could try removing the -DQNN_SDK_ROOT=/tmp/qnn/2.28.0.241029 part since it doesn't seem like it would affect this failure.
- Remove explicit addition of `-DET_EVENT_TRACER_ENABLED` from backends/qualcomm/CMakeLists.txt as setting the definition without enabling cmake flag `EXECUTORCH_ENABLE_EVENT_TRACER` caused issues when building the executor_runner. - Replace deprecated namespace `torch::executor` with `executorch::etdump` in the executor_runner.cpp. Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com> Change-Id: Iadff38374e661f42e394dc69903548922ca08aea
|
Thanks @dbort ! I managed to find and fix the issue by following your pointers a little further. I think the remaining CI failures are not related to my change. The problem was that the QNN backend always set the definition The fix: I removed the definition in the QNN backend here so that the behavior should now be fully controlled by the CMake flags. |
| # | ||
| # add compile option | ||
| # | ||
| target_compile_options(executorch PUBLIC -DET_EVENT_TRACER_ENABLED) |
There was a problem hiding this comment.
@shewu-quic you added this line in https://github.com/pytorch/executorch/pull/2227/files#diff-0f6d37c62838592cf30db121c46cf34a9b9316df4b5a0d98f0ad7c7a98f7ff7eR261
It's currently causing problems because ET_EVENT_TRACER_ENABLED only works when EXECUTORCH_BUILD_DEVTOOLS is also enabled. Is it ok to remove this, or will other scripts/docs need to change?
There was a problem hiding this comment.
It should be automatically set by the main CMakeLists.txt as long as the CMake flag EXECUTORCH_ENABLE_EVENT_TRACER is enabled.
|
@benkli01 Thank you for tracking down the problem with the Qualcomm jobs! @shewu-quic @cccclai please let us know if the change to qualcomm/CMakeLists.txt is ok. |
|
@dbort has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
dbort
left a comment
There was a problem hiding this comment.
Since the CI jobs look good, I'm fine with merging this. Thanks so much for taking the time to figure all of this out!
Re-upload of #4502 to discuss with @GregoryComer.