[WIP] Fix: clean up "version" fields in L2 swimlane / dep_gen JSON#862
[WIP] Fix: clean up "version" fields in L2 swimlane / dep_gen JSON#862indigo1973 wants to merge 1 commit into
Conversation
deps.json and l2_perf_records.json both carried a "version" field that consumers were getting wrong: - deps.json bumped v2 → v3 in hw-native-sys#808 but swimlane_converter still guarded on `version != 2`, silently rejected every fresh capture, and fell back to L2PerfRecord::fanout[] — losing the race-window edges dep_gen replay exists to recover. - l2_perf_records.json's "version" was never a schema version — the producer writes L2PerfLevel (1..4) there. Misreading it caused swimlane_converter._print_verbose_data_info and sched_overhead_analysis to short-circuit on `version != 2` / `< 2`, while phase blocks only exist at level >= 3. deps.json — drop the "version" field Producer no longer emits it; deps_to_graph drops its version guard (same release as the producer; KeyError is a clearer failure than a synthetic guard); test_dep_gen + test_dep_gen_chain drop the version assertion; dep_gen_replay.{cpp,h}, docs/dfx/dep_gen.md, and tools/README.md drop the v2/v3 schema labels. dep_gen.md §4 example JSON, fields table, and §5 arg-row description are rewritten against the current strided-Tensor producer (buffer_numel replaces raw_shapes; start_offset + strides[] replace multi-dim offset[]). l2_perf_records.json — rename "version" → "l2_perf_level" Producer (a2a3 + a5) writes the new name; swimlane_converter. read_perf_data + verbose print + _swimlane_validate.py follow. Both misaligned short-circuits removed: load_deps_json's guard outright (only reads edges[].pred / .succ, stable across every schema), _print_verbose_data_info's `version != 2`, and parse_scheduler_from_json_phases's `version < 2` (the `if not phases_by_thread` check below was already correct). Doc / comment fallout — keep code, comments, and docs in sync per .claude/rules/doc-consistency.md: - swimlane_converter / sched_overhead_analysis / profiling_levels.md (a2a3 + a5): "(version 2)" / "v2 JSON" wording → explicit l2_perf_level >= N. - 6 scheduler comments (a2a3 + a5: scheduler_dispatch.cpp, scheduler_cold_path.cpp, scheduler_types.h) describing "v2 JSON dispatch records" now point at dispatch-phase records in aicpu_scheduler_phases[]. - tools/README.md Input File Format example is now a real two-task slice; the stale stub had wrong field names and missed ring_id / dispatch_time_us / finish_time_us. Troubleshooting "Unsupported version" entry renamed to "Unsupported l2_perf_level". - docs/dfx/l2-swimlane-profiling.md: level table, per-task table, phase-record table, and prose all corrected — start_time → _us, core_type 0/1 → "aic"/"aiv", phase_id SCHED_* → lowercase phase strings, plus ring_id / pop_hit / loop_iter vs submit_idx / core_to_thread[] additions.
There was a problem hiding this comment.
Code Review
This pull request updates the profiling and dependency graph schemas, replacing the legacy "version" fields in deps.json and l2_perf_records.json with a descriptive l2_perf_level and a strided-tensor representation. The changes update the documentation, downstream analysis tools, tests, and C++ collectors/replay logic to use buffer_numel, start_offset, and strides instead of raw shapes and offsets. One issue was identified in dep_gen_replay.cpp where strides are implemented and serialized as unsigned integers (uint32_t) despite being documented as signed int32 arrays, which could cause serialization issues for negative strides.
| // Strided tensor representation. tensors[].buffer_numel is the underlying | ||
| // storage element count; tasks[].args[] and edges[] carry per-slice | ||
| // geometry as (start_offset uint64, strides[] int32). |
There was a problem hiding this comment.
The comment states that strides[] is an int32 array, and the documentation in dep_gen.md also specifies int32 array for strides. However, the implementation in TaskArgEntry and EdgeAnnot defines strides as uint32_t arrays, and they are serialized using write_uint_array. If any negative strides are present (e.g., for reversed dimensions), they will be serialized as large unsigned integers (e.g., 4294967295), causing incorrect behavior in downstream consumers.
Consider changing the strides fields in TaskArgEntry and EdgeAnnot to int32_t and implementing/using a signed write_int_array helper to serialize them correctly.
deps.json and l2_perf_records.json both carried a "version" field that consumers were getting wrong:
deps.json bumped v2 → v3 in Refactor: a2a3 + a5 Tensor to strided (stride + start_offset) model #808 but swimlane_converter still guarded on
version != 2, silently rejected every fresh capture, and fell back to L2PerfRecord::fanout[] — losing the race-window edges dep_gen replay exists to recover.l2_perf_records.json's "version" was never a schema version — the producer writes L2PerfLevel (1..4) there. Misreading it caused swimlane_converter._print_verbose_data_info and sched_overhead_analysis to short-circuit on
version != 2/< 2, while phase blocks only exist at level >= 3.deps.json — drop the "version" field
Producer no longer emits it; deps_to_graph drops its version guard (same release as the producer; KeyError is a clearer failure than a synthetic guard); test_dep_gen + test_dep_gen_chain drop the version assertion; dep_gen_replay.{cpp,h}, docs/dfx/dep_gen.md, and tools/README.md drop the v2/v3 schema labels. dep_gen.md §4 example JSON, fields table, and §5 arg-row description are rewritten against the current strided-Tensor producer (buffer_numel replaces raw_shapes; start_offset + strides[] replace multi-dim offset[]).
l2_perf_records.json — rename "version" → "l2_perf_level"
Producer (a2a3 + a5) writes the new name; swimlane_converter.
read_perf_data + verbose print + _swimlane_validate.py follow.
Both misaligned short-circuits removed: load_deps_json's guard
outright (only reads edges[].pred / .succ, stable across every
schema), _print_verbose_data_info's
version != 2, andparse_scheduler_from_json_phases's
version < 2(theif not phases_by_threadcheck below was already correct).Doc / comment fallout — keep code, comments, and docs in sync per .claude/rules/doc-consistency.md: