Skip to content

enhance hstu export#443

Merged
tiankongdeguiji merged 16 commits intoalibaba:masterfrom
eric-gecheng:features/hstu_export_enhance
Apr 9, 2026
Merged

enhance hstu export#443
tiankongdeguiji merged 16 commits intoalibaba:masterfrom
eric-gecheng:features/hstu_export_enhance

Conversation

@eric-gecheng
Copy link
Copy Markdown
Collaborator

  1. When exporting HSTU model, user needs to add a command line argument to specify the target item_id column name, tzrec will save the column name into model_acc.json. Online processor will read it.
  2. Tzrec will detect the HSTU model kernel, pytorch or triton, and write a field into model_acc.json.
  3. Since PAI-EAS container will detect and extract any zipped file in the model directory, and .pt2 file is actually a zipped format, we have to put it into a sub-folder "aoti" so as to avoid being unzipped.
  4. For triton kernel, on-line inference processor needs to know what are the output key names, so we need to save them during export. A file called output_field_names.json is created under "aoti" folder.

@tiankongdeguiji tiankongdeguiji added the claude-review Let Claude Review label Mar 27, 2026
@github-actions github-actions bot removed the claude-review Let Claude Review label Mar 27, 2026
@eric-gecheng eric-gecheng added the claude-review Let Claude Review label Mar 27, 2026
@github-actions github-actions bot removed the claude-review Let Claude Review label Mar 27, 2026
@github-actions
Copy link
Copy Markdown

Code Review Summary

The changes are well-structured and the intent is clear — adding HSTU-specific export config, reorganizing AOTI output into a subfolder, and persisting output field names. A few items to address:

Issues

  1. Bug: typo in docsexperiments/dltm_hstu/export should be experiments/dlrm_hstu/export in the export example (dlrm_hstu.md:322).

  2. Missing torch.no_grad() on extra forward passaot_utils.py:90 runs dense_model(sparse_output) just to capture output keys, building autograd graphs needlessly. Consider wrapping in torch.no_grad() or extracting keys from the ExportedProgram to avoid the redundant pass entirely.

  3. hstu_kernel silently swallowed in RTP pathexport_model passes both hstu_item_id and hstu_kernel to impl(), but export_rtp_model absorbs hstu_kernel via **kwargs without using it. If the RTP path doesn't need it, add a comment; if it does, this is a bug.

Test Coverage Gaps

  1. The core new feature has no test coverage — no test passes --hstu_item_id, so the code path writing hstu_item_id/hstu_kernel into model_acc.json is never exercised. Similarly, output_field_names.json creation is never verified. Consider:
    • Adding hstu_item_id to the existing HSTU export test
    • Adding a unit test for export_acc_config() with the new parameters
    • Asserting output_field_names.json exists in AOTI export tests

Minor

  1. Kernel.Name() called for all model types — returns misleading default (TRITON) for non-HSTU models. Harmless but confusing; guard with if hstu_item_id.

  2. Docstring/implementation casing mismatch — docstring says default is "PYTORCH" but the code lowercases to "pytorch". Clarify the expected contract for downstream consumers.


🤖 Generated with Claude Code

@tiankongdeguiji tiankongdeguiji added the claude-review Let Claude Review label Mar 31, 2026
@github-actions github-actions bot removed the claude-review Let Claude Review label Mar 31, 2026
@github-actions
Copy link
Copy Markdown

Code Review Summary

Overall this PR is well-structured — the parameter threading through the normal export path is clean, the validation guard for hstu_item_id is a good UX improvement, and the tests verify the new model_acc.json fields properly. A few items worth addressing:

Bug

  • export_rtp_model silently discards hstu_item_id and hstu_kernel: The function accepts these parameters but never uses them — no model_acc.json is written in the RTP path. test_dlrm_hstu_rtp_train_export asserts these fields exist in model_acc.json, which will likely fail. See inline comment for details.

Test Coverage Gaps

  • ValueError for missing hstu_item_id is untested: The validation guard at export_util.py:93-101 is a user-facing error that deserves a unit test.
  • test_hstu_with_fg_train_eval_export is unconditionally skipped (@unittest.skip): The new model_acc.json assertions for match model towers (lines 461-474) are dead code. Consider @unittest.skipIf with a condition (e.g., GPU availability) instead.
  • output_field_names.json only checked for existence, not content. See inline comment.

Minor

  • hstu_kernel is computed for all model types, not just HSTU. See inline comment with suggestion.
  • The extra forward pass in aot_utils.py to capture output keys could be avoided by passing keys from the caller which already has them. See inline comment.

🤖 Generated with Claude Code

@eric-gecheng eric-gecheng added the claude-review Let Claude Review label Apr 7, 2026
@github-actions github-actions bot removed the claude-review Let Claude Review label Apr 7, 2026
cmd_str += f"--asset_files {asset_files}"
cmd_str += f"--asset_files {asset_files} "
if hstu_item_id:
cmd_str += f"--hstu_item_id {hstu_item_id}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell injection surface: hstu_item_id is interpolated into a command string that is later executed with shell=True (via misc_util.run_cmd). While current callers use hardcoded literals, this extends a pre-existing injection-prone pattern. Consider using shlex.quote():

Suggested change
cmd_str += f"--hstu_item_id {hstu_item_id}"
cmd_str += f"--hstu_item_id {shlex.quote(hstu_item_id)}"

(Same concern applies to asset_files and other interpolated values in this function.)

Comment on lines +460 to +473
# Verify model_acc.json contains HSTU kernel for user tower
user_acc_path = os.path.join(self.test_dir, "export/user/model_acc.json")
self.assertTrue(os.path.exists(user_acc_path))
with open(user_acc_path) as f:
acc_cfg = json.load(f)
self.assertNotIn("hstu_item_id", acc_cfg)
self.assertEqual(acc_cfg["hstu_kernel"], "pytorch")
# Verify model_acc.json contains HSTU kernel for item tower
item_acc_path = os.path.join(self.test_dir, "export/item/model_acc.json")
self.assertTrue(os.path.exists(item_acc_path))
with open(item_acc_path) as f:
acc_cfg = json.load(f)
self.assertNotIn("hstu_item_id", acc_cfg)
self.assertEqual(acc_cfg["hstu_kernel"], "pytorch")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead test code: The parent test test_hstu_with_fg_train_eval_export is decorated with @unittest.skip("skip hstu match test"), so these new assertions verifying hstu_kernel and hstu_item_id absence in model_acc.json will never execute. The PR appears to add match-side validation but it's illusory coverage.

Is there a plan to re-enable this test? If so, a tracking comment or issue would help.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Code Review Summary

PR looks good overall — the HSTU export enhancements are well-structured and the rank integration tests provide solid verification of the new model_acc.json fields and output_field_names.json.

Key Findings

Performance

  • Redundant forward pass in aot_utils.py:90-91: An extra dense_model(sparse_output) call runs the full dense model on GPU just to capture output key names. The caller already has these keys from its own forward pass — pass them in as a parameter instead.

Code Quality

  • Unreachable fallback in utils.py:226: The else "pytorch" branch in the hstu_kernel ternary can never trigger given the caller's usage of Kernel.Name(...). Simplify to hstu_kernel.lower().

Test Coverage

  • Dead test assertions in match_integration_test.py:460-473: The parent test is @unittest.skip'd, so the new model_acc.json assertions for the match model path never run.
  • Missing negative test: No test verifies the ValueError raised when hstu_item_id is omitted for dlrm_hstu models. This is a simple unit test worth adding.

Security

  • Shell injection surface in tests/utils.py:1109: hstu_item_id is f-string interpolated into a shell=True command. Current callers use hardcoded strings, but shlex.quote() would be a low-effort hardening.

See inline comments for details and suggestions.


🤖 Generated with Claude Code

tiankongdeguiji
tiankongdeguiji previously approved these changes Apr 8, 2026
Two bugs caused test_rank_dlrm_hstu_cutlass_train_eval_export to fail:

1. tzrec/tests/rank_integration_test.py: the test called utils.test_export
   without hstu_item_id, but the DlrmHSTU export path now requires it
   (ValueError raised in export_util.py). Pass hstu_item_id
   ="cand_seq__video_id" mirroring the sibling triton test.

2. tzrec/acc/aot_utils.py: the new dry-run that captures output field
   names called dense_model(sparse_output) in raw fp32, before the
   DenseAutocastWrapper was applied. The CUTLASS HSTU attention kernel
   rejects fp32 inputs, which made the dry-run crash for any
   mixed_precision=BF16/FP16 + CUTLASS configuration. Reorder so
   dense_to_export (autocast-wrapped when mixed_precision is set) is
   constructed first, then dry-run through it to collect output keys.

Verified locally: both test_rank_dlrm_hstu_cutlass_train_eval_export
and test_rank_dlrm_hstu_train_eval_export pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tiankongdeguiji tiankongdeguiji merged commit 9d4a142 into alibaba:master Apr 9, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants