Conversation
eric-gecheng
commented
Mar 26, 2026
- When exporting HSTU model, user needs to add a command line argument to specify the target item_id column name, tzrec will save the column name into model_acc.json. Online processor will read it.
- Tzrec will detect the HSTU model kernel, pytorch or triton, and write a field into model_acc.json.
- Since PAI-EAS container will detect and extract any zipped file in the model directory, and .pt2 file is actually a zipped format, we have to put it into a sub-folder "aoti" so as to avoid being unzipped.
- For triton kernel, on-line inference processor needs to know what are the output key names, so we need to save them during export. A file called output_field_names.json is created under "aoti" folder.
Code Review SummaryThe changes are well-structured and the intent is clear — adding HSTU-specific export config, reorganizing AOTI output into a subfolder, and persisting output field names. A few items to address: Issues
Test Coverage Gaps
Minor
🤖 Generated with Claude Code |
Code Review SummaryOverall this PR is well-structured — the parameter threading through the normal export path is clean, the validation guard for Bug
Test Coverage Gaps
Minor
🤖 Generated with Claude Code |
| cmd_str += f"--asset_files {asset_files}" | ||
| cmd_str += f"--asset_files {asset_files} " | ||
| if hstu_item_id: | ||
| cmd_str += f"--hstu_item_id {hstu_item_id}" |
There was a problem hiding this comment.
Shell injection surface: hstu_item_id is interpolated into a command string that is later executed with shell=True (via misc_util.run_cmd). While current callers use hardcoded literals, this extends a pre-existing injection-prone pattern. Consider using shlex.quote():
| cmd_str += f"--hstu_item_id {hstu_item_id}" | |
| cmd_str += f"--hstu_item_id {shlex.quote(hstu_item_id)}" |
(Same concern applies to asset_files and other interpolated values in this function.)
| # Verify model_acc.json contains HSTU kernel for user tower | ||
| user_acc_path = os.path.join(self.test_dir, "export/user/model_acc.json") | ||
| self.assertTrue(os.path.exists(user_acc_path)) | ||
| with open(user_acc_path) as f: | ||
| acc_cfg = json.load(f) | ||
| self.assertNotIn("hstu_item_id", acc_cfg) | ||
| self.assertEqual(acc_cfg["hstu_kernel"], "pytorch") | ||
| # Verify model_acc.json contains HSTU kernel for item tower | ||
| item_acc_path = os.path.join(self.test_dir, "export/item/model_acc.json") | ||
| self.assertTrue(os.path.exists(item_acc_path)) | ||
| with open(item_acc_path) as f: | ||
| acc_cfg = json.load(f) | ||
| self.assertNotIn("hstu_item_id", acc_cfg) | ||
| self.assertEqual(acc_cfg["hstu_kernel"], "pytorch") |
There was a problem hiding this comment.
Dead test code: The parent test test_hstu_with_fg_train_eval_export is decorated with @unittest.skip("skip hstu match test"), so these new assertions verifying hstu_kernel and hstu_item_id absence in model_acc.json will never execute. The PR appears to add match-side validation but it's illusory coverage.
Is there a plan to re-enable this test? If so, a tracking comment or issue would help.
Code Review SummaryPR looks good overall — the HSTU export enhancements are well-structured and the rank integration tests provide solid verification of the new Key FindingsPerformance
Code Quality
Test Coverage
Security
See inline comments for details and suggestions. 🤖 Generated with Claude Code |
Two bugs caused test_rank_dlrm_hstu_cutlass_train_eval_export to fail: 1. tzrec/tests/rank_integration_test.py: the test called utils.test_export without hstu_item_id, but the DlrmHSTU export path now requires it (ValueError raised in export_util.py). Pass hstu_item_id ="cand_seq__video_id" mirroring the sibling triton test. 2. tzrec/acc/aot_utils.py: the new dry-run that captures output field names called dense_model(sparse_output) in raw fp32, before the DenseAutocastWrapper was applied. The CUTLASS HSTU attention kernel rejects fp32 inputs, which made the dry-run crash for any mixed_precision=BF16/FP16 + CUTLASS configuration. Reorder so dense_to_export (autocast-wrapped when mixed_precision is set) is constructed first, then dry-run through it to collect output keys. Verified locally: both test_rank_dlrm_hstu_cutlass_train_eval_export and test_rank_dlrm_hstu_train_eval_export pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>