diff --git a/PI05_EVO_RL_ACP_EXECUTION_PLAN.md b/PI05_EVO_RL_ACP_EXECUTION_PLAN.md
new file mode 100644
index 0000000..afb500d
--- /dev/null
+++ b/PI05_EVO_RL_ACP_EXECUTION_PLAN.md
@@ -0,0 +1,275 @@
+# PI05 Evo-RL ACP Execution Plan
+
+## Goal
+
+Reference `AMD_Hackathon` ROCm setup guidance to:
+
+1. validate a ROCm-backed LeRobot runtime,
+2. attempt `pi05` inference on this machine,
+3. run the `pi05` Evo-RL ACP workflow end-to-end.
+
+## Current Machine Status
+
+- ROCm runtime detected via `rocminfo`
+- GPU detected: `AMD Instinct MI300X VF`
+- Python: `3.12.3`
+- Torch: `2.6.0+rocm7.0.2`
+- LeRobot: `0.4.4`
+- `lerobot-info`, `lerobot-train --help`, `lerobot-eval --help` all work
+
+Validation commands already executed:
+
+```bash
+python3 - <<'PY'
+import torch
+print(torch.__version__)
+print(torch.cuda.is_available())
+print(torch.cuda.get_device_name(0))
+PY
+
+cd /root/phi-media-lab/Evo-RL-Phi
+python3 -m lerobot.scripts.lerobot_info
+python3 -m lerobot.scripts.lerobot_train --help
+python3 -m lerobot.scripts.lerobot_eval --help
+```
+
+## Code Changes Applied
+
+These local changes were made in `Evo-RL-Phi` to unblock validation:
+
+1. `src/lerobot/policies/pi05/modeling_pi05.py`
+   Relaxed the brittle `transformers.models.siglip.check` import requirement. The check now runs only when that helper exists.
+2. `src/lerobot/policies/pi05/configuration_pi05.py`
+   Added `tokenizer_name` as a config field.
+3. `src/lerobot/policies/pi05/processor_pi05.py`
+   Switched tokenizer loading from a hard-coded value to `config.tokenizer_name`.
+4. `src/lerobot/policies/pi05/modeling_pi05.py`
+   Added a tied-weight fallback so `lerobot/pi05_base` can load when the checkpoint only exposes `lm_head.weight`.
+5. `src/lerobot/configs/value.py`
+   Added `dataset.video_backend` so `lerobot-value-infer` can avoid broken `torchcodec` default selection on this ROCm environment.
+6. `src/lerobot/scripts/lerobot_value_infer.py`
+   Passed `dataset.video_backend` through to `LeRobotDataset`.
+
+Editable install refreshed:
+
+```bash
+cd /root/phi-media-lab/Evo-RL-Phi
+python3 -m pip install -e .
+```
+
+## Validation Results
+
+### Passed
+
+- ROCm tensor allocation and compute
+- LeRobot CLI runtime
+- ACP prompt injection path into `pi05`
+- OpenPI-patched `transformers` environment for `pi05`
+- `pi05` GPU smoke test on ROCm with mocked tokenizer
+- real `pi05` pretrained inference with `lerobot/pi05_base`
+- `pistar06` value training smoke run on `maxbeau/XLeRobot`
+- `lerobot-value-infer` writing ACP labels back to the local dataset cache
+- `pi05` ACP policy-training smoke run on the labeled dataset
+
+Command:
+
+```bash
+cd /root/phi-media-lab/Evo-RL-Phi
+pytest -q tests/training/test_acp_pi05_prompt_pipeline.py -q
+```
+
+Result: passed on `DEVICE='cuda'`
+
+### OpenPI-Patched Environment Created
+
+Isolated environment:
+
+```bash
+/root/phi-media-lab/.venvs/pi05-openpi-ssp
+```
+
+OpenPI source:
+
+```bash
+/root/phi-media-lab/openpi
+```
+
+Environment preparation that was executed:
+
+```bash
+python3 -m venv --system-site-packages /root/phi-media-lab/.venvs/pi05-openpi-ssp
+git clone --depth=1 --recurse-submodules https://github.com/Physical-Intelligence/openpi.git /root/phi-media-lab/openpi
+/root/phi-media-lab/.venvs/pi05-openpi-ssp/bin/python -m pip install 'transformers==4.53.2'
+/root/phi-media-lab/.venvs/pi05-openpi-ssp/bin/python -m pip install -e /root/phi-media-lab/Evo-RL-Phi --no-deps
+cp -r /root/phi-media-lab/openpi/src/openpi/models_pytorch/transformers_replace/* \
+  /root/phi-media-lab/.venvs/pi05-openpi-ssp/lib/python3.12/site-packages/transformers/
+echo '/opt/venv/lib/python3.12/site-packages' > \
+  /root/phi-media-lab/.venvs/pi05-openpi-ssp/lib/python3.12/site-packages/_opt_venv.pth
+```
+
+Patch verification:
+
+```text
+transformers 4.53.2
+GemmaRMSNorm.forward(self, x, cond=None)
+siglip_check_result True
+```
+
+### `pi05` ROCm Smoke Test
+
+In the patched venv, a minimal GPU forward + `select_action` test passed when tokenizer access was mocked:
+
+```text
+smoke_ok
+loss 2.579069137573242
+loss_keys ['loss', 'loss_per_dim']
+action_shape (1, 7)
+action_device cpu
+```
+
+This confirms:
+
+- ROCm backend is usable for `pi05`
+- OpenPI transformer patches are sufficient for model execution
+- the remaining blocker is no longer model-side
+
+### Real `pi05` Inference Result
+
+After refreshing Hugging Face auth and using the OpenPI-patched venv, real tokenizer/model access worked.
+
+Successful path:
+
+```text
+google/paligemma-3b-pt-224
+lerobot/pi05_base
+```
+
+Observed result:
+
+```text
+real_infer_ok
+action_shape (1, 32)
+action_mean 0.02885555289685726
+action_std 0.09869649261236191
+```
+
+Important runtime note:
+
+- the saved `policy_preprocessor.json` in `lerobot/pi05_base` targets `cpu`, so the processed batch had to be moved to `cuda` before `select_action`
+
+### ACP Workflow Smoke Result
+
+Dataset used:
+
+```text
+maxbeau/XLeRobot
+```
+
+Dataset compatibility findings:
+
+- `episode_success` is absent, so value training/inference used `--dataset.default_success=failure`
+- default `torchcodec` decoding is not usable in this ROCm environment for DataLoader CPU-side video decode
+- `pyav` works and must be forced for both training and value inference
+- the dataset lacks state quantiles, so `STATE` normalization had to be switched from `QUANTILES` to `MEAN_STD`
+
+Successful smoke sequence:
+
+```bash
+lerobot-value-train \
+  --dataset.repo_id=maxbeau/XLeRobot \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=1 \
+  --save_checkpoint=true \
+  --save_freq=1 \
+  --wandb.enable=false \
+  --output_dir=/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_ckpt_meanstd
+
+lerobot-value-infer \
+  --dataset.repo_id=maxbeau/XLeRobot \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path=/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_ckpt_meanstd \
+  --runtime.device=cuda \
+  --runtime.batch_size=1 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=5 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field=complementary_info.value_smoke \
+  --acp.advantage_field=complementary_info.advantage_smoke \
+  --acp.indicator_field=complementary_info.acp_indicator_smoke
+
+lerobot-train \
+  --dataset.repo_id=maxbeau/XLeRobot \
+  --dataset.root=/root/.cache/huggingface/lerobot/maxbeau/XLeRobot \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=1 \
+  --save_checkpoint=false \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field=complementary_info.acp_indicator_smoke \
+  --acp.indicator_dropout_prob=0.3
+```
+
+Key observed outputs:
+
+```text
+ACP stats | n_step=5 positive_ratio_target=0.3000 positive_ratio_observed=0.3001
+Wrote value annotations to dataset root: /root/.cache/huggingface/lerobot/maxbeau/XLeRobot
+ACP indicator stats (hf_dataset_scan): field='complementary_info.acp_indicator_smoke' ratio=0.300135 positive=446 total=1486
+First policy prompt (task):
+Task: pick up the block Advantage: negative, State: ...
+End of training
+```
+
+## Smoke-Test Summary
+
+What has been proven:
+
+- ROCm backend is healthy
+- `pi05` processor + ACP prompt path is healthy
+- local `pi05` code can instantiate further than before after removing the brittle SigLIP helper check
+- OpenPI-patched `transformers` makes real `pi05` model execution possible on ROCm
+- real `pi05` forward + `select_action` succeed with `lerobot/pi05_base`
+- `pistar06 -> value_infer -> ACP label writeback -> pi05 ACP train` has been smoke-tested end to end
+
+What is not yet proven:
+
+- stable multi-step or full-length `pi05` training on this dataset
+- production-quality ACP behavior with meaningful value targets, because the smoke run treated all episodes as `failure`
+- whether `maxbeau/XLeRobot` should be augmented with quantile stats instead of using `MEAN_STD`
+
+## Required Next Actions
+
+### Next Practical Steps
+
+1. Decide whether to keep using local cache path `/root/.cache/huggingface/lerobot/maxbeau/XLeRobot` for ACP-labeled experiments or push a cleaned dataset variant to a new HF repo.
+2. Either augment dataset quantile stats or keep explicit `MEAN_STD` overrides for `STATE`/`ACTION`.
+3. Increase `steps`, `batch_size`, and checkpointing from the smoke setup once you want a real training run.
+4. If value targets should reflect true success, add `episode_success` metadata instead of relying on `default_success=failure`.
+
+## Large Download Note
+
+Public `lerobot/pi05_base` exists and the `model.safetensors` size is approximately:
+
+```text
+14,467,165,872 bytes
+```
+
+Do not download it until the tokenizer and patched-transformers issues are solved, otherwise the model still will not run.
diff --git a/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md b/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md
new file mode 100644
index 0000000..4186aa7
--- /dev/null
+++ b/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md
@@ -0,0 +1,430 @@
+# PI05 Evo-RL ACP 实验报告
+
+## 1. 实验目标
+
+本实验的目标是基于 `~/phi-media-lab/AMD_Hackathon` 中的 AMD ROCm 配置思路，在当前机器上完成以下验证：
+
+1. 配置并验证 ROCm 后端的 LeRobot / Evo-RL 运行环境
+2. 跑通 `pi05` 的真实推理
+3. 跑通 `pistar06 -> value-infer -> ACP indicator -> pi05 policy train` 的 Evo-RL ACP workflow
+4. 将实验从 smoke test 逐步扩大到可持续训练规模，并验证 checkpoint 稳定性
+
+## 2. 实验环境
+
+### 2.1 硬件与基础环境
+
+- GPU: `AMD Instinct MI300X VF`
+- ROCm: 可用
+- Python: `3.12.3`
+- Torch: `2.6.0+rocm7.0.2`
+- LeRobot: `0.4.4`
+
+### 2.2 代码库
+
+- 文档仓: `/root/phi-media-lab/AMD_Hackathon`
+- 主代码仓: `/root/phi-media-lab/Evo-RL-Phi`
+- OpenPI 源码: `/root/phi-media-lab/openpi`
+
+### 2.3 使用的 Python 环境
+
+正式实验使用的隔离环境：
+
+```text
+/root/phi-media-lab/.venvs/pi05-openpi-ssp
+```
+
+该环境中额外完成了：
+
+- `transformers==4.53.2`
+- OpenPI `transformers_replace` 补丁覆盖
+- editable install 的 `Evo-RL-Phi`
+
+## 3. 关键兼容性问题与解决方案
+
+### 3.1 `pi05` 对 OpenPI patched transformers 的依赖
+
+问题：
+
+- 原环境中的 `transformers 4.57.6` 无法满足 OpenPI 版 `pi05`
+- `GemmaRMSNorm.forward(cond=...)` 不存在
+- `transformers.models.siglip.check` 兼容性脆弱
+
+解决：
+
+- 单独构建 patched venv
+- 安装 `transformers==4.53.2`
+- 覆盖 OpenPI 的 `transformers_replace`
+
+结果：
+
+- `pi05` 的 forward 和 `select_action` 在 ROCm 上可运行
+
+### 3.2 gated tokenizer 访问
+
+问题：
+
+- `google/paligemma-3b-pt-224` 初始无权限
+
+解决：
+
+- 更新 Hugging Face 账号权限
+- 本地重新 `hf auth login`
+
+结果：
+
+- 真实 tokenizer 加载成功
+- `lerobot/pi05_base` 真实推理成功
+
+### 3.3 `torchcodec` 与 ROCm 数据解码不兼容
+
+问题：
+
+- DataLoader 侧视频解码默认走 `torchcodec`
+- 当前环境中 CPU worker 路径不可用，导致视频帧解码失败
+
+解决：
+
+- 全部数据相关命令显式指定：
+
+```bash
+--dataset.video_backend=pyav
+```
+
+结果：
+
+- `LeRobotDataset` 可稳定读取图像帧
+- `value-train` / `value-infer` / `policy train` 全部可运行
+
+### 3.4 数据集缺少 quantile stats
+
+问题：
+
+- `maxbeau/XLeRobot` 的 `observation.state` 只有 `min/max/mean/std/count`
+- 缺失 `q01/q99`
+- 默认 `QUANTILES` 归一化会报错
+
+解决：
+
+- 在 value / policy 训练中统一覆盖：
+
+```bash
+--normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}'
+```
+
+结果：
+
+- 成功绕过 quantile 依赖
+- 当前所有实验阶段都采用该方案
+
+### 3.5 数据集缺少 `episode_success`
+
+问题：
+
+- `maxbeau/XLeRobot` 的 episode metadata 中不存在 `episode_success`
+
+解决：
+
+- value 相关流程统一使用：
+
+```bash
+--dataset.default_success=failure
+```
+
+结果：
+
+- 可完成 value target 构造
+- 但当前 value supervision 语义仍然较弱，属于工程可跑通而非最优标签质量
+
+## 4. 本地代码修改
+
+本实验为跑通整条链路，对 `Evo-RL-Phi` 做了最小兼容修改。
+
+### 4.1 `pi05` 相关
+
+[configuration_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/configuration_pi05.py)
+
+- 新增 `tokenizer_name`
+
+[processor_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/processor_pi05.py)
+
+- 从硬编码 tokenizer 切换到 `config.tokenizer_name`
+
+[modeling_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/modeling_pi05.py)
+
+- 放宽 `siglip.check` 的依赖
+- 增加 tied-weight fallback，兼容 `lerobot/pi05_base` 权重结构
+
+### 4.2 value inference 相关
+
+[value.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/configs/value.py)
+
+- 为 `ValueInferenceDatasetConfig` 增加 `video_backend`
+
+[lerobot_value_infer.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/scripts/lerobot_value_infer.py)
+
+- 将 `dataset.video_backend` 传给 `LeRobotDataset`
+
+## 5. 数据集与模型
+
+### 5.1 数据集
+
+实验数据集：
+
+```text
+maxbeau/XLeRobot
+```
+
+本地缓存根路径：
+
+```text
+/root/.cache/huggingface/lerobot/maxbeau/XLeRobot
+```
+
+已确认特性：
+
+- 总帧数：`1486`
+- episode 数：`5`
+- task：`pick up the block`
+- 包含视频键：
+  - `observation.images.front_cam`
+  - `observation.images.hand_cam`
+- 不包含：
+  - `episode_success`
+  - state quantile stats
+
+### 5.2 模型
+
+value model:
+
+```text
+pistar06
+```
+
+policy model:
+
+```text
+lerobot/pi05_base
+```
+
+## 6. 实验阶段与结果
+
+### 6.1 Smoke 阶段
+
+目标：
+
+- 验证真实 `pi05` 推理
+- 验证 1-step value train / value infer / policy train
+
+结果：
+
+- `pi05` 真实推理成功
+- `pistar06` 1-step 训练成功
+- `value-infer` 成功写回：
+  - `complementary_info.value_smoke`
+  - `complementary_info.advantage_smoke`
+  - `complementary_info.acp_indicator_smoke`
+- `pi05` 1-step ACP 训练成功
+
+### 6.2 Pilot 阶段
+
+脚本：
+
+[run_pi05_acp_pilot.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh)
+
+配置：
+
+- value train: `5` steps
+- value infer: 全量
+- policy train: `5` steps
+
+结果：
+
+- 全链路成功
+- value checkpoint 成功
+- policy checkpoint 成功
+
+输出目录：
+
+- [value pilot](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_pilot)
+- [policy pilot](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_pilot)
+
+### 6.3 Stage1
+
+脚本：
+
+[run_pi05_acp_stage1.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage1.sh)
+
+配置：
+
+- value train: `20` steps
+- value infer: 全量
+- policy train: `20` steps
+
+结果：
+
+- value 训练成功，checkpoint 于 `10/20`
+- value infer 成功写回 `*_stage1`
+- policy 训练成功，checkpoint 于 `10/20`
+
+输出目录：
+
+- [value stage1](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage1)
+- [policy stage1](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage1)
+
+### 6.4 Stage2
+
+脚本：
+
+[run_pi05_acp_stage2.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage2.sh)
+
+配置：
+
+- value train: `100` steps
+- value infer: 全量
+- policy train: `100` steps
+
+结果：
+
+- value 训练成功，checkpoint 于 `50/100`
+- value infer 成功写回 `*_stage2`
+- policy 训练成功，checkpoint 于 `50/100`
+
+输出目录：
+
+- [value stage2](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage2)
+- [policy stage2](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage2)
+
+### 6.5 Stage3
+
+脚本：
+
+[run_pi05_acp_stage3.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage3.sh)
+
+配置：
+
+- value train: `500` steps
+- value infer: 全量
+- policy train: `500` steps
+
+结果：
+
+- value 训练成功，checkpoint 于 `250/500`
+- value infer 成功写回 `*_stage3`
+- policy 训练成功，checkpoint 于 `250/500`
+- policy 训练过程中 loss 呈下降趋势
+
+关键日志：
+
+```text
+step:200 ... loss:0.350 ...
+step:400 ... loss:0.245 ...
+```
+
+输出目录：
+
+- [value stage3](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage3)
+- [policy stage3](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage3)
+
+## 7. 关键观测
+
+### 7.1 真实 `pi05` 推理已通过
+
+已确认以下链路可用：
+
+- gated tokenizer 加载
+- `lerobot/pi05_base` 权重加载
+- `select_action` 真实推理
+
+说明当前机器上的 `pi05` 不是“只可训练不可推理”的状态。
+
+### 7.2 ACP workflow 已完整跑通
+
+已成功完成：
+
+1. `pistar06` 训练
+2. `value-infer`
+3. 写回 ACP 标签
+4. `pi05` 带 ACP 标签训练
+
+说明 `Evo-RL-Phi` 的 `pi05` ACP pipeline 在本机是闭环可执行的。
+
+### 7.3 训练规模已验证到 500 steps
+
+当前已验证的最大训练规模：
+
+- value train: `500` steps
+- policy train: `500` steps
+
+且中途 checkpoint 与最终 checkpoint 均正常。
+
+### 7.4 当前实验仍是“工程可跑通”，不等于“最优训练设置”
+
+原因：
+
+- 使用了 `MEAN_STD` 覆盖，而非数据原生 quantiles
+- `episode_success` 缺失，只能把默认 success 设为 `failure`
+- 数据集只有 `5` 个 episode，规模很小
+
+因此目前更准确的结论是：
+
+- 工程链路稳定
+- 训练过程稳定
+- 但数据语义质量和训练规模还不适合直接拿来评估最终策略效果
+
+## 8. 生成的脚本与文档
+
+### 8.1 脚本
+
+- [run_pi05_acp_smoke.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_smoke.sh)
+- [run_pi05_acp_pilot.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh)
+- [run_pi05_acp_stage1.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage1.sh)
+- [run_pi05_acp_stage2.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage2.sh)
+- [run_pi05_acp_stage3.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage3.sh)
+
+### 8.2 文档
+
+- [PI05_EVO_RL_ACP_EXECUTION_PLAN.md](/root/phi-media-lab/AMD_Hackathon/PI05_EVO_RL_ACP_EXECUTION_PLAN.md)
+- [PI05_EVO_RL_ACP_RUNBOOK.md](/root/phi-media-lab/AMD_Hackathon/PI05_EVO_RL_ACP_RUNBOOK.md)
+
+## 9. 风险与限制
+
+### 9.1 数据标签质量
+
+`episode_success` 缺失导致 value targets 的语义较弱。当前方式适合工程验证，不适合严肃评估。
+
+### 9.2 数据归一化不是最优配置
+
+当前使用 `MEAN_STD` 是为兼容旧数据。若补齐 quantile stats，理论上更应回归到 `QUANTILES`。
+
+### 9.3 数据规模过小
+
+`5` 个 episode 只能验证训练管线，无法支持可靠的策略泛化结论。
+
+### 9.4 仍需真实 rollout 评估
+
+当前实验主要验证训练与推理链路，没有完成真实机器人闭环 rollout 评估。
+
+## 10. 结论
+
+本实验已经完成以下结论性验证：
+
+1. 当前机器上的 `ROCm + MI300X` 可以稳定运行 `Evo-RL-Phi`
+2. `pi05` 在 OpenPI patched transformers 环境下可完成真实推理
+3. `pistar06 -> ACP label writeback -> pi05 ACP training` 的完整 workflow 已跑通
+4. 训练规模已从 smoke test 扩展到 `500-step + checkpoint`，说明该链路具备中等规模训练稳定性
+
+因此，当前系统状态可以定义为：
+
+```text
+pi05 的 Evo-RL ACP workflow 已在 ROCm 环境中完成端到端工程验证，并具备继续放大训练规模的条件
+```
+
+## 11. 下一步建议
+
+优先顺序建议如下：
+
+1. 给数据集补 `episode_success`
+2. 给数据集补 quantile stats
+3. 基于更高质量数据启动 `1000+` steps 长训
+4. 对 `stage3` 或后续 checkpoint 做真实机器人 rollout 评估
+5. 如果需要长期运行，整理一份正式训练脚本并加入日志/监控策略
diff --git a/PI05_EVO_RL_ACP_RUNBOOK.md b/PI05_EVO_RL_ACP_RUNBOOK.md
new file mode 100644
index 0000000..ddf50eb
--- /dev/null
+++ b/PI05_EVO_RL_ACP_RUNBOOK.md
@@ -0,0 +1,58 @@
+# PI05 Evo-RL ACP Runbook
+
+## Fixed Constraints
+
+- Use the patched environment: `/root/phi-media-lab/.venvs/pi05-openpi-ssp`
+- Force dataset decoding with `--dataset.video_backend=pyav`
+- Use local labeled dataset cache for policy training: `/root/.cache/huggingface/lerobot/maxbeau/XLeRobot`
+- Use `MEAN_STD` for `STATE` and `ACTION` unless the dataset is augmented with quantile stats
+
+## Ready-To-Run Scripts
+
+Smoke validation:
+
+```bash
+bash /root/phi-media-lab/AMD_Hackathon/run_pi05_acp_smoke.sh
+```
+
+Short pilot run with checkpoints:
+
+```bash
+bash /root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh
+```
+
+## Pilot Output Paths
+
+Value checkpoint:
+
+```text
+/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_pilot/checkpoints/000005/pretrained_model
+```
+
+Policy checkpoint:
+
+```text
+/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_pilot/checkpoints/000005/pretrained_model
+```
+
+ACP-labeled dataset fields written into local cache:
+
+```text
+complementary_info.value_pilot
+complementary_info.advantage_pilot
+complementary_info.acp_indicator_pilot
+```
+
+## Current Data Assumptions
+
+- Dataset repo: `maxbeau/XLeRobot`
+- Dataset does not contain `episode_success`
+- Current runs therefore rely on `--dataset.default_success=failure`
+
+## When To Change The Defaults
+
+Switch away from the current overrides only if one of these is true:
+
+1. `XLeRobot` gets quantile stats, then `STATE` and `ACTION` can go back to `QUANTILES`
+2. `episode_success` gets written into episode metadata, then value targets can use real success labels
+3. ACP-labeled outputs are pushed to a dedicated Hub dataset, then policy training can point at that repo instead of the local cache root
diff --git a/run_pi05_acp_pilot.sh b/run_pi05_acp_pilot.sh
new file mode 100755
index 0000000..6816686
--- /dev/null
+++ b/run_pi05_acp_pilot.sh
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT="/root/phi-media-lab"
+EVORL_ROOT="$ROOT/Evo-RL-Phi"
+VENV="$ROOT/.venvs/pi05-openpi-ssp/bin"
+DATASET_REPO="maxbeau/XLeRobot"
+DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot"
+
+VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_pilot"
+VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_pilot"
+POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_pilot"
+
+VALUE_FIELD="complementary_info.value_pilot"
+ADV_FIELD="complementary_info.advantage_pilot"
+IND_FIELD="complementary_info.acp_indicator_pilot"
+
+export HF_HUB_ENABLE_HF_TRANSFER=1
+
+cd "$EVORL_ROOT"
+
+"$VENV/lerobot-value-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=5 \
+  --save_checkpoint=true \
+  --save_freq=5 \
+  --wandb.enable=false \
+  --output_dir="$VALUE_RUN_DIR" \
+  --job_name=pi05_acp_pilot_value
+
+"$VENV/lerobot-value-infer" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path="$VALUE_RUN_DIR" \
+  --runtime.device=cuda \
+  --runtime.batch_size=4 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=10 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field="$VALUE_FIELD" \
+  --acp.advantage_field="$ADV_FIELD" \
+  --acp.indicator_field="$IND_FIELD" \
+  --output_dir="$VALUE_INFER_DIR" \
+  --job_name=pi05_acp_pilot_infer
+
+"$VENV/lerobot-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.root="$DATASET_ROOT" \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=5 \
+  --save_checkpoint=true \
+  --save_freq=5 \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field="$IND_FIELD" \
+  --acp.indicator_dropout_prob=0.3 \
+  --output_dir="$POLICY_RUN_DIR" \
+  --job_name=pi05_acp_policy_pilot
diff --git a/run_pi05_acp_smoke.sh b/run_pi05_acp_smoke.sh
new file mode 100755
index 0000000..7c10942
--- /dev/null
+++ b/run_pi05_acp_smoke.sh
@@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT="/root/phi-media-lab"
+EVORL_ROOT="$ROOT/Evo-RL-Phi"
+VENV="$ROOT/.venvs/pi05-openpi-ssp/bin"
+DATASET_REPO="maxbeau/XLeRobot"
+DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot"
+
+VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_ckpt_meanstd"
+VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_smoke"
+POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_smoke"
+
+VALUE_FIELD="complementary_info.value_smoke"
+ADV_FIELD="complementary_info.advantage_smoke"
+IND_FIELD="complementary_info.acp_indicator_smoke"
+
+export HF_HUB_ENABLE_HF_TRANSFER=1
+
+cd "$EVORL_ROOT"
+
+"$VENV/lerobot-value-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=1 \
+  --save_checkpoint=true \
+  --save_freq=1 \
+  --wandb.enable=false \
+  --output_dir="$VALUE_RUN_DIR" \
+  --job_name=pi05_acp_ckpt_meanstd
+
+"$VENV/lerobot-value-infer" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path="$VALUE_RUN_DIR" \
+  --runtime.device=cuda \
+  --runtime.batch_size=1 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=5 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field="$VALUE_FIELD" \
+  --acp.advantage_field="$ADV_FIELD" \
+  --acp.indicator_field="$IND_FIELD" \
+  --output_dir="$VALUE_INFER_DIR" \
+  --job_name=pi05_acp_smoke_infer
+
+"$VENV/lerobot-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.root="$DATASET_ROOT" \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=1 \
+  --save_checkpoint=false \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field="$IND_FIELD" \
+  --acp.indicator_dropout_prob=0.3 \
+  --output_dir="$POLICY_RUN_DIR" \
+  --job_name=pi05_acp_policy_smoke
diff --git a/run_pi05_acp_stage1.sh b/run_pi05_acp_stage1.sh
new file mode 100755
index 0000000..4d955ef
--- /dev/null
+++ b/run_pi05_acp_stage1.sh
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT="/root/phi-media-lab"
+EVORL_ROOT="$ROOT/Evo-RL-Phi"
+VENV="$ROOT/.venvs/pi05-openpi-ssp/bin"
+DATASET_REPO="maxbeau/XLeRobot"
+DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot"
+
+VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage1"
+VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage1"
+POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage1"
+
+VALUE_FIELD="complementary_info.value_stage1"
+ADV_FIELD="complementary_info.advantage_stage1"
+IND_FIELD="complementary_info.acp_indicator_stage1"
+
+export HF_HUB_ENABLE_HF_TRANSFER=1
+
+cd "$EVORL_ROOT"
+
+"$VENV/lerobot-value-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=20 \
+  --save_checkpoint=true \
+  --save_freq=10 \
+  --wandb.enable=false \
+  --output_dir="$VALUE_RUN_DIR" \
+  --job_name=pi05_acp_stage1_value
+
+"$VENV/lerobot-value-infer" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path="$VALUE_RUN_DIR" \
+  --runtime.device=cuda \
+  --runtime.batch_size=4 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=10 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field="$VALUE_FIELD" \
+  --acp.advantage_field="$ADV_FIELD" \
+  --acp.indicator_field="$IND_FIELD" \
+  --output_dir="$VALUE_INFER_DIR" \
+  --job_name=pi05_acp_stage1_infer
+
+"$VENV/lerobot-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.root="$DATASET_ROOT" \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=20 \
+  --save_checkpoint=true \
+  --save_freq=10 \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field="$IND_FIELD" \
+  --acp.indicator_dropout_prob=0.3 \
+  --output_dir="$POLICY_RUN_DIR" \
+  --job_name=pi05_acp_policy_stage1
diff --git a/run_pi05_acp_stage2.sh b/run_pi05_acp_stage2.sh
new file mode 100755
index 0000000..99bff3c
--- /dev/null
+++ b/run_pi05_acp_stage2.sh
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT="/root/phi-media-lab"
+EVORL_ROOT="$ROOT/Evo-RL-Phi"
+VENV="$ROOT/.venvs/pi05-openpi-ssp/bin"
+DATASET_REPO="maxbeau/XLeRobot"
+DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot"
+
+VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage2"
+VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage2"
+POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage2"
+
+VALUE_FIELD="complementary_info.value_stage2"
+ADV_FIELD="complementary_info.advantage_stage2"
+IND_FIELD="complementary_info.acp_indicator_stage2"
+
+export HF_HUB_ENABLE_HF_TRANSFER=1
+
+cd "$EVORL_ROOT"
+
+"$VENV/lerobot-value-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=100 \
+  --save_checkpoint=true \
+  --save_freq=50 \
+  --wandb.enable=false \
+  --output_dir="$VALUE_RUN_DIR" \
+  --job_name=pi05_acp_stage2_value
+
+"$VENV/lerobot-value-infer" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path="$VALUE_RUN_DIR" \
+  --runtime.device=cuda \
+  --runtime.batch_size=4 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=10 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field="$VALUE_FIELD" \
+  --acp.advantage_field="$ADV_FIELD" \
+  --acp.indicator_field="$IND_FIELD" \
+  --output_dir="$VALUE_INFER_DIR" \
+  --job_name=pi05_acp_stage2_infer
+
+"$VENV/lerobot-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.root="$DATASET_ROOT" \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=100 \
+  --save_checkpoint=true \
+  --save_freq=50 \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field="$IND_FIELD" \
+  --acp.indicator_dropout_prob=0.3 \
+  --output_dir="$POLICY_RUN_DIR" \
+  --job_name=pi05_acp_policy_stage2
diff --git a/run_pi05_acp_stage3.sh b/run_pi05_acp_stage3.sh
new file mode 100755
index 0000000..bacc617
--- /dev/null
+++ b/run_pi05_acp_stage3.sh
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT="/root/phi-media-lab"
+EVORL_ROOT="$ROOT/Evo-RL-Phi"
+VENV="$ROOT/.venvs/pi05-openpi-ssp/bin"
+DATASET_REPO="maxbeau/XLeRobot"
+DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot"
+
+VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage3"
+VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage3"
+POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage3"
+
+VALUE_FIELD="complementary_info.value_stage3"
+ADV_FIELD="complementary_info.advantage_stage3"
+IND_FIELD="complementary_info.acp_indicator_stage3"
+
+export HF_HUB_ENABLE_HF_TRANSFER=1
+
+cd "$EVORL_ROOT"
+
+"$VENV/lerobot-value-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --value.type=pistar06 \
+  --value.device=cuda \
+  --value.dtype=float32 \
+  --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=500 \
+  --save_checkpoint=true \
+  --save_freq=250 \
+  --wandb.enable=false \
+  --output_dir="$VALUE_RUN_DIR" \
+  --job_name=pi05_acp_stage3_value
+
+"$VENV/lerobot-value-infer" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.video_backend=pyav \
+  --dataset.default_success=failure \
+  --inference.checkpoint_path="$VALUE_RUN_DIR" \
+  --runtime.device=cuda \
+  --runtime.batch_size=4 \
+  --runtime.num_workers=0 \
+  --acp.enable=true \
+  --acp.n_step=10 \
+  --acp.positive_ratio=0.3 \
+  --acp.value_field="$VALUE_FIELD" \
+  --acp.advantage_field="$ADV_FIELD" \
+  --acp.indicator_field="$IND_FIELD" \
+  --output_dir="$VALUE_INFER_DIR" \
+  --job_name=pi05_acp_stage3_infer
+
+"$VENV/lerobot-train" \
+  --dataset.repo_id="$DATASET_REPO" \
+  --dataset.root="$DATASET_ROOT" \
+  --dataset.video_backend=pyav \
+  --policy.type=pi05 \
+  --policy.pretrained_path=lerobot/pi05_base \
+  --policy.device=cuda \
+  --policy.dtype=float32 \
+  --policy.push_to_hub=false \
+  --policy.train_expert_only=true \
+  --policy.freeze_vision_encoder=true \
+  --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \
+  --batch_size=1 \
+  --num_workers=0 \
+  --steps=500 \
+  --save_checkpoint=true \
+  --save_freq=250 \
+  --wandb.enable=false \
+  --acp.enable=true \
+  --acp.indicator_field="$IND_FIELD" \
+  --acp.indicator_dropout_prob=0.3 \
+  --output_dir="$POLICY_RUN_DIR" \
+  --job_name=pi05_acp_policy_stage3