diff --git a/PI05_EVO_RL_ACP_EXECUTION_PLAN.md b/PI05_EVO_RL_ACP_EXECUTION_PLAN.md new file mode 100644 index 0000000..afb500d --- /dev/null +++ b/PI05_EVO_RL_ACP_EXECUTION_PLAN.md @@ -0,0 +1,275 @@ +# PI05 Evo-RL ACP Execution Plan + +## Goal + +Reference `AMD_Hackathon` ROCm setup guidance to: + +1. validate a ROCm-backed LeRobot runtime, +2. attempt `pi05` inference on this machine, +3. run the `pi05` Evo-RL ACP workflow end-to-end. + +## Current Machine Status + +- ROCm runtime detected via `rocminfo` +- GPU detected: `AMD Instinct MI300X VF` +- Python: `3.12.3` +- Torch: `2.6.0+rocm7.0.2` +- LeRobot: `0.4.4` +- `lerobot-info`, `lerobot-train --help`, `lerobot-eval --help` all work + +Validation commands already executed: + +```bash +python3 - <<'PY' +import torch +print(torch.__version__) +print(torch.cuda.is_available()) +print(torch.cuda.get_device_name(0)) +PY + +cd /root/phi-media-lab/Evo-RL-Phi +python3 -m lerobot.scripts.lerobot_info +python3 -m lerobot.scripts.lerobot_train --help +python3 -m lerobot.scripts.lerobot_eval --help +``` + +## Code Changes Applied + +These local changes were made in `Evo-RL-Phi` to unblock validation: + +1. `src/lerobot/policies/pi05/modeling_pi05.py` + Relaxed the brittle `transformers.models.siglip.check` import requirement. The check now runs only when that helper exists. +2. `src/lerobot/policies/pi05/configuration_pi05.py` + Added `tokenizer_name` as a config field. +3. `src/lerobot/policies/pi05/processor_pi05.py` + Switched tokenizer loading from a hard-coded value to `config.tokenizer_name`. +4. `src/lerobot/policies/pi05/modeling_pi05.py` + Added a tied-weight fallback so `lerobot/pi05_base` can load when the checkpoint only exposes `lm_head.weight`. +5. `src/lerobot/configs/value.py` + Added `dataset.video_backend` so `lerobot-value-infer` can avoid broken `torchcodec` default selection on this ROCm environment. +6. `src/lerobot/scripts/lerobot_value_infer.py` + Passed `dataset.video_backend` through to `LeRobotDataset`. + +Editable install refreshed: + +```bash +cd /root/phi-media-lab/Evo-RL-Phi +python3 -m pip install -e . +``` + +## Validation Results + +### Passed + +- ROCm tensor allocation and compute +- LeRobot CLI runtime +- ACP prompt injection path into `pi05` +- OpenPI-patched `transformers` environment for `pi05` +- `pi05` GPU smoke test on ROCm with mocked tokenizer +- real `pi05` pretrained inference with `lerobot/pi05_base` +- `pistar06` value training smoke run on `maxbeau/XLeRobot` +- `lerobot-value-infer` writing ACP labels back to the local dataset cache +- `pi05` ACP policy-training smoke run on the labeled dataset + +Command: + +```bash +cd /root/phi-media-lab/Evo-RL-Phi +pytest -q tests/training/test_acp_pi05_prompt_pipeline.py -q +``` + +Result: passed on `DEVICE='cuda'` + +### OpenPI-Patched Environment Created + +Isolated environment: + +```bash +/root/phi-media-lab/.venvs/pi05-openpi-ssp +``` + +OpenPI source: + +```bash +/root/phi-media-lab/openpi +``` + +Environment preparation that was executed: + +```bash +python3 -m venv --system-site-packages /root/phi-media-lab/.venvs/pi05-openpi-ssp +git clone --depth=1 --recurse-submodules https://github.com/Physical-Intelligence/openpi.git /root/phi-media-lab/openpi +/root/phi-media-lab/.venvs/pi05-openpi-ssp/bin/python -m pip install 'transformers==4.53.2' +/root/phi-media-lab/.venvs/pi05-openpi-ssp/bin/python -m pip install -e /root/phi-media-lab/Evo-RL-Phi --no-deps +cp -r /root/phi-media-lab/openpi/src/openpi/models_pytorch/transformers_replace/* \ + /root/phi-media-lab/.venvs/pi05-openpi-ssp/lib/python3.12/site-packages/transformers/ +echo '/opt/venv/lib/python3.12/site-packages' > \ + /root/phi-media-lab/.venvs/pi05-openpi-ssp/lib/python3.12/site-packages/_opt_venv.pth +``` + +Patch verification: + +```text +transformers 4.53.2 +GemmaRMSNorm.forward(self, x, cond=None) +siglip_check_result True +``` + +### `pi05` ROCm Smoke Test + +In the patched venv, a minimal GPU forward + `select_action` test passed when tokenizer access was mocked: + +```text +smoke_ok +loss 2.579069137573242 +loss_keys ['loss', 'loss_per_dim'] +action_shape (1, 7) +action_device cpu +``` + +This confirms: + +- ROCm backend is usable for `pi05` +- OpenPI transformer patches are sufficient for model execution +- the remaining blocker is no longer model-side + +### Real `pi05` Inference Result + +After refreshing Hugging Face auth and using the OpenPI-patched venv, real tokenizer/model access worked. + +Successful path: + +```text +google/paligemma-3b-pt-224 +lerobot/pi05_base +``` + +Observed result: + +```text +real_infer_ok +action_shape (1, 32) +action_mean 0.02885555289685726 +action_std 0.09869649261236191 +``` + +Important runtime note: + +- the saved `policy_preprocessor.json` in `lerobot/pi05_base` targets `cpu`, so the processed batch had to be moved to `cuda` before `select_action` + +### ACP Workflow Smoke Result + +Dataset used: + +```text +maxbeau/XLeRobot +``` + +Dataset compatibility findings: + +- `episode_success` is absent, so value training/inference used `--dataset.default_success=failure` +- default `torchcodec` decoding is not usable in this ROCm environment for DataLoader CPU-side video decode +- `pyav` works and must be forced for both training and value inference +- the dataset lacks state quantiles, so `STATE` normalization had to be switched from `QUANTILES` to `MEAN_STD` + +Successful smoke sequence: + +```bash +lerobot-value-train \ + --dataset.repo_id=maxbeau/XLeRobot \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=1 \ + --save_checkpoint=true \ + --save_freq=1 \ + --wandb.enable=false \ + --output_dir=/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_ckpt_meanstd + +lerobot-value-infer \ + --dataset.repo_id=maxbeau/XLeRobot \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path=/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_ckpt_meanstd \ + --runtime.device=cuda \ + --runtime.batch_size=1 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=5 \ + --acp.positive_ratio=0.3 \ + --acp.value_field=complementary_info.value_smoke \ + --acp.advantage_field=complementary_info.advantage_smoke \ + --acp.indicator_field=complementary_info.acp_indicator_smoke + +lerobot-train \ + --dataset.repo_id=maxbeau/XLeRobot \ + --dataset.root=/root/.cache/huggingface/lerobot/maxbeau/XLeRobot \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=1 \ + --save_checkpoint=false \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field=complementary_info.acp_indicator_smoke \ + --acp.indicator_dropout_prob=0.3 +``` + +Key observed outputs: + +```text +ACP stats | n_step=5 positive_ratio_target=0.3000 positive_ratio_observed=0.3001 +Wrote value annotations to dataset root: /root/.cache/huggingface/lerobot/maxbeau/XLeRobot +ACP indicator stats (hf_dataset_scan): field='complementary_info.acp_indicator_smoke' ratio=0.300135 positive=446 total=1486 +First policy prompt (task): +Task: pick up the block Advantage: negative, State: ... +End of training +``` + +## Smoke-Test Summary + +What has been proven: + +- ROCm backend is healthy +- `pi05` processor + ACP prompt path is healthy +- local `pi05` code can instantiate further than before after removing the brittle SigLIP helper check +- OpenPI-patched `transformers` makes real `pi05` model execution possible on ROCm +- real `pi05` forward + `select_action` succeed with `lerobot/pi05_base` +- `pistar06 -> value_infer -> ACP label writeback -> pi05 ACP train` has been smoke-tested end to end + +What is not yet proven: + +- stable multi-step or full-length `pi05` training on this dataset +- production-quality ACP behavior with meaningful value targets, because the smoke run treated all episodes as `failure` +- whether `maxbeau/XLeRobot` should be augmented with quantile stats instead of using `MEAN_STD` + +## Required Next Actions + +### Next Practical Steps + +1. Decide whether to keep using local cache path `/root/.cache/huggingface/lerobot/maxbeau/XLeRobot` for ACP-labeled experiments or push a cleaned dataset variant to a new HF repo. +2. Either augment dataset quantile stats or keep explicit `MEAN_STD` overrides for `STATE`/`ACTION`. +3. Increase `steps`, `batch_size`, and checkpointing from the smoke setup once you want a real training run. +4. If value targets should reflect true success, add `episode_success` metadata instead of relying on `default_success=failure`. + +## Large Download Note + +Public `lerobot/pi05_base` exists and the `model.safetensors` size is approximately: + +```text +14,467,165,872 bytes +``` + +Do not download it until the tokenizer and patched-transformers issues are solved, otherwise the model still will not run. diff --git a/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md b/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md new file mode 100644 index 0000000..4186aa7 --- /dev/null +++ b/PI05_EVO_RL_ACP_EXPERIMENT_REPORT.md @@ -0,0 +1,430 @@ +# PI05 Evo-RL ACP 实验报告 + +## 1. 实验目标 + +本实验的目标是基于 `~/phi-media-lab/AMD_Hackathon` 中的 AMD ROCm 配置思路,在当前机器上完成以下验证: + +1. 配置并验证 ROCm 后端的 LeRobot / Evo-RL 运行环境 +2. 跑通 `pi05` 的真实推理 +3. 跑通 `pistar06 -> value-infer -> ACP indicator -> pi05 policy train` 的 Evo-RL ACP workflow +4. 将实验从 smoke test 逐步扩大到可持续训练规模,并验证 checkpoint 稳定性 + +## 2. 实验环境 + +### 2.1 硬件与基础环境 + +- GPU: `AMD Instinct MI300X VF` +- ROCm: 可用 +- Python: `3.12.3` +- Torch: `2.6.0+rocm7.0.2` +- LeRobot: `0.4.4` + +### 2.2 代码库 + +- 文档仓: `/root/phi-media-lab/AMD_Hackathon` +- 主代码仓: `/root/phi-media-lab/Evo-RL-Phi` +- OpenPI 源码: `/root/phi-media-lab/openpi` + +### 2.3 使用的 Python 环境 + +正式实验使用的隔离环境: + +```text +/root/phi-media-lab/.venvs/pi05-openpi-ssp +``` + +该环境中额外完成了: + +- `transformers==4.53.2` +- OpenPI `transformers_replace` 补丁覆盖 +- editable install 的 `Evo-RL-Phi` + +## 3. 关键兼容性问题与解决方案 + +### 3.1 `pi05` 对 OpenPI patched transformers 的依赖 + +问题: + +- 原环境中的 `transformers 4.57.6` 无法满足 OpenPI 版 `pi05` +- `GemmaRMSNorm.forward(cond=...)` 不存在 +- `transformers.models.siglip.check` 兼容性脆弱 + +解决: + +- 单独构建 patched venv +- 安装 `transformers==4.53.2` +- 覆盖 OpenPI 的 `transformers_replace` + +结果: + +- `pi05` 的 forward 和 `select_action` 在 ROCm 上可运行 + +### 3.2 gated tokenizer 访问 + +问题: + +- `google/paligemma-3b-pt-224` 初始无权限 + +解决: + +- 更新 Hugging Face 账号权限 +- 本地重新 `hf auth login` + +结果: + +- 真实 tokenizer 加载成功 +- `lerobot/pi05_base` 真实推理成功 + +### 3.3 `torchcodec` 与 ROCm 数据解码不兼容 + +问题: + +- DataLoader 侧视频解码默认走 `torchcodec` +- 当前环境中 CPU worker 路径不可用,导致视频帧解码失败 + +解决: + +- 全部数据相关命令显式指定: + +```bash +--dataset.video_backend=pyav +``` + +结果: + +- `LeRobotDataset` 可稳定读取图像帧 +- `value-train` / `value-infer` / `policy train` 全部可运行 + +### 3.4 数据集缺少 quantile stats + +问题: + +- `maxbeau/XLeRobot` 的 `observation.state` 只有 `min/max/mean/std/count` +- 缺失 `q01/q99` +- 默认 `QUANTILES` 归一化会报错 + +解决: + +- 在 value / policy 训练中统一覆盖: + +```bash +--normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' +``` + +结果: + +- 成功绕过 quantile 依赖 +- 当前所有实验阶段都采用该方案 + +### 3.5 数据集缺少 `episode_success` + +问题: + +- `maxbeau/XLeRobot` 的 episode metadata 中不存在 `episode_success` + +解决: + +- value 相关流程统一使用: + +```bash +--dataset.default_success=failure +``` + +结果: + +- 可完成 value target 构造 +- 但当前 value supervision 语义仍然较弱,属于工程可跑通而非最优标签质量 + +## 4. 本地代码修改 + +本实验为跑通整条链路,对 `Evo-RL-Phi` 做了最小兼容修改。 + +### 4.1 `pi05` 相关 + +[configuration_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/configuration_pi05.py) + +- 新增 `tokenizer_name` + +[processor_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/processor_pi05.py) + +- 从硬编码 tokenizer 切换到 `config.tokenizer_name` + +[modeling_pi05.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/policies/pi05/modeling_pi05.py) + +- 放宽 `siglip.check` 的依赖 +- 增加 tied-weight fallback,兼容 `lerobot/pi05_base` 权重结构 + +### 4.2 value inference 相关 + +[value.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/configs/value.py) + +- 为 `ValueInferenceDatasetConfig` 增加 `video_backend` + +[lerobot_value_infer.py](/root/phi-media-lab/Evo-RL-Phi/src/lerobot/scripts/lerobot_value_infer.py) + +- 将 `dataset.video_backend` 传给 `LeRobotDataset` + +## 5. 数据集与模型 + +### 5.1 数据集 + +实验数据集: + +```text +maxbeau/XLeRobot +``` + +本地缓存根路径: + +```text +/root/.cache/huggingface/lerobot/maxbeau/XLeRobot +``` + +已确认特性: + +- 总帧数:`1486` +- episode 数:`5` +- task:`pick up the block` +- 包含视频键: + - `observation.images.front_cam` + - `observation.images.hand_cam` +- 不包含: + - `episode_success` + - state quantile stats + +### 5.2 模型 + +value model: + +```text +pistar06 +``` + +policy model: + +```text +lerobot/pi05_base +``` + +## 6. 实验阶段与结果 + +### 6.1 Smoke 阶段 + +目标: + +- 验证真实 `pi05` 推理 +- 验证 1-step value train / value infer / policy train + +结果: + +- `pi05` 真实推理成功 +- `pistar06` 1-step 训练成功 +- `value-infer` 成功写回: + - `complementary_info.value_smoke` + - `complementary_info.advantage_smoke` + - `complementary_info.acp_indicator_smoke` +- `pi05` 1-step ACP 训练成功 + +### 6.2 Pilot 阶段 + +脚本: + +[run_pi05_acp_pilot.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh) + +配置: + +- value train: `5` steps +- value infer: 全量 +- policy train: `5` steps + +结果: + +- 全链路成功 +- value checkpoint 成功 +- policy checkpoint 成功 + +输出目录: + +- [value pilot](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_pilot) +- [policy pilot](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_pilot) + +### 6.3 Stage1 + +脚本: + +[run_pi05_acp_stage1.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage1.sh) + +配置: + +- value train: `20` steps +- value infer: 全量 +- policy train: `20` steps + +结果: + +- value 训练成功,checkpoint 于 `10/20` +- value infer 成功写回 `*_stage1` +- policy 训练成功,checkpoint 于 `10/20` + +输出目录: + +- [value stage1](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage1) +- [policy stage1](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage1) + +### 6.4 Stage2 + +脚本: + +[run_pi05_acp_stage2.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage2.sh) + +配置: + +- value train: `100` steps +- value infer: 全量 +- policy train: `100` steps + +结果: + +- value 训练成功,checkpoint 于 `50/100` +- value infer 成功写回 `*_stage2` +- policy 训练成功,checkpoint 于 `50/100` + +输出目录: + +- [value stage2](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage2) +- [policy stage2](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage2) + +### 6.5 Stage3 + +脚本: + +[run_pi05_acp_stage3.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage3.sh) + +配置: + +- value train: `500` steps +- value infer: 全量 +- policy train: `500` steps + +结果: + +- value 训练成功,checkpoint 于 `250/500` +- value infer 成功写回 `*_stage3` +- policy 训练成功,checkpoint 于 `250/500` +- policy 训练过程中 loss 呈下降趋势 + +关键日志: + +```text +step:200 ... loss:0.350 ... +step:400 ... loss:0.245 ... +``` + +输出目录: + +- [value stage3](/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_stage3) +- [policy stage3](/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_stage3) + +## 7. 关键观测 + +### 7.1 真实 `pi05` 推理已通过 + +已确认以下链路可用: + +- gated tokenizer 加载 +- `lerobot/pi05_base` 权重加载 +- `select_action` 真实推理 + +说明当前机器上的 `pi05` 不是“只可训练不可推理”的状态。 + +### 7.2 ACP workflow 已完整跑通 + +已成功完成: + +1. `pistar06` 训练 +2. `value-infer` +3. 写回 ACP 标签 +4. `pi05` 带 ACP 标签训练 + +说明 `Evo-RL-Phi` 的 `pi05` ACP pipeline 在本机是闭环可执行的。 + +### 7.3 训练规模已验证到 500 steps + +当前已验证的最大训练规模: + +- value train: `500` steps +- policy train: `500` steps + +且中途 checkpoint 与最终 checkpoint 均正常。 + +### 7.4 当前实验仍是“工程可跑通”,不等于“最优训练设置” + +原因: + +- 使用了 `MEAN_STD` 覆盖,而非数据原生 quantiles +- `episode_success` 缺失,只能把默认 success 设为 `failure` +- 数据集只有 `5` 个 episode,规模很小 + +因此目前更准确的结论是: + +- 工程链路稳定 +- 训练过程稳定 +- 但数据语义质量和训练规模还不适合直接拿来评估最终策略效果 + +## 8. 生成的脚本与文档 + +### 8.1 脚本 + +- [run_pi05_acp_smoke.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_smoke.sh) +- [run_pi05_acp_pilot.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh) +- [run_pi05_acp_stage1.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage1.sh) +- [run_pi05_acp_stage2.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage2.sh) +- [run_pi05_acp_stage3.sh](/root/phi-media-lab/AMD_Hackathon/run_pi05_acp_stage3.sh) + +### 8.2 文档 + +- [PI05_EVO_RL_ACP_EXECUTION_PLAN.md](/root/phi-media-lab/AMD_Hackathon/PI05_EVO_RL_ACP_EXECUTION_PLAN.md) +- [PI05_EVO_RL_ACP_RUNBOOK.md](/root/phi-media-lab/AMD_Hackathon/PI05_EVO_RL_ACP_RUNBOOK.md) + +## 9. 风险与限制 + +### 9.1 数据标签质量 + +`episode_success` 缺失导致 value targets 的语义较弱。当前方式适合工程验证,不适合严肃评估。 + +### 9.2 数据归一化不是最优配置 + +当前使用 `MEAN_STD` 是为兼容旧数据。若补齐 quantile stats,理论上更应回归到 `QUANTILES`。 + +### 9.3 数据规模过小 + +`5` 个 episode 只能验证训练管线,无法支持可靠的策略泛化结论。 + +### 9.4 仍需真实 rollout 评估 + +当前实验主要验证训练与推理链路,没有完成真实机器人闭环 rollout 评估。 + +## 10. 结论 + +本实验已经完成以下结论性验证: + +1. 当前机器上的 `ROCm + MI300X` 可以稳定运行 `Evo-RL-Phi` +2. `pi05` 在 OpenPI patched transformers 环境下可完成真实推理 +3. `pistar06 -> ACP label writeback -> pi05 ACP training` 的完整 workflow 已跑通 +4. 训练规模已从 smoke test 扩展到 `500-step + checkpoint`,说明该链路具备中等规模训练稳定性 + +因此,当前系统状态可以定义为: + +```text +pi05 的 Evo-RL ACP workflow 已在 ROCm 环境中完成端到端工程验证,并具备继续放大训练规模的条件 +``` + +## 11. 下一步建议 + +优先顺序建议如下: + +1. 给数据集补 `episode_success` +2. 给数据集补 quantile stats +3. 基于更高质量数据启动 `1000+` steps 长训 +4. 对 `stage3` 或后续 checkpoint 做真实机器人 rollout 评估 +5. 如果需要长期运行,整理一份正式训练脚本并加入日志/监控策略 diff --git a/PI05_EVO_RL_ACP_RUNBOOK.md b/PI05_EVO_RL_ACP_RUNBOOK.md new file mode 100644 index 0000000..ddf50eb --- /dev/null +++ b/PI05_EVO_RL_ACP_RUNBOOK.md @@ -0,0 +1,58 @@ +# PI05 Evo-RL ACP Runbook + +## Fixed Constraints + +- Use the patched environment: `/root/phi-media-lab/.venvs/pi05-openpi-ssp` +- Force dataset decoding with `--dataset.video_backend=pyav` +- Use local labeled dataset cache for policy training: `/root/.cache/huggingface/lerobot/maxbeau/XLeRobot` +- Use `MEAN_STD` for `STATE` and `ACTION` unless the dataset is augmented with quantile stats + +## Ready-To-Run Scripts + +Smoke validation: + +```bash +bash /root/phi-media-lab/AMD_Hackathon/run_pi05_acp_smoke.sh +``` + +Short pilot run with checkpoints: + +```bash +bash /root/phi-media-lab/AMD_Hackathon/run_pi05_acp_pilot.sh +``` + +## Pilot Output Paths + +Value checkpoint: + +```text +/root/phi-media-lab/Evo-RL-Phi/outputs/value_train/pi05_acp_pilot/checkpoints/000005/pretrained_model +``` + +Policy checkpoint: + +```text +/root/phi-media-lab/Evo-RL-Phi/outputs/train/pi05_acp_policy_pilot/checkpoints/000005/pretrained_model +``` + +ACP-labeled dataset fields written into local cache: + +```text +complementary_info.value_pilot +complementary_info.advantage_pilot +complementary_info.acp_indicator_pilot +``` + +## Current Data Assumptions + +- Dataset repo: `maxbeau/XLeRobot` +- Dataset does not contain `episode_success` +- Current runs therefore rely on `--dataset.default_success=failure` + +## When To Change The Defaults + +Switch away from the current overrides only if one of these is true: + +1. `XLeRobot` gets quantile stats, then `STATE` and `ACTION` can go back to `QUANTILES` +2. `episode_success` gets written into episode metadata, then value targets can use real success labels +3. ACP-labeled outputs are pushed to a dedicated Hub dataset, then policy training can point at that repo instead of the local cache root diff --git a/run_pi05_acp_pilot.sh b/run_pi05_acp_pilot.sh new file mode 100755 index 0000000..6816686 --- /dev/null +++ b/run_pi05_acp_pilot.sh @@ -0,0 +1,77 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="/root/phi-media-lab" +EVORL_ROOT="$ROOT/Evo-RL-Phi" +VENV="$ROOT/.venvs/pi05-openpi-ssp/bin" +DATASET_REPO="maxbeau/XLeRobot" +DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot" + +VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_pilot" +VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_pilot" +POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_pilot" + +VALUE_FIELD="complementary_info.value_pilot" +ADV_FIELD="complementary_info.advantage_pilot" +IND_FIELD="complementary_info.acp_indicator_pilot" + +export HF_HUB_ENABLE_HF_TRANSFER=1 + +cd "$EVORL_ROOT" + +"$VENV/lerobot-value-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=5 \ + --save_checkpoint=true \ + --save_freq=5 \ + --wandb.enable=false \ + --output_dir="$VALUE_RUN_DIR" \ + --job_name=pi05_acp_pilot_value + +"$VENV/lerobot-value-infer" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path="$VALUE_RUN_DIR" \ + --runtime.device=cuda \ + --runtime.batch_size=4 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=10 \ + --acp.positive_ratio=0.3 \ + --acp.value_field="$VALUE_FIELD" \ + --acp.advantage_field="$ADV_FIELD" \ + --acp.indicator_field="$IND_FIELD" \ + --output_dir="$VALUE_INFER_DIR" \ + --job_name=pi05_acp_pilot_infer + +"$VENV/lerobot-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.root="$DATASET_ROOT" \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=5 \ + --save_checkpoint=true \ + --save_freq=5 \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field="$IND_FIELD" \ + --acp.indicator_dropout_prob=0.3 \ + --output_dir="$POLICY_RUN_DIR" \ + --job_name=pi05_acp_policy_pilot diff --git a/run_pi05_acp_smoke.sh b/run_pi05_acp_smoke.sh new file mode 100755 index 0000000..7c10942 --- /dev/null +++ b/run_pi05_acp_smoke.sh @@ -0,0 +1,76 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="/root/phi-media-lab" +EVORL_ROOT="$ROOT/Evo-RL-Phi" +VENV="$ROOT/.venvs/pi05-openpi-ssp/bin" +DATASET_REPO="maxbeau/XLeRobot" +DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot" + +VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_ckpt_meanstd" +VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_smoke" +POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_smoke" + +VALUE_FIELD="complementary_info.value_smoke" +ADV_FIELD="complementary_info.advantage_smoke" +IND_FIELD="complementary_info.acp_indicator_smoke" + +export HF_HUB_ENABLE_HF_TRANSFER=1 + +cd "$EVORL_ROOT" + +"$VENV/lerobot-value-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=1 \ + --save_checkpoint=true \ + --save_freq=1 \ + --wandb.enable=false \ + --output_dir="$VALUE_RUN_DIR" \ + --job_name=pi05_acp_ckpt_meanstd + +"$VENV/lerobot-value-infer" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path="$VALUE_RUN_DIR" \ + --runtime.device=cuda \ + --runtime.batch_size=1 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=5 \ + --acp.positive_ratio=0.3 \ + --acp.value_field="$VALUE_FIELD" \ + --acp.advantage_field="$ADV_FIELD" \ + --acp.indicator_field="$IND_FIELD" \ + --output_dir="$VALUE_INFER_DIR" \ + --job_name=pi05_acp_smoke_infer + +"$VENV/lerobot-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.root="$DATASET_ROOT" \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=1 \ + --save_checkpoint=false \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field="$IND_FIELD" \ + --acp.indicator_dropout_prob=0.3 \ + --output_dir="$POLICY_RUN_DIR" \ + --job_name=pi05_acp_policy_smoke diff --git a/run_pi05_acp_stage1.sh b/run_pi05_acp_stage1.sh new file mode 100755 index 0000000..4d955ef --- /dev/null +++ b/run_pi05_acp_stage1.sh @@ -0,0 +1,77 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="/root/phi-media-lab" +EVORL_ROOT="$ROOT/Evo-RL-Phi" +VENV="$ROOT/.venvs/pi05-openpi-ssp/bin" +DATASET_REPO="maxbeau/XLeRobot" +DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot" + +VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage1" +VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage1" +POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage1" + +VALUE_FIELD="complementary_info.value_stage1" +ADV_FIELD="complementary_info.advantage_stage1" +IND_FIELD="complementary_info.acp_indicator_stage1" + +export HF_HUB_ENABLE_HF_TRANSFER=1 + +cd "$EVORL_ROOT" + +"$VENV/lerobot-value-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=20 \ + --save_checkpoint=true \ + --save_freq=10 \ + --wandb.enable=false \ + --output_dir="$VALUE_RUN_DIR" \ + --job_name=pi05_acp_stage1_value + +"$VENV/lerobot-value-infer" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path="$VALUE_RUN_DIR" \ + --runtime.device=cuda \ + --runtime.batch_size=4 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=10 \ + --acp.positive_ratio=0.3 \ + --acp.value_field="$VALUE_FIELD" \ + --acp.advantage_field="$ADV_FIELD" \ + --acp.indicator_field="$IND_FIELD" \ + --output_dir="$VALUE_INFER_DIR" \ + --job_name=pi05_acp_stage1_infer + +"$VENV/lerobot-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.root="$DATASET_ROOT" \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=20 \ + --save_checkpoint=true \ + --save_freq=10 \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field="$IND_FIELD" \ + --acp.indicator_dropout_prob=0.3 \ + --output_dir="$POLICY_RUN_DIR" \ + --job_name=pi05_acp_policy_stage1 diff --git a/run_pi05_acp_stage2.sh b/run_pi05_acp_stage2.sh new file mode 100755 index 0000000..99bff3c --- /dev/null +++ b/run_pi05_acp_stage2.sh @@ -0,0 +1,77 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="/root/phi-media-lab" +EVORL_ROOT="$ROOT/Evo-RL-Phi" +VENV="$ROOT/.venvs/pi05-openpi-ssp/bin" +DATASET_REPO="maxbeau/XLeRobot" +DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot" + +VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage2" +VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage2" +POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage2" + +VALUE_FIELD="complementary_info.value_stage2" +ADV_FIELD="complementary_info.advantage_stage2" +IND_FIELD="complementary_info.acp_indicator_stage2" + +export HF_HUB_ENABLE_HF_TRANSFER=1 + +cd "$EVORL_ROOT" + +"$VENV/lerobot-value-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=100 \ + --save_checkpoint=true \ + --save_freq=50 \ + --wandb.enable=false \ + --output_dir="$VALUE_RUN_DIR" \ + --job_name=pi05_acp_stage2_value + +"$VENV/lerobot-value-infer" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path="$VALUE_RUN_DIR" \ + --runtime.device=cuda \ + --runtime.batch_size=4 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=10 \ + --acp.positive_ratio=0.3 \ + --acp.value_field="$VALUE_FIELD" \ + --acp.advantage_field="$ADV_FIELD" \ + --acp.indicator_field="$IND_FIELD" \ + --output_dir="$VALUE_INFER_DIR" \ + --job_name=pi05_acp_stage2_infer + +"$VENV/lerobot-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.root="$DATASET_ROOT" \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=100 \ + --save_checkpoint=true \ + --save_freq=50 \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field="$IND_FIELD" \ + --acp.indicator_dropout_prob=0.3 \ + --output_dir="$POLICY_RUN_DIR" \ + --job_name=pi05_acp_policy_stage2 diff --git a/run_pi05_acp_stage3.sh b/run_pi05_acp_stage3.sh new file mode 100755 index 0000000..bacc617 --- /dev/null +++ b/run_pi05_acp_stage3.sh @@ -0,0 +1,77 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT="/root/phi-media-lab" +EVORL_ROOT="$ROOT/Evo-RL-Phi" +VENV="$ROOT/.venvs/pi05-openpi-ssp/bin" +DATASET_REPO="maxbeau/XLeRobot" +DATASET_ROOT="/root/.cache/huggingface/lerobot/maxbeau/XLeRobot" + +VALUE_RUN_DIR="$EVORL_ROOT/outputs/value_train/pi05_acp_stage3" +VALUE_INFER_DIR="$EVORL_ROOT/outputs/value_infer/pi05_acp_stage3" +POLICY_RUN_DIR="$EVORL_ROOT/outputs/train/pi05_acp_policy_stage3" + +VALUE_FIELD="complementary_info.value_stage3" +ADV_FIELD="complementary_info.advantage_stage3" +IND_FIELD="complementary_info.acp_indicator_stage3" + +export HF_HUB_ENABLE_HF_TRANSFER=1 + +cd "$EVORL_ROOT" + +"$VENV/lerobot-value-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --value.type=pistar06 \ + --value.device=cuda \ + --value.dtype=float32 \ + --value.normalization_mapping='{"ACTION":"IDENTITY","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=500 \ + --save_checkpoint=true \ + --save_freq=250 \ + --wandb.enable=false \ + --output_dir="$VALUE_RUN_DIR" \ + --job_name=pi05_acp_stage3_value + +"$VENV/lerobot-value-infer" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.video_backend=pyav \ + --dataset.default_success=failure \ + --inference.checkpoint_path="$VALUE_RUN_DIR" \ + --runtime.device=cuda \ + --runtime.batch_size=4 \ + --runtime.num_workers=0 \ + --acp.enable=true \ + --acp.n_step=10 \ + --acp.positive_ratio=0.3 \ + --acp.value_field="$VALUE_FIELD" \ + --acp.advantage_field="$ADV_FIELD" \ + --acp.indicator_field="$IND_FIELD" \ + --output_dir="$VALUE_INFER_DIR" \ + --job_name=pi05_acp_stage3_infer + +"$VENV/lerobot-train" \ + --dataset.repo_id="$DATASET_REPO" \ + --dataset.root="$DATASET_ROOT" \ + --dataset.video_backend=pyav \ + --policy.type=pi05 \ + --policy.pretrained_path=lerobot/pi05_base \ + --policy.device=cuda \ + --policy.dtype=float32 \ + --policy.push_to_hub=false \ + --policy.train_expert_only=true \ + --policy.freeze_vision_encoder=true \ + --policy.normalization_mapping='{"ACTION":"MEAN_STD","STATE":"MEAN_STD","VISUAL":"IDENTITY"}' \ + --batch_size=1 \ + --num_workers=0 \ + --steps=500 \ + --save_checkpoint=true \ + --save_freq=250 \ + --wandb.enable=false \ + --acp.enable=true \ + --acp.indicator_field="$IND_FIELD" \ + --acp.indicator_dropout_prob=0.3 \ + --output_dir="$POLICY_RUN_DIR" \ + --job_name=pi05_acp_policy_stage3