Port LeLab to LeRobot 0.6.0 (main) by nicolas-rabault · Pull Request #56 · huggingface/leLab

nicolas-rabault · 2026-06-30T10:54:56Z

Ports LeLab to the latest LeRobot main (the upcoming 0.6.0), bumping the pin from 82dffde to 5ac3b49.

Fixes #51.

Breaking-change fixes

record.py — LeRobot replaced DatasetRecordConfig.vcodec with rgb_encoder/depth_encoder config objects (vcodec is now nested, e.g. --dataset.rgb_encoder.vcodec). The LeRobotDataset.create()/.resume() calls now forward rgb_encoder=/depth_encoder= instead of vcodec=.
train.py — TrainPipelineConfig.eval_freq was renamed to env_eval_freq; the request field and CLI flag are updated to --env_eval_freq.

Calibration, teleoperation, rollout, datasets, cameras, the LeRobotDataset core API, and record_loop/RecordConfig were verified unaffected.

HF Jobs: use LeRobot's native remote training

LeRobot now ships native HF-Jobs training (lerobot-train --job.target=<flavor>), which handles dataset push, config staging, in-pod checkpoint upload, log streaming, and resume. LeLab now relies on it instead of its own ~330-line reimplementation:

Cloud training is spawned as a local lerobot-train --job.target=<flavor> subprocess whose stdout LeLab tails — the same machinery as local runs. A shared SubprocessJobRunner base now backs both runners.
Deleted the in-pod sidecar uploader (WRAPPER_SOURCE), SSE log tailing, status polling, dataset-push, and wandb-secret handling — all now native.
The HF job id / page URL / model-repo are parsed from submit_to_hf's stdout markers and persisted by the registry watchdog. Remote cancellation goes through HfApi.cancel_job; reattach after a uvicorn --reload re-streams logs via HfApi.fetch_job_logs(follow=True) (no hf CLI dependency).
JobRegistry, the /jobs/* endpoints, and checkpoint discovery are kept — LeRobot has no equivalent.

For cloud jobs the builder also passes --save_checkpoint_to_hub (so per-step checkpoints reach the Hub, since pod storage is ephemeral) and omits --output_dir and --policy.push_to_hub/--policy.repo_id (an absolute host output_dir would otherwise be baked into the pod config and crash it; submit_to_hf owns the model repo). Cloud checkpoint listing falls back to the repo-root model when no checkpoints/<step>/ tree was pushed.

Validation

pytest: 161 passed; ruff check and ruff format --check clean.
New tests cover the cloud CLI-builder divergence (local vs cloud-fresh vs cloud-resume) and the cloud runner's stdout-marker parsing, reattach log re-streaming, remote cancel, and inspect_job stage → liveness/returncode mapping.
Verified end-to-end on hardware: dataset recording, local + HF-Cloud training (incl. checkpoint access on the Hub), and policy rollout on an SO-101 follower.

Net +530 / −591 lines.

Copilot

Pull request overview

This PR ports LeLab to the upcoming LeRobot 0.6.0 (main), bumping the dependency pin from 82dffde to 5ac3b49 and adapting to two breaking API renames, while replacing LeLab's ~330-line bespoke HF-Jobs cloud-training implementation with LeRobot's native remote-training feature (lerobot-train --job.target=<flavor>). It fits into the existing job/runner architecture by extracting a shared SubprocessJobRunner base that both the local and cloud runners now build on.

Changes:

API renames: record.py swaps vcodec= for rgb_encoder=/depth_encoder= on LeRobotDataset.create()/.resume(); train.py renames the eval_freq request field/flag to env_eval_freq.
Cloud runner rewrite: HfCloudJobRunner now spawns a local lerobot-train --job.target subprocess and tails its stdout (shared SubprocessJobRunner pipeline), parsing submission markers for the HF job id/page/model-repo, cancelling via HfApi.cancel_job, and reattaching after reload via HfApi.fetch_job_logs(follow=True) + inspect_job stage mapping.
CLI divergence + registry: cloud builds add --job.target/--job.tags/--save_checkpoint_to_hub and omit --output_dir/--policy.*; JobRegistry persists cloud ids asynchronously (_sync_cloud_ids) and routes cloud checkpoint listing through _list_imported_hub.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
pyproject.toml	Bumps the LeRobot git pin to commit `5ac3b49` (target 0.6.0).
lelab/record.py	Replaces `vcodec` with `rgb_encoder`/`depth_encoder` in dataset create/resume calls.
lelab/train.py	Renames `eval_freq`→`env_eval_freq`; adds `job_target` param and HF-Cloud CLI divergence (job flags, omit output_dir/policy.* on cloud).
lelab/jobs.py	Extracts shared `SubprocessJobRunner`; adds `_sync_cloud_ids`, cloud-aware checkpoint listing, simplified finalize/error message.
lelab/runners/hf_cloud.py	Rewrites cloud runner as a local subprocess tailer with stdout marker parsing, remote cancel, and reattach via the Hub Python API.
tests/test_train.py	Adds tests for `--env_eval_freq` and cloud vs local CLI-builder divergence.
tests/test_runners_hf_cloud.py	Replaces wandb-key tests with marker parsing, reattach streaming, remote cancel, and stage→liveness mapping tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Bump the lerobot pin to main (5ac3b49). Two breaking changes: - record.py: dataset video config moved from a single vcodec string to rgb_encoder/depth_encoder config objects; forward them to LeRobotDataset.create()/.resume(). - train.py: TrainPipelineConfig.eval_freq was renamed to env_eval_freq. Adopt LeRobot's native HF Jobs feature: cloud training now runs through 'lerobot-train --job.target=<flavor>', spawned as a local subprocess whose stdout LeLab tails (same machinery as local runs). This removes the hand-rolled cloud submission (~330 lines): the in-pod checkpoint sidecar, SSE log tailing, status polling, dataset push, and wandb-secret handling. A shared SubprocessJobRunner base now backs both runners. JobRegistry, the /jobs/* endpoints, and checkpoint discovery are kept as-is. Add cloud-path tests for the CLI builder and the stdout marker parser.

The preflight probes the local interpreter, but hf_cloud jobs run in their own environment, so a missing local package is irrelevant. Gate the check on the local runner so the install dialog never appears for cloud training.

Copilot AI review requested due to automatic review settings June 30, 2026 10:54

Copilot started reviewing on behalf of nicolas-rabault June 30, 2026 10:55 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

nicolas-rabault force-pushed the port/lerobot-0.6.0 branch from 05dbf74 to 9fa1022 Compare June 30, 2026 13:48

nicolas-rabault added 2 commits June 30, 2026 16:34

fix(training): skip policy-extra preflight for cloud runs

3c382e1

The preflight probes the local interpreter, but hf_cloud jobs run in their own environment, so a missing local package is irrelevant. Gate the check on the local runner so the install dialog never appears for cloud training.

nicolas-rabault force-pushed the port/lerobot-0.6.0 branch from 9fa1022 to 3c382e1 Compare June 30, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port LeLab to LeRobot 0.6.0 (main)#56

Port LeLab to LeRobot 0.6.0 (main)#56
nicolas-rabault wants to merge 2 commits into
mainfrom
port/lerobot-0.6.0

nicolas-rabault commented Jun 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nicolas-rabault commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking-change fixes

HF Jobs: use LeRobot's native remote training

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nicolas-rabault commented Jun 30, 2026 •

edited

Loading