MCAP-native incident replay for ROS 2 robots. Capture the 60 seconds before a failure, then scrub it like a black box in Foxglove.
When a robot misbehaves, you want the 60 seconds before — what it was seeing, what it was commanding, what state it was in. MissionDebug runs alongside your ROS 2 stack, keeps a rolling buffer of the topics you care about, and writes a standard MCAP file the moment something goes wrong. Detectors fire automatically (stall, path deviation, low battery, topic dropout, or any rule you write in YAML). Open the web UI, click a session, scrub the timeline. Annotate the moment, share a deep-linked URL with a teammate.
Standards-native. Local-first. No cloud, no login, no proprietary format.
https://github.com/mukul-07/missiondebug/raw/main/docs/demo.mp4
| When to reach for it | |
|---|---|
| MissionDebug | After a failure. "What happened in the 60 seconds before it broke?" Auto-trigger on rules, replay in the browser, annotate, share a timestamped link. |
rosbag2 --snapshot-mode |
The recording primitive MissionDebug wraps. Same in-memory rolling buffer, but no auto-trigger, no UI, no detection, no retention. Reach for it if you're building your own stack on top. |
| Live-ops tools (e.g. ros2_medkit) | During the incident. "What is the robot doing right now?" Live introspection, fault APIs, remote operations. Different time-shape — complementary, not substitute. |
| Continuous bag recorders | Keep the deep archive of high-volume topics (point clouds, costmaps). MissionDebug owns the focused, replay-ready layer for the topics engineers actually scrub during incidents. |
MissionDebug is the post-incident layer. It's what you reach for after the alert fires.
Most ROS debugging tools assume you knew to start recording. MissionDebug always has the last 60 seconds of your robot in RAM and snapshots it when things go wrong — manually, or automatically when a detector fires. The agent runs entirely on the robot; nothing leaves the machine unless you copy it off. Useful in defense, hospital, industrial, and other environments where cloud-first observability isn't an option.
Sessions auto-save when a detector fires; the label tells you why. Click one to scrub the timeline.
No ROS install, no source checkout — just Docker. See missiondebug-demos: git clone → docker compose up → scrub the sample_drive fixture in your browser.
You need Ubuntu 22.04 (or 24.04), ROS 2 Humble (or Jazzy), Python 3.10+, Node 20+, pnpm 9+, and tmux.
git clone https://github.com/mukul-07/missiondebug.git
cd missiondebug
make install
source /opt/ros/humble/setup.bash
MD_FIXTURES=1 make devOpen http://localhost:5173. The session list will already contain a sample_drive fixture — click it, scrub the timeline.
The fixture is 30 seconds long with a deliberate stall (8–14s) and a 0.8m path deviation (14–22s). Watch the velocity chart drop, the orange dot freeze, then drift off the green line.
Build the three .debs (Linux only):
sudo apt install fakeroot dpkg-dev python3-pip python3-venv nodejs
sudo npm install -g pnpm
make package
ls dist/
# missiondebug-agent_1.0.0_<arch>.deb — captures sessions
# missiondebug-backend_1.0.0_<arch>.deb — API + session index + retention
# missiondebug-web_1.0.0_all.deb — static UI (backend serves it)Install on the target robot:
sudo dpkg -i dist/missiondebug-agent_1.0.0_<arch>.deb
sudo dpkg -i dist/missiondebug-backend_1.0.0_<arch>.deb
sudo dpkg -i dist/missiondebug-web_1.0.0_all.debThat's it. All three start automatically and run at boot:
| Service | Port | Purpose |
|---|---|---|
missiondebug-agent |
127.0.0.1:7000 |
Subscribes to ROS topics, writes MCAPs on anomaly |
missiondebug-backend |
0.0.0.0:8000 |
Indexes MCAPs, serves UI + API |
| (web — static files served by backend) | — | UI at http://<robot>:8000 |
Browse to http://<robot>:8000 from any machine on the network. No nginx, no separate web service, no proxy.
The default config captures /cmd_vel, /tf, /plan, and a camera. To match your stack, edit:
sudo nano /etc/missiondebug/config.yaml
sudo systemctl restart missiondebug-agent
journalctl -u missiondebug-agent -n 30 --no-pagerYou should see Subscribed to <topic> [<type>] for every topic in your config, plus Loaded N config-driven rule(s) if you added rules.
For ready-to-edit starting points see examples/:
ground-vehicle-config.yaml— AMRs, delivery bots, indoor service robotsdrone-config.yaml— UAV via mavrosmanipulator-config.yaml— robot arm + MoveIt2rule-patterns.yaml— copy-paste cookbook of detector recipes
Whenever a built-in detector or one of your rules fires, the agent saves the previous 60 seconds as an MCAP. Browse to http://<robot>:8000 and your sessions show up at the top of the list, labeled with what triggered them (anomaly:stall, anomaly:my-rule-name, anomaly:dropout:/lidar, etc.).
Click a session → timeline + chart + pose track render. Drag the scrubber, hit space to play, use ←/→ for 100ms steps, Shift+← / Shift+→ for 1s steps. Add notes at the playhead with the + Add at playhead button. Copy a deep-linked ?t=23.4 URL with Copy link to share an exact frame with a teammate.
When the robot does something weird but no rule fired, capture the last 60s yourself:
curl -X POST http://<robot>:7000/sessions/save -H 'Content-Type: application/json' \
-d '{"label":"weird-behavior-after-corner"}'Refresh the session list — your snapshot is there with that label.
Edit /etc/missiondebug/config.yaml:
anomaly:
rules:
- name: my-rule
topic: /my_topic
field: data # dot-path; e.g. "data" or "status.status" or "linear.x"
equals: true # exactly one: equals / not_equals / lt / gt / lte / gte
duration_seconds: 0 # how long the condition must hold (0 = instant)
cooldown_seconds: 30 # min gap between firesRestart with sudo systemctl restart missiondebug-agent. To verify it loads:
journalctl -u missiondebug-agent -n 20 --no-pager | grep "Loaded.*rule"To trigger it manually for testing:
ros2 topic pub --once /my_topic std_msgs/Bool '{data: true}'
ls -lh /var/lib/missiondebug/sessions/ # new MCAP appearsSee examples/rule-patterns.yaml for recipes covering numeric thresholds, string equals, boolean flags, and actionlib aborts.
The backend deletes oldest sessions when total MCAP bytes exceed MD_MAX_DISK_MB (default: 2048 MB). To change:
sudo nano /etc/missiondebug/backend.env # set MD_MAX_DISK_MB=<n>
sudo systemctl restart missiondebug-backendInspect or force a sweep:
curl -s http://localhost:8000/api/admin/disk
# {"used_bytes": ..., "used_mb": ..., "cap_mb": 2048, "cap_enabled": true, "session_count": 14}
curl -s -X POST http://localhost:8000/api/admin/sweep
# {"deleted_ids": [...], "bytes_freed": ..., "bytes_after": ..., "cap_bytes": ...}# Health
sudo systemctl status missiondebug-agent missiondebug-backend
journalctl -u missiondebug-agent -f # tail capture logs
journalctl -u missiondebug-backend -f # tail backend/UI logs
# Inspect captures
ls -lh /var/lib/missiondebug/sessions/
curl -s http://localhost:8000/api/sessions | jq '.sessions[0:3]'
# Trigger a stall manually (publishes zero cmd_vel for 6s)
timeout 6 ros2 topic pub -r 10 /cmd_vel geometry_msgs/Twist '{linear: {x: 0.0}}'- Agent + your shell on different ROS graphs — if
ros2 node listfrom your shell can't see/missiondebug_agent, you have a DDS isolation issue (different RMW_IMPLEMENTATION between the systemd service and your shell). Switch the service to Cyclone DDS:sudo apt install -y ros-humble-rmw-cyclonedds-cpp sudo systemctl edit missiondebug-agent # Add: # [Service] # Environment=RMW_IMPLEMENTATION=rmw_cyclonedds_cpp echo 'export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp' >> ~/.bashrc sudo systemctl restart missiondebug-agent
- No sessions appearing — verify the topics in your config exist (
ros2 topic list), the rule loaded (journalctl -u missiondebug-agent | grep Loaded), and the condition is actually being met. Try the manual stall trigger above first. - ROS 1 + ROS 2 env mixed — if your shell shows
ROS_MASTER_URIalongsideROS_DISTRO=humble, your~/.bashrcis sourcing both. Comment out the noetic line.
Both services expose interactive Swagger UI at /docs and OpenAPI JSON at /openapi.json:
open http://<robot>:7000/docs # agent (capture)
open http://<robot>:8000/docs # backend (sessions, files, annotations, admin)Trigger a session save from your existing monitoring with one POST:
curl -X POST http://<robot>:7000/sessions/save \
-H 'Content-Type: application/json' \
-d '{"label":"alertmanager:critical-latency"}'Full endpoint reference + curl cookbook: docs/API.md.
MissionDebug captures session data when its built-in detectors fire — but your existing monitoring stack already detects plenty of things the built-in detectors don't. Point those external alerts at the agent's save endpoint and you get root-cause replay for free:
- Generic webhook —
curl -X POST .../sessions/save -d '{"label":"..."}'from any monitoring tool - Prometheus Alertmanager — webhook receiver + small shim that turns
alerts[]into a label - ros2_medkit Triggers — bridge script that subscribes to medkit's SSE event stream and forwards triggers
Full recipes + working scripts: docs/INTEGRATIONS.md.
- Agent (Python,
agent/) — rclpy subscribers → per-topic ring buffers in RAM (rate-limited & sized) → MCAP writer → control HTTP API on:7000. Built-in detectors (stall, path-deviation, battery_low, topic_dropout) plus a config-driven rule engine; all detectors auto-save and label the resulting session. - Backend (FastAPI + SQLite,
backend/) — auto-rescans the sessions directory every 5s, indexes MCAP metadata, serves files with HTTP range support so the browser streams. Disk-retention sweeper runs every 30s. Mounts the web UI's static dist at/when present. - Web (React + Vite,
web/) — Web Worker decodes the MCAP using@foxglove/rosmsg2-serialization, renders synchronized video / pose / scalar tracks (one chart per numeric topic, filterable) and a JSON inspector at the playhead. Annotations stored server-side; URLs are deep-linkable with?t=23.4.
Specs:
- SPEC.md — v0 (record + replay loop, single robot, localhost)
- v1-SPEC.md — v1 (path-deviation, annotations, share links,
.deb, fixture) - v1.5-SPEC.md — v1.5 (config-driven rules, topic dropout, disk retention, full backend/web
.debs) - v2-SPEC.md — v2 Fleet Edition (central hub, agent→hub sync, auth, S3 upload, fleet-ready topic rendering)
make test # 87 tests across agent + backend, ~1sMIT — see LICENSE.

