v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate by avrabe · Pull Request #248 · pulseengine/relay

avrabe · 2026-07-01T16:22:22Z

Simulation as a first-class, extensive verification layer

The formal proofs here are strong (142 Kani + the Lean Lyapunov chain), but the simulation layer was thin: ~25 short deterministic point-tests, each a single condition. This PR turns the closed-loop safety tests into large randomised dispersion campaigns — the NASA/POST2 discipline (seed per trial, sweep a dispersion vector, reduce to a pass-rate + worst-case margins).

Framing from the research note: PX4/ArduPilot don't run dispersed Monte-Carlo in CI — their "thousands of hours" is field flights + EKF log-replay. Adopting the aerospace dispersion-deck discipline directly is a differentiator, not a copy.

Three campaigns — 6000 dispersed trials/run, all 0 failures

examples/falcon-sitl-gz/src/campaign.rs runs the same verified cascade (relay-geo GeoAtt + relay-mix-quad + relay-iekf FDI) across thousands of seeded-random trials:

campaign	trials	worst peak tilt	worst final tilt	note
`motor_out_monte_carlo_campaign`	2000	0.837 rad	0.097 rad	random rotor fails at random time from random IC; FDI isolates + reconfigures; no tumble, settles upright; MIX-P08 holds; FDI detect 2 ms
`att_stab_monte_carlo_campaign`	2000	0.554 rad	~0	recover to level from random tilt (≤0.5 rad) + rate (≤1 rad/s)
`hexa_monte_carlo_campaign`	2000	0.523 rad	~0	same geometric controller on a 6-rotor airframe

Discipline: hierarchical SplitMix64 seeding (trial_rng(seed, i) — any failure replayable from its index); recoverable-envelope rejection sampling so a FAIL is a real controller bug, never "the IC was physically unrecoverable"; tight regression bounds set just above the measured worst case (early-warning long before the physical safety bound).

Named gate — elevating the safety suite

New ci.yml job "Closed-loop simulation + Monte-Carlo campaigns" runs cargo test -p falcon-sitl-gz --nocapture. This makes the top-of-the-V integration tests (fault_tolerance / shield / mission / hexa) and the campaigns a first-class, visible required check — they already ran inside the generic cargo test --workspace, but a dedicated gate keeps the most safety-critical tests from being silently narrowed out of scope (the recurring orphaned-verification failure mode), and surfaces the worst-case-margin lines in the CI log for trend-watching.

Verification

All 3 campaigns: 2000 trials each, 0 failures. Full falcon-sitl-gz suite: 22/22. campaign.rs clippy-clean.
rivet validate PASS. FV-FALCON-SIMMC-001 verifies FAULT-P02 / ATT-P01 / MIXMULTI-P01.

Next slices (scoped in the artifact)

Add sensor-noise / wind / mass-inertia dispersion + full 6-DOF Gazebo physics, and wind→RTL / runaway→terminate fail-safe campaigns (three-way envelope classification). Plus the --scenario=rotor-out recordable flight for video.

🤖 Generated with Claude Code

…oop safety gate Turns single-condition closed-loop point-tests into large randomised DISPERSION campaigns (NASA/POST2 discipline: seed per trial, sweep, reduce to pass-rate + worst-case margins). examples/falcon-sitl-gz/src/campaign.rs runs the SAME verified cascade (relay-geo + relay-mix-quad + relay-iekf FDI) across thousands of seeded-random trials, asserting the safety invariant across all of them. Three campaigns, 6000 dispersed trials/run, all 0 failures: - motor_out_monte_carlo_campaign (2000): random rotor fails at random time from random tilt/rate → FDI isolates correct rotor, MIX-P08 holds, no tumble, settles upright. Worst peak 0.837 rad, final 0.097, detect 2 ms. - att_stab_monte_carlo_campaign (2000): recover to level from random tilt/rate. - hexa_monte_carlo_campaign (2000): same geometric controller on 6 rotors. Reproducibility via hierarchical SplitMix64 (trial_rng(seed, i) — any failure replayable from its index); recoverable-envelope rejection-sampling so a FAIL is a real bug not "physics said no"; tight regression bounds just above measured worst case. NAMED GATE: new ci.yml job "Closed-loop simulation + Monte-Carlo campaigns" runs cargo test -p falcon-sitl-gz --nocapture — elevates the top-of-V integration tests (fault_tolerance/shield/mission/hexa) + the campaigns to a first-class visible required check, so the most safety-critical tests can't be silently narrowed out of scope. FV-FALCON-SIMMC-001 (verifies FAULT-P02/ATT-P01/MIXMULTI-P01). Research note grounded the dispersion design (PX4/ArduPilot approximate MC via field flights + log replay; falcon adopts the aerospace dispersion-deck discipline directly). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

avrabe merged commit 597780c into main Jul 1, 2026
96 of 98 checks passed

avrabe deleted the feat/monte-carlo-sim-campaigns branch July 1, 2026 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate#248

v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate#248
avrabe merged 1 commit into
mainfrom
feat/monte-carlo-sim-campaigns

avrabe commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

avrabe commented Jul 1, 2026

Simulation as a first-class, extensive verification layer

Three campaigns — 6000 dispersed trials/run, all 0 failures

Named gate — elevating the safety suite

Verification

Next slices (scoped in the artifact)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant