Skip to content

v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate#248

Merged
avrabe merged 1 commit into
mainfrom
feat/monte-carlo-sim-campaigns
Jul 1, 2026
Merged

v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate#248
avrabe merged 1 commit into
mainfrom
feat/monte-carlo-sim-campaigns

Conversation

@avrabe

@avrabe avrabe commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Simulation as a first-class, extensive verification layer

The formal proofs here are strong (142 Kani + the Lean Lyapunov chain), but the simulation layer was thin: ~25 short deterministic point-tests, each a single condition. This PR turns the closed-loop safety tests into large randomised dispersion campaigns — the NASA/POST2 discipline (seed per trial, sweep a dispersion vector, reduce to a pass-rate + worst-case margins).

Framing from the research note: PX4/ArduPilot don't run dispersed Monte-Carlo in CI — their "thousands of hours" is field flights + EKF log-replay. Adopting the aerospace dispersion-deck discipline directly is a differentiator, not a copy.

Three campaigns — 6000 dispersed trials/run, all 0 failures

examples/falcon-sitl-gz/src/campaign.rs runs the same verified cascade (relay-geo GeoAtt + relay-mix-quad + relay-iekf FDI) across thousands of seeded-random trials:

campaign trials worst peak tilt worst final tilt note
motor_out_monte_carlo_campaign 2000 0.837 rad 0.097 rad random rotor fails at random time from random IC; FDI isolates + reconfigures; no tumble, settles upright; MIX-P08 holds; FDI detect 2 ms
att_stab_monte_carlo_campaign 2000 0.554 rad ~0 recover to level from random tilt (≤0.5 rad) + rate (≤1 rad/s)
hexa_monte_carlo_campaign 2000 0.523 rad ~0 same geometric controller on a 6-rotor airframe

Discipline: hierarchical SplitMix64 seeding (trial_rng(seed, i) — any failure replayable from its index); recoverable-envelope rejection sampling so a FAIL is a real controller bug, never "the IC was physically unrecoverable"; tight regression bounds set just above the measured worst case (early-warning long before the physical safety bound).

Named gate — elevating the safety suite

New ci.yml job "Closed-loop simulation + Monte-Carlo campaigns" runs cargo test -p falcon-sitl-gz --nocapture. This makes the top-of-the-V integration tests (fault_tolerance / shield / mission / hexa) and the campaigns a first-class, visible required check — they already ran inside the generic cargo test --workspace, but a dedicated gate keeps the most safety-critical tests from being silently narrowed out of scope (the recurring orphaned-verification failure mode), and surfaces the worst-case-margin lines in the CI log for trend-watching.

Verification

  • All 3 campaigns: 2000 trials each, 0 failures. Full falcon-sitl-gz suite: 22/22. campaign.rs clippy-clean.
  • rivet validate PASS. FV-FALCON-SIMMC-001 verifies FAULT-P02 / ATT-P01 / MIXMULTI-P01.

Next slices (scoped in the artifact)

Add sensor-noise / wind / mass-inertia dispersion + full 6-DOF Gazebo physics, and wind→RTL / runaway→terminate fail-safe campaigns (three-way envelope classification). Plus the --scenario=rotor-out recordable flight for video.

🤖 Generated with Claude Code

…oop safety gate

Turns single-condition closed-loop point-tests into large randomised DISPERSION
campaigns (NASA/POST2 discipline: seed per trial, sweep, reduce to pass-rate +
worst-case margins). examples/falcon-sitl-gz/src/campaign.rs runs the SAME
verified cascade (relay-geo + relay-mix-quad + relay-iekf FDI) across thousands
of seeded-random trials, asserting the safety invariant across all of them.

Three campaigns, 6000 dispersed trials/run, all 0 failures:
- motor_out_monte_carlo_campaign (2000): random rotor fails at random time from
  random tilt/rate → FDI isolates correct rotor, MIX-P08 holds, no tumble,
  settles upright. Worst peak 0.837 rad, final 0.097, detect 2 ms.
- att_stab_monte_carlo_campaign (2000): recover to level from random tilt/rate.
- hexa_monte_carlo_campaign (2000): same geometric controller on 6 rotors.

Reproducibility via hierarchical SplitMix64 (trial_rng(seed, i) — any failure
replayable from its index); recoverable-envelope rejection-sampling so a FAIL is
a real bug not "physics said no"; tight regression bounds just above measured
worst case.

NAMED GATE: new ci.yml job "Closed-loop simulation + Monte-Carlo campaigns"
runs cargo test -p falcon-sitl-gz --nocapture — elevates the top-of-V
integration tests (fault_tolerance/shield/mission/hexa) + the campaigns to a
first-class visible required check, so the most safety-critical tests can't be
silently narrowed out of scope.

FV-FALCON-SIMMC-001 (verifies FAULT-P02/ATT-P01/MIXMULTI-P01). Research note
grounded the dispersion design (PX4/ArduPilot approximate MC via field flights
+ log replay; falcon adopts the aerospace dispersion-deck discipline directly).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 597780c into main Jul 1, 2026
96 of 98 checks passed
@avrabe avrabe deleted the feat/monte-carlo-sim-campaigns branch July 1, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant