v1.109 — Monte-Carlo dispersion campaigns + named closed-loop safety gate#248
Merged
Conversation
…oop safety gate Turns single-condition closed-loop point-tests into large randomised DISPERSION campaigns (NASA/POST2 discipline: seed per trial, sweep, reduce to pass-rate + worst-case margins). examples/falcon-sitl-gz/src/campaign.rs runs the SAME verified cascade (relay-geo + relay-mix-quad + relay-iekf FDI) across thousands of seeded-random trials, asserting the safety invariant across all of them. Three campaigns, 6000 dispersed trials/run, all 0 failures: - motor_out_monte_carlo_campaign (2000): random rotor fails at random time from random tilt/rate → FDI isolates correct rotor, MIX-P08 holds, no tumble, settles upright. Worst peak 0.837 rad, final 0.097, detect 2 ms. - att_stab_monte_carlo_campaign (2000): recover to level from random tilt/rate. - hexa_monte_carlo_campaign (2000): same geometric controller on 6 rotors. Reproducibility via hierarchical SplitMix64 (trial_rng(seed, i) — any failure replayable from its index); recoverable-envelope rejection-sampling so a FAIL is a real bug not "physics said no"; tight regression bounds just above measured worst case. NAMED GATE: new ci.yml job "Closed-loop simulation + Monte-Carlo campaigns" runs cargo test -p falcon-sitl-gz --nocapture — elevates the top-of-V integration tests (fault_tolerance/shield/mission/hexa) + the campaigns to a first-class visible required check, so the most safety-critical tests can't be silently narrowed out of scope. FV-FALCON-SIMMC-001 (verifies FAULT-P02/ATT-P01/MIXMULTI-P01). Research note grounded the dispersion design (PX4/ArduPilot approximate MC via field flights + log replay; falcon adopts the aerospace dispersion-deck discipline directly). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Simulation as a first-class, extensive verification layer
The formal proofs here are strong (142 Kani + the Lean Lyapunov chain), but the simulation layer was thin: ~25 short deterministic point-tests, each a single condition. This PR turns the closed-loop safety tests into large randomised dispersion campaigns — the NASA/POST2 discipline (seed per trial, sweep a dispersion vector, reduce to a pass-rate + worst-case margins).
Three campaigns — 6000 dispersed trials/run, all 0 failures
examples/falcon-sitl-gz/src/campaign.rsruns the same verified cascade (relay-geo GeoAtt + relay-mix-quad + relay-iekf FDI) across thousands of seeded-random trials:motor_out_monte_carlo_campaignatt_stab_monte_carlo_campaignhexa_monte_carlo_campaignDiscipline: hierarchical SplitMix64 seeding (
trial_rng(seed, i)— any failure replayable from its index); recoverable-envelope rejection sampling so a FAIL is a real controller bug, never "the IC was physically unrecoverable"; tight regression bounds set just above the measured worst case (early-warning long before the physical safety bound).Named gate — elevating the safety suite
New
ci.ymljob "Closed-loop simulation + Monte-Carlo campaigns" runscargo test -p falcon-sitl-gz --nocapture. This makes the top-of-the-V integration tests (fault_tolerance/shield/mission/hexa) and the campaigns a first-class, visible required check — they already ran inside the genericcargo test --workspace, but a dedicated gate keeps the most safety-critical tests from being silently narrowed out of scope (the recurring orphaned-verification failure mode), and surfaces the worst-case-margin lines in the CI log for trend-watching.Verification
falcon-sitl-gzsuite: 22/22.campaign.rsclippy-clean.rivet validatePASS.FV-FALCON-SIMMC-001verifies FAULT-P02 / ATT-P01 / MIXMULTI-P01.Next slices (scoped in the artifact)
Add sensor-noise / wind / mass-inertia dispersion + full 6-DOF Gazebo physics, and wind→RTL / runaway→terminate fail-safe campaigns (three-way envelope classification). Plus the
--scenario=rotor-outrecordable flight for video.🤖 Generated with Claude Code