Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ac398ab
fp-stability: confirm, rank, and disambiguate dd_line hotspots
sbryngelson Jun 1, 2026
196aff5
fp-stability: per-instance disambiguation of fypp-expanded hotspots (…
sbryngelson Jun 2, 2026
bc7e516
fp-stability: distinguish precision-sensitivity from cancellation-ori…
sbryngelson Jun 2, 2026
1825dd9
fp-stability: rank cancellation by severity (bits lost), not count; r…
sbryngelson Jun 2, 2026
b9e790f
fp-stability: scale-free pass/fail via significant bits, replacing 6 …
sbryngelson Jun 2, 2026
84bec6d
fp-stability: lead with cancellation origins, report digits lost (not…
sbryngelson Jun 2, 2026
d45bc5b
fp-stability: de-duplicate helpers from the review additions (no beha…
sbryngelson Jun 2, 2026
9f868c7
fp-stability: split the 1876-line module into metrics/runners/report …
sbryngelson Jun 2, 2026
982ec89
fp-stability: remove Tier 2 per-instance disambiguation entirely
sbryngelson Jun 2, 2026
2276eb1
fp-stability: accept a user case.py (positional, like run), with a fe…
sbryngelson Jun 2, 2026
3b662db
fp-stability: refresh --help description for case.py usage + sig-bits…
sbryngelson Jun 2, 2026
c6637a0
fp-stability: address PR review — silent failures, 1-row crash, dead …
sbryngelson Jun 2, 2026
d3919d5
fp-stability: add opt-in Verrou bootstrap script + actionable SKIP me…
sbryngelson Jun 2, 2026
c27b6ae
ci(fp-stability): build Verrou via the shared bootstrap script (DRY)
sbryngelson Jun 2, 2026
a4cfa79
fp-stability: install Verrou from prebuilt artifact (verrou-dist), bu…
sbryngelson Jun 2, 2026
0613913
fp-stability: auto-install Verrou on first use (download prebuilt), h…
sbryngelson Jun 2, 2026
37b7a21
fp-stability: address PR review — atomic prebuilt install, verify VER…
sbryngelson Jun 2, 2026
1f10f31
fp-stability: drop dd line/sym bisection; keep cancellation + move fy…
sbryngelson Jun 2, 2026
eea0c8d
fp-stability: drop the MCA pass (redundant with the random-rounding s…
sbryngelson Jun 3, 2026
259d95c
Merge branch 'master' into fp-stability-dd-confirm-rank
sbryngelson Jun 3, 2026
c4d1ef0
fp-stability: address Copilot review — verify --verrou-binary executa…
sbryngelson Jun 3, 2026
39a1b0f
fp-stability: give _run_simulation_verrou sole ownership of run_dir c…
sbryngelson Jun 3, 2026
d809997
fp-stability: prune unit tests to the high-value contracts (33 -> 17)
sbryngelson Jun 3, 2026
a9dbb42
ci(fp-stability): derive Verrou cache key from verrou.sh content (no …
sbryngelson Jun 3, 2026
c58d44f
fp-stability: remove emoji from console + GitHub-summary output (ASCI…
sbryngelson Jun 3, 2026
0099674
fp-stability: ascii-only — convert em-dash/arrow/math glyphs in comme…
sbryngelson Jun 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 11 additions & 26 deletions .github/workflows/fp-stability.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ name: FP Stability
# On FAIL: verrou_dd_sym runs to identify the responsible function symbols.
# Logs are uploaded as CI artifacts.
#
# Verrou (Valgrind 3.26.0 + edf-hpc/verrou@a58d434) is built once and cached.
# Build takes ~20 min uncached; cached runs restore in ~30 s.
# Verrou (the pinned Valgrind+Verrou pair; versions live in toolchain/bootstrap/verrou.sh)
# is installed by fp-stability on first use and cached. The prebuilt download is seconds;
# a cache miss with no prebuilt falls back to a ~20-min source build.

on:
push:
Expand Down Expand Up @@ -68,37 +69,21 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.local/verrou
key: verrou-a58d434-valgrind-3.26.0-${{ runner.os }}
# Key off the installer's content so any version bump (or other edit) in
# verrou.sh auto-busts the cache and forces a fresh install — no hand-synced
# version string to drift out of date.
key: verrou-${{ hashFiles('toolchain/bootstrap/verrou.sh') }}-${{ runner.os }}

- name: Install system dependencies
run: |
sudo apt-get update -y
sudo apt-get install -y \
build-essential automake python3 python3-numpy libc6-dbg \
cmake gfortran
cmake gfortran zstd

- name: Build Verrou
if: steps.cache-verrou.outputs.cache-hit != 'true'
run: |
cd /tmp
wget -q https://sourceware.org/pub/valgrind/valgrind-3.26.0.tar.bz2
tar xf valgrind-3.26.0.tar.bz2

git clone https://github.com/edf-hpc/verrou.git
git -C verrou checkout a58d434

# Merge Verrou into Valgrind source tree and patch
cp -r verrou valgrind-3.26.0/verrou
cd valgrind-3.26.0
cat verrou/valgrind.*diff | patch -p1

./autogen.sh
./configure --enable-only64bit --prefix="$HOME/.local/verrou"
make -j"$(nproc)"
make install

- name: Verify Verrou
run: ~/.local/verrou/bin/valgrind --version
# Verrou is installed by `fp-stability` itself on first use (downloads the
# prebuilt artifact; aborts if that fails). The cache above restores it across
# runs so the download only happens on a cache miss.

- name: Build MFC (debug, serial)
# FFLAGS=-fno-inline prevents gfortran from inlining small functions into
Expand Down
168 changes: 168 additions & 0 deletions toolchain/bootstrap/verrou.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
#!/bin/bash
#
# Opt-in installer for Verrou (the Valgrind FP-perturbation tool used by
# `./mfc.sh fp-stability`). Verrou is NOT a Python/pip package - it is a fork of
# Valgrind. By default this downloads a prebuilt, hash-verified artifact (seconds);
# if none is available for this tag/arch it falls back to a source build (~20 min).
# fp-stability auto-runs this on first use when Verrou is absent (printing what it
# does); it is also safe to run by hand. A failed install aborts, never a silent skip.
#
# bash toolchain/bootstrap/verrou.sh # install into $HOME/.local/verrou
# VERROU_HOME=/path bash toolchain/bootstrap/verrou.sh
# bash toolchain/bootstrap/verrou.sh --force # reinstall even if present
# VERROU_BUILD_FROM_SOURCE=1 bash toolchain/bootstrap/verrou.sh # skip the prebuilt
#
# Versions are pinned to match the fp-stability CI workflow.

set -euo pipefail

VALGRIND_VERSION="3.26.0"
VERROU_COMMIT="a58d434"
# Prebuilt artifacts (built once per arch) live in a small companion repo. The tag
# pins to the (valgrind, verrou) pair above - bump all three together.
VERROU_DIST_REPO="${VERROU_DIST_REPO:-sbryngelson/verrou-dist}"
VERROU_DIST_TAG="${VERROU_DIST_TAG:-v1}"
PREFIX="${VERROU_HOME:-$HOME/.local/verrou}"
FORCE="${1:-}"

echo "==> Verrou bootstrap (Valgrind ${VALGRIND_VERSION} + edf-hpc/verrou@${VERROU_COMMIT}) -> ${PREFIX}"

# Idempotent: skip if already installed and working. Source env.sh first if present
# (a prebuilt tree needs VALGRIND_LIB to run; a source build works either way).
if [ "$FORCE" != "--force" ] && [ -x "${PREFIX}/bin/valgrind" ] \
&& ( [ -f "${PREFIX}/env.sh" ] && . "${PREFIX}/env.sh"; "${PREFIX}/bin/valgrind" --tool=verrou --version >/dev/null 2>&1 ); then
echo "==> Verrou already installed at ${PREFIX} (use --force to rebuild). Nothing to do."
exit 0
fi

# Platform: Valgrind has no working modern-macOS support; Linux only.
if [ "$(uname -s)" != "Linux" ]; then
echo "ERROR: Verrou requires Linux (Valgrind does not support modern macOS, incl. Apple Silicon)." >&2
exit 1
fi
arch_tag=""
case "$(uname -m)" in
x86_64) arch_tag="x86_64" ;;
aarch64|arm64)
arch_tag="aarch64"
echo "WARNING: $(uname -m) detected. Valgrind builds here, but Verrou's FP backends are" >&2
echo " best-validated on x86_64 - treat results as experimental on this arch." >&2
;;
*)
echo "WARNING: unrecognised arch $(uname -m); the build may fail. Proceeding anyway." >&2
;;
esac

# Fast path: download a prebuilt, hash-verified artifact and source its relocatable
# env.sh, instead of building from source. Any failure (no asset for this arch/tag,
# missing zstd/sha256sum, checksum mismatch, won't run) falls through to the build.
try_prebuilt() {
[ -n "$arch_tag" ] || return 1
[ "${VERROU_BUILD_FROM_SOURCE:-}" = "1" ] && return 1
command -v sha256sum >/dev/null 2>&1 || return 1
tar --zstd --help >/dev/null 2>&1 || command -v zstd >/dev/null 2>&1 || return 1
command -v curl >/dev/null 2>&1 || command -v wget >/dev/null 2>&1 || return 1

local asset base dl
asset="verrou-${VERROU_COMMIT}-valgrind-${VALGRIND_VERSION}-linux-${arch_tag}.tar.zst"
base="https://github.com/${VERROU_DIST_REPO}/releases/download/${VERROU_DIST_TAG}/${asset}"
dl="$(mktemp -d)"

echo "==> Trying prebuilt ${VERROU_DIST_REPO}@${VERROU_DIST_TAG} (${asset})"
_fetch() { # url dest
if command -v curl >/dev/null 2>&1; then curl -fsSL -o "$2" "$1"; else wget -q -O "$2" "$1"; fi
}
if ! _fetch "$base" "$dl/$asset" || ! _fetch "$base.sha256" "$dl/$asset.sha256"; then
echo "==> No prebuilt for this tag/arch - building from source instead."
rm -rf "$dl"; return 1
fi
if ! ( cd "$dl" && sha256sum -c "$asset.sha256" >/dev/null 2>&1 ); then
echo "WARNING: prebuilt checksum mismatch - building from source instead." >&2
rm -rf "$dl"; return 1
fi

# Extract + verify in a staging dir, then swap into $PREFIX atomically. set -e
# is suppressed inside a function used as an `if` condition, so check each step
# explicitly - otherwise a failed extract would fall through and the source
# build would install on top of a half-written tree (or a stale one on --force).
local stage="$dl/stage"
mkdir -p "$stage"
if tar --zstd --help >/dev/null 2>&1; then
tar -C "$stage" --zstd -xf "$dl/$asset" || { echo "WARNING: prebuilt extract failed - building from source instead." >&2; rm -rf "$dl"; return 1; }
else
zstd -dc "$dl/$asset" | tar -C "$stage" -xf - || { echo "WARNING: prebuilt extract failed - building from source instead." >&2; rm -rf "$dl"; return 1; }
fi

# Valgrind bakes its build prefix into the binary; the artifact's env.sh sets
# VALGRIND_LIB relative to the tree so the relocated install works. Verify the
# staged tree runs before committing it.
if ! ( . "${stage}/env.sh" && "${stage}/bin/valgrind" --tool=verrou --version >/dev/null 2>&1 ); then
echo "WARNING: prebuilt did not run - building from source instead." >&2
rm -rf "$dl"; return 1
fi

# Commit only now: replace any existing $PREFIX atomically.
mkdir -p "$(dirname "$PREFIX")"
rm -rf "$PREFIX"
if ! mv "$stage" "$PREFIX"; then
echo "WARNING: could not install prebuilt to ${PREFIX} - building from source instead." >&2
rm -rf "$dl"; return 1
fi
rm -rf "$dl"
return 0
}

if try_prebuilt; then
echo "==> Verifying"
( . "${PREFIX}/env.sh" && "${PREFIX}/bin/valgrind" --tool=verrou --version )
echo "==> Done (prebuilt). Verrou installed at ${PREFIX}"
echo " Run: ./mfc.sh fp-stability (or set VERROU_HOME=${PREFIX} if you used a custom prefix)"
exit 0
fi

# Build dependencies.
missing=""
for tool in tar git make patch autoconf automake; do
command -v "$tool" >/dev/null 2>&1 || missing="$missing $tool"
done
command -v cc >/dev/null 2>&1 || command -v gcc >/dev/null 2>&1 || missing="$missing gcc"
command -v wget >/dev/null 2>&1 || command -v curl >/dev/null 2>&1 || missing="$missing wget/curl"
if [ -n "$missing" ]; then
echo "ERROR: missing build dependencies:$missing" >&2
echo " Install them (e.g. apt: build-essential automake autoconf libtool; or load HPC modules) and retry." >&2
exit 1
fi

workdir="$(mktemp -d)"
trap 'rm -rf "$workdir"' EXIT
cd "$workdir"

tarball="valgrind-${VALGRIND_VERSION}.tar.bz2"
url="https://sourceware.org/pub/valgrind/${tarball}"
echo "==> Downloading ${tarball}"
if command -v wget >/dev/null 2>&1; then
wget -q "$url"
else
curl -fsSL -o "$tarball" "$url"
fi
tar xf "$tarball"

echo "==> Cloning Verrou @ ${VERROU_COMMIT}"
git clone --quiet https://github.com/edf-hpc/verrou.git
git -C verrou checkout --quiet "$VERROU_COMMIT"

# Merge Verrou into the Valgrind tree and apply its patch.
cp -r verrou "valgrind-${VALGRIND_VERSION}/verrou"
cd "valgrind-${VALGRIND_VERSION}"
cat verrou/valgrind.*diff | patch -p1

echo "==> Building (this takes ~20 min)"
./autogen.sh
./configure --enable-only64bit --prefix="$PREFIX"
make -j"$(nproc)"
make install

echo "==> Verifying"
"${PREFIX}/bin/valgrind" --tool=verrou --version
echo "==> Done. Verrou installed at ${PREFIX}"
echo " Run: ./mfc.sh fp-stability (or set VERROU_HOME=${PREFIX} if you used a custom prefix)"
74 changes: 30 additions & 44 deletions toolchain/mfc/cli/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -898,27 +898,36 @@
name="fp-stability",
help="Run floating-point stability tests using Verrou.",
description=(
"Runs each registered test case N times under Verrou's random IEEE-754 "
"rounding mode and compares against a nearest-rounding reference run. "
"Reports the max L∞ deviation and PASS/FAIL against per-case thresholds.\n\n"
"Requires a Verrou-enabled Valgrind at $VERROU_HOME/bin/valgrind "
"(defaults to $HOME/.local/verrou). The simulation and pre_process "
"binaries must be serial (no-MPI, no-GPU) debug builds.\n\n"
"Test cases:\n"
" sod_standard 1-D standard Sod, p_L/p_R=10 (well-conditioned baseline)\n"
" sod_strong 1-D Sod, p_L/p_R=100,000 — HLLC xi-factor cancellation\n"
" water_stiffened 1-D water shock (pi_inf=4046) — pressure-recovery cancellation\n"
" air_water_interface 1-D air/water contact (two-fluid) — mixed-cell cancellation\n\n"
"Additional features (skip with --no-* flags):\n"
"Runs Verrou random-rounding stability analysis on a built-in suite of small "
"1-D cases, or - given a case .py (positional INPUT) - on your own case. Each "
"case is run N times under Verrou's random IEEE-754 rounding and compared "
"against a nearest-rounding reference. PASS/FAIL is scale-free: a case must "
"retain at least ~24 significant bits (single precision) under random rounding "
"(no per-case thresholds).\n\n"
"With a case .py, that case is run as a SINGLE serial CPU process under Verrou "
"(~30x slower, and run many times), so it must be a small, short proxy - large "
"grids or long runs are rejected with guidance; serial .dat I/O is forced. "
"Example: ./mfc.sh fp-stability my_case.py\n\n"
"Uses a Verrou-enabled Valgrind at $VERROU_HOME/bin/valgrind (defaults to "
"$HOME/.local/verrou); if absent it is installed automatically (a pinned, "
"hash-verified prebuilt is downloaded, with a source build as fallback) - "
"aborts if that install fails. The simulation and pre_process binaries must "
"be serial (no-MPI, no-GPU) debug builds.\n\n"
"Analysis passes (skip with --no-* flags):\n"
" float proxy One run with --rounding-mode=float (single-precision sensitivity)\n"
" vprec sweep Runs at mantissa bits [52, 23, 16, 10] (precision floor curve)\n"
" dd_sym verrou_dd_sym bisection to responsible functions (on failure)\n"
" dd_line verrou_dd_line bisection to responsible source lines (on failure)\n"
" cancellation --check-cancellation detection of catastrophic cancellation sites\n"
" mca-sigbits Monte Carlo Arithmetic (mcaquad) significant-bits lower bound\n"
" float-max --check-max-float detection of double→float overflow sites\n"
" cancellation --check-cancellation origins, ranked by significant digits lost\n"
" float-max --check-max-float detection of double->float overflow sites\n"
),
include_common=["mfc_config", "verbose", "debug_log"],
positionals=[
Positional(
name="input",
help="Optional case .py to analyze instead of the built-in suite (run as a single serial CPU process under Verrou; must be small/short).",
nargs="?",
completion=Completion(type=CompletionType.FILES_PY),
),
],
arguments=[
Argument(
name="sim-binary",
Expand Down Expand Up @@ -960,34 +969,13 @@
default=False,
dest="no_vprec",
),
Argument(
name="no-dd-sym",
help="Skip verrou_dd_sym function-level delta-debug on failure.",
action=ArgAction.STORE_TRUE,
default=False,
dest="no_dd_sym",
),
Argument(
name="no-dd-line",
help="Skip verrou_dd_line source-line delta-debug on failure.",
action=ArgAction.STORE_TRUE,
default=False,
dest="no_dd_line",
),
Argument(
name="no-cancellation",
help="Skip --check-cancellation catastrophic-cancellation detection.",
action=ArgAction.STORE_TRUE,
default=False,
dest="no_cancellation",
),
Argument(
name="no-mca",
help="Skip Monte Carlo Arithmetic (mcaquad) significant-bits estimate.",
action=ArgAction.STORE_TRUE,
default=False,
dest="no_mca",
),
Argument(
name="no-float-max",
help="Skip --check-max-float float32 overflow detection.",
Expand All @@ -997,14 +985,15 @@
),
],
examples=[
Example("./mfc.sh fp-stability", "Auto-discover binaries and run all cases"),
Example("./mfc.sh fp-stability", "Auto-discover binaries and run the built-in suite"),
Example("./mfc.sh fp-stability my_case.py", "Analyze your own case (small/short, serial, CPU)"),
Example(
"./mfc.sh fp-stability --sim-binary build/install/abc123/bin/simulation",
"Specify simulation binary explicitly",
),
Example("./mfc.sh fp-stability -N 10", "Run 10 random-rounding samples per case"),
Example("./mfc.sh fp-stability --no-vprec --no-dd-line", "Skip VPREC sweep and line debug"),
Example("./mfc.sh fp-stability --no-cancellation --no-mca --no-float-max", "Skip new analysis passes"),
Example("./mfc.sh fp-stability --no-vprec --no-cancellation", "Skip VPREC sweep and cancellation detection"),
Example("./mfc.sh fp-stability --no-cancellation --no-float-max", "Skip analysis passes"),
],
key_options=[
("--sim-binary PATH", "Serial simulation binary (debug, no-MPI)"),
Expand All @@ -1013,10 +1002,7 @@
("-N, --samples N", "Random-rounding samples per case (default: 5)"),
("--no-float-proxy", "Skip float-rounding proxy run"),
("--no-vprec", "Skip VPREC mantissa-bit sweep"),
("--no-dd-sym", "Skip verrou_dd_sym on failure"),
("--no-dd-line", "Skip verrou_dd_line on failure"),
("--no-cancellation", "Skip cancellation detection"),
("--no-mca", "Skip MCA significant-bits estimate"),
("--no-float-max", "Skip float32 overflow detection"),
],
)
Expand Down
Loading
Loading