Changepoint detection for commodity and equity time series using a Gaussian Markov Switching Model (MSM). Built as a Master's thesis project.
The system fits a K-regime MSM offline via EM (Baum-Welch), then runs a causal online filter to score and alarm on regime shifts in real time. Three detector variants are provided: Hard Switch, Posterior Transition, and Surprise.
- Rust (stable) — install via rustup
For real-data experiments (optional, deferred): a free Alpha Vantage API key — alphavantage.co/support/#api-key. Synthetic experiments work with no API key or config file.
cargo run # interactive menu
cargo run -- e2e # run all registered synthetic experiments end-to-end
cargo run -r -- e2e # release build (recommended for EM training runs)
cargo run -- help # direct CLI help
Copy the example config and fill in your API key:
cp config.example.toml config.toml
Edit config.toml:
[alphavantage]
api_key = "your_api_key_here"
rate_limit_per_minute = 75 # default, can be omitted
[cache]
path = "data/commodities.duckdb" # default, can be omitted
[ingest]
series = [
{ commodity = "spy", interval = "15min" },
{ commodity = "qqq", interval = "15min" },
{ commodity = "wti", interval = "daily" },
{ commodity = "brent", interval = "daily" },
{ commodity = "natural_gas", interval = "daily" },
{ commodity = "gold", interval = "daily" },
{ commodity = "silver", interval = "daily" },
]
config.tomlcontains your API key — never commit it.
cargo run launches a 9-category guided menu. Navigate with arrow keys; press Esc to go back at any prompt.
Main menu:
Data — ingest, inspect, and refresh market data
Features — feature families and observation pipeline
Calibration — synthetic-to-real scenario calibration
Models — Gaussian MSM fitting and inspection
Detection — detector variants and alarm configuration
Evaluation — synthetic and real-data evaluation
Experiments — run single or batch experiments
Reporting — plots, tables, and artifact export
Inspect Runs — browse and view saved run artifacts
Exit
See docs/interactive_cli.md for the full menu reference.
Pass a subcommand as the first argument to skip the interactive menu (useful for scripting):
cargo run -- e2e # run all registered synthetic experiments end-to-end
cargo run -- run-experiment --config experiment_config.json
cargo run -- run-batch --config a.json --config b.json [--save <dir>]
cargo run -- run-real --id <experiment_id> [--cache <path.duckdb>] [--save <dir>]
cargo run -- calibrate --id <experiment_id> [--out <dir>]
cargo run -- param-search --id <experiment_id> # grid search (DryRun)
cargo run -- optimize --id <experiment_id> [--cache <path>] [--save <dir>] [--top <n>]
cargo run -- inspect --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 [--cache <path.duckdb>]
cargo run -- status [--config path/to/config.toml]
cargo run -- help
Experiment configs are JSON files. A template is printed by the Experiments > Show Config Template menu item.
Runs a real-data experiment from the registry by ID, loading price data from the DuckDB cache:
cargo run -- run-real --id real_spy_daily_hard_switch --cache data/commodities.duckdb --save ./output
Artifacts (20 files, including plots) are written to runs/real/<id>/<run_id>/ and optionally copied to --save.
For intraday experiments, the pipeline automatically applies a Regular Trading Hours (RTH) filter that retains only bars in the 09:30–15:59 ET window, excluding pre-market and after-hours bars.
Regenerates all plots and JSON artifacts for an existing run by replaying the experiment from its config.snapshot.json:
cargo run -- generate-report --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 --cache data/commodities.duckdb
The command re-runs the full pipeline (including EM fitting) and writes a fresh artifact set with a new run_id. No files from the original run are overwritten.
Runs a list of JSON experiment configs in sequence. Each config is dispatched
through the backend matching its mode field (Synthetic → SyntheticBackend,
Real → RealBackend, SimToReal → SimToRealBackend), so reported metrics
are real for every mode:
cargo run -- run-batch --config a.json --config b.json --save ./batch_out
Writes batch_summary.json with per-run status and metrics.
Calibrates a synthetic experiment's model parameters against the empirical distribution of that experiment's feature family:
cargo run -- calibrate --id hard_switch --out ./output/calibration
Produces calibration_summary.json, synthetic_vs_empirical_summary.json, and calibrated_scenario.json.
Two-phase parameter search for real-data experiments:
- Phase 1 — Grid search (artifact writes disabled for speed): sweeps a grid over detector and optionally model parameters using real data and the full EM pipeline. Ranks all grid points by a combined coverage + precision score.
- Phase 2 — Full E2E run: re-runs with the best-scoring config with full artifact output (JSON, CSV, plots).
Two search modes:
- Detector-only (default): sweeps
threshold,persistence_required,cooldown. - Joint model + detector (
--model): additionally sweepsk_regimes∈ {2, 3} and five feature families.
# Detector-only
cargo run -- optimize --id real_spy_daily_hard_switch
cargo run -- optimize --id real_wti_daily_surprise --save ./runs/optimize/wti --top 15
# Joint model + detector
cargo run -- optimize --id real_spy_daily_hard_switch --model
cargo run -- optimize --id real_spy_intraday_hard_switch --model --top 20
Default grids by detector type:
| Detector | Threshold range | Persistence | Cooldown | Detector pts | Joint pts (×10) |
|---|---|---|---|---|---|
HardSwitch |
0.30 – 0.80 | 1, 2, 3, 5 | 0, 3, 5, 10 | 128 | 1 280 |
Surprise |
1.0 – 6.0 | 1, 2, 3, 5 | 0, 5, 10, 20 | 128 | 1 280 |
PosteriorTransition |
0.10 – 0.50 | 1, 2, 3 | 0, 3, 5, 10 | 84 | 840 |
Artifacts written to --save (default ./runs/optimize/<id>/):
| File | Contents |
|---|---|
search_report.json |
Full ranked grid — all N points with scores |
search_summary.txt |
Human-readable top-N table + best params |
result.json |
Full ExperimentResult from the best-config run |
config.snapshot.json |
Exact ExperimentConfig used for the best run |
signal_alarms.png |
Alarm timeline plot |
detector_scores.png |
Detector score trace |
regime_posteriors.png |
Filtered posterior heatmap |
*.csv, remaining *.json |
Standard run artifact set |
Hidden state S_t in {1,...,K} with first-order Markov dynamics:
P(S_t = j | S_{t-1} = i) = A_{ij}
Observations are Gaussian given the regime:
y_t | S_t = j ~ N(mu_j, sigma_j^2)
Parameters theta = (pi, A, mu_{1:K}, sigma^2_{1:K}) are fitted offline via the EM algorithm (Baum-Welch), then frozen for online use.
See docs/gaussian_msm_simulator.md and docs/em_estimation.md.
| Phase | Component | Doc |
|---|---|---|
| Emission density | N(y_t; mu_j, sigma_j^2) | emission_model.md |
| Forward filter | alpha_{t | t}(j) = Pr(S_t=j |
| Log-likelihood | log p(y_{1:T}) from filter normalisation constants | log_likelihood.md |
| Backward smoother | gamma_t(j) = Pr(S_t=j | y_{1:T}) |
| Pairwise posteriors | xi_t(i,j) = Pr(S_{t-1}=i, S_t=j | y_{1:T}) |
| EM estimation | Baum-Welch until convergence | em_estimation.md |
| Diagnostics | Validity checks on fitted parameters | diagnostics.md |
| Online inference | Causal streaming filter, no future data | online_inference.md |
All detectors consume one-step causal filter output and apply a score + alarm policy (persistence + cooldown). See docs/changepoint_detectors.md.
| Detector | Score | Alarm trigger |
|---|---|---|
| HardSwitch | Indicator `1[argmax_j alpha_{t | t}(j) ≠ argmax_j alpha_{t-1 |
| PosteriorTransition | LeavePrevious: `1 − alpha_{t |
t}(r_{t-1}); or TotalVariation: ½ Σ_j |
| Surprise | −log c_t (optionally minus a lagged EMA baseline b_{t-1}) |
Score exceeds threshold |
The fixed-parameter policy (offline-fit, online-freeze) is described in docs/fixed_parameter_policy.md.
Raw prices are transformed into the observation sequence y_t before fitting or streaming:
| Family | Formula |
|---|---|
LogReturn |
log(P_t / P_{t-1}) |
AbsReturn |
absolute value of log return |
SquaredReturn |
(log return)^2 |
RollingVol |
Rolling std of log returns over window w |
StandardizedReturn |
log return / rolling std |
Scaling options: None, ZScore, RobustZScore. All transforms are strictly causal.
See docs/observation_design.md for the full pipeline and session-aware variants.
Synthetic scenarios are calibrated against real empirical data so that benchmark experiments are grounded. The workflow maps empirical statistics (mean, variance, jump contamination) to MSM parameters and verifies the discrepancy.
See docs/synthetic_to_real_calibration.md.
Evaluated on simulated data with known changepoints using an event-window matching protocol. Metrics: coverage, precision-like score, mean detection delay.
See docs/benchmark_protocol.md.
No ground truth is available, so two routes are used:
- Route A — Proxy Event Alignment: alarm timing vs. known market events (earnings, macro announcements).
- Route B — Segmentation Self-Consistency: within-segment homogeneity and between-segment contrast.
See docs/real_data_evaluation.md.
Experiments are fully described by a JSON ExperimentConfig and run through ExperimentRunner. Each run produces a deterministic run ID (from config hash + seed), a structured artifact directory, and a serialised ExperimentResult.
runs/
synthetic/
<run_label>/
<run_id>/
config.snapshot.json — ExperimentConfig used for this run
result.json — full ExperimentResult
summary.json — lightweight metrics summary
model_params.json — fitted ModelParams (K, pi, A, mu, sigma²)
fit_summary.json — human-readable EM fit metadata
loglikelihood_history.csv — LL at each EM iteration
feature_summary.json — feature pipeline metadata and stats
score_trace.csv — per-step detector score
alarms.csv — alarm timestamps and scores
changepoints.csv — ground-truth changepoints (synthetic)
regime_posteriors.csv — T×K filtered posterior probabilities
detector_config.json — detector type and threshold settings
signal_alarms.png — observation series with alarm markers
detector_scores.png — score trace with threshold line
regime_posteriors.png — posterior probability traces per regime
delay_distribution.png — detection delay histogram (synthetic)
real/
<run_label>/
<run_id>/
config.snapshot.json — ExperimentConfig used for this run
result.json — full ExperimentResult
summary.json — lightweight metrics summary
model_params.json — fitted ModelParams
fit_summary.json — human-readable EM fit metadata
loglikelihood_history.csv — LL at each EM iteration
feature_summary.json — feature pipeline metadata and stats
score_trace.csv — per-step detector score
alarms.csv — alarm timestamps and scores
regime_posteriors.csv — T×K filtered posterior probabilities
real_eval_summary.csv — Route A + Route B metric summary
route_a_result.json — proxy event alignment details
route_b_result.json — segmentation self-consistency details
split_summary.json — train/val/test split boundaries
data_quality.json — NaN/gap/out-of-range checks
detector_config.json — detector type and threshold settings
signal_alarms.png — observation series with alarm markers
detector_scores.png — score trace with threshold line
regime_posteriors.png — posterior probability traces per regime
segmentation.png — segment-coloured real-data plot
Eighteen experiments are registered in src/experiments/registry.rs:
| ID | Type | Description |
|---|---|---|
hard_switch |
Synthetic | HardSwitch, 2-regime, LogReturn/ZScore, horizon 2000 |
posterior_transition |
Synthetic | PosteriorTransition (LeavePrevious), 2-regime, LogReturn/ZScore, horizon 2000 |
surprise |
Synthetic | Surprise, 2-regime, LogReturn/ZScore, horizon 2000 |
posterior_transition_tv |
Synthetic | PosteriorTransition (TotalVariation), 2-regime, LogReturn/ZScore, horizon 2000 |
hard_switch_shock |
Synthetic | HardSwitch, shock-contaminated synthetic (jump noise path) |
hard_switch_frozen |
Synthetic | HardSwitch, loads pre-fitted model from data/frozen_models/hard_switch_frozen |
hard_switch_multi_start |
Synthetic | HardSwitch, multi-start EM (3 starts) — produces multi_start_summary.json |
surprise_ema |
Synthetic | Surprise with EMA-baseline (ema_alpha=0.3) adjusted score |
squared_return_surprise |
Synthetic | Surprise detector on SquaredReturn feature family |
cusum_comparison |
Synthetic | One-sided variance-CUSUM (benchmark baseline), LogReturn/ZScore, horizon 2000 |
bocpd_comparison |
Synthetic | BOCPD with Inverse-Gamma conjugate model (benchmark baseline), LogReturn/ZScore, horizon 2000 |
real_spy_daily_hard_switch |
Real | SPY daily adj-close log-returns, HardSwitch, 2018–present |
real_wti_daily_surprise |
Real | WTI daily spot-price log-returns, Surprise, 2018–present |
real_spy_intraday_hard_switch |
Real | SPY 15-min RTH log-returns (session-aware), HardSwitch, 2022–2025 |
simreal_spy_daily_hard_switch |
Sim-to-real | EM trained on a synthetic stream calibrated to SPY (Quick-EM); online detector run on real SPY |
simreal_spy_daily_abs_return_k3 |
Sim-to-real | SPY daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration |
simreal_wti_daily_abs_return_k3 |
Sim-to-real | WTI daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration |
simreal_gold_daily_abs_return_k3 |
Sim-to-real | GOLD daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration |
The 11 synthetic experiments (including the two comparison baselines) can all be run at once:
cargo run -- e2e
cargo run -r -- e2e # release build (faster EM training)
Comparison baseline results (synthetic, seed=42, horizon=2000):
| Detector | Alarms | Precision | Recall | FAR | Delay (mean) |
|---|---|---|---|---|---|
| HardSwitch (0.5) | 38 | 0.658 | 0.207 | 0.0065 | 10.8 |
| PosteriorTransition (0.3) | 86 | 0.767 | 0.545 | 0.0100 | 9.2 |
| Surprise (2.5) | 22 | 0.955 | 0.174 | 0.0005 | 10.0 |
| CUSUM (thr=8.0, slack=0.5) | 38 | 0.842 | 0.264 | 0.0030 | 10.1 |
| BOCPD (thr=0.5, h=0.02) | 1 | 1.000 | 0.008 | 0.0000 | 0.0 |
CUSUM achieves better recall than HardSwitch at comparable alarm count and better precision, placing it between HardSwitch and PosteriorTransition on the precision–recall frontier. BOCPD at threshold 0.5 (requiring ≥50% run-length posterior mass at r=0) is highly conservative given the low hazard rate (h=0.02), firing only once; lowering the threshold reveals its detection capability.
Sample output:
[10/11] cusum_comparison
Pipeline:
[3/6] TrainOrLoadModel K=2 LL=-1948.67 iter=124 converged=true (132ms)
[4/6] RunOnline detector=Cusum thr=8.000 n_alarms=38
[5/6] Evaluate precision=0.8421 recall=0.2645 n_events=121
Metrics : prec=0.8421 recall=0.2645 n_alarms=38 FAR=0.003000 delay=10.1
[11/11] bocpd_comparison
Pipeline:
[4/6] RunOnline detector=Bocpd thr=0.500 n_alarms=1 (43ms)
[5/6] Evaluate precision=1.0000 recall=0.0083 n_events=121
Metrics : prec=1.0000 recall=0.0083 n_alarms=1 FAR=0.000000 delay=0.0
...
Completed: 11 Failed: 0
See docs/experiment_runner.md.
The reporting layer generates tables (and, in future work, plots) from run artifacts. See docs/reporting_and_export.md.
| Output | Description |
|---|---|
result.json |
Full ExperimentResult with all pipeline outputs |
summary.json |
Lightweight metrics summary |
model_params.json |
Fitted ModelParams (reloadable via LoadFrozen) |
fit_summary.json |
Human-readable EM fit metadata (K, iters, LL, convergence) |
loglikelihood_history.csv |
Log-likelihood at each EM iteration |
feature_summary.json |
Feature pipeline stats (n_obs, mean, variance, train/val split) |
config.snapshot.json |
Exact ExperimentConfig snapshot |
detector_config.json |
Detector type and threshold settings |
score_trace.csv |
Per-step detector score |
alarms.csv |
Alarm timestamps and scores |
regime_posteriors.csv |
T×K filtered posterior probabilities |
split_summary.json |
Train/val/test split info (real mode) |
data_quality.json |
NaN/gap/out-of-range checks (real mode) |
real_eval_summary.csv |
Route A + Route B metric row (real mode) |
route_a_result.json |
Proxy event alignment detail (real Route A) |
route_b_result.json |
Segmentation self-consistency detail (real Route B) |
batch_summary.json |
Aggregate summary across all runs in a batch |
signal_alarms.png |
Observation series with alarm markers (requires font backend) |
detector_scores.png |
Score trace with threshold line (requires font backend) |
regime_posteriors.png |
Filtered posterior traces per regime (requires font backend) |
delay_distribution.png |
Detection delay histogram — synthetic only (requires font backend) |
segmentation.png |
Segment-coloured real-data plot — real only (requires font backend) |
Data is sourced from the Alpha Vantage API. Supported series:
Commodities (daily / weekly / monthly / quarterly / annual): WTI, Brent, Natural Gas, Copper, Aluminum, Wheat, Corn, Cotton, Sugar, Coffee, Gold, Silver, All Commodities Index.
Equities (SPY, QQQ): daily, weekly, monthly, and intraday (1min, 5min, 15min, 30min, 60min).
The HTTP client is rate-limited (default 75 req/min, token-bucket). See docs/alphavantage_client.md.
All fetched data is persisted in a local DuckDB database (default: data/commodities.duckdb, created automatically). Each series is stored as (symbol, interval, date, value) rows. Re-ingest does a full replace.
See docs/duckdb_cache.md for schema details.
src/
main.rs — dual-mode dispatch (interactive / direct CLI)
config.rs — TOML config structs
alphavantage/
client.rs — async HTTP client with rate limiting
commodity.rs — endpoint/interval types + deserialisation
rate_limiter.rs — token-bucket rate limiter
cache/
mod.rs — DuckDB persistence layer (store/load/last_fetched/status)
data_service/
mod.rs — cache-first orchestration, bulk ingest
cli/
mod.rs — interactive menu + 9 direct subcommand handlers
features/
mod.rs — feature families, scaling, session-aware pipeline
model/
params.rs — ModelParams (K, pi, A, mu, sigma²)
simulate.rs — Gaussian MSM generative sampler
filter.rs — Hamilton forward filter
smoother.rs — backward smoother (RTS)
pairwise.rs — pairwise posterior pass
em.rs — Baum-Welch EM estimator
diagnostics.rs — fitted-model validity checks
online/
mod.rs — causal streaming filter (log-space, numerically stable)
detector/
hard_switch.rs — Hard Switch detector
posterior_transition.rs — Posterior Transition detector
surprise.rs — Surprise (-log predictive) detector
frozen.rs — FrozenModel + StreamingSession
calibration/
mod.rs — empirical summary + synthetic mapping
report.rs — CalibrationReport workflow
benchmark/
mod.rs — event-window evaluation protocol
real_eval/
route_a.rs — proxy event alignment
route_b.rs — segmentation self-consistency
report.rs — combined Route A + B report
experiments/
config.rs — ExperimentConfig (fully serialisable)
runner.rs — ExperimentRunner<B> + ExperimentBackend trait
synthetic_backend.rs — SyntheticBackend: EM + detection + evaluation
real_backend.rs — RealBackend: DuckDB load, 70/15/15 split, Route A+B eval
sim_to_real_backend.rs — SimToRealBackend: train EM on calibrated synthetic, test online on real
shared.rs — backend-shared model training and online streaming helpers
dry_run_backend.rs — DryRunBackend: config validation without EM
batch.rs — BatchConfig + run_batch + batch_summary.json
result.rs — ExperimentResult, RunStatus, EvaluationSummary
registry.rs — 13 registered experiment definitions (synthetic, real, sim-to-real)
search.rs — param-search grid + optimize() two-phase search driver
artifact.rs — run directory layout + snapshot helpers
reporting/
artifact.rs — ArtifactRootConfig, RunArtifactLayout
export/ — JSON / CSV export (schema, json, csv)
plot/ — plotters-based renderers (5 plot types)
table/ — MetricsTableBuilder, ComparisonTableBuilder
report.rs — RunReporter, AggregateReporter
| Doc | Topic |
|---|---|
| alphavantage_client.md | Alpha Vantage HTTP client and rate limiting |
| duckdb_cache.md | DuckDB schema and cache API |
| data_service.md | DataService orchestration layer |
| data_pipeline.md | Real financial data pipeline |
| interactive_cli.md | Interactive CLI full reference |
| observation_design.md | Feature families and observation pipeline |
| gaussian_msm_simulator.md | Generative MSM simulator |
| emission_model.md | Gaussian emission density |
| forward_filter.md | Hamilton forward filter |
| filter_validation.md | Filter validation on simulated data |
| log_likelihood.md | Observed-data log-likelihood |
| backward_smoother.md | RTS backward smoother |
| pairwise_posteriors.md | Pairwise posterior transition probabilities |
| em_estimation.md | Baum-Welch EM estimator |
| diagnostics.md | Fitted-model diagnostics and trust checks |
| online_inference.md | Online (streaming) causal inference |
| changepoint_detectors.md | Detector variants (HardSwitch, PosteriorTransition, Surprise) |
| fixed_parameter_policy.md | Offline-trained, online-frozen parameter policy |
| benchmark_protocol.md | Synthetic benchmark and event-window evaluation |
| synthetic_to_real_calibration.md | Synthetic-to-real calibration workflow |
| real_data_evaluation.md | Real-data evaluation (Route A + B) |
| experiment_runner.md | Experiment runner and reproducibility layer |
| reporting_and_export.md | Reporting, plots, tables, and artifact export |
cargo test
328 tests covering all core components: filter/smoother correctness, EM convergence, detector alarm logic, calibration mapping, benchmark matching, experiment runner orchestration, real-backend data pipeline, and artifact serialisation.