Proteus

Changepoint detection for commodity and equity time series using a Gaussian Markov Switching Model (MSM). Built as a Master's thesis project.

The system fits a K-regime MSM offline via EM (Baum-Welch), then runs a causal online filter to score and alarm on regime shifts in real time. Three detector variants are provided: Hard Switch, Posterior Transition, and Surprise.

Quick Start

Prerequisites

Rust (stable) — install via rustup

For real-data experiments (optional, deferred): a free Alpha Vantage API key — alphavantage.co/support/#api-key. Synthetic experiments work with no API key or config file.

Running (synthetic — no config needed)

cargo run              # interactive menu
cargo run -- e2e       # run all registered synthetic experiments end-to-end
cargo run -r -- e2e    # release build (recommended for EM training runs)
cargo run -- help      # direct CLI help

Configuration (real data only)

Copy the example config and fill in your API key:

cp config.example.toml config.toml

Edit config.toml:

[alphavantage]
api_key = "your_api_key_here"
rate_limit_per_minute = 75   # default, can be omitted

[cache]
path = "data/commodities.duckdb"   # default, can be omitted

[ingest]
series = [
    { commodity = "spy",         interval = "15min"   },
    { commodity = "qqq",         interval = "15min"   },
    { commodity = "wti",         interval = "daily"   },
    { commodity = "brent",       interval = "daily"   },
    { commodity = "natural_gas", interval = "daily"   },
    { commodity = "gold",        interval = "daily"   },
    { commodity = "silver",      interval = "daily"   },
]

config.toml contains your API key — never commit it.

Usage

Interactive Mode

cargo run launches a 9-category guided menu. Navigate with arrow keys; press Esc to go back at any prompt.

Main menu:
  Data          — ingest, inspect, and refresh market data
  Features      — feature families and observation pipeline
  Calibration   — synthetic-to-real scenario calibration
  Models        — Gaussian MSM fitting and inspection
  Detection     — detector variants and alarm configuration
  Evaluation    — synthetic and real-data evaluation
  Experiments   — run single or batch experiments
  Reporting     — plots, tables, and artifact export
  Inspect Runs  — browse and view saved run artifacts
  Exit

See docs/interactive_cli.md for the full menu reference.

Direct CLI Mode

Pass a subcommand as the first argument to skip the interactive menu (useful for scripting):

cargo run -- e2e                                          # run all registered synthetic experiments end-to-end
cargo run -- run-experiment  --config experiment_config.json
cargo run -- run-batch       --config a.json --config b.json [--save <dir>]
cargo run -- run-real        --id <experiment_id> [--cache <path.duckdb>] [--save <dir>]
cargo run -- calibrate       --id <experiment_id> [--out <dir>]
cargo run -- param-search    --id <experiment_id>         # grid search (DryRun)
cargo run -- optimize        --id <experiment_id> [--cache <path>] [--save <dir>] [--top <n>]
cargo run -- inspect         --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 [--cache <path.duckdb>]
cargo run -- status          [--config path/to/config.toml]
cargo run -- help

Experiment configs are JSON files. A template is printed by the Experiments > Show Config Template menu item.

`run-real`

Runs a real-data experiment from the registry by ID, loading price data from the DuckDB cache:

cargo run -- run-real --id real_spy_daily_hard_switch --cache data/commodities.duckdb --save ./output

Artifacts (20 files, including plots) are written to runs/real/<id>/<run_id>/ and optionally copied to --save.

For intraday experiments, the pipeline automatically applies a Regular Trading Hours (RTH) filter that retains only bars in the 09:30–15:59 ET window, excluding pre-market and after-hours bars.

`generate-report`

Regenerates all plots and JSON artifacts for an existing run by replaying the experiment from its config.snapshot.json:

cargo run -- generate-report --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 --cache data/commodities.duckdb

The command re-runs the full pipeline (including EM fitting) and writes a fresh artifact set with a new run_id. No files from the original run are overwritten.

`run-batch`

Runs a list of JSON experiment configs in sequence. Each config is dispatched through the backend matching its mode field (Synthetic → SyntheticBackend, Real → RealBackend, SimToReal → SimToRealBackend), so reported metrics are real for every mode:

cargo run -- run-batch --config a.json --config b.json --save ./batch_out

Writes batch_summary.json with per-run status and metrics.

`calibrate`

Calibrates a synthetic experiment's model parameters against the empirical distribution of that experiment's feature family:

cargo run -- calibrate --id hard_switch --out ./output/calibration

Produces calibration_summary.json, synthetic_vs_empirical_summary.json, and calibrated_scenario.json.

`optimize`

Two-phase parameter search for real-data experiments:

Phase 1 — Grid search (artifact writes disabled for speed): sweeps a grid over detector and optionally model parameters using real data and the full EM pipeline. Ranks all grid points by a combined coverage + precision score.
Phase 2 — Full E2E run: re-runs with the best-scoring config with full artifact output (JSON, CSV, plots).

Two search modes:

Detector-only (default): sweeps threshold, persistence_required, cooldown.
Joint model + detector (--model): additionally sweeps k_regimes ∈ {2, 3} and five feature families.

# Detector-only
cargo run -- optimize --id real_spy_daily_hard_switch
cargo run -- optimize --id real_wti_daily_surprise --save ./runs/optimize/wti --top 15

# Joint model + detector
cargo run -- optimize --id real_spy_daily_hard_switch --model
cargo run -- optimize --id real_spy_intraday_hard_switch --model --top 20

Default grids by detector type:

Detector	Threshold range	Persistence	Cooldown	Detector pts	Joint pts (×10)
`HardSwitch`	0.30 – 0.80	1, 2, 3, 5	0, 3, 5, 10	128	1 280
`Surprise`	1.0 – 6.0	1, 2, 3, 5	0, 5, 10, 20	128	1 280
`PosteriorTransition`	0.10 – 0.50	1, 2, 3	0, 3, 5, 10	84	840

Artifacts written to --save (default ./runs/optimize/<id>/):

File	Contents
`search_report.json`	Full ranked grid — all N points with scores
`search_summary.txt`	Human-readable top-N table + best params
`result.json`	Full `ExperimentResult` from the best-config run
`config.snapshot.json`	Exact `ExperimentConfig` used for the best run
`signal_alarms.png`	Alarm timeline plot
`detector_scores.png`	Detector score trace
`regime_posteriors.png`	Filtered posterior heatmap
`.csv`, remaining `.json`	Standard run artifact set

Model

Gaussian Markov Switching Model

Hidden state S_t in {1,...,K} with first-order Markov dynamics:

P(S_t = j | S_{t-1} = i) = A_{ij}

Observations are Gaussian given the regime:

y_t | S_t = j  ~  N(mu_j, sigma_j^2)

Parameters theta = (pi, A, mu_{1:K}, sigma^2_{1:K}) are fitted offline via the EM algorithm (Baum-Welch), then frozen for online use.

See docs/gaussian_msm_simulator.md and docs/em_estimation.md.

Inference Pipeline

Phase	Component	Doc
Emission density	N(y_t; mu_j, sigma_j^2)	emission_model.md
Forward filter	alpha_{t	t}(j) = Pr(S_t=j
Log-likelihood	log p(y_{1:T}) from filter normalisation constants	log_likelihood.md
Backward smoother	gamma_t(j) = Pr(S_t=j	y_{1:T})
Pairwise posteriors	xi_t(i,j) = Pr(S_{t-1}=i, S_t=j	y_{1:T})
EM estimation	Baum-Welch until convergence	em_estimation.md
Diagnostics	Validity checks on fitted parameters	diagnostics.md
Online inference	Causal streaming filter, no future data	online_inference.md

Detector Variants

All detectors consume one-step causal filter output and apply a score + alarm policy (persistence + cooldown). See docs/changepoint_detectors.md.

Detector	Score	Alarm trigger
HardSwitch	Indicator `1[argmax_j alpha_{t	t}(j) ≠ argmax_j alpha_{t-1
PosteriorTransition	`LeavePrevious`: `1 − alpha_{t	t}(r_{t-1})`; or` TotalVariation`:` ½ Σ_j
Surprise	`−log c_t` (optionally minus a lagged EMA baseline `b_{t-1}`)	Score exceeds threshold

The fixed-parameter policy (offline-fit, online-freeze) is described in docs/fixed_parameter_policy.md.

Observation Pipeline

Raw prices are transformed into the observation sequence y_t before fitting or streaming:

Family	Formula
`LogReturn`	log(P_t / P_{t-1})
`AbsReturn`	absolute value of log return
`SquaredReturn`	(log return)^2
`RollingVol`	Rolling std of log returns over window w
`StandardizedReturn`	log return / rolling std

Scaling options: None, ZScore, RobustZScore. All transforms are strictly causal.

See docs/observation_design.md for the full pipeline and session-aware variants.

Calibration

Synthetic scenarios are calibrated against real empirical data so that benchmark experiments are grounded. The workflow maps empirical statistics (mean, variance, jump contamination) to MSM parameters and verifies the discrepancy.

See docs/synthetic_to_real_calibration.md.

Evaluation

Synthetic Benchmark

Evaluated on simulated data with known changepoints using an event-window matching protocol. Metrics: coverage, precision-like score, mean detection delay.

See docs/benchmark_protocol.md.

Real-Data Evaluation

No ground truth is available, so two routes are used:

Route A — Proxy Event Alignment: alarm timing vs. known market events (earnings, macro announcements).
Route B — Segmentation Self-Consistency: within-segment homogeneity and between-segment contrast.

See docs/real_data_evaluation.md.

Experiments

Experiments are fully described by a JSON ExperimentConfig and run through ExperimentRunner. Each run produces a deterministic run ID (from config hash + seed), a structured artifact directory, and a serialised ExperimentResult.

runs/
  synthetic/
    <run_label>/
      <run_id>/
        config.snapshot.json      — ExperimentConfig used for this run
        result.json               — full ExperimentResult
        summary.json              — lightweight metrics summary
        model_params.json         — fitted ModelParams (K, pi, A, mu, sigma²)
        fit_summary.json          — human-readable EM fit metadata
        loglikelihood_history.csv — LL at each EM iteration
        feature_summary.json      — feature pipeline metadata and stats
        score_trace.csv           — per-step detector score
        alarms.csv                — alarm timestamps and scores
        changepoints.csv          — ground-truth changepoints (synthetic)
        regime_posteriors.csv     — T×K filtered posterior probabilities
        detector_config.json      — detector type and threshold settings
        signal_alarms.png         — observation series with alarm markers
        detector_scores.png       — score trace with threshold line
        regime_posteriors.png     — posterior probability traces per regime
        delay_distribution.png    — detection delay histogram (synthetic)
  real/
    <run_label>/
      <run_id>/
        config.snapshot.json      — ExperimentConfig used for this run
        result.json               — full ExperimentResult
        summary.json              — lightweight metrics summary
        model_params.json         — fitted ModelParams
        fit_summary.json          — human-readable EM fit metadata
        loglikelihood_history.csv — LL at each EM iteration
        feature_summary.json      — feature pipeline metadata and stats
        score_trace.csv           — per-step detector score
        alarms.csv                — alarm timestamps and scores
        regime_posteriors.csv     — T×K filtered posterior probabilities
        real_eval_summary.csv     — Route A + Route B metric summary
        route_a_result.json       — proxy event alignment details
        route_b_result.json       — segmentation self-consistency details
        split_summary.json        — train/val/test split boundaries
        data_quality.json         — NaN/gap/out-of-range checks
        detector_config.json      — detector type and threshold settings
        signal_alarms.png         — observation series with alarm markers
        detector_scores.png       — score trace with threshold line
        regime_posteriors.png     — posterior probability traces per regime
        segmentation.png          — segment-coloured real-data plot

Registered Experiments

Eighteen experiments are registered in src/experiments/registry.rs:

ID	Type	Description
`hard_switch`	Synthetic	HardSwitch, 2-regime, LogReturn/ZScore, horizon 2000
`posterior_transition`	Synthetic	PosteriorTransition (LeavePrevious), 2-regime, LogReturn/ZScore, horizon 2000
`surprise`	Synthetic	Surprise, 2-regime, LogReturn/ZScore, horizon 2000
`posterior_transition_tv`	Synthetic	PosteriorTransition (TotalVariation), 2-regime, LogReturn/ZScore, horizon 2000
`hard_switch_shock`	Synthetic	HardSwitch, shock-contaminated synthetic (jump noise path)
`hard_switch_frozen`	Synthetic	HardSwitch, loads pre-fitted model from `data/frozen_models/hard_switch_frozen`
`hard_switch_multi_start`	Synthetic	HardSwitch, multi-start EM (3 starts) — produces `multi_start_summary.json`
`surprise_ema`	Synthetic	Surprise with EMA-baseline (`ema_alpha=0.3`) adjusted score
`squared_return_surprise`	Synthetic	Surprise detector on `SquaredReturn` feature family
`cusum_comparison`	Synthetic	One-sided variance-CUSUM (benchmark baseline), LogReturn/ZScore, horizon 2000
`bocpd_comparison`	Synthetic	BOCPD with Inverse-Gamma conjugate model (benchmark baseline), LogReturn/ZScore, horizon 2000
`real_spy_daily_hard_switch`	Real	SPY daily adj-close log-returns, HardSwitch, 2018–present
`real_wti_daily_surprise`	Real	WTI daily spot-price log-returns, Surprise, 2018–present
`real_spy_intraday_hard_switch`	Real	SPY 15-min RTH log-returns (session-aware), HardSwitch, 2022–2025
`simreal_spy_daily_hard_switch`	Sim-to-real	EM trained on a synthetic stream calibrated to SPY (Quick-EM); online detector run on real SPY
`simreal_spy_daily_abs_return_k3`	Sim-to-real	SPY daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration
`simreal_wti_daily_abs_return_k3`	Sim-to-real	WTI daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration
`simreal_gold_daily_abs_return_k3`	Sim-to-real	GOLD daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration

The 11 synthetic experiments (including the two comparison baselines) can all be run at once:

cargo run -- e2e
cargo run -r -- e2e    # release build (faster EM training)

Comparison baseline results (synthetic, seed=42, horizon=2000):

Detector	Alarms	Precision	Recall	FAR	Delay (mean)
HardSwitch (0.5)	38	0.658	0.207	0.0065	10.8
PosteriorTransition (0.3)	86	0.767	0.545	0.0100	9.2
Surprise (2.5)	22	0.955	0.174	0.0005	10.0
CUSUM (thr=8.0, slack=0.5)	38	0.842	0.264	0.0030	10.1
BOCPD (thr=0.5, h=0.02)	1	1.000	0.008	0.0000	0.0

CUSUM achieves better recall than HardSwitch at comparable alarm count and better precision, placing it between HardSwitch and PosteriorTransition on the precision–recall frontier. BOCPD at threshold 0.5 (requiring ≥50% run-length posterior mass at r=0) is highly conservative given the low hazard rate (h=0.02), firing only once; lowering the threshold reveals its detection capability.

Sample output:

[10/11]  cusum_comparison
  Pipeline:
    [3/6] TrainOrLoadModel  K=2  LL=-1948.67  iter=124  converged=true  (132ms)
    [4/6] RunOnline         detector=Cusum  thr=8.000  n_alarms=38
    [5/6] Evaluate          precision=0.8421  recall=0.2645  n_events=121
  Metrics : prec=0.8421  recall=0.2645  n_alarms=38  FAR=0.003000  delay=10.1

[11/11]  bocpd_comparison
  Pipeline:
    [4/6] RunOnline         detector=Bocpd  thr=0.500  n_alarms=1  (43ms)
    [5/6] Evaluate          precision=1.0000  recall=0.0083  n_events=121
  Metrics : prec=1.0000  recall=0.0083  n_alarms=1  FAR=0.000000  delay=0.0
...
  Completed: 11  Failed: 0

See docs/experiment_runner.md.

Reporting

The reporting layer generates tables (and, in future work, plots) from run artifacts. See docs/reporting_and_export.md.

Output	Description
`result.json`	Full `ExperimentResult` with all pipeline outputs
`summary.json`	Lightweight metrics summary
`model_params.json`	Fitted ModelParams (reloadable via `LoadFrozen`)
`fit_summary.json`	Human-readable EM fit metadata (K, iters, LL, convergence)
`loglikelihood_history.csv`	Log-likelihood at each EM iteration
`feature_summary.json`	Feature pipeline stats (n_obs, mean, variance, train/val split)
`config.snapshot.json`	Exact `ExperimentConfig` snapshot
`detector_config.json`	Detector type and threshold settings
`score_trace.csv`	Per-step detector score
`alarms.csv`	Alarm timestamps and scores
`regime_posteriors.csv`	T×K filtered posterior probabilities
`split_summary.json`	Train/val/test split info (real mode)
`data_quality.json`	NaN/gap/out-of-range checks (real mode)
`real_eval_summary.csv`	Route A + Route B metric row (real mode)
`route_a_result.json`	Proxy event alignment detail (real Route A)
`route_b_result.json`	Segmentation self-consistency detail (real Route B)
`batch_summary.json`	Aggregate summary across all runs in a batch
`signal_alarms.png`	Observation series with alarm markers (requires font backend)
`detector_scores.png`	Score trace with threshold line (requires font backend)
`regime_posteriors.png`	Filtered posterior traces per regime (requires font backend)
`delay_distribution.png`	Detection delay histogram — synthetic only (requires font backend)
`segmentation.png`	Segment-coloured real-data plot — real only (requires font backend)

Data

Sources

Data is sourced from the Alpha Vantage API. Supported series:

Commodities (daily / weekly / monthly / quarterly / annual): WTI, Brent, Natural Gas, Copper, Aluminum, Wheat, Corn, Cotton, Sugar, Coffee, Gold, Silver, All Commodities Index.

Equities (SPY, QQQ): daily, weekly, monthly, and intraday (1min, 5min, 15min, 30min, 60min).

The HTTP client is rate-limited (default 75 req/min, token-bucket). See docs/alphavantage_client.md.

Caching

All fetched data is persisted in a local DuckDB database (default: data/commodities.duckdb, created automatically). Each series is stored as (symbol, interval, date, value) rows. Re-ingest does a full replace.

See docs/duckdb_cache.md for schema details.

Architecture

src/
  main.rs                    — dual-mode dispatch (interactive / direct CLI)
  config.rs                  — TOML config structs
  alphavantage/
    client.rs                — async HTTP client with rate limiting
    commodity.rs             — endpoint/interval types + deserialisation
    rate_limiter.rs          — token-bucket rate limiter
  cache/
    mod.rs                   — DuckDB persistence layer (store/load/last_fetched/status)
  data_service/
    mod.rs                   — cache-first orchestration, bulk ingest
  cli/
    mod.rs                   — interactive menu + 9 direct subcommand handlers
  features/
    mod.rs                   — feature families, scaling, session-aware pipeline
  model/
    params.rs                — ModelParams (K, pi, A, mu, sigma²)
    simulate.rs              — Gaussian MSM generative sampler
    filter.rs                — Hamilton forward filter
    smoother.rs              — backward smoother (RTS)
    pairwise.rs              — pairwise posterior pass
    em.rs                    — Baum-Welch EM estimator
    diagnostics.rs           — fitted-model validity checks
  online/
    mod.rs                   — causal streaming filter (log-space, numerically stable)
  detector/
    hard_switch.rs           — Hard Switch detector
    posterior_transition.rs  — Posterior Transition detector
    surprise.rs              — Surprise (-log predictive) detector
    frozen.rs                — FrozenModel + StreamingSession
  calibration/
    mod.rs                   — empirical summary + synthetic mapping
    report.rs                — CalibrationReport workflow
  benchmark/
    mod.rs                   — event-window evaluation protocol
  real_eval/
    route_a.rs               — proxy event alignment
    route_b.rs               — segmentation self-consistency
    report.rs                — combined Route A + B report
  experiments/
    config.rs                — ExperimentConfig (fully serialisable)
    runner.rs                — ExperimentRunner<B> + ExperimentBackend trait
    synthetic_backend.rs     — SyntheticBackend: EM + detection + evaluation
    real_backend.rs          — RealBackend: DuckDB load, 70/15/15 split, Route A+B eval
    sim_to_real_backend.rs   — SimToRealBackend: train EM on calibrated synthetic, test online on real
    shared.rs                — backend-shared model training and online streaming helpers
    dry_run_backend.rs       — DryRunBackend: config validation without EM
    batch.rs                 — BatchConfig + run_batch + batch_summary.json
    result.rs                — ExperimentResult, RunStatus, EvaluationSummary
    registry.rs              — 13 registered experiment definitions (synthetic, real, sim-to-real)
    search.rs                — param-search grid + optimize() two-phase search driver
    artifact.rs              — run directory layout + snapshot helpers
  reporting/
    artifact.rs              — ArtifactRootConfig, RunArtifactLayout
    export/                  — JSON / CSV export (schema, json, csv)
    plot/                    — plotters-based renderers (5 plot types)
    table/                   — MetricsTableBuilder, ComparisonTableBuilder
    report.rs                — RunReporter, AggregateReporter

Documentation Index

Doc	Topic
alphavantage_client.md	Alpha Vantage HTTP client and rate limiting
duckdb_cache.md	DuckDB schema and cache API
data_service.md	DataService orchestration layer
data_pipeline.md	Real financial data pipeline
interactive_cli.md	Interactive CLI full reference
observation_design.md	Feature families and observation pipeline
gaussian_msm_simulator.md	Generative MSM simulator
emission_model.md	Gaussian emission density
forward_filter.md	Hamilton forward filter
filter_validation.md	Filter validation on simulated data
log_likelihood.md	Observed-data log-likelihood
backward_smoother.md	RTS backward smoother
pairwise_posteriors.md	Pairwise posterior transition probabilities
em_estimation.md	Baum-Welch EM estimator
diagnostics.md	Fitted-model diagnostics and trust checks
online_inference.md	Online (streaming) causal inference
changepoint_detectors.md	Detector variants (HardSwitch, PosteriorTransition, Surprise)
fixed_parameter_policy.md	Offline-trained, online-frozen parameter policy
benchmark_protocol.md	Synthetic benchmark and event-window evaluation
synthetic_to_real_calibration.md	Synthetic-to-real calibration workflow
real_data_evaluation.md	Real-data evaluation (Route A + B)
experiment_runner.md	Experiment runner and reproducibility layer
reporting_and_export.md	Reporting, plots, tables, and artifact export

Tests

cargo test

328 tests covering all core components: filter/smoother correctness, EM convergence, detector alarm logic, calibration mapping, benchmark matching, experiment runner orchestration, real-backend data pipeline, and artifact serialisation.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data/proxy_events		data/proxy_events
docs		docs
notes		notes
plots		plots
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
config.example.toml		config.example.toml

Folders and files

Latest commit

History

Repository files navigation

Proteus

Quick Start

Prerequisites

Running (synthetic — no config needed)

Configuration (real data only)

Usage

Interactive Mode

Direct CLI Mode

run-real

generate-report

run-batch

calibrate

optimize

Model

Gaussian Markov Switching Model

Inference Pipeline

Detector Variants

Observation Pipeline

Calibration

Evaluation

Synthetic Benchmark

Real-Data Evaluation

Experiments

Registered Experiments

Reporting

Data

Sources

Caching

Architecture

Documentation Index

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`run-real`

`generate-report`

`run-batch`

`calibrate`

`optimize`

Packages