Skip to content

Spiryd/proteus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proteus

Changepoint detection for commodity and equity time series using a Gaussian Markov Switching Model (MSM). Built as a Master's thesis project.

The system fits a K-regime MSM offline via EM (Baum-Welch), then runs a causal online filter to score and alarm on regime shifts in real time. Three detector variants are provided: Hard Switch, Posterior Transition, and Surprise.


Quick Start

Prerequisites

  • Rust (stable) — install via rustup

For real-data experiments (optional, deferred): a free Alpha Vantage API key — alphavantage.co/support/#api-key. Synthetic experiments work with no API key or config file.

Running (synthetic — no config needed)

cargo run              # interactive menu
cargo run -- e2e       # run all registered synthetic experiments end-to-end
cargo run -r -- e2e    # release build (recommended for EM training runs)
cargo run -- help      # direct CLI help

Configuration (real data only)

Copy the example config and fill in your API key:

cp config.example.toml config.toml

Edit config.toml:

[alphavantage]
api_key = "your_api_key_here"
rate_limit_per_minute = 75   # default, can be omitted

[cache]
path = "data/commodities.duckdb"   # default, can be omitted

[ingest]
series = [
    { commodity = "spy",         interval = "15min"   },
    { commodity = "qqq",         interval = "15min"   },
    { commodity = "wti",         interval = "daily"   },
    { commodity = "brent",       interval = "daily"   },
    { commodity = "natural_gas", interval = "daily"   },
    { commodity = "gold",        interval = "daily"   },
    { commodity = "silver",      interval = "daily"   },
]

config.toml contains your API key — never commit it.


Usage

Interactive Mode

cargo run launches a 9-category guided menu. Navigate with arrow keys; press Esc to go back at any prompt.

Main menu:
  Data          — ingest, inspect, and refresh market data
  Features      — feature families and observation pipeline
  Calibration   — synthetic-to-real scenario calibration
  Models        — Gaussian MSM fitting and inspection
  Detection     — detector variants and alarm configuration
  Evaluation    — synthetic and real-data evaluation
  Experiments   — run single or batch experiments
  Reporting     — plots, tables, and artifact export
  Inspect Runs  — browse and view saved run artifacts
  Exit

See docs/interactive_cli.md for the full menu reference.

Direct CLI Mode

Pass a subcommand as the first argument to skip the interactive menu (useful for scripting):

cargo run -- e2e                                          # run all registered synthetic experiments end-to-end
cargo run -- run-experiment  --config experiment_config.json
cargo run -- run-batch       --config a.json --config b.json [--save <dir>]
cargo run -- run-real        --id <experiment_id> [--cache <path.duckdb>] [--save <dir>]
cargo run -- calibrate       --id <experiment_id> [--out <dir>]
cargo run -- param-search    --id <experiment_id>         # grid search (DryRun)
cargo run -- optimize        --id <experiment_id> [--cache <path>] [--save <dir>] [--top <n>]
cargo run -- inspect         --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 [--cache <path.duckdb>]
cargo run -- status          [--config path/to/config.toml]
cargo run -- help

Experiment configs are JSON files. A template is printed by the Experiments > Show Config Template menu item.

run-real

Runs a real-data experiment from the registry by ID, loading price data from the DuckDB cache:

cargo run -- run-real --id real_spy_daily_hard_switch --cache data/commodities.duckdb --save ./output

Artifacts (20 files, including plots) are written to runs/real/<id>/<run_id>/ and optionally copied to --save.

For intraday experiments, the pipeline automatically applies a Regular Trading Hours (RTH) filter that retains only bars in the 09:30–15:59 ET window, excluding pre-market and after-hours bars.

generate-report

Regenerates all plots and JSON artifacts for an existing run by replaying the experiment from its config.snapshot.json:

cargo run -- generate-report --dir ./runs/real/my_run/run_001
cargo run -- generate-report --dir ./runs/real/my_run/run_001 --cache data/commodities.duckdb

The command re-runs the full pipeline (including EM fitting) and writes a fresh artifact set with a new run_id. No files from the original run are overwritten.

run-batch

Runs a list of JSON experiment configs in sequence. Each config is dispatched through the backend matching its mode field (SyntheticSyntheticBackend, RealRealBackend, SimToRealSimToRealBackend), so reported metrics are real for every mode:

cargo run -- run-batch --config a.json --config b.json --save ./batch_out

Writes batch_summary.json with per-run status and metrics.

calibrate

Calibrates a synthetic experiment's model parameters against the empirical distribution of that experiment's feature family:

cargo run -- calibrate --id hard_switch --out ./output/calibration

Produces calibration_summary.json, synthetic_vs_empirical_summary.json, and calibrated_scenario.json.

optimize

Two-phase parameter search for real-data experiments:

  1. Phase 1 — Grid search (artifact writes disabled for speed): sweeps a grid over detector and optionally model parameters using real data and the full EM pipeline. Ranks all grid points by a combined coverage + precision score.
  2. Phase 2 — Full E2E run: re-runs with the best-scoring config with full artifact output (JSON, CSV, plots).

Two search modes:

  • Detector-only (default): sweeps threshold, persistence_required, cooldown.
  • Joint model + detector (--model): additionally sweeps k_regimes ∈ {2, 3} and five feature families.
# Detector-only
cargo run -- optimize --id real_spy_daily_hard_switch
cargo run -- optimize --id real_wti_daily_surprise --save ./runs/optimize/wti --top 15

# Joint model + detector
cargo run -- optimize --id real_spy_daily_hard_switch --model
cargo run -- optimize --id real_spy_intraday_hard_switch --model --top 20

Default grids by detector type:

Detector Threshold range Persistence Cooldown Detector pts Joint pts (×10)
HardSwitch 0.30 – 0.80 1, 2, 3, 5 0, 3, 5, 10 128 1 280
Surprise 1.0 – 6.0 1, 2, 3, 5 0, 5, 10, 20 128 1 280
PosteriorTransition 0.10 – 0.50 1, 2, 3 0, 3, 5, 10 84 840

Artifacts written to --save (default ./runs/optimize/<id>/):

File Contents
search_report.json Full ranked grid — all N points with scores
search_summary.txt Human-readable top-N table + best params
result.json Full ExperimentResult from the best-config run
config.snapshot.json Exact ExperimentConfig used for the best run
signal_alarms.png Alarm timeline plot
detector_scores.png Detector score trace
regime_posteriors.png Filtered posterior heatmap
*.csv, remaining *.json Standard run artifact set

Model

Gaussian Markov Switching Model

Hidden state S_t in {1,...,K} with first-order Markov dynamics:

P(S_t = j | S_{t-1} = i) = A_{ij}

Observations are Gaussian given the regime:

y_t | S_t = j  ~  N(mu_j, sigma_j^2)

Parameters theta = (pi, A, mu_{1:K}, sigma^2_{1:K}) are fitted offline via the EM algorithm (Baum-Welch), then frozen for online use.

See docs/gaussian_msm_simulator.md and docs/em_estimation.md.

Inference Pipeline

Phase Component Doc
Emission density N(y_t; mu_j, sigma_j^2) emission_model.md
Forward filter alpha_{t t}(j) = Pr(S_t=j
Log-likelihood log p(y_{1:T}) from filter normalisation constants log_likelihood.md
Backward smoother gamma_t(j) = Pr(S_t=j y_{1:T})
Pairwise posteriors xi_t(i,j) = Pr(S_{t-1}=i, S_t=j y_{1:T})
EM estimation Baum-Welch until convergence em_estimation.md
Diagnostics Validity checks on fitted parameters diagnostics.md
Online inference Causal streaming filter, no future data online_inference.md

Detector Variants

All detectors consume one-step causal filter output and apply a score + alarm policy (persistence + cooldown). See docs/changepoint_detectors.md.

Detector Score Alarm trigger
HardSwitch Indicator `1[argmax_j alpha_{t t}(j) ≠ argmax_j alpha_{t-1
PosteriorTransition LeavePrevious: `1 − alpha_{t t}(r_{t-1}); or TotalVariation: ½ Σ_j
Surprise −log c_t (optionally minus a lagged EMA baseline b_{t-1}) Score exceeds threshold

The fixed-parameter policy (offline-fit, online-freeze) is described in docs/fixed_parameter_policy.md.


Observation Pipeline

Raw prices are transformed into the observation sequence y_t before fitting or streaming:

Family Formula
LogReturn log(P_t / P_{t-1})
AbsReturn absolute value of log return
SquaredReturn (log return)^2
RollingVol Rolling std of log returns over window w
StandardizedReturn log return / rolling std

Scaling options: None, ZScore, RobustZScore. All transforms are strictly causal.

See docs/observation_design.md for the full pipeline and session-aware variants.


Calibration

Synthetic scenarios are calibrated against real empirical data so that benchmark experiments are grounded. The workflow maps empirical statistics (mean, variance, jump contamination) to MSM parameters and verifies the discrepancy.

See docs/synthetic_to_real_calibration.md.


Evaluation

Synthetic Benchmark

Evaluated on simulated data with known changepoints using an event-window matching protocol. Metrics: coverage, precision-like score, mean detection delay.

See docs/benchmark_protocol.md.

Real-Data Evaluation

No ground truth is available, so two routes are used:

  • Route A — Proxy Event Alignment: alarm timing vs. known market events (earnings, macro announcements).
  • Route B — Segmentation Self-Consistency: within-segment homogeneity and between-segment contrast.

See docs/real_data_evaluation.md.


Experiments

Experiments are fully described by a JSON ExperimentConfig and run through ExperimentRunner. Each run produces a deterministic run ID (from config hash + seed), a structured artifact directory, and a serialised ExperimentResult.

runs/
  synthetic/
    <run_label>/
      <run_id>/
        config.snapshot.json      — ExperimentConfig used for this run
        result.json               — full ExperimentResult
        summary.json              — lightweight metrics summary
        model_params.json         — fitted ModelParams (K, pi, A, mu, sigma²)
        fit_summary.json          — human-readable EM fit metadata
        loglikelihood_history.csv — LL at each EM iteration
        feature_summary.json      — feature pipeline metadata and stats
        score_trace.csv           — per-step detector score
        alarms.csv                — alarm timestamps and scores
        changepoints.csv          — ground-truth changepoints (synthetic)
        regime_posteriors.csv     — T×K filtered posterior probabilities
        detector_config.json      — detector type and threshold settings
        signal_alarms.png         — observation series with alarm markers
        detector_scores.png       — score trace with threshold line
        regime_posteriors.png     — posterior probability traces per regime
        delay_distribution.png    — detection delay histogram (synthetic)
  real/
    <run_label>/
      <run_id>/
        config.snapshot.json      — ExperimentConfig used for this run
        result.json               — full ExperimentResult
        summary.json              — lightweight metrics summary
        model_params.json         — fitted ModelParams
        fit_summary.json          — human-readable EM fit metadata
        loglikelihood_history.csv — LL at each EM iteration
        feature_summary.json      — feature pipeline metadata and stats
        score_trace.csv           — per-step detector score
        alarms.csv                — alarm timestamps and scores
        regime_posteriors.csv     — T×K filtered posterior probabilities
        real_eval_summary.csv     — Route A + Route B metric summary
        route_a_result.json       — proxy event alignment details
        route_b_result.json       — segmentation self-consistency details
        split_summary.json        — train/val/test split boundaries
        data_quality.json         — NaN/gap/out-of-range checks
        detector_config.json      — detector type and threshold settings
        signal_alarms.png         — observation series with alarm markers
        detector_scores.png       — score trace with threshold line
        regime_posteriors.png     — posterior probability traces per regime
        segmentation.png          — segment-coloured real-data plot

Registered Experiments

Eighteen experiments are registered in src/experiments/registry.rs:

ID Type Description
hard_switch Synthetic HardSwitch, 2-regime, LogReturn/ZScore, horizon 2000
posterior_transition Synthetic PosteriorTransition (LeavePrevious), 2-regime, LogReturn/ZScore, horizon 2000
surprise Synthetic Surprise, 2-regime, LogReturn/ZScore, horizon 2000
posterior_transition_tv Synthetic PosteriorTransition (TotalVariation), 2-regime, LogReturn/ZScore, horizon 2000
hard_switch_shock Synthetic HardSwitch, shock-contaminated synthetic (jump noise path)
hard_switch_frozen Synthetic HardSwitch, loads pre-fitted model from data/frozen_models/hard_switch_frozen
hard_switch_multi_start Synthetic HardSwitch, multi-start EM (3 starts) — produces multi_start_summary.json
surprise_ema Synthetic Surprise with EMA-baseline (ema_alpha=0.3) adjusted score
squared_return_surprise Synthetic Surprise detector on SquaredReturn feature family
cusum_comparison Synthetic One-sided variance-CUSUM (benchmark baseline), LogReturn/ZScore, horizon 2000
bocpd_comparison Synthetic BOCPD with Inverse-Gamma conjugate model (benchmark baseline), LogReturn/ZScore, horizon 2000
real_spy_daily_hard_switch Real SPY daily adj-close log-returns, HardSwitch, 2018–present
real_wti_daily_surprise Real WTI daily spot-price log-returns, Surprise, 2018–present
real_spy_intraday_hard_switch Real SPY 15-min RTH log-returns (session-aware), HardSwitch, 2022–2025
simreal_spy_daily_hard_switch Sim-to-real EM trained on a synthetic stream calibrated to SPY (Quick-EM); online detector run on real SPY
simreal_spy_daily_abs_return_k3 Sim-to-real SPY daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration
simreal_wti_daily_abs_return_k3 Sim-to-real WTI daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration
simreal_gold_daily_abs_return_k3 Sim-to-real GOLD daily AbsReturn / K=3 / HardSwitch joint-optimum; stationary-π Quick-EM calibration

The 11 synthetic experiments (including the two comparison baselines) can all be run at once:

cargo run -- e2e
cargo run -r -- e2e    # release build (faster EM training)

Comparison baseline results (synthetic, seed=42, horizon=2000):

Detector Alarms Precision Recall FAR Delay (mean)
HardSwitch (0.5) 38 0.658 0.207 0.0065 10.8
PosteriorTransition (0.3) 86 0.767 0.545 0.0100 9.2
Surprise (2.5) 22 0.955 0.174 0.0005 10.0
CUSUM (thr=8.0, slack=0.5) 38 0.842 0.264 0.0030 10.1
BOCPD (thr=0.5, h=0.02) 1 1.000 0.008 0.0000 0.0

CUSUM achieves better recall than HardSwitch at comparable alarm count and better precision, placing it between HardSwitch and PosteriorTransition on the precision–recall frontier. BOCPD at threshold 0.5 (requiring ≥50% run-length posterior mass at r=0) is highly conservative given the low hazard rate (h=0.02), firing only once; lowering the threshold reveals its detection capability.

Sample output:

[10/11]  cusum_comparison
  Pipeline:
    [3/6] TrainOrLoadModel  K=2  LL=-1948.67  iter=124  converged=true  (132ms)
    [4/6] RunOnline         detector=Cusum  thr=8.000  n_alarms=38
    [5/6] Evaluate          precision=0.8421  recall=0.2645  n_events=121
  Metrics : prec=0.8421  recall=0.2645  n_alarms=38  FAR=0.003000  delay=10.1

[11/11]  bocpd_comparison
  Pipeline:
    [4/6] RunOnline         detector=Bocpd  thr=0.500  n_alarms=1  (43ms)
    [5/6] Evaluate          precision=1.0000  recall=0.0083  n_events=121
  Metrics : prec=1.0000  recall=0.0083  n_alarms=1  FAR=0.000000  delay=0.0
...
  Completed: 11  Failed: 0

See docs/experiment_runner.md.


Reporting

The reporting layer generates tables (and, in future work, plots) from run artifacts. See docs/reporting_and_export.md.

Output Description
result.json Full ExperimentResult with all pipeline outputs
summary.json Lightweight metrics summary
model_params.json Fitted ModelParams (reloadable via LoadFrozen)
fit_summary.json Human-readable EM fit metadata (K, iters, LL, convergence)
loglikelihood_history.csv Log-likelihood at each EM iteration
feature_summary.json Feature pipeline stats (n_obs, mean, variance, train/val split)
config.snapshot.json Exact ExperimentConfig snapshot
detector_config.json Detector type and threshold settings
score_trace.csv Per-step detector score
alarms.csv Alarm timestamps and scores
regime_posteriors.csv T×K filtered posterior probabilities
split_summary.json Train/val/test split info (real mode)
data_quality.json NaN/gap/out-of-range checks (real mode)
real_eval_summary.csv Route A + Route B metric row (real mode)
route_a_result.json Proxy event alignment detail (real Route A)
route_b_result.json Segmentation self-consistency detail (real Route B)
batch_summary.json Aggregate summary across all runs in a batch
signal_alarms.png Observation series with alarm markers (requires font backend)
detector_scores.png Score trace with threshold line (requires font backend)
regime_posteriors.png Filtered posterior traces per regime (requires font backend)
delay_distribution.png Detection delay histogram — synthetic only (requires font backend)
segmentation.png Segment-coloured real-data plot — real only (requires font backend)

Data

Sources

Data is sourced from the Alpha Vantage API. Supported series:

Commodities (daily / weekly / monthly / quarterly / annual): WTI, Brent, Natural Gas, Copper, Aluminum, Wheat, Corn, Cotton, Sugar, Coffee, Gold, Silver, All Commodities Index.

Equities (SPY, QQQ): daily, weekly, monthly, and intraday (1min, 5min, 15min, 30min, 60min).

The HTTP client is rate-limited (default 75 req/min, token-bucket). See docs/alphavantage_client.md.

Caching

All fetched data is persisted in a local DuckDB database (default: data/commodities.duckdb, created automatically). Each series is stored as (symbol, interval, date, value) rows. Re-ingest does a full replace.

See docs/duckdb_cache.md for schema details.


Architecture

src/
  main.rs                    — dual-mode dispatch (interactive / direct CLI)
  config.rs                  — TOML config structs
  alphavantage/
    client.rs                — async HTTP client with rate limiting
    commodity.rs             — endpoint/interval types + deserialisation
    rate_limiter.rs          — token-bucket rate limiter
  cache/
    mod.rs                   — DuckDB persistence layer (store/load/last_fetched/status)
  data_service/
    mod.rs                   — cache-first orchestration, bulk ingest
  cli/
    mod.rs                   — interactive menu + 9 direct subcommand handlers
  features/
    mod.rs                   — feature families, scaling, session-aware pipeline
  model/
    params.rs                — ModelParams (K, pi, A, mu, sigma²)
    simulate.rs              — Gaussian MSM generative sampler
    filter.rs                — Hamilton forward filter
    smoother.rs              — backward smoother (RTS)
    pairwise.rs              — pairwise posterior pass
    em.rs                    — Baum-Welch EM estimator
    diagnostics.rs           — fitted-model validity checks
  online/
    mod.rs                   — causal streaming filter (log-space, numerically stable)
  detector/
    hard_switch.rs           — Hard Switch detector
    posterior_transition.rs  — Posterior Transition detector
    surprise.rs              — Surprise (-log predictive) detector
    frozen.rs                — FrozenModel + StreamingSession
  calibration/
    mod.rs                   — empirical summary + synthetic mapping
    report.rs                — CalibrationReport workflow
  benchmark/
    mod.rs                   — event-window evaluation protocol
  real_eval/
    route_a.rs               — proxy event alignment
    route_b.rs               — segmentation self-consistency
    report.rs                — combined Route A + B report
  experiments/
    config.rs                — ExperimentConfig (fully serialisable)
    runner.rs                — ExperimentRunner<B> + ExperimentBackend trait
    synthetic_backend.rs     — SyntheticBackend: EM + detection + evaluation
    real_backend.rs          — RealBackend: DuckDB load, 70/15/15 split, Route A+B eval
    sim_to_real_backend.rs   — SimToRealBackend: train EM on calibrated synthetic, test online on real
    shared.rs                — backend-shared model training and online streaming helpers
    dry_run_backend.rs       — DryRunBackend: config validation without EM
    batch.rs                 — BatchConfig + run_batch + batch_summary.json
    result.rs                — ExperimentResult, RunStatus, EvaluationSummary
    registry.rs              — 13 registered experiment definitions (synthetic, real, sim-to-real)
    search.rs                — param-search grid + optimize() two-phase search driver
    artifact.rs              — run directory layout + snapshot helpers
  reporting/
    artifact.rs              — ArtifactRootConfig, RunArtifactLayout
    export/                  — JSON / CSV export (schema, json, csv)
    plot/                    — plotters-based renderers (5 plot types)
    table/                   — MetricsTableBuilder, ComparisonTableBuilder
    report.rs                — RunReporter, AggregateReporter

Documentation Index

Doc Topic
alphavantage_client.md Alpha Vantage HTTP client and rate limiting
duckdb_cache.md DuckDB schema and cache API
data_service.md DataService orchestration layer
data_pipeline.md Real financial data pipeline
interactive_cli.md Interactive CLI full reference
observation_design.md Feature families and observation pipeline
gaussian_msm_simulator.md Generative MSM simulator
emission_model.md Gaussian emission density
forward_filter.md Hamilton forward filter
filter_validation.md Filter validation on simulated data
log_likelihood.md Observed-data log-likelihood
backward_smoother.md RTS backward smoother
pairwise_posteriors.md Pairwise posterior transition probabilities
em_estimation.md Baum-Welch EM estimator
diagnostics.md Fitted-model diagnostics and trust checks
online_inference.md Online (streaming) causal inference
changepoint_detectors.md Detector variants (HardSwitch, PosteriorTransition, Surprise)
fixed_parameter_policy.md Offline-trained, online-frozen parameter policy
benchmark_protocol.md Synthetic benchmark and event-window evaluation
synthetic_to_real_calibration.md Synthetic-to-real calibration workflow
real_data_evaluation.md Real-data evaluation (Route A + B)
experiment_runner.md Experiment runner and reproducibility layer
reporting_and_export.md Reporting, plots, tables, and artifact export

Tests

cargo test

328 tests covering all core components: filter/smoother correctness, EM convergence, detector alarm logic, calibration mapping, benchmark matching, experiment runner orchestration, real-backend data pipeline, and artifact serialisation.

About

Change Point Detection Algorithm Based on a Markov Switching Model for Commodities made for my Master Thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors