Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-21) — Chaos Experiments](#whats-new-2026-06-21--chaos-experiments)
- [What's new (2026-06-21) — JSON Contract & Snapshot Matching](#whats-new-2026-06-21--json-contract--snapshot-matching)
- [What's new (2026-06-21) — SLSA Build Provenance](#whats-new-2026-06-21--slsa-build-provenance)
- [What's new (2026-06-21) — Feature Flags](#whats-new-2026-06-21--feature-flags)
Expand Down Expand Up @@ -123,6 +124,12 @@

---

## What's new (2026-06-21) — Chaos Experiments

Inject faults, verify the system holds. Full reference: [`docs/source/Eng/doc/new_features/v71_features_doc.rst`](docs/source/Eng/doc/new_features/v71_features_doc.rst).

- **`ChaosExperiment` / `run_experiment` / `Probe` / `latency_fault` / `exception_fault`** (`AC_run_chaos`): `resilience` *recovers* from failures; this *causes* them and checks a steady-state hypothesis still holds (Chaos Toolkit lifecycle — verify before, inject faults, verify after, roll back LIFO). Probes/faults/rollbacks are callables; the clock/RNG/sleep are injectable so experiments run **deterministically** in tests with no real failures or sleeping. `AC_run_chaos` drives an action-list spec. Pure-stdlib.

## What's new (2026-06-21) — JSON Contract & Snapshot Matching

Match, diff and snapshot JSON payloads. Full reference: [`docs/source/Eng/doc/new_features/v70_features_doc.rst`](docs/source/Eng/doc/new_features/v70_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-21) — 混沌实验](#本次更新-2026-06-21--混沌实验)
- [本次更新 (2026-06-21) — JSON 合约与快照比对](#本次更新-2026-06-21--json-合约与快照比对)
- [本次更新 (2026-06-21) — SLSA 构建来源证明](#本次更新-2026-06-21--slsa-构建来源证明)
- [本次更新 (2026-06-21) — 功能旗标](#本次更新-2026-06-21--功能旗标)
Expand Down Expand Up @@ -122,6 +123,12 @@

---

## 本次更新 (2026-06-21) — 混沌实验

注入故障、验证系统仍成立。完整参考:[`docs/source/Zh/doc/new_features/v71_features_doc.rst`](../docs/source/Zh/doc/new_features/v71_features_doc.rst)。

- **`ChaosExperiment` / `run_experiment` / `Probe` / `latency_fault` / `exception_fault`**(`AC_run_chaos`):`resilience` 从失败中*恢复*;这则*制造*失败并检查稳态假设是否仍成立(Chaos Toolkit 生命周期 —— 之前验证、注入故障、之后验证、LIFO 回滚)。探针/故障/回滚皆为 callable;时钟/RNG/sleep 可注入,因此实验在测试中**确定地**执行,无真正失败或睡眠。`AC_run_chaos` 以动作列表 spec 驱动。纯标准库。

## 本次更新 (2026-06-21) — JSON 合约与快照比对

比对、取差异与快照 JSON 内容。完整参考:[`docs/source/Zh/doc/new_features/v70_features_doc.rst`](../docs/source/Zh/doc/new_features/v70_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-21) — 混沌實驗](#本次更新-2026-06-21--混沌實驗)
- [本次更新 (2026-06-21) — JSON 合約與快照比對](#本次更新-2026-06-21--json-合約與快照比對)
- [本次更新 (2026-06-21) — SLSA 建置來源證明](#本次更新-2026-06-21--slsa-建置來源證明)
- [本次更新 (2026-06-21) — 功能旗標](#本次更新-2026-06-21--功能旗標)
Expand Down Expand Up @@ -122,6 +123,12 @@

---

## 本次更新 (2026-06-21) — 混沌實驗

注入故障、驗證系統仍成立。完整參考:[`docs/source/Zh/doc/new_features/v71_features_doc.rst`](../docs/source/Zh/doc/new_features/v71_features_doc.rst)。

- **`ChaosExperiment` / `run_experiment` / `Probe` / `latency_fault` / `exception_fault`**(`AC_run_chaos`):`resilience` 從失敗中*復原*;這則*製造*失敗並檢查穩態假設是否仍成立(Chaos Toolkit 生命週期 —— 之前驗證、注入故障、之後驗證、LIFO 回滾)。探針/故障/回滾皆為 callable;時鐘/RNG/sleep 可注入,因此實驗在測試中**具決定性**地執行,無真正失敗或睡眠。`AC_run_chaos` 以動作清單 spec 驅動。純標準函式庫。

## 本次更新 (2026-06-21) — JSON 合約與快照比對

比對、取差異與快照 JSON 內容。完整參考:[`docs/source/Zh/doc/new_features/v70_features_doc.rst`](../docs/source/Zh/doc/new_features/v70_features_doc.rst)。
Expand Down
51 changes: 51 additions & 0 deletions docs/source/Eng/doc/new_features/v71_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Chaos Experiments
=================

``resilience`` *recovers* from failures (retry, circuit breaker); this is the
inverse — it *causes* controlled failures and checks that a steady-state
hypothesis still holds. Modelled on the Chaos Toolkit lifecycle: verify steady
state **before**, run the **method** (fault activities), verify steady state
**after**, then always run **rollbacks** (LIFO). It returns a journal.

Probes, faults and rollbacks are caller-supplied callables, and the clock / RNG
/ sleep are injectable, so an experiment runs deterministically in tests with
fakes — no real failures, no real sleeping. Pure standard library (``random`` +
``time``); imports no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import (
ChaosExperiment, Probe, run_experiment, latency_fault)

experiment = ChaosExperiment(
title="checkout survives slow payments",
probes=[Probe("service_up", check_health, tolerance=True),
Probe("p95_latency", measure_p95, tolerance=[0, 500])],
method=[latency_fault("payment_delay", delay_s=2.0, rate=0.5)],
rollbacks=[restore_network])

journal = run_experiment(experiment)
if journal["deviated"]:
print("hypothesis broke under fault:", journal["status"])

A ``Probe`` returns a value checked against its ``tolerance`` (a literal, a
``[low, high]`` range, or a predicate callable). ``run_experiment`` verifies the
hypothesis first — if it fails, the status is ``failed-before-method`` and the
method never runs — then applies each fault, re-verifies (setting ``deviated``
and status ``deviated`` if it no longer holds), and always runs rollbacks LIFO
in a ``finally``. Probe/fault/rollback errors are caught and recorded in the
journal rather than crashing the run. ``latency_fault`` and ``exception_fault``
are ready-made fault factories with an injectable RNG (rate) and sleep.

Executor command
----------------

``AC_run_chaos`` takes a ``spec`` (object or JSON string) whose probes, method
and rollbacks are **action lists** — ``{title, probes:[{name, action:[AC...]}],
method:[{name, action:[AC...]}], rollbacks:[[AC...]]}`` — and returns the
journal. A probe's steady state holds when its actions run without error. The
same operation is exposed as the MCP tool ``ac_run_chaos`` and as a Script
Builder command under **Flow**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v68_features_doc
doc/new_features/v69_features_doc
doc/new_features/v70_features_doc
doc/new_features/v71_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
43 changes: 43 additions & 0 deletions docs/source/Zh/doc/new_features/v71_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
混沌實驗(Chaos Experiments)
============================

``resilience`` 從失敗中*復原*(retry、circuit breaker);這是相反的一面 —— 它*製造*受控的失敗,
並檢查穩態假設是否仍成立。仿照 Chaos Toolkit 生命週期:**之前**驗證穩態、執行**方法**(故障活動)、
**之後**再驗證穩態,然後永遠執行**回滾**(LIFO)。它回傳一份 journal。

探針、故障與回滾皆為呼叫端提供的 callable,且時鐘 / RNG / sleep 可注入,因此實驗在測試中以假物件
具決定性地執行 —— 沒有真正的失敗、沒有真正的睡眠。純標準函式庫(``random`` + ``time``);不匯入
``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import (
ChaosExperiment, Probe, run_experiment, latency_fault)

experiment = ChaosExperiment(
title="checkout survives slow payments",
probes=[Probe("service_up", check_health, tolerance=True),
Probe("p95_latency", measure_p95, tolerance=[0, 500])],
method=[latency_fault("payment_delay", delay_s=2.0, rate=0.5)],
rollbacks=[restore_network])

journal = run_experiment(experiment)
if journal["deviated"]:
print("假設在故障下被打破:", journal["status"])

``Probe`` 回傳一個值,並以其 ``tolerance``(字面值、``[low, high]`` 範圍,或述詞 callable)檢查。
``run_experiment`` 先驗證假設 —— 若失敗,狀態為 ``failed-before-method`` 且方法不會執行 —— 接著
套用每個故障、再次驗證(若不再成立則設定 ``deviated`` 與狀態 ``deviated``),並在 ``finally`` 中
永遠以 LIFO 執行回滾。探針/故障/回滾的錯誤會被捕捉並記錄在 journal,而非讓執行崩潰。
``latency_fault`` 與 ``exception_fault`` 是現成的故障工廠,具可注入的 RNG(rate)與 sleep。

執行器命令
----------

``AC_run_chaos`` 接受一份 ``spec``(物件或 JSON 字串),其探針、方法與回滾皆為**動作清單** ——
``{title, probes:[{name, action:[AC...]}], method:[{name, action:[AC...]}], rollbacks:[[AC...]]}``
—— 並回傳 journal。當探針的動作執行而無錯誤時,其穩態即成立。同一操作亦以 MCP 工具
``ac_run_chaos`` 以及 Script Builder 中 **Flow** 分類下的命令提供。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v68_features_doc
doc/new_features/v69_features_doc
doc/new_features/v70_features_doc
doc/new_features/v71_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
7 changes: 7 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,11 @@
from je_auto_control.utils.json_contract import (
MatchReport, diff_json, match_json, normalize_json, snapshot_json,
)
# Deterministic chaos experiments (steady-state hypothesis + fault injection)
from je_auto_control.utils.chaos import (
ChaosExperiment, Fault, Probe, exception_fault, latency_fault,
run_experiment,
)
# Background popup/interrupt watchdog (unattended automation)
from je_auto_control.utils.watchdog import (
PopupWatchdog, WatchdogRule, default_popup_watchdog,
Expand Down Expand Up @@ -853,6 +858,8 @@ def start_autocontrol_gui(*args, **kwargs):
"build_provenance", "subject_for", "subject_for_bytes",
"verify_provenance", "write_provenance",
"MatchReport", "diff_json", "match_json", "normalize_json", "snapshot_json",
"ChaosExperiment", "Fault", "Probe", "exception_fault", "latency_fault",
"run_experiment",
# MCP server
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",
Expand Down
10 changes: 10 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -1388,6 +1388,16 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
),
description="Verify a JWT (alg allowlist + exp/nbf/aud); returns {ok, claims}.",
))
specs.append(CommandSpec(
"AC_run_chaos", "Flow", "Run Chaos Experiment",
fields=(
FieldSpec("spec", FieldType.STRING,
placeholder='{"title": "...", "probes": [{"name": "p", '
'"action": [...]}], "method": [{"name": "f", '
'"action": [...]}], "rollbacks": [[...]]}'),
),
description="Verify steady state, inject faults, re-verify, roll back.",
))
specs.append(CommandSpec(
"AC_run_saga", "Flow", "Run Saga (Compensating Rollback)",
fields=(
Expand Down
10 changes: 10 additions & 0 deletions je_auto_control/utils/chaos/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""Deterministic chaos experiments (steady-state hypothesis + fault injection)."""
from je_auto_control.utils.chaos.chaos import (
ChaosExperiment, Fault, Probe, exception_fault, latency_fault,
run_experiment,
)

__all__ = [
"ChaosExperiment", "Fault", "Probe", "exception_fault", "latency_fault",
"run_experiment",
]
143 changes: 143 additions & 0 deletions je_auto_control/utils/chaos/chaos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
"""Deterministic chaos experiments: steady-state hypothesis + fault injection.

``resilience`` *recovers* from failures (retry, circuit breaker); this is the
inverse — it *causes* controlled failures and checks a steady-state hypothesis
still holds. Modelled on the Chaos Toolkit lifecycle: verify steady state
before, run the method (fault activities), verify steady state after, then
always run rollbacks (LIFO). Returns a journal.

Probes, faults and rollbacks are caller-supplied callables, and the clock /
RNG / sleep are injectable, so an experiment runs deterministically in tests
with fakes — no real failures, no real sleeping. Pure standard library
(``random`` + ``time``); imports no ``PySide6``.
"""
import random
import time
from dataclasses import dataclass, field
from typing import Any, Callable, Dict, List, Optional, Sequence


@dataclass
class Probe:
"""A steady-state probe: ``call`` returns a value checked against ``tolerance``."""

name: str
call: Callable[[], Any]
tolerance: Any = True


@dataclass
class Fault:
"""A fault-injection activity run during the experiment method."""

name: str
apply: Callable[[], Any]


@dataclass
class ChaosExperiment:
"""An experiment: a steady-state hypothesis, a method, and rollbacks."""

title: str
probes: Sequence[Probe] = ()
method: Sequence[Fault] = ()
rollbacks: Sequence[Callable[[], Any]] = field(default_factory=tuple)


def _check_tolerance(value: Any, tolerance: Any) -> bool:
if callable(tolerance):
return bool(tolerance(value))
if isinstance(tolerance, (list, tuple)) and len(tolerance) == 2:
return tolerance[0] <= value <= tolerance[1]
return value == tolerance


def _verify_probes(probes: Sequence[Probe]) -> Dict[str, Any]:
results: List[Dict[str, Any]] = []
ok = True
for probe in probes:
try:
value = probe.call()
met = _check_tolerance(value, probe.tolerance)
results.append({"name": probe.name, "ok": met, "value": value})
except Exception as exc: # pylint: disable=broad-exception-caught
met = False
results.append({"name": probe.name, "ok": False,
"error": str(exc)})
ok = ok and met
return {"ok": ok, "probes": results}


def _apply_fault(fault: Fault) -> Dict[str, Any]:
try:
return {"name": fault.name, "ok": True, "result": fault.apply()}
except Exception as exc: # pylint: disable=broad-exception-caught
return {"name": fault.name, "ok": False, "error": str(exc)}


def _run_rollbacks(rollbacks: Sequence[Callable[[], Any]]) -> List[Dict[str, Any]]:
results: List[Dict[str, Any]] = []
for rollback in reversed(list(rollbacks)):
try:
rollback()
results.append({"ok": True})
except Exception as exc: # pylint: disable=broad-exception-caught
results.append({"ok": False, "error": str(exc)})
return results


def run_experiment(experiment: ChaosExperiment, *,
clock: Callable[[], float] = time.monotonic) -> Dict[str, Any]:
"""Run ``experiment`` and return a journal dict (Chaos-Toolkit shape)."""
start = clock()
before = _verify_probes(experiment.probes)
journal: Dict[str, Any] = {
"title": experiment.title,
"steady_states": {"before": before, "after": None},
"run": [], "rollbacks": [], "deviated": False, "status": "completed",
}
if not before["ok"]:
journal["status"] = "failed-before-method"
journal["duration"] = clock() - start
return journal
try:
for fault in experiment.method:
journal["run"].append(_apply_fault(fault))
after = _verify_probes(experiment.probes)
journal["steady_states"]["after"] = after
journal["deviated"] = not after["ok"]
if not after["ok"]:
journal["status"] = "deviated"
finally:
journal["rollbacks"] = _run_rollbacks(experiment.rollbacks)
journal["duration"] = clock() - start
return journal


def latency_fault(name: str, *, delay_s: float, rate: float = 1.0,
rng: Optional[random.Random] = None,
sleep: Callable[[float], None] = time.sleep) -> Fault:
"""A fault that sleeps ``delay_s`` with probability ``rate``."""
generator = rng or random.Random() # nosec B311 # reason: non-crypto chaos rate sampling

def apply() -> Dict[str, Any]:
if generator.random() < rate:
sleep(delay_s)
return {"injected": "latency", "delay_s": delay_s}
return {"injected": None}

return Fault(name=name, apply=apply)


def exception_fault(name: str, *, exc: type = RuntimeError,
message: str = "chaos", rate: float = 1.0,
rng: Optional[random.Random] = None) -> Fault:
"""A fault that raises ``exc`` with probability ``rate``."""
generator = rng or random.Random() # nosec B311 # reason: non-crypto chaos rate sampling

def apply() -> Dict[str, Any]:
if generator.random() < rate:
raise exc(message)
return {"injected": None}

return Fault(name=name, apply=apply)
Loading
Loading