Skip to content

refactor: split flow models into a package#484

Draft
dgenio wants to merge 4 commits into
mainfrom
refactor/396-split-flow-package
Draft

refactor: split flow models into a package#484
dgenio wants to merge 4 commits into
mainfrom
refactor/396-split-flow-package

Conversation

@dgenio

@dgenio dgenio commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

Split the 1,289-line chainweaver/flow.py from PR #474 into focused modules behind an unchanged chainweaver.flow facade. Runtime behavior, public API snapshots, serialized references, and legacy imports remain unchanged.

This is a stacked PR targeting #474's branch. After #474 merges, retarget or rebase this PR onto main.

Changes

  • Replace chainweaver/flow.py with focused definitions, steps, DAG, governance, drift, and reference-resolution modules.
  • Preserve chainweaver.flow imports and __module__ metadata through the package facade.
  • Add regression coverage for facade identity, legacy module identity, and pickle resolution.
  • Update the repository map, architecture ownership table, common-task paths, and changelog.

Why

The former module mixed independently changing concerns and had grown beyond the issue's size target. The extra refs.py and steps.py modules keep every implementation file below 450 lines while incorporating #474's schema-reference policy cleanly.

Tradeoffs / risks

  • The diff is mechanically large because definitions moved; an AST comparison confirmed all 20 moved definitions are unchanged.
  • The facade adds minor import indirection but preserves the established public and serialization surface.
  • No dependencies or migrations are introduced.

Scope notes (Mode B)

Testing

  • Linting passes (ruff check chainweaver/ tests/ examples/) — All checks passed!
  • Formatting check passes (ruff format --check chainweaver/ tests/ examples/) — 268 files already formatted
  • Type checking passes (python -m mypy chainweaver/ tests/) — no issues in 230 source files
  • All existing tests pass (python -m pytest tests/ -v) — 2064 passed, 1 skipped; 92.96% coverage
  • New tests added for new functionality — tests/test_flow_import_surface.py
  • Targeted compatibility tests — 81 passed
  • python -m mkdocs build --strict — documentation built successfully
  • python -m build and twine check dist/* — wheel and sdist built; all artifacts passed
  • Wheel inspection — all seven chainweaver/flow/ modules present; legacy flow.py absent

Issues closed by this PR

Closes #396

Related Issues

Checklist

  • Code follows project conventions (see AGENTS.md and docs/agent-context/)
  • Public API changes are documented (the stable surface is retained; the internal package layout is documented)
  • No secrets or credentials included

claude and others added 4 commits June 25, 2026 05:43
Bundles four tightly-coupled hardening issues that share the flow
deserialization path (the issues explicitly cross-reference one another):

#416 — parse-size & structural guardrails: add FlowParseLimits
  (max_bytes/nodes/depth/string_length/steps) applied by default in
  flow_from_json/yaml/dict; a byte-size precheck plus a bounded structural
  walk that also caps traversal of YAML alias/anchor expansion. Overridable
  via limits=, with FlowParseLimits.unlimited() to opt out.

#345 — schema-ref module-resolution allowlist: a ContextVar-backed policy
  (set_schema_ref_policy / schema_ref_policy / SchemaRefAllowlist) consulted
  in resolve_class_ref BEFORE importlib.import_module, so a denied module's
  top-level code never runs. New SchemaRefPolicyError (CW-E051). Surfaced on
  the CLI as `--schema-ref-allow PREFIX` for `run` and `serve` (the commands
  that actually resolve refs).

#400 — adversarial corpus: tests/corpus/flow_files with 29 malformed files +
  a manifest, driven through the library loaders and `chainweaver validate`
  (table + JSON), plus generated resource-shaped cases for the #416 limits.

#340 — scheduled fuzz workflow: .github/workflows/fuzz.yml runs the existing
  fuzz harness weekly + on dispatch over a new schema-typed fixture
  (examples/fuzzable_linear.flow.yaml) against a graceful-handling invariant
  (examples/fuzz_properties.py); seed echoed for repro, counterexample
  uploaded on failure.

Docs: security.md (untrusted flow-file loading), workflows.md (fuzz job +
corpus), error-table.md (CW-E051). Public API + snapshot updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01A2nffdLkZjypkTDW6rGLeo
The lint job's actionlint/shellcheck flagged SC2153 ("SEED/RUNS may not be
assigned; did you mean seed/runs?") because the run script reassigned the
env-provided vars to lowercase locals. Drop the locals and reference the
all-caps env vars directly — shellcheck treats all-caps names as environment
variables, so neither SC2153 nor SC2154 fires. Behaviour is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01A2nffdLkZjypkTDW6rGLeo
- test_flow_corpus: assert the real >= 25 corpus cases (29 committed satisfy
  it) instead of >= 20, matching issue #400's acceptance criterion.
- cli: fix the --schema-ref-allow help example so the sentence-final period
  isn't copy-pasted as a trailing dot in the module prefix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01A2nffdLkZjypkTDW6rGLeo
@dgenio dgenio changed the base branch from claude/issue-triage-grouping-hy3zh2 to main July 1, 2026 10:43
@dgenio dgenio requested a review from Copilot July 1, 2026 10:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the flow model layer by replacing the monolithic chainweaver/flow.py with a chainweaver/flow/ package behind a stable chainweaver.flow facade, while also extending flow-file hardening (parse limits + schema-ref allowlist policy), adding regression tests, and updating documentation/CI to reflect the new structure and security posture.

Changes:

  • Split chainweaver/flow.py into focused modules (definitions, steps, dag, governance, drift, refs) and preserve legacy import + pickle/module-qualified identity via chainweaver.flow facade.
  • Add/extend untrusted flow-file defenses: FlowParseLimits/DEFAULT_PARSE_LIMITS in serialization.py and a schema-ref module allowlist policy (plus CLI wiring).
  • Add regression tests, an adversarial corpus, and a scheduled fuzz workflow; update docs (security, architecture, workflows) and changelog.

Reviewed changes

Copilot reviewed 59 out of 60 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_schema_ref_policy.py Exercises schema-ref allowlist behavior, scoping, pre-import rejection, and CLI flag behavior.
tests/test_flow_import_surface.py Verifies facade identity (is equality), stable __module__, and pickle round-trips for moved symbols.
tests/test_flow_corpus.py Validates adversarial corpus failures and generated resource-limit cases for deserialization guardrails.
tests/test_cli_serve.py Updates serve test to pass the new schema_ref_allow parameter.
tests/schema_ref_sentinel.py Sentinel module used to prove rejected refs fail before importing.
tests/fixtures/public_api.json Updates public API snapshot for new exported symbols/signature changes.
tests/corpus/flow_files/README.md Documents the adversarial flow-file corpus layout and how to add cases.
tests/corpus/flow_files/manifest.json Enumerates corpus cases and expected FlowSerializationError.detail substrings.
tests/corpus/flow_files/invalid/version_wrong_type.flow.json Corpus fixture: invalid version type.
tests/corpus/flow_files/invalid/unsafe_python_tag.flow.yaml Corpus fixture: unsafe YAML tag.
tests/corpus/flow_files/invalid/unknown_type.flow.json Corpus fixture: unknown type discriminator.
tests/corpus/flow_files/invalid/unclosed_sequence.flow.yaml Corpus fixture: malformed YAML sequence.
tests/corpus/flow_files/invalid/truncated.flow.json Corpus fixture: truncated JSON.
tests/corpus/flow_files/invalid/trailing_comma.flow.json Corpus fixture: invalid JSON trailing comma.
tests/corpus/flow_files/invalid/top_level_string.flow.json Corpus fixture: wrong top-level type (string).
tests/corpus/flow_files/invalid/top_level_scalar.flow.yaml Corpus fixture: wrong top-level type (scalar).
tests/corpus/flow_files/invalid/top_level_number.flow.json Corpus fixture: wrong top-level type (number).
tests/corpus/flow_files/invalid/top_level_null.flow.json Corpus fixture: wrong top-level type (null).
tests/corpus/flow_files/invalid/top_level_list.flow.yaml Corpus fixture: wrong top-level type (list).
tests/corpus/flow_files/invalid/top_level_list.flow.json Corpus fixture: wrong top-level type (list).
tests/corpus/flow_files/invalid/tab_indentation.flow.yaml Corpus fixture: invalid YAML indentation (tab).
tests/corpus/flow_files/invalid/steps_scalar.flow.yaml Corpus fixture: steps wrong type (scalar).
tests/corpus/flow_files/invalid/steps_null.flow.json Corpus fixture: steps wrong type (null).
tests/corpus/flow_files/invalid/steps_not_a_list.flow.json Corpus fixture: steps wrong type (string).
tests/corpus/flow_files/invalid/step_not_an_object.flow.json Corpus fixture: step list contains non-object.
tests/corpus/flow_files/invalid/step_missing_tool_name.flow.json Corpus fixture: step missing tool_name.
tests/corpus/flow_files/invalid/step_input_mapping_scalar.flow.json Corpus fixture: input_mapping wrong type.
tests/corpus/flow_files/invalid/name_wrong_type.flow.json Corpus fixture: invalid name type.
tests/corpus/flow_files/invalid/missing_type.flow.yaml Corpus fixture: missing type discriminator (YAML).
tests/corpus/flow_files/invalid/missing_type.flow.json Corpus fixture: missing type discriminator (JSON).
tests/corpus/flow_files/invalid/missing_required_fields.flow.yaml Corpus fixture: missing required fields (YAML).
tests/corpus/flow_files/invalid/missing_required_fields.flow.json Corpus fixture: missing required fields (JSON).
tests/corpus/flow_files/invalid/future_format_version.flow.json Corpus fixture: unsupported format_version.
tests/corpus/flow_files/invalid/empty.flow.yaml Corpus fixture: empty YAML payload.
tests/corpus/flow_files/invalid/dag_step_missing_id.flow.json Corpus fixture: DAG step missing step_id.
tests/corpus/flow_files/invalid/dag_missing_version.flow.json Corpus fixture: DAG missing required version.
tests/corpus/flow_files/invalid/bom_prefixed.flow.json Corpus fixture: BOM-prefixed JSON.
examples/README.md Documents fuzzing example scripts/fixtures.
examples/fuzzable_linear.flow.yaml Adds a schema-typed fuzz fixture flow for scheduled fuzzing.
examples/fuzz_properties.py Adds a custom fuzz property (gracefully_handles_input) for CI fuzz gating.
docs/security.md Documents untrusted flow-file guardrails and schema-ref allowlist policy.
docs/reference/error-table.md Adds CW-E051 row for SchemaRefPolicyError.
docs/agent-context/workflows.md Documents the fuzz workflow and adversarial corpus conventions.
docs/agent-context/architecture.md Updates module map to reflect flow/ package split and refs.py policy hook.
CHANGELOG.md Notes the flow model package split.
chainweaver/serialization.py Introduces FlowParseLimits + default structural/size guardrails on all deserialization entry points.
chainweaver/flow/steps.py Moves step/retry models into dedicated module.
chainweaver/flow/refs.py Adds schema-ref module-resolution policy + class-ref resolution helper.
chainweaver/flow/governance.py Moves governance lifecycle models into dedicated module.
chainweaver/flow/drift.py Moves drift record model into dedicated module.
chainweaver/flow/definitions.py Moves core linear flow definitions and shared types into dedicated module.
chainweaver/flow/dag.py Moves DAG models/topology validation into dedicated module.
chainweaver/flow/init.py Provides stable chainweaver.flow facade + preserves __module__ for moved symbols.
chainweaver/flow.py Removes the former monolithic flow module.
chainweaver/exceptions.py Adds SchemaRefPolicyError and registers stable diagnostic code CW-E051.
chainweaver/cli/run.py Wires --schema-ref-allow into run and serve commands.
chainweaver/cli/_shared.py Adds shared CLI option + helper for installing schema-ref allowlist policy.
chainweaver/init.py Exposes new flow-policy and parse-limit symbols on the public API surface.
AGENTS.md Updates repo map and “common tasks” pointers to new flow/ module locations.
.github/workflows/fuzz.yml Adds scheduled/on-demand fuzz workflow (non-PR-blocking) with counterexample upload.

Comment thread chainweaver/exceptions.py
self.detail = detail or (
f"Module '{module_path}' is not permitted by the active schema-ref policy"
)
super().__init__(f"Schema ref '{ref}' rejected: {self.detail}.")
Comment on lines +414 to +415
if prefixes:
set_schema_ref_policy(SchemaRefAllowlist(prefixes))
Comment thread chainweaver/flow/steps.py
Comment on lines +69 to +71
Raises:
FlowSerializationError: When any ref cannot be resolved or does
not point to a :class:`BaseException` subclass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants