Skip to content

Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90

Open
cafzal wants to merge 4 commits into
mainfrom
restore-machine-maintenance
Open

Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90
cafzal wants to merge 4 commits into
mainfrom
restore-machine-maintenance

Conversation

@cafzal

@cafzal cafzal commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

What this does

Restores the v1/machine_maintenance multi-reasoner template (removed in #67), rebased on the real MANUFACTURING.PUBLIC dataset — 50 machines, 20 technicians, 8 products, 12 periods across 3 plants (15 CSVs in data/). The script threads four reasoners through one ontology (querying → graph → rules → prescriptive), and the runbook walks the 13 manufacturing reasoner-workflow eval questions with the real figure each stage produces.

Verification

The full machine_maintenance.py runs end-to-end against the live engine (all stages OPTIMAL, no errors). Every runbook figure comes from a real run — no predicted numbers.

  • Real data dumped from MANUFACTURING.PUBLIC to data/*.csv (15 tables, validated row counts)
  • Querying (Q1–Q5, Q7) reproduces the eval's expected answers exactly — OEE 78.3/68.0/63.3, downtime by fault & plant, failure ranking @ P12, waste rates, technician coverage
  • Rules (Q9) exact — risk tiers 3 Critical / 6 Elevated / 41 Standard (M001/M006/M011)
  • Graph (Q8) — bipartite betweenness ranks Pumps/Motors as top bottlenecks, corroborating the eval
  • Prescriptive (Q10–Q12) — baseline schedules all 50 (periods 1–10, 5/period); the T001 what-if drops exactly the 4 Plant_A Turbines (M001/M004/M006/M009), 46/50 scheduled — matches the eval's structural answer exactly
  • Predictive (Q6, Q13) — bundled pre-computed failure predictions; a live GNN is noted as an extension
  • v1 standards — relationalai==1.15.0, README front-matter + sections + sample-data table, .gitignore

Note: the graph and prescriptive objective values are the template's own sound formulations (the eval's exact cost coefficients aren't in any repo), so they corroborate the eval's structural findings rather than bit-matching its objective numbers. The seven deterministic querying/rules answers match to the digit.

Runbook: v1/machine_maintenance/runbook.md

Restores v1/machine_maintenance (removed in #67), now backed by the real
50-machine MANUFACTURING.PUBLIC dataset (50 machines, 20 technicians, 8
products, 12 periods, 3 plants). Runbook reframed around the 13 reasoner-
workflow eval questions; querying (Q1-5, Q7) and rules (Q9) verified against
the real data and reproduce the eval's expected answers exactly. Graph,
predictive, and prescriptive stages plus the script rebuild are in progress.
cafzal added 2 commits June 22, 2026 13:32
Rebuild machine_maintenance.py as a four-stage multi-reasoner pipeline
(querying, graph, rules, prescriptive) over the 50-machine MANUFACTURING.PUBLIC
data; the full script runs OPTIMAL end-to-end. Querying and rules reproduce the
eval's expected answers exactly (OEE 78.3/68.0/63.3, downtime drivers, risk
tiers 3/6/41); graph and prescriptive use the template's own sound formulations
and corroborate the eval's structural findings (Pumps/Motors bottlenecks; the
T001 what-if drops the four Plant_A Turbines, 46/50 scheduled).

Finalize runbook with the real figures, rewrite README for the 50-machine
dataset, pin relationalai==1.15.0, add .gitignore.
README: drop invalid 'Querying' reasoning_type (docs CI enum), remove the H1
title, rewrite 'What this template is for' as a business problem statement,
reorder How-it-works to match the script (rules before graph) with verbatim
code snippets, move thresholds out of prose into those snippets, add
assumed-knowledge and a real expected-output snippet.

Script: add a Stage 0 ontology banner, fix Stage casing, and persist a
MaintenancePlan headline concept after the solve so the plan stays queryable.

Runbook: make the graph and prescriptive prompts question-shaped and reconcile
the chain diagram to the script's four stages.

No behavior change: full script still runs OPTIMAL with identical numbers
(baseline 199.032 / 50-of-50; what-if 169.971 / 46-of-50); py_compile + ruff clean.
Re-verified both reworded runbook prompts by paste-testing them in fresh agents
(no access to the script) against the live engine:
- Prescriptive prompt reproduces exactly: OPTIMAL, 50/50 scheduled across P1-10,
  and the T001 what-if drops exactly M001/M004/M006/M009 (the four Houston
  Turbines) -- a fresh agent reached the same structural answer from its own
  formulation.
- Graph prompt reproduces the same conclusion (the 20 three-product Pumps and
  Motors are the bottlenecks) but the betweenness score is construction-dependent
  and the top is a 20-way tie. Tightened the prompt to specify the bipartite
  construction and rewrote the response to state the tie honestly instead of an
  arbitrary top-8 ranking.
@cafzal cafzal marked this pull request as ready for review June 22, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant