Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook) by cafzal · Pull Request #90 · RelationalAI/templates

cafzal · 2026-06-22T17:26:43Z

What this does

Restores the v1/machine_maintenance multi-reasoner template (removed in #67), rebased on the real MANUFACTURING.PUBLIC dataset — 50 machines, 20 technicians, 8 products, 12 periods across 3 plants (15 CSVs in data/). The script threads four reasoners through one ontology (querying → graph → rules → prescriptive), and the runbook walks the 13 manufacturing reasoner-workflow eval questions with the real figure each stage produces.

Verification

The full machine_maintenance.py runs end-to-end against the live engine (all stages OPTIMAL, no errors). Every runbook figure comes from a real run — no predicted numbers.

Real data dumped from MANUFACTURING.PUBLIC to data/*.csv (15 tables, validated row counts)
Querying (Q1–Q5, Q7) reproduces the eval's expected answers exactly — OEE 78.3/68.0/63.3, downtime by fault & plant, failure ranking @ P12, waste rates, technician coverage
Rules (Q9) exact — risk tiers 3 Critical / 6 Elevated / 41 Standard (M001/M006/M011)
Graph (Q8) — bipartite betweenness ranks Pumps/Motors as top bottlenecks, corroborating the eval
Prescriptive (Q10–Q12) — baseline schedules all 50 (periods 1–10, 5/period); the T001 what-if drops exactly the 4 Plant_A Turbines (M001/M004/M006/M009), 46/50 scheduled — matches the eval's structural answer exactly
Predictive (Q6, Q13) — bundled pre-computed failure predictions; a live GNN is noted as an extension
v1 standards — relationalai==1.15.0, README front-matter + sections + sample-data table, .gitignore

Note: the graph and prescriptive objective values are the template's own sound formulations (the eval's exact cost coefficients aren't in any repo), so they corroborate the eval's structural findings rather than bit-matching its objective numbers. The seven deterministic querying/rules answers match to the digit.

Runbook: v1/machine_maintenance/runbook.md

Restores v1/machine_maintenance (removed in #67), now backed by the real 50-machine MANUFACTURING.PUBLIC dataset (50 machines, 20 technicians, 8 products, 12 periods, 3 plants). Runbook reframed around the 13 reasoner- workflow eval questions; querying (Q1-5, Q7) and rules (Q9) verified against the real data and reproduce the eval's expected answers exactly. Graph, predictive, and prescriptive stages plus the script rebuild are in progress.

Rebuild machine_maintenance.py as a four-stage multi-reasoner pipeline (querying, graph, rules, prescriptive) over the 50-machine MANUFACTURING.PUBLIC data; the full script runs OPTIMAL end-to-end. Querying and rules reproduce the eval's expected answers exactly (OEE 78.3/68.0/63.3, downtime drivers, risk tiers 3/6/41); graph and prescriptive use the template's own sound formulations and corroborate the eval's structural findings (Pumps/Motors bottlenecks; the T001 what-if drops the four Plant_A Turbines, 46/50 scheduled). Finalize runbook with the real figures, rewrite README for the 50-machine dataset, pin relationalai==1.15.0, add .gitignore.

README: drop invalid 'Querying' reasoning_type (docs CI enum), remove the H1 title, rewrite 'What this template is for' as a business problem statement, reorder How-it-works to match the script (rules before graph) with verbatim code snippets, move thresholds out of prose into those snippets, add assumed-knowledge and a real expected-output snippet. Script: add a Stage 0 ontology banner, fix Stage casing, and persist a MaintenancePlan headline concept after the solve so the plan stays queryable. Runbook: make the graph and prescriptive prompts question-shaped and reconcile the chain diagram to the script's four stages. No behavior change: full script still runs OPTIMAL with identical numbers (baseline 199.032 / 50-of-50; what-if 169.971 / 46-of-50); py_compile + ruff clean.

Re-verified both reworded runbook prompts by paste-testing them in fresh agents (no access to the script) against the live engine: - Prescriptive prompt reproduces exactly: OPTIMAL, 50/50 scheduled across P1-10, and the T001 what-if drops exactly M001/M004/M006/M009 (the four Houston Turbines) -- a fresh agent reached the same structural answer from its own formulation. - Graph prompt reproduces the same conclusion (the 20 three-product Pumps and Motors are the bottlenecks) but the betweenness score is construction-dependent and the top is a 20-way tie. Tightened the prompt to specify the bipartite construction and rewrote the response to state the tie honestly instead of an arbitrary top-8 ranking.

github-actions Bot had a problem deploying to Preview June 22, 2026 17:27 Failure

cafzal added 2 commits June 22, 2026 13:32

github-actions Bot had a problem deploying to Preview June 22, 2026 22:18 Failure

cafzal marked this pull request as ready for review June 22, 2026 22:39

cafzal requested review from jablonskidev and somacdivad as code owners June 22, 2026 22:39

github-actions Bot had a problem deploying to Preview June 22, 2026 22:40 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90

Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90
cafzal wants to merge 4 commits into
mainfrom
restore-machine-maintenance

cafzal commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cafzal commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cafzal commented Jun 22, 2026 •

edited

Loading