Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90
Open
cafzal wants to merge 4 commits into
Open
Restore machine_maintenance template (real MANUFACTURING.PUBLIC data + eval-aligned runbook)#90cafzal wants to merge 4 commits into
cafzal wants to merge 4 commits into
Conversation
Restores v1/machine_maintenance (removed in #67), now backed by the real 50-machine MANUFACTURING.PUBLIC dataset (50 machines, 20 technicians, 8 products, 12 periods, 3 plants). Runbook reframed around the 13 reasoner- workflow eval questions; querying (Q1-5, Q7) and rules (Q9) verified against the real data and reproduce the eval's expected answers exactly. Graph, predictive, and prescriptive stages plus the script rebuild are in progress.
Rebuild machine_maintenance.py as a four-stage multi-reasoner pipeline (querying, graph, rules, prescriptive) over the 50-machine MANUFACTURING.PUBLIC data; the full script runs OPTIMAL end-to-end. Querying and rules reproduce the eval's expected answers exactly (OEE 78.3/68.0/63.3, downtime drivers, risk tiers 3/6/41); graph and prescriptive use the template's own sound formulations and corroborate the eval's structural findings (Pumps/Motors bottlenecks; the T001 what-if drops the four Plant_A Turbines, 46/50 scheduled). Finalize runbook with the real figures, rewrite README for the 50-machine dataset, pin relationalai==1.15.0, add .gitignore.
README: drop invalid 'Querying' reasoning_type (docs CI enum), remove the H1 title, rewrite 'What this template is for' as a business problem statement, reorder How-it-works to match the script (rules before graph) with verbatim code snippets, move thresholds out of prose into those snippets, add assumed-knowledge and a real expected-output snippet. Script: add a Stage 0 ontology banner, fix Stage casing, and persist a MaintenancePlan headline concept after the solve so the plan stays queryable. Runbook: make the graph and prescriptive prompts question-shaped and reconcile the chain diagram to the script's four stages. No behavior change: full script still runs OPTIMAL with identical numbers (baseline 199.032 / 50-of-50; what-if 169.971 / 46-of-50); py_compile + ruff clean.
Re-verified both reworded runbook prompts by paste-testing them in fresh agents (no access to the script) against the live engine: - Prescriptive prompt reproduces exactly: OPTIMAL, 50/50 scheduled across P1-10, and the T001 what-if drops exactly M001/M004/M006/M009 (the four Houston Turbines) -- a fresh agent reached the same structural answer from its own formulation. - Graph prompt reproduces the same conclusion (the 20 three-product Pumps and Motors are the bottlenecks) but the betweenness score is construction-dependent and the top is a 20-way tie. Tightened the prompt to specify the bipartite construction and rewrote the response to state the tie honestly instead of an arbitrary top-8 ranking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Restores the
v1/machine_maintenancemulti-reasoner template (removed in #67), rebased on the realMANUFACTURING.PUBLICdataset — 50 machines, 20 technicians, 8 products, 12 periods across 3 plants (15 CSVs indata/). The script threads four reasoners through one ontology (querying → graph → rules → prescriptive), and the runbook walks the 13 manufacturing reasoner-workflow eval questions with the real figure each stage produces.Verification
The full
machine_maintenance.pyruns end-to-end against the live engine (all stages OPTIMAL, no errors). Every runbook figure comes from a real run — no predicted numbers.MANUFACTURING.PUBLICtodata/*.csv(15 tables, validated row counts)relationalai==1.15.0, README front-matter + sections + sample-data table,.gitignoreNote: the graph and prescriptive objective values are the template's own sound formulations (the eval's exact cost coefficients aren't in any repo), so they corroborate the eval's structural findings rather than bit-matching its objective numbers. The seven deterministic querying/rules answers match to the digit.
Runbook:
v1/machine_maintenance/runbook.md