From 36043a8c9184587579d97489ad3ad168125d7a37 Mon Sep 17 00:00:00 2001 From: TouqeerHamdani Date: Tue, 12 May 2026 21:03:23 +0530 Subject: [PATCH 1/2] level-5: Touqeer Hamdani --- submissions/Touqeer-Hamdani/level5/answers.md | 322 ++++++++++++++++++ submissions/Touqeer-Hamdani/level5/schema.md | 79 +++++ 2 files changed, 401 insertions(+) create mode 100644 submissions/Touqeer-Hamdani/level5/answers.md create mode 100644 submissions/Touqeer-Hamdani/level5/schema.md diff --git a/submissions/Touqeer-Hamdani/level5/answers.md b/submissions/Touqeer-Hamdani/level5/answers.md new file mode 100644 index 000000000..5ee08a4fc --- /dev/null +++ b/submissions/Touqeer-Hamdani/level5/answers.md @@ -0,0 +1,322 @@ +# Level 5 — Graph Thinking + +**Author:** Touqeer Hamdani +**Date:** May 2026 + +--- + +## Q1. Model It (20 pts) + +### Graph Schema + +> Full diagram: [`schema.md`](./schema.md) + +The graph schema is designed around the 3 factory CSVs and captures the full production planning domain — projects, what they produce, where they're built, who builds them, and when. + +### Node Labels (8) + +| Label | Source | Key Properties | Count | +|-------|--------|----------------|-------| +| **Project** | production.csv → `project_id`, `project_number`, `project_name` | project_id, project_number, project_name | 8 | +| **Product** | production.csv → `product_type`, `unit` | product_type, unit | 7 | +| **Station** | production.csv → `station_code`, `station_name` | station_code, station_name | 10 | +| **Worker** | workers.csv → `worker_id`, `name` | worker_id, name, role, hours_per_week, type | 14 | +| **Week** | capacity.csv → `week` | week_id | 8 | +| **Factory** | Implicit overall plant | factory_name | 1 | +| **Certification** | workers.csv → `certifications` (split by comma) | cert_name | 23 unique | +| **Etapp** | production.csv → `etapp` | etapp_name | 2 (ET1, ET2) | + +### Relationship Types (9) + +| Relationship | Direction | Properties | +|-------------|-----------|------------| +| **PRODUCES** | `(Project)→(Product)` | `quantity`, `unit_factor`, `unit` | +| **SCHEDULED_AT** | `(Project)→(Station)` | `planned_hours`, `actual_hours`, `completed_units`, `week`, `etapp`, `bop`, `variance_pct` | +| **ACTIVE_IN** | `(Project)→(Week)` | — | +| **IN_PHASE** | `(Project)→(Etapp)` | — | +| **WORKS_AT** | `(Worker)→(Station)` | — (primary station assignment) | +| **CAN_COVER** | `(Worker)→(Station)` | — (cross-trained coverage) | +| **HOLDS** | `(Worker)→(Certification)` | — | +| **LOADED_IN** | `(Station)→(Week)` | `total_planned`, `total_actual` | +| **HAS_CAPACITY** | `(Week)→(Factory)` | `own_hours`, `hired_hours`, `overtime_hours`, `total_planned`, `deficit` | + +### Data-Carrying Relationships (4) + +1. **PRODUCES** — Each project-to-product edge carries `{quantity: 600, unit_factor: 1.77, unit: "meter"}`, capturing the production spec. This lets you query things like "which projects produce more than 500 meters of IQB?" directly from the relationship. + +2. **SCHEDULED_AT** — The core operational edge. Each project-station-week combination carries `{planned_hours: 48.0, actual_hours: 45.2, completed_units: 28, week: "w1", etapp: "ET1", bop: "BOP1"}`. This is the richest relationship in the graph — it's where all the variance analysis lives, and where we track the phase (`etapp`, `bop`) of the work. + +3. **LOADED_IN** — Aggregated station load per week: `{total_planned: 393, total_actual: 410}`. Enables capacity-vs-demand queries at the station level without re-aggregating from SCHEDULED_AT every time. *(Note: these properties are calculated by aggregating `SCHEDULED_AT` edges during graph construction, as `factory_capacity.csv` only provides factory-wide totals).* + +4. **HAS_CAPACITY** — Links each week to the global factory, carrying the `{own_hours, hired_hours, overtime_hours, total_planned, deficit}` workforce metrics straight out of `factory_capacity.csv`. This perfectly mirrors the exact relationship pattern requested in the L6 instructions. + +### Design Decisions + +- **Certification as a node** (not a Worker property): Workers share certifications (e.g., multiple workers hold MIG/MAG). Modeling it as a node enables queries like "find all workers certified for TIG welding" with a single hop instead of string parsing. +- **Etapp and BOP as properties on SCHEDULED_AT**: Since a single project can move through different BOPs (phases) across different stations and weeks, treating `bop` and `etapp` as edge properties accurately models *when and where* that phase occurs, rather than applying a blanket phase to the entire project. +- **SCHEDULED_AT carries `week` as a property** rather than routing through Week nodes: This keeps the most queried relationship (planned vs actual hours) as a direct Project→Station edge. The separate ACTIVE_IN relationship to Week handles the temporal dimension when needed. +- **Etapp as a node** (for L6 compliance): The L6 spec explicitly requires `Etapp` as a node label. From a pure design perspective, etapp works better as an edge property on SCHEDULED_AT (only 2 values, no properties of its own), and we keep it there for direct querying. The `Etapp` node + `IN_PHASE` relationship is included to meet the L6 minimum graph requirements. + +### Implementation Notes for L6 + +- **SCHEDULED_AT creates parallel edges**: A single (Project, Station) pair can have multiple SCHEDULED_AT edges — one per week/etapp/bop/product combination. For example, P01→Station 011 appears in both w1 and w2. Additionally, P05→Station 018 has two rows in the same week (w1) with the same etapp/bop (ET2/BOP3) but different product types (SB and SD). In `seed_graph.py`, use `MERGE` with a composite key including `week`, `etapp`, `bop`, **and** `product_type` to ensure idempotency without data loss: + ```cypher + MERGE (p:Project {project_id: row.project_id}) + MERGE (s:Station {station_code: row.station_code}) + MERGE (p)-[r:SCHEDULED_AT {week: row.week, etapp: row.etapp, bop: row.bop, product_type: row.product_type}]->(s) + SET r.planned_hours = toFloat(row.planned_hours), + r.actual_hours = toFloat(row.actual_hours), + r.completed_units = toInteger(row.completed_units) + ``` +- **PRODUCES needs deduplication**: The same (Project, Product) pair appears across many CSV rows (different weeks/stations), but the production spec (`quantity`, `unit_factor`, `unit`) is constant per pair. Create **one** PRODUCES edge per unique `(project_id, product_type)` — either deduplicate in Python before loading, or use `MERGE` on the pair: + ```cypher + MERGE (p:Project {project_id: row.project_id}) + MERGE (prod:Product {product_type: row.product_type}) + MERGE (p)-[r:PRODUCES]->(prod) + SET r.quantity = toInteger(row.quantity), + r.unit_factor = toFloat(row.unit_factor), + r.unit = row.unit + ``` + +--- + +## Q2. Why Not Just SQL? (20 pts) + +*Prompt: "Which workers are certified to cover Station 016 (Gjutning) when Per Gustafsson is on vacation, and which projects would be affected?"* + +> **Data Reality Check:** In `factory_workers.csv`, the worker at Station 016 is actually named **Per Hansen** (W07), not Per Gustafsson. The queries below reflect the actual data. + +### 1. The SQL Query +Assuming a standard relational schema with normalized tables (`Workers`, `Worker_Coverage`, `Stations`, `Project_Schedules`, `Projects`), we must join 5 tables to traverse the relationships: + +```sql +SELECT + w.name AS CoveringWorker, + GROUP_CONCAT(DISTINCT p.project_name) AS AffectedProjects +FROM Workers w +JOIN Worker_Coverage wc ON w.worker_id = wc.worker_id +JOIN Project_Schedules ps ON wc.station_code = ps.station_code +JOIN Projects p ON ps.project_id = p.project_id +WHERE wc.station_code = '016' + AND w.name != 'Per Hansen' +GROUP BY w.name; +``` + +### 2. The Cypher Query +Using our graph schema, the query becomes a visual representation of the path: `Worker → Station ← Project`: + +```cypher +MATCH (w:Worker)-[:CAN_COVER]->(s:Station {station_code: "016"})<-[:SCHEDULED_AT]-(p:Project) +WHERE w.name <> "Per Hansen" +RETURN w.name AS CoveringWorker, collect(DISTINCT p.project_name) AS AffectedProjects +``` + +### 3. What the graph makes obvious that SQL hides +SQL forces you to think about database mechanics—specifically, resolving foreign keys across multiple intermediate junction tables just to traverse a simple real-world relationship. The graph version (Cypher) hides those storage mechanics and makes the network topology instantly obvious, perfectly mirroring how a human visualizes the factory floor: "Find workers who point to this station, and find projects that point to this station." + +--- + +## Q3. Spot the Bottleneck (20 pts) + +### 1. Identifying the Overload + +From `factory_capacity.csv`, five out of eight weeks show capacity deficits: + +| Week | Total Capacity | Total Planned | Deficit | +|------|---------------|---------------|---------| +| w1 | 480 | 612 | **-132** | +| w2 | 520 | 645 | **-125** | +| w4 | 500 | 550 | **-50** | +| w6 | 440 | 520 | **-80** | +| w7 | 520 | 600 | **-80** | + +Using `factory_production.csv` to drill into the two worst weeks (w1 and w2): + +**Volume Bottleneck (Station 011):** Station 011 (FS IQB) is the primary structural bottleneck. In w1, it is scheduled to handle work from 7 projects simultaneously (P01, P02, P03, P04, P05, P07, P08). As the entry point of the manufacturing pipeline, it creates a massive initial capacity constraint. + +**Volume Driver (Project P05):** Project P05 (Sjukhus Linköping ET2) is the largest individual contributor (1200 meters of IQB). It heavily loads the early-stage stations in w1. + +**Efficiency Overruns (Station 016):** While 011 causes deficits via sheer scheduled volume, Station 016 (Gjutning / Casting) causes deficits through poor execution efficiency. Looking at the worst overruns by percentage (actual vs planned hours): + +| Project | Station | Week | Planned | Actual | Variance | +|---------|---------|------|---------|--------|----------| +| P03 | 016 Gjutning | w2 | 28.0 | 35.0 | **+25%** | +| P04 | 018 SB B/F-hall | w1 | 19.0 | 22.0 | **+16%** | +| P05 | 016 Gjutning | w2 | 35.0 | 40.0 | **+14%** | +| P03 | 014 Svets o montage | w1 | 42.0 | 48.0 | **+14%** | +| P08 | 016 Gjutning | w3 | 22.0 | 25.0 | **+14%** | + +Station 016 appears repeatedly in the worst overruns. Therefore, the factory capacity deficit is a dual problem: structural schedule overload at the start of the pipeline (011), and severe execution overruns at the finishing stages (016). + +### 2. Cypher Query + +```cypher +MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) +WHERE r.actual_hours > r.planned_hours * 1.1 +RETURN s.station_name AS Station, + collect({ + project: p.project_name, + variance_pct: round((r.actual_hours - r.planned_hours) / r.planned_hours * 100, 1) + }) AS Overruns +``` + +### 3. Modeling the Alert as a Graph Pattern + +**Approach: Store `variance_pct` as a numeric property on SCHEDULED_AT.** + +During graph seeding, compute and store the variance percentage on each scheduling edge: + +```cypher +SET r.variance_pct = round((r.actual_hours - r.planned_hours) / r.planned_hours * 100, 1) +``` + +This means the threshold is applied at **query time**, not at seed time — making it fully flexible: + +```cypher +// 10% threshold for alerts +MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) +WHERE r.variance_pct > 10 +RETURN s.station_name, p.project_name, r.variance_pct ORDER BY r.variance_pct DESC + +// 5% threshold for Q4's hybrid query (finding well-executed projects) +MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) +WHERE r.variance_pct < 5 +RETURN p.project_name, avg(r.variance_pct) +``` + +**Why this over a `(:Bottleneck)` node or a boolean flag:** +- A dedicated `(:Alert)` node adds schema complexity (extra nodes + relationships) for what is essentially a simple numeric comparison on existing data. +- A boolean `overrun: true/false` flag loses the magnitude — a 11% overrun and a 50% overrun both say `true`, and changing the threshold requires re-seeding. +- A numeric `variance_pct` preserves full fidelity, keeps the data where it naturally belongs (on the scheduling edge), and lets dashboards apply any threshold on the fly. + +--- + +## Q4. Vector + Graph Hybrid (20 pts) + +*Prompt text:* "450 meters of IQB beams for a hospital extension in Linköping, similar scope to previous hospital projects, tight timeline" + +### 1. What to Embed +There are two ways to handle this, ranging from a simple baseline to a robust production system: + +**Approach A: Composite Description (Baseline & Simplicity)** +The simplest method is to create a single composite text block for each project combining its name, location, building type, and product scope (e.g., `"Sjukhus Linköping ET2, hospital, Linköping, 1200m IQB..."`). We embed this entire paragraph. This captures the overall semantic context ("vibe") perfectly for basic similarity searches. + +**Approach B: Metadata Extraction & Filtering (Robust Precision)** +Relying entirely on a single embedding can sometimes be risky (e.g., the model might heavily weight "tight timeline" and return a project from the wrong city). A more precise, production-grade approach is to use an LLM to extract structured metadata from the free-text query (e.g., `location: "Linköping"`, `material: "IQB beams"`). We then use those extracted properties to perform exact comparisons and use them as **hard graph filters**, relying on the vector embedding purely for the fuzzy semantic matching of the remaining context. + +*For the L5/L6 scope, Approach A is the standard expected baseline, but Approach B represents a more advanced architecture.* + +### 2. The Hybrid Query +This query performs a two-stage pipeline: it uses Neo4j's vector index to find semantically similar projects, and then traverses the graph to filter out projects that were executed poorly. + +```cypher +// Stage 1: Vector Search for top 5 semantic matches +CALL db.index.vector.queryNodes('project_embeddings', 5, $queryEmbedding) +YIELD node AS candidate, score + +// Stage 2: Graph Traversal for operational quality +MATCH (candidate)-[r:SCHEDULED_AT]->(s:Station) +WHERE s.station_code IN ["011", "012", "013", "014", "016", "017"] // IQB pipeline stations + AND r.variance_pct < 5 // Must be a well-executed project + +RETURN candidate.project_name AS ReferenceProject, + score AS SimilarityScore, + collect(DISTINCT s.station_name) AS StationsUsed +ORDER BY score DESC +``` + +### 3. Why this is better than just filtering by product type +If we only filtered the database by `product_type = 'IQB'`, we would return almost every project in the factory's history (P01–P06, P08). This is useless for accurate planning. + +The hybrid approach provides two crucial layers of intelligence: +1. **The Vector Layer** captures human context. A "hospital extension in Linköping" is semantically similar to past project P05 ("Sjukhus Linköping ET2") due to building type and location, whereas a standard filter would treat it exactly the same as a parking garage in Helsingborg (P04). +2. **The Graph Layer** ensures operational reliability. By traversing the `SCHEDULED_AT` edges and checking the `variance_pct` (our property from Q3), we ensure that the semantically matched project was actually executed well on the factory floor, making it a trustworthy baseline for scheduling the new request. + +--- + +## Q5. Your L6 Plan (20 pts) + +### 1. Node Labels → CSV Column Mappings + +| Node Label | Source | CSV Columns | Key | Count | +|------------|--------|-------------|-----|-------| +| **Project** | production.csv | `project_id`, `project_number`, `project_name` | `project_id` | 8 | +| **Product** | production.csv | `product_type`, `unit` | `product_type` | 7 | +| **Station** | production.csv | `station_code`, `station_name` | `station_code` | 10 | +| **Worker** | workers.csv | `worker_id`, `name`, `role`, `hours_per_week`, `type` | `worker_id` | 14 | +| **Week** | capacity.csv | `week` | `week` | 8 | +| **Factory** | Implicit | — (single node) | — | 1 | +| **Certification** | workers.csv | `certifications` (comma-split) | `cert_name` | 23 | +| **Etapp** | production.csv | `etapp` | `etapp_name` | 2 | + +### 2. Relationship Types → What Creates Them + +| Relationship | Created By | Properties | +|---|---|---| +| **PRODUCES** | `MERGE` on unique `(project_id, product_type)` pairs from production.csv | `quantity`, `unit_factor`, `unit` | +| **SCHEDULED_AT** | Each row of production.csv → `MERGE` with composite key `{week, etapp, bop, product_type}` | `planned_hours`, `actual_hours`, `completed_units`, `week`, `etapp`, `bop`, `product_type`, `variance_pct` | +| **ACTIVE_IN** | Distinct `(project_id, week)` pairs from production.csv | — | +| **IN_PHASE** | Distinct `(project_id, etapp)` pairs from production.csv | — | +| **WORKS_AT** | workers.csv → `primary_station` column | — | +| **CAN_COVER** | workers.csv → `can_cover_stations` (comma-split, one edge per station) | — | +| **HOLDS** | workers.csv → `certifications` (comma-split, one edge per cert) | — | +| **LOADED_IN** | Aggregated from SCHEDULED_AT per (station, week) during seeding | `total_planned`, `total_actual` | +| **HAS_CAPACITY** | Each row of capacity.csv | `own_hours`, `hired_hours`, `overtime_hours`, `total_planned`, `deficit` | + +#### Seed Script Constraints & Idiosyncrasies +- **Uniqueness Constraints:** To ensure idempotency during the `MERGE` process, the script must create constraints beforehand: + `CREATE CONSTRAINT IF NOT EXISTS FOR (p:Project) REQUIRE p.project_id IS UNIQUE` (and similarly for Station, Worker, Week, Product). +- **Foreman Assignment:** Worker W11 (Victor Elm) is listed as a Foreman with `primary_station = "all"`. The seed script must handle `"all"` correctly (either by skipping the `WORKS_AT` edge and relying solely on his `can_cover_stations` list, or by explicitly creating edges to all 10 stations) to avoid creating a junk station node named "all". + +### 3. Streamlit Dashboard Panels (4 + Self-Test) + +#### Page 1: Project Overview (10 pts) +A summary table showing all 8 projects with total planned hours, total actual hours, variance %, and products involved. + +```cypher +MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) +WITH p, sum(r.planned_hours) AS planned, sum(r.actual_hours) AS actual +MATCH (p)-[:PRODUCES]->(prod:Product) +RETURN p.project_name AS Project, planned AS PlannedHours, actual AS ActualHours, + round((actual - planned) / planned * 100, 1) AS VariancePct, + collect(DISTINCT prod.product_type) AS Products +ORDER BY p.project_id +``` + +#### Page 2: Station Load (10 pts) +Interactive Plotly bar chart showing hours per station across weeks. Stations where actual > planned are highlighted in red. + +```cypher +MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) +RETURN s.station_code AS StationCode, s.station_name AS Station, r.week AS Week, + sum(r.planned_hours) AS Planned, sum(r.actual_hours) AS Actual +ORDER BY StationCode, Week +``` + +#### Page 3: Capacity Tracker (10 pts) +Weekly capacity (own + hired + overtime) vs total planned demand. Deficit weeks are color-coded red. + +```cypher +MATCH (w:Week)-[r:HAS_CAPACITY]->(f:Factory) +RETURN w.week_id AS Week, + r.own_hours + r.hired_hours + r.overtime_hours AS TotalCapacity, + r.total_planned AS PlannedDemand, + r.deficit AS Deficit +ORDER BY w.week_id +``` + +#### Page 4: Worker Coverage (10 pts) +Matrix showing which workers can cover which stations. Single-point-of-failure stations (only 1 worker) are flagged. + +```cypher +MATCH (w:Worker)-[:CAN_COVER]->(s:Station) +RETURN s.station_name AS Station, collect(w.name) AS Workers, count(w) AS WorkerCount +ORDER BY WorkerCount ASC +``` + +#### Navigation (5 pts) +A sidebar will be implemented to allow users to switch seamlessly between the 4 dashboard pages and the Self-Test page without reloading the app. + +#### Page 5: Self-Test (20 pts) +Automated green/red checklist verifying: Neo4j connection, node count ≥ 50, relationship count ≥ 100, 6+ node labels, 8+ relationship types, and variance query returns results. + diff --git a/submissions/Touqeer-Hamdani/level5/schema.md b/submissions/Touqeer-Hamdani/level5/schema.md new file mode 100644 index 000000000..5a4e1dde7 --- /dev/null +++ b/submissions/Touqeer-Hamdani/level5/schema.md @@ -0,0 +1,79 @@ +# Factory Knowledge Graph — Schema + +```mermaid +graph LR + %% ── Node definitions ── + Project["🏗️ Project
project_id · project_number
project_name"] + Product["📦 Product
product_type · unit"] + Station["🏭 Station
station_code · station_name"] + Worker["👷 Worker
worker_id · name
role · hours_per_week · type"] + Week["📅 Week
week_id"] + Factory["🏭 Factory
factory_name"] + Certification["🎓 Certification
cert_name"] + Etapp["🔄 Etapp
etapp_name"] + + %% ── Relationships ── + Project -->|"PRODUCES
{quantity, unit_factor, unit}"| Product + Project -->|"SCHEDULED_AT
{planned_hours, actual_hours, week,
completed_units, etapp, bop, variance_pct}"| Station + Project -->|"ACTIVE_IN"| Week + Project -->|"IN_PHASE"| Etapp + + Worker -->|"WORKS_AT"| Station + Worker -->|"CAN_COVER"| Station + Worker -->|"HOLDS"| Certification + + Station -->|"LOADED_IN
{total_planned,
total_actual}"| Week + Week -->|"HAS_CAPACITY
{own_hours, hired_hours, overtime_hours, total_planned, deficit}"| Factory + + %% ── Styling ── + classDef proj fill:#4F46E5,stroke:#3730A3,color:#fff,rx:12 + classDef prod fill:#059669,stroke:#047857,color:#fff,rx:12 + classDef stat fill:#D97706,stroke:#B45309,color:#fff,rx:12 + classDef work fill:#DC2626,stroke:#B91C1C,color:#fff,rx:12 + classDef week fill:#7C3AED,stroke:#6D28D9,color:#fff,rx:12 + classDef meta fill:#6B7280,stroke:#4B5563,color:#fff,rx:12 + classDef cert fill:#0891B2,stroke:#0E7490,color:#fff,rx:12 + classDef etap fill:#E11D48,stroke:#BE123C,color:#fff,rx:12 + + class Project proj + class Product prod + class Station stat + class Worker work + class Week week + class Factory meta + class Certification cert + class Etapp etap +``` + +## Node Labels (8) + +| # | Label | Source CSV | Key Properties | Count | +|---|-------|-----------|----------------|-------| +| 1 | **Project** | production.csv | project_id, project_number, project_name | 8 | +| 2 | **Product** | production.csv | product_type, unit | 7 (IQB, IQP, SB, SD, SP, SR, HSQ) | +| 3 | **Station** | production.csv | station_code, station_name | 10 (011–019, 021) | +| 4 | **Worker** | workers.csv | worker_id, name, role, hours_per_week, type | 14 | +| 5 | **Week** | capacity.csv | week_id | 8 (w1–w8) | +| 6 | **Factory** | Implicit | factory_name | 1 | +| 7 | **Certification** | workers.csv | cert_name | 23 unique certs | +| 8 | **Etapp** | production.csv | etapp_name | 2 (ET1, ET2) | + +## Relationship Types (9) + +| # | Relationship | From → To | Properties (data-carrying?) | +|---|-------------|-----------|----------------------------| +| 1 | **PRODUCES** | Project → Product | ✅ `{quantity, unit_factor, unit}` | +| 2 | **SCHEDULED_AT** | Project → Station | ✅ `{planned_hours, actual_hours, completed_units, week, etapp, bop, variance_pct}` | +| 3 | **ACTIVE_IN** | Project → Week | — | +| 4 | **IN_PHASE** | Project → Etapp | — | +| 5 | **WORKS_AT** | Worker → Station | — (primary station) | +| 6 | **CAN_COVER** | Worker → Station | — (coverage capability) | +| 7 | **HOLDS** | Worker → Certification | — | +| 8 | **LOADED_IN** | Station → Week | ✅ `{total_planned, total_actual}`* | +| 9 | **HAS_CAPACITY**| Week → Factory | ✅ `{own_hours, hired_hours, overtime_hours, total_planned, deficit}` | + +> 4 relationships carry data properties (**PRODUCES**, **SCHEDULED_AT**, **LOADED_IN**, **HAS_CAPACITY**), exceeding the minimum of 2. +> +> *\*Note: `LOADED_IN` properties are calculated by aggregating the `SCHEDULED_AT` edges for each station/week.* +> +> *\*Note: `etapp` is also kept as a property on `SCHEDULED_AT` for direct querying. The `Etapp` node is included for L6 compliance, but from a pure design perspective, etapp works better as an edge property since it only has 2 values and carries no properties of its own.* From d1386a6433f3b3f8ef3a7be02caa94a8c8f7f9a8 Mon Sep 17 00:00:00 2001 From: TouqeerHamdani Date: Tue, 12 May 2026 21:41:01 +0530 Subject: [PATCH 2/2] level-6: Touqeer Hamdani --- .../Touqeer-Hamdani/level6/.env.example | 3 + .../Touqeer-Hamdani/level6/DASHBOARD_URL.txt | 1 + submissions/Touqeer-Hamdani/level6/README.md | 69 ++ submissions/Touqeer-Hamdani/level6/app.py | 824 ++++++++++++++++++ .../Touqeer-Hamdani/level6/requirements.txt | 6 + .../Touqeer-Hamdani/level6/seed_graph.py | 297 +++++++ 6 files changed, 1200 insertions(+) create mode 100644 submissions/Touqeer-Hamdani/level6/.env.example create mode 100644 submissions/Touqeer-Hamdani/level6/DASHBOARD_URL.txt create mode 100644 submissions/Touqeer-Hamdani/level6/README.md create mode 100644 submissions/Touqeer-Hamdani/level6/app.py create mode 100644 submissions/Touqeer-Hamdani/level6/requirements.txt create mode 100644 submissions/Touqeer-Hamdani/level6/seed_graph.py diff --git a/submissions/Touqeer-Hamdani/level6/.env.example b/submissions/Touqeer-Hamdani/level6/.env.example new file mode 100644 index 000000000..bdd17bb95 --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/.env.example @@ -0,0 +1,3 @@ +NEO4J_URI = "neo4j+s://xxxxx.databases.neo4j.io" +NEO4J_USER = "neo4j" +NEO4J_PASSWORD = "your-password" \ No newline at end of file diff --git a/submissions/Touqeer-Hamdani/level6/DASHBOARD_URL.txt b/submissions/Touqeer-Hamdani/level6/DASHBOARD_URL.txt new file mode 100644 index 000000000..6d1f8d412 --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/DASHBOARD_URL.txt @@ -0,0 +1 @@ +https://l6-factory-dashboard-touqeerhamdani.streamlit.app diff --git a/submissions/Touqeer-Hamdani/level6/README.md b/submissions/Touqeer-Hamdani/level6/README.md new file mode 100644 index 000000000..13313e1b5 --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/README.md @@ -0,0 +1,69 @@ +# Factory Knowledge Graph Dashboard — Level 6 + +A **Neo4j knowledge graph** + **Streamlit dashboard** for a Swedish steel fabrication company managing 8 construction projects across 10 production stations. + +## Quick Start + +### 1. Prerequisites +- Python 3.10+ +- A Neo4j instance (recommended: [Neo4j Aura Free](https://neo4j.io/aura)) + +### 2. Setup +```bash +python -m venv venv +venv\Scripts\activate # Windows +# source venv/bin/activate # macOS/Linux +pip install -r requirements.txt +``` + +### 3. Configure credentials +Copy `.env.example` → `.env` and fill in your Neo4j credentials: +``` +NEO4J_URI=neo4j+s://xxxxx.databases.neo4j.io +NEO4J_USER=neo4j +NEO4J_PASSWORD=your-password +``` + +### 4. Seed the graph (run once) +```bash +python seed_graph.py +``` + +### 5. Launch the dashboard +```bash +streamlit run app.py +``` + +## Dashboard Pages + +| Page | Description | +|------|-------------| +| **Project Overview** | All 8 projects with planned/actual hours, variance %, and products | +| **Station Load** | Interactive bar chart — hours per station per week, overloads in red | +| **Capacity Tracker** | Stacked capacity bars + demand line, deficit weeks highlighted | +| **Worker Coverage** | Coverage matrix + SPOF (single-point-of-failure) station detection | +| **Self-Test** | Automated 6-check verification (20 pts) | + +## Project Structure + +``` +l6-factory-dashboard/ +├── seed_graph.py # CSV → Neo4j (idempotent, uses MERGE) +├── app.py # Streamlit dashboard (5 pages) +├── requirements.txt +├── .env.example +├── README.md +├── DASHBOARD_URL.txt +└── data/ + ├── factory_production.csv + ├── factory_workers.csv + └── factory_capacity.csv +``` + +## Deployed URL + +See `DASHBOARD_URL.txt`. + +## Author + +**Touqeer Hamdani** — Level 6 submission, May 2026. diff --git a/submissions/Touqeer-Hamdani/level6/app.py b/submissions/Touqeer-Hamdani/level6/app.py new file mode 100644 index 000000000..bda45bf8c --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/app.py @@ -0,0 +1,824 @@ +""" +app.py — Factory Knowledge Graph Dashboard +Streamlit application with 6 pages powered by Neo4j. +""" + +import streamlit as st +from neo4j import GraphDatabase +import pandas as pd +import plotly.express as px +import plotly.graph_objects as go +from plotly.subplots import make_subplots +import os +from dotenv import load_dotenv +import statsmodels.api as sm + +# ── Page config ────────────────────────────────────────────────────────────── + +st.set_page_config( + page_title="Factory Dashboard", + page_icon=None, + layout="wide", + initial_sidebar_state="expanded", +) + +# ── Custom CSS ─────────────────────────────────────────────────────────────── + +st.markdown(""" + +""", unsafe_allow_html=True) + + +# ── Neo4j connection ───────────────────────────────────────────────────────── + +@st.cache_resource +def get_driver(): + """Connect to Neo4j — supports both st.secrets (Cloud) and .env (local).""" + try: + uri = st.secrets["NEO4J_URI"] + user = st.secrets["NEO4J_USER"] + password = st.secrets["NEO4J_PASSWORD"] + except Exception: + load_dotenv() + uri = os.getenv("NEO4J_URI") + user = os.getenv("NEO4J_USER") + password = os.getenv("NEO4J_PASSWORD") + return GraphDatabase.driver(uri, auth=(user, password)) + + +def query_to_df(cypher: str) -> pd.DataFrame: + """Run a Cypher query and return the results as a DataFrame.""" + driver = get_driver() + with driver.session() as session: + result = session.run(cypher) + return pd.DataFrame([dict(r) for r in result]) + + +# ── Sidebar navigation ────────────────────────────────────────────────────── + +st.sidebar.markdown("## Factory Dashboard") +st.sidebar.markdown("---") +page = st.sidebar.radio("Navigate", [ + "Project Overview", + "Station Load", + "Capacity Tracker", + "Worker Coverage", + "Load Forecast", + "Self-Test", +]) +st.sidebar.markdown("---") +st.sidebar.caption("Level 6 · Touqeer Hamdani") + + +# ── Helper: render a KPI card ──────────────────────────────────────────────── + +def kpi(label, value, sub="", color="blue"): + st.markdown(f""" +
+

{label}

+
{value}
+
{sub}
+
+ """, unsafe_allow_html=True) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 1 — Project Overview +# ══════════════════════════════════════════════════════════════════════════════ + +def page_project_overview(): + st.header("Project Overview") + st.caption("All 8 factory projects with planned vs actual hours and variance analysis.") + + df = query_to_df(""" + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + WITH p, + sum(r.planned_hours) AS planned, + sum(r.actual_hours) AS actual + OPTIONAL MATCH (p)-[:PRODUCES]->(prod:Product) + RETURN p.project_id AS ID, + p.project_name AS Project, + planned AS PlannedHours, + actual AS ActualHours, + CASE + WHEN planned = 0 THEN 0.0 + ELSE round((actual - planned) / planned * 100, 1) + END AS VariancePct, + collect(DISTINCT prod.product_type) AS Products + ORDER BY p.project_id + """) + + if df.empty: + st.warning("No data found. Has `seed_graph.py` been run?") + return + + # KPI cards + total_planned = df["PlannedHours"].sum() + total_actual = df["ActualHours"].sum() + avg_var = round((total_actual - total_planned) / total_planned * 100, 1) + overrun_count = len(df[df["VariancePct"] > 0]) + + c1, c2, c3, c4 = st.columns(4) + with c1: kpi("Projects", len(df), "active in schedule", "blue") + with c2: kpi("Total Planned Hours", f"{total_planned:,.0f} h", "across all stations", "green") + with c3: kpi("Total Actual Hours", f"{total_actual:,.0f} h", f"{'+' if total_actual > total_planned else '-'} vs plan", "amber") + with c4: kpi("Average Plan Variance", f"{avg_var:+.1f}%", f"{overrun_count} projects over plan", "red" if avg_var > 0 else "green") + + st.markdown("") + + # Format products column for display + display_df = df.copy() + display_df["Products"] = display_df["Products"].apply(lambda x: ", ".join(x) if isinstance(x, list) else x) + display_df["VariancePct"] = display_df["VariancePct"].apply(lambda v: f"{v:+.1f}%") + + st.dataframe( + display_df, + use_container_width=True, + hide_index=True, + column_config={ + "ID": st.column_config.TextColumn("ID", width="small"), + "Project": st.column_config.TextColumn("Project"), + "PlannedHours": st.column_config.NumberColumn("Planned (h)", format="%.1f"), + "ActualHours": st.column_config.NumberColumn("Actual (h)", format="%.1f"), + "VariancePct": st.column_config.TextColumn("Variance"), + "Products": st.column_config.TextColumn("Products"), + }, + ) + + # Bar chart: planned vs actual per project + fig = go.Figure() + fig.add_trace(go.Bar(name="Planned", x=df["Project"], y=df["PlannedHours"], + marker_color="#3b82f6")) + fig.add_trace(go.Bar(name="Actual", x=df["Project"], y=df["ActualHours"], + marker_color=["#22c55e" if a <= p else "#ef4444" + for a, p in zip(df["ActualHours"], df["PlannedHours"])])) + fig.update_layout( + barmode="group", template="plotly_dark", + title="Planned vs Actual Hours by Project", + xaxis_title="Project", yaxis_title="Hours", + height=420, margin=dict(t=50, b=40), + legend_title="Metric", showlegend=True + ) + st.plotly_chart(fig, use_container_width=True) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 2 — Station Load +# ══════════════════════════════════════════════════════════════════════════════ + +def page_station_load(): + st.header("Station Load") + st.caption("Hours per station across weeks. Red = actual exceeds planned.") + + df = query_to_df(""" + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + RETURN s.station_code AS StationCode, + s.station_name AS Station, + r.week AS Week, + sum(r.planned_hours) AS Planned, + sum(r.actual_hours) AS Actual + ORDER BY s.station_code, r.week + """) + + if df.empty: + st.warning("No data found.") + return + + df["Overloaded"] = df["Actual"] > df["Planned"] + df["Label"] = df["StationCode"] + " " + df["Station"] + + # Week filter + weeks = sorted(df["Week"].unique()) + selected_weeks = st.multiselect("Filter by week", weeks, default=weeks) + filtered = df[df["Week"].isin(selected_weeks)] + + # Grouped bar chart + filtered = filtered.copy() + filtered["Label_Week"] = filtered["Label"] + " - " + filtered["Week"].astype(str) + + tick_text = [ + f'{row["Label_Week"]}' + for _, row in filtered.iterrows() + ] + + fig = go.Figure() + fig.add_trace(go.Bar( + name="Planned", x=filtered["Label_Week"], + y=filtered["Planned"], marker_color="#3b82f6", + )) + fig.add_trace(go.Bar( + name="Actual", x=filtered["Label_Week"], + y=filtered["Actual"], + marker_color=["#ef4444" if o else "#22c55e" for o in filtered["Overloaded"]], + )) + fig.update_layout( + barmode="group", template="plotly_dark", + title="Station Load: Planned vs Actual", + yaxis_title="Hours", + height=500, margin=dict(t=50, b=80), + legend_title="Metric", showlegend=True, + xaxis=dict( + title="Station - Week", + tickmode="array", + tickvals=filtered["Label_Week"], + ticktext=tick_text + ) + ) + st.plotly_chart(fig, use_container_width=True) + + # Summary table + with st.expander("Detailed data"): + st.dataframe(filtered[["StationCode", "Station", "Week", "Planned", "Actual", "Overloaded"]], + use_container_width=True, hide_index=True) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 3 — Capacity Tracker +# ══════════════════════════════════════════════════════════════════════════════ + +def page_capacity_tracker(): + st.header("Capacity Tracker") + st.caption("Weekly capacity (own + hired + overtime) vs planned demand. Deficit weeks in red.") + + df = query_to_df(""" + MATCH (w:Week)-[r:HAS_CAPACITY]->(f:Factory) + RETURN w.week_id AS Week, + r.own_hours AS Own, + r.hired_hours AS Hired, + r.overtime_hours AS Overtime, + r.own_hours + r.hired_hours + r.overtime_hours AS TotalCapacity, + r.total_planned AS PlannedDemand, + r.deficit AS Deficit + ORDER BY w.week_id + """) + + if df.empty: + st.warning("No data found.") + return + + deficit_weeks = len(df[df["Deficit"] < 0]) + total_deficit = df[df["Deficit"] < 0]["Deficit"].sum() + + c1, c2, c3 = st.columns(3) + with c1: kpi("Deficit Weeks", f"{deficit_weeks} / {len(df)}", "weeks over capacity", "red") + with c2: kpi("Cumulative Capacity Deficit", f"{total_deficit:+,.0f} h", "cumulative shortfall", "red") + with c3: kpi("Maximum Weekly Deficit", f"{df['Deficit'].min():+,.0f} h", f"in {df.loc[df['Deficit'].idxmin(), 'Week']}", "amber") + + st.markdown("") + + # Color x-axis labels red for deficit weeks (used on the bottom chart) + tick_text = [ + f'{row["Week"]}' + for _, row in df.iterrows() + ] + + fig = make_subplots( + rows=2, cols=1, + shared_xaxes=True, + vertical_spacing=0.1, + row_heights=[0.75, 0.25] + ) + + # Top Chart: Stacked bar (capacity components) + line (demand) + fig.add_trace(go.Bar(name="Own Staff", x=df["Week"], y=df["Own"], marker_color="#3b82f6"), row=1, col=1) + fig.add_trace(go.Bar(name="Hired", x=df["Week"], y=df["Hired"], marker_color="#8b5cf6"), row=1, col=1) + fig.add_trace(go.Bar(name="Overtime", x=df["Week"], y=df["Overtime"], marker_color="#f59e0b"), row=1, col=1) + + fig.add_trace(go.Scatter( + name="Planned Demand", x=df["Week"], y=df["PlannedDemand"], + mode="lines+markers", line=dict(color="#ef4444", width=3, dash="dot"), + marker=dict(size=8), + ), row=1, col=1) + + # Bottom Chart: Surplus/Deficit Bar + fig.add_trace(go.Bar( + name="Surplus / Deficit", x=df["Week"], y=df["Deficit"], + marker_color=["#ef4444" if d < 0 else "#22c55e" for d in df["Deficit"]], + text=[f"{d:+.0f}" for d in df["Deficit"]], + textposition="outside", + showlegend=False, + hovertemplate="Variance: %{y:+.0f}h" + ), row=2, col=1) + + fig.update_layout( + barmode="stack", template="plotly_dark", + title="Weekly Capacity vs Demand", + height=600, margin=dict(t=50, b=40), + legend=dict(title="Capacity Type", orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1), + ) + fig.update_yaxes(title_text="Hours", row=1, col=1) + fig.update_yaxes(title_text="Variance", row=2, col=1) + fig.update_xaxes( + title="Week", tickmode="array", + tickvals=df["Week"], ticktext=tick_text, + row=2, col=1 + ) + st.plotly_chart(fig, use_container_width=True) + + # Data table + with st.expander("Detailed data"): + st.dataframe(df, use_container_width=True, hide_index=True) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 4 — Worker Coverage +# ══════════════════════════════════════════════════════════════════════════════ + +def page_worker_coverage(): + st.header("Worker Coverage Matrix") + st.caption("Which workers can cover which stations. SPOF stations (≤ 1 unique worker) flagged in red.") + + # Coverage data + df = query_to_df(""" + MATCH (w:Worker)-[:CAN_COVER]->(s:Station) + RETURN s.station_code AS StationCode, + s.station_name AS Station, + collect(w.name) AS Workers, + count(w) AS WorkerCount + ORDER BY WorkerCount ASC + """) + + if df.empty: + st.warning("No data found.") + return + + spof = df[df["WorkerCount"] <= 1] + c1, c2 = st.columns(2) + with c1: kpi("Total Stations", len(df), "with assigned coverage", "blue") + with c2: kpi("SPOF Stations", len(spof), + ", ".join(spof["Station"].tolist()) if len(spof) > 0 else "None", + "red" if len(spof) > 0 else "green") + + st.markdown("") + + # ------------------------------------------------------------------------- + # Heatmap Matrix + # ------------------------------------------------------------------------- + st.markdown('
Worker Certification Matrix
', unsafe_allow_html=True) + st.caption("A visual overview of certifications. Blue indicates a worker is certified for that station.") + + matrix_df = query_to_df(""" + MATCH (w:Worker)-[:CAN_COVER]->(s:Station) + RETURN w.name AS Worker, s.station_code AS StationCode, s.station_name AS Station + """) + + if not matrix_df.empty: + pivot = matrix_df.pivot_table( + index=["StationCode", "Station"], + columns="Worker", + aggfunc="size", + fill_value=0, + ) + pivot = pivot.clip(upper=1) + pivot = pivot.sort_index() + pivot = pivot[sorted(pivot.columns)] + + y_labels = [f"{idx[0]} - {idx[1]}" for idx in pivot.index] + + # Hover text matrix + hover_text = [] + for s, row in zip(y_labels, pivot.values): + hover_row = [] + for w, val in zip(pivot.columns, row): + status = "Certified" if val == 1 else "Uncertified" + hover_row.append(f"Worker: {w}
Station: {s}
Status: {status}") + hover_text.append(hover_row) + + fig = go.Figure(data=go.Heatmap( + z=pivot.values, + x=pivot.columns, + y=y_labels, + colorscale=[[0, "rgba(255, 255, 255, 0.05)"], [1, "#3b82f6"]], + showscale=False, + xgap=2, ygap=2, + hoverinfo="text", + text=hover_text + )) + + fig.update_layout( + template="plotly_dark", + height=400, + margin=dict(t=10, b=80, l=180, r=20), + xaxis=dict(tickangle=-45, side="bottom"), + yaxis=dict(autorange="reversed") + ) + st.plotly_chart(fig, use_container_width=True) + + # Detailed Coverage Table (satisfies the "table" requirement cleanly without horizontal scroll issues) + st.markdown('
Station Coverage Details
', unsafe_allow_html=True) + display_df = df.copy() + display_df["Workers"] = display_df["Workers"].apply( + lambda x: ", ".join(x) if isinstance(x, list) else x + ) + + def highlight_spof(row): + if row["WorkerCount"] <= 1: + return ["background-color: rgba(239,68,68,0.15)"] * len(row) + return [""] * len(row) + + st.dataframe( + display_df.style.apply(highlight_spof, axis=1), + use_container_width=True, hide_index=True, + ) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 5 — Load Forecast +# ══════════════════════════════════════════════════════════════════════════════ + +def page_load_forecast(): + st.header("Load Forecast (Week 9)") + st.caption("Predictive analysis identifying where production load will exceed station capacity in the coming week.") + + # 1. Get Historical Load + Variance + load_df = query_to_df(""" + MATCH (s:Station)-[l:LOADED_IN]->(w:Week) + RETURN s.station_code AS StationCode, + s.station_name AS Station, + toInteger(substring(w.week_id, 1)) AS WeekNum, + l.total_actual AS ActualLoad, + l.total_planned AS PlannedLoad, + CASE WHEN l.total_planned > 0 + THEN round((l.total_actual - l.total_planned) / l.total_planned * 100, 1) + ELSE 0.0 END AS VariancePct + ORDER BY StationCode, WeekNum + """) + + # 2. Get Graph-Aware Capacity + cap_df = query_to_df(""" + MATCH (w:Worker)-[:CAN_COVER]->(s:Station) + WITH w, s + MATCH (w)-[:CAN_COVER]->(all_s:Station) + WITH w, s, count(all_s) AS total_coverage + RETURN s.station_code AS StationCode, + s.station_name AS Station, + sum(toFloat(w.hours_per_week) / total_coverage) AS Capacity + ORDER BY StationCode + """) + + if load_df.empty or cap_df.empty: + st.warning("No data found. Ensure the graph is seeded.") + return + + load_df["Load"] = load_df["ActualLoad"].fillna(load_df["PlannedLoad"]) + + # 3. Process Forecasts + forecasts = [] + + # Iterate over all known stations from both load and capacity queries + all_stations = sorted(list(set(load_df["StationCode"]).union(set(cap_df["StationCode"])))) + + for station_code in all_stations: + station_data = load_df[load_df["StationCode"] == station_code] + cap_series = cap_df[cap_df["StationCode"] == station_code] + + station_name = station_data["Station"].iloc[0] if not station_data.empty else cap_series["Station"].iloc[0] + cap = cap_series["Capacity"].iloc[0] if not cap_series.empty else 0.0 + + if len(station_data) > 1: + X = sm.add_constant(station_data["WeekNum"]) + model = sm.OLS(station_data["Load"], X).fit() + pred_9 = model.predict([1, 9])[0] + elif len(station_data) == 1: + pred_9 = station_data["Load"].iloc[0] + else: + pred_9 = 0 + + pred_9 = max(0, pred_9) + + util_pct = (pred_9 / cap * 100) if cap > 0 else (float('inf') if pred_9 > 0 else 0) + + forecasts.append({ + "StationCode": station_code, + "Station": station_name, + "Week9_Forecast": pred_9, + "Capacity": cap, + "UtilPct": util_pct + }) + + forecast_df = pd.DataFrame(forecasts) + forecast_df["Status"] = forecast_df.apply(lambda x: "OVERLOAD" if x["Week9_Forecast"] > x["Capacity"] else "SAFE", axis=1) + + # KPIs + overloaded_count = len(forecast_df[forecast_df["Status"] == "OVERLOAD"]) + avg_util = forecast_df["UtilPct"].mean() + + c1, c2, c3 = st.columns(3) + with c1: kpi("Average Factory Utilization", f"{avg_util:.1f}%", "projected for week 9", "blue") + with c2: kpi("Critical Stations", overloaded_count, "over capacity in week 9", "red" if overloaded_count > 0 else "green") + with c3: kpi("Highest Load", f"{forecast_df['UtilPct'].max():.0f}%", f"at {forecast_df.loc[forecast_df['UtilPct'].idxmax(), 'StationCode']}", "amber") + + st.markdown('
Global Forecast: Load vs. Capacity (Week 9)
', unsafe_allow_html=True) + + # Global comparison chart + fig_global = go.Figure() + fig_global.add_trace(go.Bar( + name="Projected Load", x=forecast_df["StationCode"], y=forecast_df["Week9_Forecast"], + marker_color=["#ef4444" if s == "OVERLOAD" else "#3b82f6" for s in forecast_df["Status"]], + hovertemplate="Load: %{y:.1f}h" + )) + fig_global.add_trace(go.Scatter( + name="Station Capacity", x=forecast_df["StationCode"], y=forecast_df["Capacity"], + mode="markers", marker=dict(color="#ffffff", size=12, symbol="line-ew-open", line=dict(width=3)), + hovertemplate="Capacity: %{y:.1f}h" + )) + fig_global.update_layout( + template="plotly_dark", height=350, margin=dict(t=20, b=40, l=10, r=10), + xaxis_title="Station Code", yaxis_title="Hours", barmode="group", + legend=dict(title="Metric", orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1) + ) + st.plotly_chart(fig_global, use_container_width=True) + + # 4. Station Deep-Dive + st.markdown('
Station Deep-Dive
', unsafe_allow_html=True) + + selected_st = st.selectbox("Select a station to see trend details", + options=forecast_df["StationCode"].tolist(), + format_func=lambda x: f"{x} - {forecast_df[forecast_df['StationCode']==x]['Station'].iloc[0]}") + + sd = forecast_df[forecast_df["StationCode"] == selected_st].iloc[0] + hist = load_df[load_df["StationCode"] == selected_st] + + c1, c2 = st.columns([2, 1]) + + with c1: + fig_detail = go.Figure() + + # OLS logic + if len(hist) > 1: + X = sm.add_constant(hist["WeekNum"]) + model = sm.OLS(hist["Load"], X).fit() + x_range = list(range(1, 10)) + y_range = model.predict(sm.add_constant(x_range)) + preds = model.get_prediction(sm.add_constant(x_range)) + ci = preds.conf_int(alpha=0.1) # 90% confidence + y_lower, y_upper = ci[:, 0], ci[:, 1] + + # Confidence Band + fig_detail.add_trace(go.Scatter( + x=x_range + x_range[::-1], y=list(y_upper) + list(y_lower)[::-1], + fill='toself', fillcolor='rgba(59, 130, 246, 0.1)', + line=dict(color='rgba(255,255,255,0)'), name="90% Confidence Interval" + )) + # Trend + fig_detail.add_trace(go.Scatter(x=x_range, y=y_range, mode="lines", + line=dict(color="#3b82f6", dash="dash"), name="Trendline")) + + # Capacity line + fig_detail.add_hline(y=sd["Capacity"], line_dash="dot", line_color="#ef4444", + annotation_text="CAPACITY LIMIT", annotation_position="top left") + + # Historical + fig_detail.add_trace(go.Scatter(x=hist["WeekNum"], y=hist["Load"], mode="lines+markers", + marker=dict(color="#ffffff", size=10), name="Historical Load")) + + # Week 9 Target Point + fig_detail.add_trace(go.Scatter(x=[9], y=[sd["Week9_Forecast"]], mode="markers", + marker=dict(color="#ef4444" if sd["Status"]=="OVERLOAD" else "#22c55e", size=14, symbol="star"), + name="W9 Projection")) + + fig_detail.update_layout( + template="plotly_dark", height=400, title=f"Load Trend Analysis: {selected_st}", + xaxis=dict(title="Week", tickmode="linear", range=[0.5, 9.5]), + yaxis=dict(title="Hours"), showlegend=True, + legend=dict(title="Metric", orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1) + ) + st.plotly_chart(fig_detail, use_container_width=True) + st.caption("💡 **Note**: The shaded area represents the 90% confidence interval of the OLS prediction, indicating the expected range of variance based on historical data.") + + # Variance Trend Chart + if not hist.empty and len(hist) > 1: + fig_var = px.line(hist, x="WeekNum", y="VariancePct", markers=True, + title=f"Historical Plan Variance (%): {selected_st}", + labels={"VariancePct": "Variance %", "WeekNum": "Week"}, + template="plotly_dark", height=200) + fig_var.add_hline(y=0, line_dash="dash", line_color="#94a3b8") + fig_var.update_traces(line_color="#f59e0b") + fig_var.update_layout(margin=dict(t=30, b=20, l=10, r=10)) + st.plotly_chart(fig_var, use_container_width=True) + else: + st.info("Insufficient data to show variance trend.") + + with c2: + st.markdown(f"### {sd['Status']}") + st.write(f"Station **{selected_st}** is projected to reach **{sd['Week9_Forecast']:.1f} hours** in Week 9.") + st.write(f"Current capacity limit is **{sd['Capacity']:.1f} hours**.") + + avg_v = hist["VariancePct"].mean() if not hist.empty else 0.0 + st.metric("Avg Historical Variance", f"{avg_v:+.1f}%", + help="Positive means actual hours consistently exceed planned hours.") + + util_color = "red" if sd["UtilPct"] > 100 else "green" + util_display = f"{sd['UtilPct']:.1f}%" if sd["UtilPct"] != float('inf') else "∞%" + delta_display = f"{sd['UtilPct']-100:.1f}%" if sd["UtilPct"] > 100 and sd["UtilPct"] != float('inf') else None + + st.metric("Projected Utilization", util_display, + delta=delta_display, + delta_color="inverse") + + if sd["Status"] == "OVERLOAD": + st.error(f"Action Required: Station will exceed capacity by {sd['Week9_Forecast'] - sd['Capacity']:.1f} hours.") + else: + st.success("No immediate capacity action required.") + + # 5. Global Action Recommendations + overloads = forecast_df[forecast_df["Status"] == "OVERLOAD"] + healthy = forecast_df[forecast_df["Status"] == "SAFE"].copy() + healthy["Surplus"] = healthy["Capacity"] - healthy["Week9_Forecast"] + healthy = healthy[healthy["Surplus"] >= 5].sort_values("Surplus", ascending=False) + + if not overloads.empty: + st.markdown('
⚠️ Action Recommendations
', unsafe_allow_html=True) + st.error(f"**{len(overloads)} Stations are projected to exceed capacity in Week 9.** Immediate action is required.") + + # Get worker coverage map to make smart, graph-aware recommendations + coverage_df = query_to_df(""" + MATCH (w:Worker)-[:CAN_COVER]->(s:Station) + WHERE w.role <> 'Foreman' + RETURN w.name AS Worker, s.station_code AS StationCode + """) + + for _, row in overloads.iterrows(): + target_station = row["StationCode"] + deficit = row["Week9_Forecast"] - row["Capacity"] + suggestion = f"- **{row['Station']} ({target_station}):** Short by **{deficit:.1f} hours**." + + # 1. Find workers certified for this overloaded station + capable_workers = coverage_df[coverage_df["StationCode"] == target_station]["Worker"].unique() + + # 2. Find which of these workers are currently at surplus stations + reassignment_options = [] + for worker in capable_workers: + # Other stations this worker covers + other_stations = coverage_df[ + (coverage_df["Worker"] == worker) & + (coverage_df["StationCode"] != target_station) + ]["StationCode"].tolist() + + # Check if any of these 'other' stations have a surplus + worker_surplus_stations = healthy[healthy["StationCode"].isin(other_stations)] + if not worker_surplus_stations.empty: + # Sort by highest surplus and take the best one for this worker + best_station = worker_surplus_stations.sort_values("Surplus", ascending=False).iloc[0] + reassignment_options.append(f"**{worker}** (from {best_station['Station']})") + + # 3. Format and display + if reassignment_options: + # Show top 3 candidates to keep the UI clean + suggestion += f" *Suggestion: Reassign {', '.join(reassignment_options[:3])}*" + else: + suggestion += f" *No cross-trained line workers available at stations with surplus.*" + + st.markdown(suggestion) + else: + st.markdown('
✅ System Healthy
', unsafe_allow_html=True) + st.success("All stations are projected to be safely within capacity limits for Week 9.") + + with st.expander("Full Forecast Data Table"): + st.dataframe(forecast_df.sort_values("UtilPct", ascending=False), use_container_width=True, hide_index=True) + + +# ══════════════════════════════════════════════════════════════════════════════ +# PAGE 6 — Self-Test +# ══════════════════════════════════════════════════════════════════════════════ + +def run_self_test(): + """Run the 6 automated checks and return a list of (description, passed, points).""" + driver = get_driver() + checks = [] + + # Check 1: Connection + try: + with driver.session() as s: + s.run("RETURN 1").single() + checks.append(("Neo4j connected", True, 3)) + except Exception: + checks.append(("Neo4j connected", False, 3)) + return checks # can't continue + + with driver.session() as s: + # Check 2: Node count >= 50 + count = s.run("MATCH (n) RETURN count(n) AS c").single()["c"] + checks.append((f"{count} nodes (min: 50)", count >= 50, 3)) + + # Check 3: Relationship count >= 100 + count = s.run("MATCH ()-[r]->() RETURN count(r) AS c").single()["c"] + checks.append((f"{count} relationships (min: 100)", count >= 100, 3)) + + # Check 4: 6+ distinct node labels + count = s.run("CALL db.labels() YIELD label RETURN count(label) AS c").single()["c"] + checks.append((f"{count} node labels (min: 6)", count >= 6, 3)) + + # Check 5: 8+ distinct relationship types + count = s.run( + "CALL db.relationshipTypes() YIELD relationshipType RETURN count(relationshipType) AS c" + ).single()["c"] + checks.append((f"{count} relationship types (min: 8)", count >= 8, 3)) + + # Check 6: Variance query returns results + result = s.run(""" + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + WHERE r.actual_hours > r.planned_hours * 1.1 + RETURN p.project_name AS project, s.station_name AS station, + r.planned_hours AS planned, r.actual_hours AS actual + LIMIT 10 + """) + rows = [dict(r) for r in result] + checks.append((f"Variance query: {len(rows)} results", len(rows) > 0, 5)) + + return checks + + +def page_self_test(): + st.header("Self-Test") + st.caption("Automated verification of graph requirements.") + + if st.button("Run Self-Test", type="primary"): + with st.spinner("Running checks…"): + checks = run_self_test() + + total = 0 + max_total = 0 + + for desc, passed, pts in checks: + max_total += pts + earned = pts if passed else 0 + total += earned + icon = "PASS" if passed else "FAIL" + css = "check-pass" if passed else "check-fail" + st.markdown( + f'{icon} {desc} — **{earned}/{pts}**', + unsafe_allow_html=True, + ) + + st.markdown("---") + color = "check-pass" if total == max_total else "check-fail" + st.markdown( + f'

SELF-TEST SCORE: {total}/{max_total}

', + unsafe_allow_html=True, + ) + else: + st.info("Click the button above to run the self-test checks.") + + +# ══════════════════════════════════════════════════════════════════════════════ +# Router +# ══════════════════════════════════════════════════════════════════════════════ + +if page == "Project Overview": + page_project_overview() +elif page == "Station Load": + page_station_load() +elif page == "Capacity Tracker": + page_capacity_tracker() +elif page == "Worker Coverage": + page_worker_coverage() +elif page == "Load Forecast": + page_load_forecast() +elif page == "Self-Test": + page_self_test() diff --git a/submissions/Touqeer-Hamdani/level6/requirements.txt b/submissions/Touqeer-Hamdani/level6/requirements.txt new file mode 100644 index 000000000..5a418921c --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/requirements.txt @@ -0,0 +1,6 @@ +streamlit +neo4j +python-dotenv +pandas +plotly +statsmodels diff --git a/submissions/Touqeer-Hamdani/level6/seed_graph.py b/submissions/Touqeer-Hamdani/level6/seed_graph.py new file mode 100644 index 000000000..ec9ec8597 --- /dev/null +++ b/submissions/Touqeer-Hamdani/level6/seed_graph.py @@ -0,0 +1,297 @@ +""" +seed_graph.py — Populate Neo4j with factory production data. + +Run once: python seed_graph.py +Idempotent: safe to re-run (clears graph, then uses MERGE). +""" + +import os +import csv +from neo4j import GraphDatabase +from dotenv import load_dotenv + +load_dotenv() + +NEO4J_URI = os.getenv("NEO4J_URI") +NEO4J_USER = os.getenv("NEO4J_USER") +NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD") + +DATA_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data") + + +# ── Helpers ────────────────────────────────────────────────────────────────── + +def read_csv(filename): + """Read a CSV file from the data/ directory and return a list of dicts.""" + filepath = os.path.join(DATA_DIR, filename) + with open(filepath, newline="", encoding="utf-8-sig") as f: + return list(csv.DictReader(f)) + + +def run(session, query, **kwargs): + """Run a Cypher query and return the result summary.""" + return session.run(query, **kwargs) + + +# ── Seeding phases ─────────────────────────────────────────────────────────── + +def create_constraints(session): + """Phase 1: Uniqueness constraints for idempotent MERGE.""" + constraints = [ + "CREATE CONSTRAINT IF NOT EXISTS FOR (p:Project) REQUIRE p.project_id IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (s:Station) REQUIRE s.station_code IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (w:Worker) REQUIRE w.worker_id IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (wk:Week) REQUIRE wk.week_id IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (prod:Product) REQUIRE prod.product_type IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (c:Certification) REQUIRE c.cert_name IS UNIQUE", + "CREATE CONSTRAINT IF NOT EXISTS FOR (e:Etapp) REQUIRE e.etapp_name IS UNIQUE", + ] + for c in constraints: + run(session, c) + print(" Created 7 uniqueness constraints") + + +def seed_production(session, rows): + """Phase 3: Nodes and relationships from factory_production.csv.""" + + # ── Nodes ── + run(session, """ + UNWIND $rows AS row + MERGE (p:Project {project_id: row.project_id}) + SET p.project_number = row.project_number, + p.project_name = row.project_name + """, rows=rows) + + run(session, """ + UNWIND $rows AS row + MERGE (:Product {product_type: row.product_type, unit: row.unit}) + """, rows=rows) + + run(session, """ + UNWIND $rows AS row + MERGE (s:Station {station_code: row.station_code}) + SET s.station_name = row.station_name + """, rows=rows) + + run(session, """ + UNWIND $rows AS row + MERGE (:Week {week_id: row.week}) + """, rows=rows) + + run(session, """ + UNWIND $rows AS row + MERGE (:Etapp {etapp_name: row.etapp}) + """, rows=rows) + + # ── Relationships ── + # PRODUCES — one per unique (project_id, product_type) + run(session, """ + UNWIND $rows AS row + MATCH (p:Project {project_id: row.project_id}) + MATCH (prod:Product {product_type: row.product_type}) + MERGE (p)-[r:PRODUCES]->(prod) + SET r.quantity = toInteger(row.quantity), + r.unit_factor = toFloat(row.unit_factor), + r.unit = row.unit + """, rows=rows) + + # SCHEDULED_AT — composite key includes product_type to avoid P05/018 collision + run(session, """ + UNWIND $rows AS row + MATCH (p:Project {project_id: row.project_id}) + MATCH (s:Station {station_code: row.station_code}) + MERGE (p)-[r:SCHEDULED_AT { + week: row.week, + etapp: row.etapp, + bop: row.bop, + product_type: row.product_type + }]->(s) + SET r.planned_hours = toFloat(row.planned_hours), + r.actual_hours = toFloat(row.actual_hours), + r.completed_units = toInteger(row.completed_units), + r.variance_pct = CASE + WHEN toFloat(row.planned_hours) > 0 + THEN round((toFloat(row.actual_hours) - toFloat(row.planned_hours)) + / toFloat(row.planned_hours) * 100, 1) + ELSE 0.0 + END + """, rows=rows) + + # ACTIVE_IN — project ↔ week + run(session, """ + UNWIND $rows AS row + MATCH (p:Project {project_id: row.project_id}) + MATCH (wk:Week {week_id: row.week}) + MERGE (p)-[:ACTIVE_IN]->(wk) + """, rows=rows) + + # IN_PHASE — project ↔ etapp + run(session, """ + UNWIND $rows AS row + MATCH (p:Project {project_id: row.project_id}) + MATCH (e:Etapp {etapp_name: row.etapp}) + MERGE (p)-[:IN_PHASE]->(e) + """, rows=rows) + + print(f" Loaded {len(rows)} production rows → nodes + relationships") + + +def seed_workers(session, rows): + """Phase 4: Nodes and relationships from factory_workers.csv.""" + + # Pre-process: split comma-separated fields in Python + worker_data = [] + for w in rows: + certs = [c.strip() for c in w["certifications"].split(",")] + cover = [s.strip() for s in w["can_cover_stations"].split(",")] + worker_data.append({ + "worker_id": w["worker_id"], + "name": w["name"], + "role": w["role"], + "primary_station": w["primary_station"], + "hours_per_week": int(w["hours_per_week"]), + "type": w["type"], + "certifications": certs, + "can_cover": cover, + }) + + # Worker nodes + run(session, """ + UNWIND $rows AS row + MERGE (w:Worker {worker_id: row.worker_id}) + SET w.name = row.name, + w.role = row.role, + w.hours_per_week = row.hours_per_week, + w.type = row.type + """, rows=worker_data) + + # Certification nodes + HOLDS + run(session, """ + UNWIND $rows AS row + MATCH (w:Worker {worker_id: row.worker_id}) + UNWIND row.certifications AS cert + MERGE (c:Certification {cert_name: cert}) + MERGE (w)-[:HOLDS]->(c) + """, rows=worker_data) + + # WORKS_AT — skip W11 (primary_station = "all") + run(session, """ + UNWIND $rows AS row + WITH row WHERE row.primary_station <> 'all' + MATCH (w:Worker {worker_id: row.worker_id}) + MATCH (s:Station {station_code: row.primary_station}) + MERGE (w)-[:WORKS_AT]->(s) + """, rows=worker_data) + + # CAN_COVER + run(session, """ + UNWIND $rows AS row + MATCH (w:Worker {worker_id: row.worker_id}) + UNWIND row.can_cover AS sc + MATCH (s:Station {station_code: sc}) + MERGE (w)-[:CAN_COVER]->(s) + """, rows=worker_data) + + print(f" Loaded {len(rows)} workers → Workers, Certifications + relationships") + + +def seed_capacity(session, rows): + """Phase 5: HAS_CAPACITY relationships from factory_capacity.csv.""" + + cap_data = [] + for c in rows: + cap_data.append({ + "week": c["week"], + "own_hours": int(c["own_hours"]), + "hired_hours": int(c["hired_hours"]), + "overtime_hours": int(c["overtime_hours"]), + "total_planned": int(c["total_planned"]), + "deficit": int(c["deficit"]), + }) + + run(session, """ + UNWIND $rows AS row + MERGE (wk:Week {week_id: row.week}) + MATCH (f:Factory) + MERGE (wk)-[r:HAS_CAPACITY]->(f) + SET r.own_hours = row.own_hours, + r.hired_hours = row.hired_hours, + r.overtime_hours = row.overtime_hours, + r.total_planned = row.total_planned, + r.deficit = row.deficit + """, rows=cap_data) + + print(f" Loaded {len(rows)} capacity rows → HAS_CAPACITY") + + +def compute_loaded_in(session): + """Phase 6: Aggregate SCHEDULED_AT into LOADED_IN per (station, week).""" + run(session, """ + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + WITH s, r.week AS week, + sum(r.planned_hours) AS tp, + sum(r.actual_hours) AS ta + MATCH (wk:Week {week_id: week}) + MERGE (s)-[l:LOADED_IN]->(wk) + SET l.total_planned = tp, + l.total_actual = ta + """) + print(" Computed LOADED_IN aggregations") + + +# ── Main ───────────────────────────────────────────────────────────────────── + +def main(): + print("=" * 55) + print(" Factory Knowledge Graph — Seeder") + print("=" * 55) + + driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD)) + driver.verify_connectivity() + print("Connected to Neo4j\n") + + # Read CSVs + production = read_csv("factory_production.csv") + workers = read_csv("factory_workers.csv") + capacity = read_csv("factory_capacity.csv") + + with driver.session() as session: + # Phase 0: Clear + run(session, "MATCH (n) DETACH DELETE n") + print("Cleared existing graph\n") + + # Phase 1: Constraints + create_constraints(session) + + # Phase 2: Factory singleton + run(session, 'MERGE (:Factory {factory_name: "VSAB Stålbyggnad"})') + print(" Created Factory node") + + # Phase 3–6 + seed_production(session, production) + seed_workers(session, workers) + seed_capacity(session, capacity) + compute_loaded_in(session) + + # ── Summary ── + with driver.session() as session: + nodes = session.run("MATCH (n) RETURN count(n) AS c").single()["c"] + rels = session.run("MATCH ()-[r]->() RETURN count(r) AS c").single()["c"] + labels = session.run("CALL db.labels() YIELD label RETURN collect(label) AS l").single()["l"] + rel_types = session.run( + "CALL db.relationshipTypes() YIELD relationshipType RETURN collect(relationshipType) AS t" + ).single()["t"] + + print(f"\n{'=' * 55}") + print(f" Seeding complete!") + print(f" Nodes: {nodes}") + print(f" Relationships: {rels}") + print(f" Labels ({len(labels)}): {labels}") + print(f" Rel types ({len(rel_types)}): {rel_types}") + print(f"{'=' * 55}") + + driver.close() + + +if __name__ == "__main__": + main()