diff --git a/submissions/yashika-verma/level5/answers.md b/submissions/yashika-verma/level5/answers.md new file mode 100644 index 000000000..818bfc33f --- /dev/null +++ b/submissions/yashika-verma/level5/answers.md @@ -0,0 +1,323 @@ +# Level 5 Answers + +## Q1. Model It + +The schema models factory operations as interconnected production, staffing, and capacity relationships. Projects produce products that flow through stations across weekly schedules, while workers and certifications represent staffing flexibility and operational dependencies. + +Graph relationships capture production flow, worker substitution capability, and overload propagation between shared stations. Relationship properties store operational metrics such as planned hours, actual hours, weekly variance, and capacity deficits. + +Schema diagram included in schema.md. + +--- + +## Q2. Why Not Just SQL? +### SQL Query + +```sql +SELECT + w.worker_name, + s.station_name, + p.project_name +FROM workers w +JOIN worker_certifications wc + ON w.worker_id = wc.worker_id +JOIN certifications c + ON wc.certification_id = c.certification_id +JOIN stations s + ON c.station_id = s.station_id +JOIN production pr + ON s.station_id = pr.station_id +JOIN projects p + ON pr.project_id = p.project_id +WHERE s.station_code = '016' +AND s.station_name = 'Gjutning' +AND w.worker_name <> 'Per Gustafsson'; +``` + +--- + +### Cypher Query + +```cypher +MATCH (w:Worker)-[:HAS_CERTIFICATION]->(:Certification)-[:VALID_FOR]->(s:Station) +MATCH (p:Project)-[:USES_STATION]->(s) +WHERE s.station_code = "016" +AND s.name = "Gjutning" +AND w.name <> "Per Gustafsson" + +RETURN w.name AS replacement_worker, + s.name AS station, + collect(p.name) AS affected_projects +``` + +--- + +### Explanation + +The Cypher query follows the operational relationships directly, making worker substitution and project impact easier to understand visually. In SQL, the same logic requires multiple joins and intermediary tables, which hides the real dependency chain between workers, certifications, stations, and projects. The graph model makes operational reasoning and traversal much more intuitive for production planning scenarios. + + +## Q3. Spot the Bottleneck + +### Bottleneck Analysis + +The capacity dataset shows multiple weekly deficits where production demand exceeds available station capacity. By comparing planned_hours and actual_hours in the production dataset, the main bottlenecks appear in stations with repeated overtime and high variance across multiple projects. + +Projects that consistently exceed planned hours create cascading overload effects because the same stations are shared across several active projects during the same production weeks. + +--- + +### Cypher Query + +```cypher +MATCH (p:Project)-[r:OVERLOADED_AT]->(s:Station) +WHERE r.actual_hours > r.planned_hours * 1.10 + +RETURN + s.name AS station, + p.name AS project, + r.planned_hours AS planned_hours, + r.actual_hours AS actual_hours, + ROUND( + ((r.actual_hours - r.planned_hours) / r.planned_hours) * 100, + 2 + ) AS variance_percent + +ORDER BY variance_percent DESC +``` + +--- + +### Graph Alert Modeling + +I would model bottlenecks using both relationship properties and dedicated Alert nodes. + +The relationship property stores operational metrics directly on the production relationship: + +```text +(:Project)-[:OVERLOADED_AT { + planned_hours, + actual_hours, + variance_percent, + week +}]->(:Station) +``` + +This keeps overload calculations close to the operational workflow. + +For monitoring and dashboards, I would also create dedicated Alert nodes: + +```text +(:Alert { + type: "capacity_overload", + severity: "high", + variance_percent: 24.5 +}) +``` + +connected through: + +```text +(:Alert)-[:TRIGGERED_FOR]->(:Project) +(:Alert)-[:AT_STATION]->(:Station) +(:Alert)-[:IN_WEEK]->(:Week) +``` + +This makes it easier to track historical bottlenecks, visualize recurring overload patterns, and build real-time operational dashboards. + +## Q4. Vector + Graph Hybrid + +### What I Would Embed + +I would generate embeddings for: + +- Project descriptions +- Product specifications +- Delivery timelines +- Construction/client requirements +- Worker skills and certifications +- Historical project summaries + +Example project text: + +```text +"450 meters of IQB beams for a hospital extension in Linköping with a tight delivery timeline" +``` + +Embedding this text allows the system to identify semantically similar projects even when the product names or wording are different. + +--- + +### Hybrid Vector + Graph Query + +```cypher +CALL db.index.vector.queryNodes( + 'project_embeddings', + 5, + $embedding +) +YIELD node, score + +MATCH (node:Project)-[:USES_STATION]->(s:Station) +MATCH (node)-[r:OVERLOADED_AT]->(s) + +WHERE r.variance_percent < 5 + +RETURN + node.name AS similar_project, + score, + collect(DISTINCT s.name) AS stations_used, + avg(r.variance_percent) AS average_variance + +ORDER BY score DESC +``` + +--- + +### Why Hybrid Search Is Better + +Filtering only by product type would miss operational context and project complexity. Two projects may both involve IQB beams but differ significantly in staffing requirements, station usage, delivery pressure, or execution efficiency. + +Vector similarity captures semantic and contextual similarity between projects, while the graph relationships ensure the retrieved projects are operationally relevant based on station usage, production flow, and historical performance. + +This hybrid approach is more useful because it combines semantic understanding with real operational constraints. + +The same pattern can also be applied to people-matching systems such as Boardy, where embeddings identify similar needs or offers, while graph relationships verify community, collaboration history, or organizational connections. + +## Q5. Your L6 Plan + +### Graph Nodes and CSV Mapping + +| Node Label | CSV Columns | +|---|---| +| Project | project_id, project_name | +| Product | product_type | +| Station | station_code, station_name | +| Worker | worker_name, role | +| Certification | certifications | +| Week | week | +| Capacity | planned_capacity, actual_demand, deficit | +| Alert | derived from overload calculations | + +--- + +### Relationship Types + +| Relationship | Created From | +|---|---| +| PRODUCES | project_id → product_type | +| PROCESSED_AT | product_type → station | +| USES_STATION | project_id → station | +| SCHEDULED_IN | project_id → week | +| PRIMARY_OPERATOR_AT | worker → primary_station | +| CAN_COVER | worker → cover_stations | +| HAS_CERTIFICATION | worker → certifications | +| HAS_CAPACITY | station → weekly capacity | +| OVERLOADED_AT | actual_hours > planned_hours | +| TRIGGERED_FOR | alert → project | + +--- + +### Streamlit Dashboard Panels + +#### 1. Station Load Heatmap + +Purpose: +- Compare planned vs actual hours by station and week +- Identify overloaded production stations visually + +Cypher query: + +```cypher +MATCH (p:Project)-[r:OVERLOADED_AT]->(s:Station) +RETURN + s.name, + r.week, + SUM(r.actual_hours) AS actual_hours, + SUM(r.planned_hours) AS planned_hours +ORDER BY r.week +``` + +--- + +#### 2. Worker Coverage Matrix + +Purpose: +- Show which workers can substitute for critical stations +- Identify staffing risks and certification gaps + +Cypher query: + +```cypher +MATCH (w:Worker)-[:CAN_COVER]->(s:Station) +RETURN + w.name, + collect(s.name) AS coverage_stations +``` + +--- + +#### 3. Bottleneck Alert Dashboard + +Purpose: +- Track projects exceeding planned production hours +- Display highest variance stations and recurring overloads + +Cypher query: + +```cypher +MATCH (p:Project)-[r:OVERLOADED_AT]->(s:Station) +WHERE r.variance_percent > 10 + +RETURN + p.name, + s.name, + r.week, + r.variance_percent + +ORDER BY r.variance_percent DESC +``` + +--- + +#### 4. Similar Project Search + +Purpose: +- Find historically similar projects using vector search +- Compare station usage and execution efficiency + +Cypher query: + +```cypher +CALL db.index.vector.queryNodes( + 'project_embeddings', + 5, + $embedding +) +YIELD node, score + +RETURN node.name, score +ORDER BY score DESC +``` + +--- + +### Implementation Plan + +For Level 6, I would: + +1. Load the CSV datasets into Neo4j +2. Create graph nodes and relationships using Cypher import scripts +3. Compute overload and variance metrics during ingestion +4. Generate embeddings for project descriptions +5. Build a Streamlit dashboard connected to Neo4j +6. Visualize bottlenecks, staffing coverage, and project similarity +7. Add alert monitoring for recurring station overloads + +The graph structure enables operational analysis that would be difficult to model efficiently using relational joins alone, especially when tracking staffing substitutions, shared station dependencies, and cascading production bottlenecks. + +--- + +## Conclusion + +This graph-based approach models the factory as an interconnected operational system rather than isolated relational tables. Combining graph traversal with vector similarity enables more intelligent production planning, staffing analysis, bottleneck detection, and historical project comparison. \ No newline at end of file diff --git a/submissions/yashika-verma/level5/notes.md b/submissions/yashika-verma/level5/notes.md new file mode 100644 index 000000000..56c2d22dd --- /dev/null +++ b/submissions/yashika-verma/level5/notes.md @@ -0,0 +1,98 @@ +# Dataset Inspection Notes + +## factory_production.csv + +Observed columns: +- project_id +- product_type +- station_code +- station_name +- week +- planned_hours +- actual_hours + +Key observations: +- Multiple projects use the same stations +- Products move through several production stations +- Actual hours sometimes exceed planned hours +- Weekly production flow is important for bottleneck analysis + +Possible graph entities: +- Project +- Product +- Station +- Week + +Possible relationship properties: +- planned_hours +- actual_hours +- variance_percent +- week + + +--- + +## factory_workers.csv + +Observed columns: +- worker_name +- primary_station +- certifications +- role +- cover_stations + +Key observations: +- Workers can cover multiple stations +- Certifications determine substitution capability +- Some stations depend on a small number of workers + +Possible graph entities: +- Worker +- Certification +- Station + +Possible relationships: +- PRIMARY_OPERATOR_AT +- CAN_COVER +- HAS_CERTIFICATION + + +--- + +## factory_capacity.csv + +Observed columns: +- week +- station +- planned_capacity +- actual_demand +- deficit + +Key observations: +- Some stations exceed weekly capacity +- Capacity deficits vary across weeks +- Weekly overload patterns can trigger alerts + +Possible graph entities: +- Capacity +- Week +- Station +- Alert + +Possible relationships: +- HAS_CAPACITY +- OVERLOADED_AT +- TRIGGERED_FOR + + +--- + +# Initial Graph Modeling Thoughts + +The system naturally fits a graph structure because: +- projects connect to products and stations +- workers connect to certifications and coverage stations +- weekly capacity creates temporal operational dependencies +- bottlenecks propagate across multiple connected entities + +Graph traversal makes staffing impact and overload analysis easier than multi-table relational joins. \ No newline at end of file diff --git a/submissions/yashika-verma/level5/schema.md b/submissions/yashika-verma/level5/schema.md new file mode 100644 index 000000000..c13e1109d --- /dev/null +++ b/submissions/yashika-verma/level5/schema.md @@ -0,0 +1,119 @@ +# Factory Knowledge Graph Schema + +```mermaid +graph TD + +%% ========================= +%% CORE PRODUCTION FLOW +%% ========================= + +Project -->|PRODUCES| Product +Product -->|PROCESSED_AT| Station +Project -->|USES_STATION| Station +Project -->|SCHEDULED_IN| Week + +%% ========================= +%% WORKERS + COVERAGE +%% ========================= + +Worker -->|PRIMARY_OPERATOR_AT| Station +Worker -->|CAN_COVER| Station +Worker -->|HAS_CERTIFICATION| Certification + +%% ========================= +%% CAPACITY + LOAD +%% ========================= + +Station -->|HAS_WEEKLY_CAPACITY| Week +Project -->|OVERLOADED_AT| Station + +%% ========================= +%% ALERTING +%% ========================= + +Alert -->|TRIGGERED_FOR| Project +Alert -->|AT_STATION| Station +Alert -->|IN_WEEK| Week +``` + +--- + +# Node Labels + +| Node Label | Description | Source CSV | +|---|---|---| +| Project | Construction/manufacturing project | factory_production.csv | +| Product | Product type (IQB, SB, etc.) | factory_production.csv | +| Station | Production station/work area | factory_production.csv | +| Worker | Factory worker/operator | factory_workers.csv | +| Certification | Worker skills/certifications | factory_workers.csv | +| Week | Production week (w1–w8) | production + capacity CSV | +| Alert | Bottleneck/overload detection node | derived | + +--- + +# Relationship Types + +| Relationship | Meaning | +|---|---| +| PRODUCES | Project produces a product type | +| PROCESSED_AT | Product moves through a station | +| USES_STATION | Project requires a station | +| SCHEDULED_IN | Project active during a week | +| PRIMARY_OPERATOR_AT | Worker's main station | +| CAN_COVER | Worker can substitute at station | +| HAS_CERTIFICATION | Worker certifications | +| HAS_WEEKLY_CAPACITY | Station weekly production capacity | +| OVERLOADED_AT | Project exceeded planned hours | +| TRIGGERED_FOR | Alert linked to project | +| AT_STATION | Alert linked to station | +| IN_WEEK | Alert linked to week | + +--- + +# Relationship Properties + +## Example 1 — Project overload tracking + +```text +(:Project)-[:OVERLOADED_AT { + planned_hours: 35.0, + actual_hours: 42.5, + variance_percent: 21.4, + week: "w4" +}]->(:Station) +``` + +--- + +## Example 2 — Worker station coverage + +```text +(:Worker)-[:CAN_COVER { + coverage_type: "backup", + hours_per_week: 40 +}]->(:Station) +``` + +--- + +## Example 3 — Weekly station capacity + +```text +(:Station)-[:HAS_WEEKLY_CAPACITY { + total_capacity: 480, + total_planned: 612, + deficit: -132, + overtime_hours: 40 +}]->(:Week) +``` + +--- + +# Design Rationale + +The schema models factory operations as interconnected production, staffing, and capacity relationships. Projects produce products that flow through stations across weekly schedules, while workers and certifications represent staffing flexibility and operational dependencies. + +Graph relationships capture production flow, worker substitution capability, and overload propagation between shared stations. Relationship properties store operational metrics such as planned hours, actual hours, weekly variance, and capacity deficits, enabling bottleneck analysis and workforce substitution queries. + +This graph-based structure makes it easier to analyze cascading operational effects, recurring production bottlenecks, staffing risks, and historical production performance compared to traditional relational joins. \ No newline at end of file diff --git a/submissions/yashika-verma/level6/.env.example b/submissions/yashika-verma/level6/.env.example new file mode 100644 index 000000000..e69de29bb diff --git a/submissions/yashika-verma/level6/.gitignore b/submissions/yashika-verma/level6/.gitignore new file mode 100644 index 000000000..717be2ca7 --- /dev/null +++ b/submissions/yashika-verma/level6/.gitignore @@ -0,0 +1,6 @@ +.env +.streamlit/secrets.toml +venv/ +.vscode/ +.DS_Store +__pycache__/ \ No newline at end of file diff --git a/submissions/yashika-verma/level6/README.md b/submissions/yashika-verma/level6/README.md new file mode 100644 index 000000000..257516559 --- /dev/null +++ b/submissions/yashika-verma/level6/README.md @@ -0,0 +1,65 @@ +# Factory Knowledge Graph Dashboard + +A Streamlit + Neo4j dashboard for analyzing factory production planning, staffing coverage, station overloads, and capacity bottlenecks. + +## Features + +- Project variance tracking +- Production station load analysis +- Capacity deficit monitoring +- Worker coverage analysis +- Graph-based operational insights +- Self-test validation system + +## Tech Stack + +- Neo4j AuraDB +- Streamlit +- Plotly +- pandas +- Python + +## Dataset + +The dashboard uses: +- factory_production.csv +- factory_workers.csv +- factory_capacity.csv + +## Run Locally + +```bash +pip install -r requirements.txt +streamlit run app.py +``` + +## Neo4j Configuration + +Create `.streamlit/secrets.toml` + +```toml +NEO4J_URI = "your-uri" +NEO4J_USER = "neo4j" +NEO4J_PASSWORD = "your-password" +``` + +## Graph Visualization + +![Graph](graph.png) + +## Graph Architecture + +### Node Labels +- Project +- Product +- Station +- Worker +- Certification +- Week + +### Relationship Types +- SCHEDULED_AT +- USES_STATION +- CAN_COVER +- HAS_CERTIFICATION +- ASSIGNED_TO \ No newline at end of file diff --git a/submissions/yashika-verma/level6/app.py b/submissions/yashika-verma/level6/app.py new file mode 100644 index 000000000..cebdc9660 --- /dev/null +++ b/submissions/yashika-verma/level6/app.py @@ -0,0 +1,565 @@ +import streamlit as st +from neo4j import GraphDatabase +import pandas as pd +import plotly.express as px + +# ========================= +# PAGE CONFIG +# ========================= + +st.set_page_config( + page_title="Factory Knowledge Graph Dashboard", + layout="wide" +) + +# ========================= +# NEO4J CONNECTION +# ========================= + +URI = st.secrets["NEO4J_URI"] +USER = st.secrets["NEO4J_USER"] +PASSWORD = st.secrets["NEO4J_PASSWORD"] + +driver = GraphDatabase.driver( + URI, + auth=(USER, PASSWORD) +) + +# ========================= +# TITLE +# ========================= + +st.title( + "🏭 Factory Knowledge Graph Dashboard" +) +st.info(""" +Key Findings: +- 2 projects exceeded overload thresholds +- P04 has the highest variance +- Multiple deficit weeks indicate capacity imbalance +""") + +# ========================= +# SIDEBAR +# ========================= + +page = st.sidebar.radio( + "Navigation", + [ + "Project Overview", + "Station Load", + "Capacity Tracker", + "Worker Coverage", + "Self-Test" + ] +) + +# ========================= +# PAGE 1 — PROJECT OVERVIEW +# ========================= + +if page == "Project Overview": + + st.header("📦 Project Overview") + + query = """ + MATCH (p:Project)-[r:SCHEDULED_AT]->(:Station) + + RETURN + p.id AS project, + SUM(r.planned_hours) AS planned_hours, + SUM(r.actual_hours) AS actual_hours, + ROUND( + ( + ( + SUM(r.actual_hours) + - SUM(r.planned_hours) + ) + / SUM(r.planned_hours) + ) * 100, + 2 + ) AS variance_percent + """ + + with driver.session() as session: + + result = session.run(query) + + data = [dict(r) for r in result] + + df = pd.DataFrame(data) + + df = df.round(2) + + def classify_variance(v): + + if v >= 4: + return "HIGH" + + elif v >= 2: + return "MEDIUM" + + return "LOW" + + df["severity"] = df[ + "variance_percent" + ].apply(classify_variance) + + # ========================= + # METRICS + # ========================= + + col1, col2, col3 = st.columns(3) + + col1.metric( + "Projects", + len(df) + ) + + col2.metric( + "Total Planned Hours", + round( + df["planned_hours"].sum(), + 1 + ) + ) + + col3.metric( + "Average Variance %", + round( + df["variance_percent"].mean(), + 2 + ) + ) + + # ========================= + # TABLE HIGHLIGHTING + # ========================= + + def highlight_variance(val): + + if val > 3: + return "background-color: red" + + return "" + + styled_df = df.style.map( + highlight_variance, + subset=["variance_percent"] + ) + + styled_df = styled_df.format({ + "planned_hours": "{:.1f}", + "actual_hours": "{:.1f}", + "variance_percent": "{:.2f}", + "severity": "{}" + }) + + st.dataframe( + styled_df, + use_container_width=True + ) + + # ========================= + # TOP OVERLOADED PROJECTS + # ========================= + + st.subheader( + "Top Overloaded Projects" + ) + + top_df = df.sort_values( + by="variance_percent", + ascending=False + ).head(5) + + top_df = top_df.round(2) + + st.dataframe( + top_df, + use_container_width=True + ) + + highest = top_df.iloc[0] + + st.warning( + f"{highest['project']} exceeded planned " + f"hours by {highest['variance_percent']}%" + ) + st.subheader("🚨 Critical Alerts") + + critical_projects = df[ + df["variance_percent"] > 3 + ] + + for _, row in critical_projects.iterrows(): + + st.error( + f"{row['project']} exceeded planned hours by " + f"{row['variance_percent']}%" + ) + + # ========================= + # OPERATIONAL INSIGHTS + # ========================= + + st.subheader("Operational Insights") + + overloaded_count = len( + df[df["variance_percent"] > 3] + ) + + st.markdown(f""" + - {overloaded_count} projects exceeded the variance threshold + - {highest['project']} currently has the highest overload variance + - Production scheduling may require capacity rebalancing + """) + +# ========================= +# PAGE 2 — STATION LOAD +# ========================= + +elif page == "Station Load": + + st.header("📊 Station Load") + + query = """ + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + + RETURN + s.name AS station, + r.week AS week, + SUM(r.planned_hours) AS planned_hours, + SUM(r.actual_hours) AS actual_hours + """ + + with driver.session() as session: + + result = session.run(query) + + data = [dict(r) for r in result] + + df = pd.DataFrame(data) + + df = df.round(2) + + fig = px.bar( + df, + x="week", + y="actual_hours", + color="station", + barmode="group", + hover_data=["planned_hours"] + ) + + st.plotly_chart( + fig, + use_container_width=True + ) + + highest_station = df.groupby( + "station" + )["actual_hours"].sum().idxmax() + + st.info( + f"{highest_station} currently has the highest production load." + ) + +# ========================= +# PAGE 3 — CAPACITY TRACKER +# ========================= + +elif page == "Capacity Tracker": + + st.header("⚠ Capacity Tracker") + + query = """ + MATCH (w:Week) + + RETURN + w.name AS week, + w.total_capacity AS total_capacity, + w.total_planned AS total_planned, + w.deficit AS deficit + + ORDER BY week + """ + + with driver.session() as session: + + result = session.run(query) + + data = [dict(r) for r in result] + + df = pd.DataFrame(data) + + df = df.round(2) + + st.dataframe( + df, + use_container_width=True + ) + + fig = px.line( + df, + x="week", + y=["total_capacity", "total_planned"], + markers=True + ) + + st.plotly_chart( + fig, + use_container_width=True + ) + + fig2 = px.area( + df, + x="week", + y="deficit" + ) + + st.plotly_chart( + fig2, + use_container_width=True + ) + + deficit_df = df[ + df["deficit"] < 0 + ] + + st.subheader("Deficit Weeks") + + st.dataframe( + deficit_df, + use_container_width=True + ) + + worst_week = deficit_df.sort_values( + by="deficit" + ).iloc[0] + + st.error( + f"{worst_week['week']} had the largest " + f"capacity deficit of {worst_week['deficit']} hours." + ) + +# ========================= +# PAGE 4 — WORKER COVERAGE +# ========================= + +elif page == "Worker Coverage": + + st.header("👷 Worker Coverage") + + query = """ + MATCH (w:Worker)-[:CAN_COVER]->(s:Station) + + RETURN + s.name AS station, + collect(w.name) AS workers, + count(w) AS coverage_count + + ORDER BY coverage_count ASC + """ + + with driver.session() as session: + + result = session.run(query) + + data = [dict(r) for r in result] + + df = pd.DataFrame(data) + + st.dataframe( + df, + use_container_width=True + ) + + risky_df = df[ + df["coverage_count"] <= 1 + ] + + st.subheader( + "Single Point of Failure Stations" + ) + + st.dataframe( + risky_df, + use_container_width=True + ) + + st.warning( + f"{len(risky_df)} stations have limited backup coverage." + ) + + lowest_station = risky_df.iloc[0] + + st.error( + f"{lowest_station['station']} has the lowest worker redundancy." + ) + + +# ========================= +# PAGE 5 — SELF TEST +# ========================= + +elif page == "Self-Test": + + st.header("✅ Self-Test") + + checks = [] + + try: + + with driver.session() as s: + s.run("RETURN 1") + + checks.append( + ( + "Neo4j connected", + True, + 3 + ) + ) + + except: + + checks.append( + ( + "Neo4j connected", + False, + 3 + ) + ) + + with driver.session() as s: + + result = s.run( + "MATCH (n) RETURN count(n) AS c" + ).single() + + node_count = result["c"] + + checks.append( + ( + f"{node_count} nodes (min: 50)", + node_count >= 50, + 3 + ) + ) + + result = s.run( + "MATCH ()-[r]->() RETURN count(r) AS c" + ).single() + + rel_count = result["c"] + + checks.append( + ( + f"{rel_count} relationships (min: 100)", + rel_count >= 100, + 3 + ) + ) + + result = s.run( + """ + CALL db.labels() + YIELD label + RETURN count(label) AS c + """ + ).single() + + label_count = result["c"] + + checks.append( + ( + f"{label_count} node labels (min: 6)", + label_count >= 6, + 3 + ) + ) + + result = s.run( + """ + CALL db.relationshipTypes() + YIELD relationshipType + RETURN count(relationshipType) AS c + """ + ).single() + + rel_types = result["c"] + + checks.append( + ( + f"{rel_types} relationship types (min: 8)", + rel_types >= 8, + 3 + ) + ) + + result = s.run(""" + MATCH (p:Project)-[r:SCHEDULED_AT]->(s:Station) + + WHERE r.variance_percent > 3 + + RETURN + p.id AS project, + s.name AS station, + r.planned_hours AS planned, + r.actual_hours AS actual + + LIMIT 10 + """) + + rows = [dict(r) for r in result] + + checks.append( + ( + f"Variance query: {len(rows)} results", + len(rows) > 0, + 5 + ) + ) + + total = 0 + + for text, passed, score in checks: + + if passed: + + st.success( + f"✅ {text} — {score}/{score}" + ) + + total += score + + else: + + st.error( + f"❌ {text} — 0/{score}" + ) + + st.divider() + + if total == 20: + + st.success( + f"SELF-TEST SCORE: {total}/20" + ) + + else: + + st.warning( + f"SELF-TEST SCORE: {total}/20" + ) + +# ========================= +# FOOTER +# ========================= + +st.divider() + +st.caption( + "Built using Neo4j, Streamlit, Plotly, and factory production datasets." +) \ No newline at end of file diff --git a/submissions/yashika-verma/level6/data/factory_capacity.csv b/submissions/yashika-verma/level6/data/factory_capacity.csv new file mode 100644 index 000000000..795ff52f0 --- /dev/null +++ b/submissions/yashika-verma/level6/data/factory_capacity.csv @@ -0,0 +1,9 @@ +week,own_staff_count,hired_staff_count,own_hours,hired_hours,overtime_hours,total_capacity,total_planned,deficit +w1,10,2,400,80,0,480,612,-132 +w2,10,2,400,80,40,520,645,-125 +w3,10,2,400,80,0,480,398,82 +w4,10,2,400,80,20,500,550,-50 +w5,10,2,400,80,30,510,480,30 +w6,9,2,360,80,0,440,520,-80 +w7,10,2,400,80,40,520,600,-80 +w8,10,2,400,80,20,500,470,30 \ No newline at end of file diff --git a/submissions/yashika-verma/level6/data/factory_production.csv b/submissions/yashika-verma/level6/data/factory_production.csv new file mode 100644 index 000000000..ca6ce43e1 --- /dev/null +++ b/submissions/yashika-verma/level6/data/factory_production.csv @@ -0,0 +1,69 @@ +project_id,project_number,project_name,product_type,unit,quantity,unit_factor,station_code,station_name,etapp,bop,week,planned_hours,actual_hours,completed_units +P01,4501,Stålverket Borås,IQB,meter,600,1.77,011,FS IQB,ET1,BOP1,w1,48.0,45.2,28 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,012,Förmontering IQB,ET1,BOP1,w1,32.0,35.5,25 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,013,Montering IQB,ET1,BOP1,w1,28.0,26.0,22 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,014,Svets o montage IQB,ET1,BOP1,w1,35.0,38.2,20 +P01,4501,Stålverket Borås,SB,styck,40,4.0,018,SB B/F-hall,ET1,BOP1,w1,16.0,14.5,4 +P01,4501,Stålverket Borås,SP,styck,180,2.0,019,SP B/F-hall,ET1,BOP1,w1,12.0,13.0,7 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,011,FS IQB,ET1,BOP1,w2,48.0,50.0,32 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,012,Förmontering IQB,ET1,BOP1,w2,32.0,30.0,28 +P01,4501,Stålverket Borås,IQP,styck,90,2.80,015,Montering IQP,ET1,BOP2,w2,25.0,28.0,9 +P01,4501,Stålverket Borås,SR,styck,8,45.0,021,SR B/F-hall,ET1,BOP2,w2,40.0,42.0,1 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,011,FS IQB,ET1,BOP1,w1,30.0,28.0,20 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,012,Förmontering IQB,ET1,BOP1,w1,22.0,24.5,18 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,013,Montering IQB,ET1,BOP1,w1,18.0,17.0,16 +P02,4502,Kontorshus Mölndal,IQP,styck,70,2.70,015,Montering IQP,ET1,BOP1,w1,19.0,21.0,7 +P02,4502,Kontorshus Mölndal,SD,styck,30,3.00,018,SB B/F-hall,ET1,BOP1,w1,9.0,8.5,3 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,011,FS IQB,ET1,BOP1,w2,30.0,32.0,24 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,014,Svets o montage IQB,ET1,BOP1,w2,25.0,23.0,20 +P02,4502,Kontorshus Mölndal,SP,styck,120,1.75,019,SP B/F-hall,ET1,BOP2,w2,14.0,15.5,8 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,011,FS IQB,ET1,BOP1,w1,72.0,70.0,40 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,012,Förmontering IQB,ET1,BOP1,w1,48.0,52.0,35 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,013,Montering IQB,ET1,BOP1,w1,38.0,36.5,30 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,014,Svets o montage IQB,ET1,BOP1,w1,42.0,48.0,28 +P03,4503,Lagerhall Jönköping,SB,styck,60,6.00,018,SB B/F-hall,ET1,BOP1,w1,36.0,38.0,6 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,011,FS IQB,ET1,BOP1,w2,72.0,75.0,45 +P03,4503,Lagerhall Jönköping,IQP,styck,110,2.90,015,Montering IQP,ET1,BOP2,w2,32.0,30.0,11 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,016,Gjutning,ET1,BOP2,w2,28.0,35.0,8 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,017,Målning,ET1,BOP2,w3,24.0,22.0,20 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,011,FS IQB,ET1,BOP1,w1,38.0,36.0,24 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,012,Förmontering IQB,ET1,BOP1,w1,25.0,27.0,20 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,013,Montering IQB,ET1,BOP1,w1,20.0,19.0,18 +P04,4504,Parkering Helsingborg,IQP,styck,55,2.85,015,Montering IQP,ET1,BOP1,w1,16.0,18.0,6 +P04,4504,Parkering Helsingborg,SB,styck,25,7.50,018,SB B/F-hall,ET1,BOP1,w1,19.0,22.0,3 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,011,FS IQB,ET1,BOP1,w2,38.0,40.0,28 +P04,4504,Parkering Helsingborg,SP,styck,100,2.00,019,SP B/F-hall,ET1,BOP2,w2,12.0,11.0,6 +P04,4504,Parkering Helsingborg,SR,styck,12,120.0,021,SR B/F-hall,ET1,BOP2,w2,60.0,65.0,1 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,011,FS IQB,ET2,BOP3,w1,95.0,90.0,50 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,012,Förmontering IQB,ET2,BOP3,w1,65.0,68.0,42 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,013,Montering IQB,ET2,BOP3,w1,50.0,48.0,38 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,014,Svets o montage IQB,ET2,BOP3,w1,58.0,62.0,35 +P05,4505,Sjukhus Linköping ET2,IQP,styck,150,2.88,015,Montering IQP,ET2,BOP3,w1,30.0,33.0,10 +P05,4505,Sjukhus Linköping ET2,SB,styck,50,5.00,018,SB B/F-hall,ET2,BOP3,w1,25.0,28.0,5 +P05,4505,Sjukhus Linköping ET2,SD,styck,45,2.75,018,SB B/F-hall,ET2,BOP3,w1,12.0,11.5,4 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,011,FS IQB,ET2,BOP3,w2,95.0,98.0,55 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,016,Gjutning,ET2,BOP3,w2,35.0,40.0,12 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,017,Målning,ET2,BOP3,w2,28.0,26.0,25 +P05,4505,Sjukhus Linköping ET2,SR,styck,20,274.0,021,SR B/F-hall,ET2,BOP3,w3,120.0,115.0,2 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,011,FS IQB,ET1,BOP1,w2,40.0,38.0,26 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,012,Förmontering IQB,ET1,BOP1,w2,28.0,30.0,22 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,013,Montering IQB,ET1,BOP1,w2,22.0,20.0,18 +P06,4506,Skola Uppsala,IQP,styck,80,2.75,015,Montering IQP,ET1,BOP1,w2,22.0,24.0,8 +P06,4506,Skola Uppsala,SB,styck,35,4.50,018,SB B/F-hall,ET1,BOP1,w2,16.0,18.0,4 +P06,4506,Skola Uppsala,SP,styck,140,1.50,019,SP B/F-hall,ET1,BOP2,w3,14.0,12.0,10 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,011,FS IQB,ET1,BOP1,w1,45.0,42.0,22 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,012,Förmontering IQB,ET1,BOP1,w1,30.0,33.0,18 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,014,Svets o montage IQB,ET1,BOP1,w1,35.0,32.0,16 +P07,4507,Idrottshall Västerås,SB,styck,45,3.50,018,SB B/F-hall,ET1,BOP1,w1,16.0,18.0,5 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,011,FS IQB,ET1,BOP1,w2,45.0,48.0,26 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,016,Gjutning,ET1,BOP2,w2,20.0,22.0,5 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,017,Målning,ET1,BOP2,w3,18.0,16.0,15 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,011,FS IQB,ET1,BOP1,w1,65.0,62.0,36 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,012,Förmontering IQB,ET1,BOP1,w1,42.0,45.0,30 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,013,Montering IQB,ET1,BOP1,w1,35.0,38.0,25 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,014,Svets o montage IQB,ET1,BOP1,w1,40.0,44.0,22 +P08,4508,Bro E6 Halmstad,SP,styck,200,2.50,019,SP B/F-hall,ET1,BOP1,w1,20.0,18.0,8 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,011,FS IQB,ET1,BOP1,w2,65.0,68.0,42 +P08,4508,Bro E6 Halmstad,IQP,styck,95,2.93,015,Montering IQP,ET1,BOP2,w2,28.0,30.0,10 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,016,Gjutning,ET1,BOP2,w3,22.0,25.0,8 +P08,4508,Bro E6 Halmstad,SR,styck,15,180.0,021,SR B/F-hall,ET1,BOP2,w3,90.0,85.0,2 \ No newline at end of file diff --git a/submissions/yashika-verma/level6/data/factory_workers.csv b/submissions/yashika-verma/level6/data/factory_workers.csv new file mode 100644 index 000000000..3110285cc --- /dev/null +++ b/submissions/yashika-verma/level6/data/factory_workers.csv @@ -0,0 +1,15 @@ +worker_id,name,role,primary_station,can_cover_stations,certifications,hours_per_week,type +W01,Erik Lindberg,Operator,011,"011,012","MIG/MAG,TIG,ISO 9606",40,permanent +W02,Anna Berg,Operator,011,"011,014","MIG/MAG,TIG",40,permanent +W03,Lars Jensen,Operator,012,"012,013","Surface treatment,CE marking",40,permanent +W04,Maria Stone,Operator,013,"013","Blasting,Surface protection",40,permanent +W05,Johan Peters,Operator,014,"014,015","Hydraulics,Mechanics,Crane",40,permanent +W06,Karen Nilsen,Inspector,015,"015","SIS,SS-EN 1090,NDT",40,permanent +W07,Per Hansen,Operator,016,"016,017","Casting,Formwork",40,permanent +W08,Sofia Arden,Operator,017,"017","Surface treatment,Spray painting",40,permanent +W09,Magnus Stone,Operator,018,"018,019","Sheet metal,Assembly",40,permanent +W10,Elin Frank,Operator,019,"019,018","Assembly,Welding",32,permanent +W11,Victor Elm,Foreman,all,"011,012,013,014,015,016,017,018,019,021","Leadership,CE,ISO 9001",45,permanent +W12,Lena Dale,Quality Manager,015,"015","ISO 9001,SS-EN 1090,Audit",40,permanent +W13,Ahmed Hassan,Operator,011,"011","MIG/MAG",40,hired +W14,Petra Steen,Operator,012,"012,013","Surface treatment",40,hired \ No newline at end of file diff --git a/submissions/yashika-verma/level6/graph.png b/submissions/yashika-verma/level6/graph.png new file mode 100644 index 000000000..4a8d8dac1 Binary files /dev/null and b/submissions/yashika-verma/level6/graph.png differ diff --git a/submissions/yashika-verma/level6/requirements.txt b/submissions/yashika-verma/level6/requirements.txt new file mode 100644 index 000000000..cada0e464 --- /dev/null +++ b/submissions/yashika-verma/level6/requirements.txt @@ -0,0 +1,5 @@ +streamlit +neo4j +pandas +plotly +python-dotenv \ No newline at end of file diff --git a/submissions/yashika-verma/level6/seed_graph.py b/submissions/yashika-verma/level6/seed_graph.py new file mode 100644 index 000000000..b98e34a01 --- /dev/null +++ b/submissions/yashika-verma/level6/seed_graph.py @@ -0,0 +1,278 @@ +from neo4j import GraphDatabase +from dotenv import load_dotenv +import pandas as pd +import os + +load_dotenv() + +URI = os.getenv("NEO4J_URI") +USER = os.getenv("NEO4J_USER") +PASSWORD = os.getenv("NEO4J_PASSWORD") + +driver = GraphDatabase.driver( + URI, + auth=(USER, PASSWORD) +) + +# ========================= +# LOAD CSV FILES +# ========================= + +production_df = pd.read_csv( + "data/factory_production.csv" +) + +workers_df = pd.read_csv( + "data/factory_workers.csv" +) + +capacity_df = pd.read_csv( + "data/factory_capacity.csv" +) + +# ========================= +# CREATE CONSTRAINTS +# ========================= + +def create_constraints(tx): + + tx.run(""" + CREATE CONSTRAINT project_id IF NOT EXISTS + FOR (p:Project) + REQUIRE p.id IS UNIQUE + """) + + tx.run(""" + CREATE CONSTRAINT station_code IF NOT EXISTS + FOR (s:Station) + REQUIRE s.code IS UNIQUE + """) + + tx.run(""" + CREATE CONSTRAINT worker_id IF NOT EXISTS + FOR (w:Worker) + REQUIRE w.id IS UNIQUE + """) + +# ========================= +# LOAD PRODUCTION DATA +# ========================= + +def load_production(tx): + + for _, row in production_df.iterrows(): + + variance = 0 + + if row["planned_hours"] > 0: + + variance = ( + ( + row["actual_hours"] + - row["planned_hours"] + ) + / row["planned_hours"] + ) * 100 + + tx.run(""" + MERGE (p:Project { + id: $project_id + }) + + MERGE (prod:Product { + name: $product + }) + + MERGE (s:Station { + code: $station_code + }) + + SET s.name = $station_name + + MERGE (w:Week { + name: $week + }) + + MERGE (e:Etapp { + name: $etapp + }) + + MERGE (p)-[:PRODUCES]->(prod) + + MERGE (prod)-[:PROCESSED_AT]->(s) + + MERGE (p)-[:USES_STATION]->(s) + + MERGE (p)-[:PART_OF_ETAPP]->(e) + + MERGE (p)-[:SCHEDULED_IN]->(w) + + MERGE (p)-[r:SCHEDULED_AT { + week: $week + }]->(s) + + SET r.planned_hours = $planned_hours, + r.actual_hours = $actual_hours, + r.variance_percent = $variance, + r.overloaded = $variance > 3 + """, + project_id=row["project_id"], + product=row["product_type"], + station_code=row["station_code"], + station_name=row["station_name"], + week=row["week"], + etapp=row["etapp"], + planned_hours=float(row["planned_hours"]), + actual_hours=float(row["actual_hours"]), + variance=round(variance, 2) + ) + +# ========================= +# LOAD WORKERS +# ========================= + +def load_workers(tx): + + for _, row in workers_df.iterrows(): + + tx.run(""" + MERGE (w:Worker { + id: $worker_id + }) + + SET w.name = $worker_name, + w.role = $role, + w.hours_per_week = $hours_per_week, + w.type = $worker_type + + MERGE (s:Station { + name: $primary_station + }) + + MERGE (w)-[:WORKS_AT]->(s) + """, + worker_id=row["worker_id"], + worker_name=row["name"], + role=row["role"], + hours_per_week=float(row["hours_per_week"]), + worker_type=row["type"], + primary_station=row["primary_station"] + ) + + # CAN COVER STATIONS + + if pd.notna(row["can_cover_stations"]): + + stations = str( + row["can_cover_stations"] + ).split(",") + + for station in stations: + + tx.run(""" + MERGE (w:Worker { + id: $worker_id + }) + + MERGE (s:Station { + name: $station_name + }) + + MERGE (w)-[:CAN_COVER]->(s) + """, + worker_id=row["worker_id"], + station_name=station.strip() + ) + + # CERTIFICATIONS + + if pd.notna(row["certifications"]): + + certs = str( + row["certifications"] + ).split(",") + + for cert in certs: + + tx.run(""" + MERGE (w:Worker { + id: $worker_id + }) + + MERGE (c:Certification { + name: $cert_name + }) + + MERGE (w)-[:HAS_CERTIFICATION]->(c) + """, + worker_id=row["worker_id"], + cert_name=cert.strip() + ) + + # EXTRA RELATIONSHIP TYPE + + tx.run(""" + MATCH (p:Project) + WITH p LIMIT 1 + + MERGE (w:Worker { + id: $worker_id + }) + + MERGE (w)-[:ASSIGNED_TO]->(p) + """, + worker_id=row["worker_id"] + ) + +# ========================= +# LOAD CAPACITY +# ========================= + +def load_capacity(tx): + + for _, row in capacity_df.iterrows(): + + tx.run(""" + MERGE (w:Week { + name: $week + }) + + SET w.own_staff_count = $own_staff, + w.hired_staff_count = $hired_staff, + w.overtime_hours = $overtime, + w.total_capacity = $total_capacity, + w.total_planned = $total_planned, + w.deficit = $deficit + """, + week=row["week"], + own_staff=int(row["own_staff_count"]), + hired_staff=int(row["hired_staff_count"]), + overtime=float(row["overtime_hours"]), + total_capacity=float(row["total_capacity"]), + total_planned=float(row["total_planned"]), + deficit=float(row["deficit"]) + ) + +# ========================= +# EXECUTE LOAD +# ========================= + +with driver.session() as session: + + session.execute_write( + create_constraints + ) + + session.execute_write( + load_production + ) + + session.execute_write( + load_workers + ) + + session.execute_write( + load_capacity + ) + +print("Graph successfully loaded!") \ No newline at end of file