Skip to content

KingsGambitLab/business-tech-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From Business Problem to Tech Spec

A field lab that teaches the actual skill of going from a vague founder DM ("approvals are killing us") to a buildable tech spec — with a real tools catalog, real tradeoffs, and an ADR for the biggest decision.

The lab focuses on pattern recognition: given a problem, which architectural shape, which patterns, which tools, and what skills do you need? It's the senior-engineer judgment call, not "fill out this template."

What's here

source/
  guide.md                                  Source guide (markdown)
  Problem_to_Tech_Spec_Teaching_Guide.docx  Generated from guide.md
app/
  index.html        Single-page lab (no backend, no build step)
  styles.css        All styling
  content.js        8 stages, tech catalog, Stride case, per-stage evaluators
  app.js            Renderer, evaluator-driven feedback loop, modes, persistence

Run

cd app
python3 -m http.server 5187
open http://127.0.0.1:5187

No npm, no build, no dependencies. Just a static page.

Two modes

Lab mode (coached). After each submission, an evaluator runs and produces hints if your answer isn't yet at the maturity bar — you retry until it is. Then the model answer + sensitivity scenarios reveal.

Assessment mode (graded). No hints. Submit each stage once and move on. At the end you get a graded report against the rubric.

Both modes share the same 8 stages.

The 8 stages

# Stage What you produce New skill taught
0 Receive the brief 5 clarifying questions Resist designing — ask outcome/users/non-goals first
1 Problem brief Measurable problem statement Translate "feels slow" into numbers
2 Recognize the shape Workflow / RAG / CRUD / search / realtime / batch / analytics / agent Pattern recognition (senior-engineer skill)
3 Constraints → architectural responses Map 8 constraints to 8 patterns Constraint-driven design
4 Pick the stack, layer by layer 4 layers × real tools (Trigger.dev / Postgres / Slack+email / Sentry) with rationale citing requirements Real-world tool selection
5 Skills gap + spikes Team skills inventory; 2 time-boxed spikes for highest-risk gaps Honest self-assessment, de-risking
6 NFRs as measurable scenarios Stimulus / environment / response / measure Load-testable contracts
7 Tech spec + ADR TL;DR + 5-field ADR for biggest decision Capturing decisions for future-you

Two playable cases — same 8-stage ladder

Stride (workflow shape) — 4 stack layers, 13 tools

Series-A B2B expense-management startup. The CEO DMs: "Approvals are killing us..."

You walk it up to: shape (workflow), constraints (audit / scale / cost / HITL), stack (Trigger.dev + Postgres events + Slack+email + Sentry), spikes (2 days for HITL signal, 1 day for delivery SLO), NFR (30s p95 delivery at 100/10min), TL;DR + ADR for the workflow-engine choice.

Ember (RAG shape) — 6 stack layers, 23 tools

Series-Seed AI customer-support startup. The CEO DMs: "Customers complain that the bot gives wrong pricing answers..."

You walk the same ladder, but the answers are entirely different: shape (RAG), constraints (hallucination / freshness / latency / cost / citations / PII), stack (pgvector + OpenAI 3-large + Claude Sonnet + structural+hybrid+rerank chunking + LiteLLM + Langfuse), spikes (3 days for chunking lift, 2 days for nightly eval pipeline), NFR (≥98% on golden set + <3s p95 + $0.05/answer), TL;DR + ADR for the chunking/retrieval choice.

The AI-specific coaching catches anti-patterns that bite real teams: jumping to "let's swap LLMs" before diagnosing the failure mode (80% of RAG failures are retrieval, not generation), shipping naive 500-token chunks, skipping evals, picking Voyage embeddings for general-domain content where OpenAI 3-large is cheaper at parity.

Tools catalog (real 2026 options with fit / fail / cost / risk)

Workflow shape (Stride): Trigger.dev · Inngest · Temporal · DIY Postgres+cron; Postgres-events · DynamoDB+Streams · Kafka; email-only · Slack+email pluggable · all-channels; logs-only · Sentry+Grafana/Datadog · Honeycomb/OTel.

RAG shape (Ember):

  • Retrieval: pgvector · Pinecone · Qdrant · Weaviate
  • Embedding: OpenAI text-embedding-3-large · Voyage voyage-3-large · Google text-embedding-005 · BGE-M3 self-hosted
  • LLM: Claude Sonnet 4.6 · GPT-5.x · tiered routing · open-weight via vLLM
  • Chunking: naive 500-token · structural · structural+hybrid+rerank
  • Gateway: direct SDKs · LiteLLM · Portkey · OpenRouter
  • Evals: eyeball · Langfuse · Braintrust · Arize Phoenix

Each option carries "fits when / fails when / cost / risk" — so picking is a thinking exercise, not a memory test.

Feedback design

The lab puts the onus on the learner. It scaffolds the structure of an answer (fields, dropdowns, drag-and-drop) but you bring the judgment. Empty text boxes are drop-off traps; per-layer pickers and bucketed prompts keep you moving while making you think.

On submit, the evaluator returns hints at two severity levels:

  • MUST — blocks pass. You retry.
  • TIP — soft suggestion. You can still pass but you'll know what to improve.

The reveal panels (Lab mode only) show three things:

  1. Model answer vs. your final attempt
  2. Sensitivity — what would change my picks if a constraint changed
  3. Wrong-but-common alternatives with the failure mode named

Persistence is in localStorage; reset wipes the session.

About

AI Solution Architect Lab — teaching FDEs to translate business problems into defensible AI architectures via stakeholder chat + canvas + adversarial defense, all graded by Claude judges. Live at labs.scaler.com/business-tech-labs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors