A config-driven workflow engine for automating startup operations. Define multi-step pipelines in JSON — swap prompts, tools, guardrails, and models without code changes.
# 1. Install dependencies
uv sync
# 2. Set up environment
cp .env.example .env
# Add your OPENAI_API_KEY to .env
# 3. Run the server
uv run python main.pyServer starts at http://localhost:8000.
cp .env.example .env
# Add your OPENAI_API_KEY to .env
docker compose up --buildSubmit a workflow job. Returns immediately with a job ID.
curl -X POST http://localhost:8000/submit \
-H "Content-Type: application/json" \
-d '{
"input": "Where is my postcard? Order ORD-123",
"workflow": "support-routing",
"metadata": {"email": "customer@example.com"}
}'Response:
{"job_id": "abc-123", "status": "pending"}Poll for results. Returns classification, agent result, and full audit log.
curl http://localhost:8000/status/abc-123curl http://localhost:8000/healthPOST /submit
|
v
Load config (JSON)
|
v
Background thread runs steps sequentially:
|
+-- [validation] -- deterministic input checks
+-- [pii_detection] -- regex-based PII redaction
+-- [llm] -- classifier (structured JSON output)
+-- [guardrail] -- rule-based gate (can escalate/fail early)
+-- [agent] -- LLM agent with tool calling
+-- [action] -- direct tool execution
|
v
Result saved to SQLite + audit log
|
v
GET /status/{job_id} --> client polls for result
| Type | Purpose | Uses LLM? |
|---|---|---|
validation |
Input length/empty checks | No |
pii_detection |
Regex-based PII redaction (email, phone, SSN) | No |
llm |
Classification, extraction, scoring | Yes |
guardrail |
Rule-based gate on prior step output | No |
agent |
Tool-calling agent (retries + fallback) | Yes |
action |
Direct tool call with state-driven args | No |
Workflows are defined in pipeline/configs/. Each config specifies an ordered list of steps:
{
"id": "support-routing",
"name": "Customer Support Router",
"steps": [
{"id": "validate", "type": "validation", "config": {"max_input_length": 5000}},
{"id": "classify", "type": "llm", "config": {"prompt": "...", "model": "gpt-4o-mini"}},
{"id": "check_critical", "type": "guardrail", "config": {"field": "priority", "operator": "eq", "value": "critical", "action": "escalate"}},
{"id": "handle", "type": "agent", "config": {"system_prompt": "...", "model": "gpt-4o", "tools": ["search_faq", "save_ticket"]}}
],
"guardrails": {"max_retries": 3, "fallback_model": "gpt-4o-mini"}
}Two configs included:
support-routing.json— customer support ticket routing (Operations pillar)lead-qualify.json— inbound lead scoring and routing
uv run python -m pipeline.evalRuns 5 golden tests across both workflows. Checks keyword presence, expected tool usage, and job completion.
pipeline/
main.py # FastAPI server (POST /submit, GET /status, GET /health)
runner.py # Step-based workflow engine with handler registry
tools.py # Tool implementations + TOOL_REGISTRY
db.py # SQLite job storage with JSON audit log
schemas.py # Pydantic models
eval.py # Golden test runner
configs/ # Workflow JSON configs
evals/ # Test fixtures
tests/ # Unit tests (31 tests, no LLM calls)
With the server running, execute the demo script to see all scenarios:
uv run python demo.pyRuns 4 scenarios: order tracking, PII redaction + refund, validation failure, and lead qualification. Shows classifications, results, and audit logs with color-coded output.
uv run pytest pipeline/tests/test_core.py -vCovers DB operations, tools, validation, PII detection, guardrails, step handler registry, and config loading. All tests are deterministic (no API keys needed).
Set these in .env for LangSmith tracing (optional):
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=ls-...
All steps are also logged to the SQLite audit log with input, output, status, and latency.