diff --git a/ai-development/ai-orchestration-stack.mdx b/ai-development/ai-orchestration-stack.mdx new file mode 100644 index 0000000..3f2b4d3 --- /dev/null +++ b/ai-development/ai-orchestration-stack.mdx @@ -0,0 +1,111 @@ +--- +title: "AI orchestration stack" +description: "Five self-hosted tools for building and running LLM workflows — n8n, Dify, LangFlow, CrewAI, and LangChain — and the blunt rule for which one to reach for." +tier: 2 +--- + +> Three of these draw boxes and arrows. Two are Python libraries. Knowing which +> is which is most of the decision. + +The homelab runs a self-hosted layer for building LLM workflows and agents on top +of its [private model serving](/infrastructure/local-llm). Five tools cover the +range from "connect an LLM to 400 other apps" to "write a multi-agent crew in +Python." They overlap on purpose at the edges; the trick is not deploying all +five and using none of them. + +## The five, at a glance + +| Tool | What it is | Shape | Reach for it when | +| --- | --- | --- | --- | +| **n8n** | General workflow automation, 400+ integrations | Visual builder (service) | You need to wire an LLM into other systems — email, calendars, webhooks, databases, SaaS APIs | +| **Dify** | Full LLMOps platform — RAG, prompts, evals, agents, model routing | Visual builder (service) | You want a production AI app with retrieval, prompt versioning, and evaluation, low-code | +| **LangFlow** | Visual node editor for LangChain graphs | Visual builder (service) | You want to prototype a chain by dragging nodes, then export it to Python | +| **LangChain** | Library of composable LLM building blocks | Python library | You're writing code and want chains, tools, memory, and retrievers as primitives | +| **CrewAI** | Framework for role-playing multi-agent "crews" | Python library | You're writing code and want agents with roles collaborating on a task | + +## Services vs libraries + +The single most useful distinction: + +- **n8n, Dify, and LangFlow are services.** They run as containers, expose a web + UI, and you build inside them. They are deployed and have an HTTPS front door. +- **LangChain and CrewAI are libraries.** You do not "deploy" them — you `pip + install` them into application code. In this homelab they live together in one + Python execution box that runs the agent code people write against them. + +A common misread is treating LangChain/CrewAI as servers to stand up, or treating +the three visual builders as interchangeable. They are not. + +## How they fit together + +{/* Shape: layered. Builders + libs sit on serving, all trace to observability. */} + +```mermaid +%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%% +flowchart TB + subgraph build [Build & run] + N8N([n8n]) + DIFY([Dify]) + LF([LangFlow]) + AX([agent code
LangChain · CrewAI]) + end + MODELS([Model providers
local Ollama · external APIs]) + OBS([Observability
OTEL traces]) + + N8N --> MODELS + DIFY --> MODELS + LF --> MODELS + AX --> MODELS + N8N -.-> OBS + DIFY -.-> OBS + LF -.-> OBS + AX -.-> OBS + + classDef svc fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6; + classDef core fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6; + classDef obs fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6; + + class N8N,DIFY,LF,AX svc + class MODELS core + class OBS obs + + linkStyle 4,5,6,7 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3; +``` + +Solid edges are model calls; coral dashed edges are telemetry. Every tool is +configured to its **own** model provider per its standard install — the local +OpenAI-compatible endpoint, an external API, or both — never forced onto a shared +backend that belongs to another stack. Each tool's call is instrumented, and the +traces fan out to the [LLM observability](/observability/llm-observability) +pipeline. + +## Picking one — the blunt version + +- One tool only, and it has to be one → **Dify**. It covers RAG, prompts, evals, + and agents without a separate automation tool. +- Already automating with workflows → keep **n8n** and add **Dify** as the AI + layer beside it. +- Want to sketch a chain visually and walk away with Python → **LangFlow**. +- Writing application code → **LangChain** for primitives, **CrewAI** for + role-based agent teams. Both are imports, not installs-as-a-service. + +LangFlow overlaps Dify's visual builder; it earns its place only for +lightweight, Python-export prototyping. If that workflow isn't yours, you can run +the other four and never miss it. + +## Where to go next + + + + The private GPU model serving these tools call. + + + How every LLM call gets traced, costed, and evaluated. + + + Why the compose-based tools run as Docker-in-LXC. + + + The configuration tier that deploys these app payloads. + + diff --git a/docs.json b/docs.json index 7814ddf..0f85ffa 100644 --- a/docs.json +++ b/docs.json @@ -171,6 +171,7 @@ "group": "AI Development", "pages": [ "ai-development/overview", + "ai-development/ai-orchestration-stack", "ai-development/repo-boundaries", "ai-development/ai-assistant-instructions", "ai-development/claude-code-plugins", @@ -232,6 +233,7 @@ "group": "Observability", "pages": [ "observability/overview", + "observability/llm-observability", { "group": "Repos", "pages": [ diff --git a/observability/llm-observability.mdx b/observability/llm-observability.mdx new file mode 100644 index 0000000..afa5ecb --- /dev/null +++ b/observability/llm-observability.mdx @@ -0,0 +1,105 @@ +--- +title: "LLM observability" +description: "Every LLM call from the orchestration stack emits OpenTelemetry, routes through Cribl, and lands in both Langfuse (trace UX) and Splunk (archival + SIEM)." +tier: 1 +--- + +> If a model was called, there's a trace — and you can see what it cost. + +The [AI-coding-tool pipeline](/observability/overview) traces the IDEs. This is +its sibling for the [AI orchestration stack](/ai-development/ai-orchestration-stack): +n8n, Dify, LangFlow, and the agent code emit OpenTelemetry for every LLM call, +and the same Cribl tier routes it — this time to two sinks. + +## Emitting traces — OpenLLMetry + OTEL GenAI + +Apps are instrumented with [OpenLLMetry](https://github.com/traceloop/openllmetry) +(the Traceloop SDK), which wraps LLM providers, vector stores, and frameworks +(LangChain, CrewAI) and emits spans following OpenTelemetry's +[GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). +Those conventions matured in 2026, so framework-native spans and SDK-emitted +spans now line up on the same schema — prompt, completion, model, token counts, +latency, cost. + +The spans leave the app over OTLP (gRPC `4317` / HTTP `4318`) pointed at the +collector, **not** at any one backend. Keeping the emit target on the pipeline — +not the trace store — is what lets the same telemetry reach more than one place. + +## Cribl is the hub + +A single collector tier owns ingest and fan-out. Cribl Edge runs **native +OpenTelemetry sources**, one per signal type on its own port, so it can route by +type without parsing payloads. From there it forks: + +{/* Shape: fan-out. Apps -> Cribl -> two sinks. */} + +```mermaid +%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%% +flowchart LR + Apps([Orchestration stack
OpenLLMetry]) + Cribl([Cribl Edge
native OTEL sources]) + LF([Langfuse
trace · cost · eval]) + SP[(Splunk
archival · SIEM)] + + Apps -->|OTLP per type| Cribl + Cribl -->|traces| LF + Cribl -->|all signals| SP + + classDef app fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6; + classDef hub fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6; + classDef sink fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6; + + class Apps app + class Cribl hub + class LF,SP sink + + linkStyle 0 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3; + linkStyle 1,2 stroke:#4FB3A9,stroke-width:2px; +``` + +- **Langfuse** gets the traces. It is the LLM-native view: trace waterfalls per + request, token cost, prompt and completion inspection, plus datasets, evals, + and prompt versioning. +- **Splunk** gets everything, for archival and correlation with the rest of the + homelab's telemetry — the same indexer the AI-coding pipeline already feeds. + +Apps never talk to a trace store directly, and they never reach across into the +monitoring tier — they emit to the collector, and the collector decides where it +goes. One ingest point, two sinks, no second collector to run. + +## Why Langfuse + +| Criterion | Langfuse | +| --- | --- | +| License | MIT — self-host with no feature gates | +| Ingestion | Native OTLP, GenAI-convention aware | +| Built for | LLM apps — traces, cost, evals, prompt management | +| Footprint | Web + worker + Postgres + ClickHouse + Redis + object storage | + +[Laminar](https://laminar.sh/) (Apache-2.0) is the runner-up — lighter, tilted +toward long-running agent debugging. Arize Phoenix is capable but ships under the +Elastic License, which gates self-host use. + + +Langfuse keeps its trace-of-record (relational + analytical) on durable local +storage; its blob store points at the homelab object store. Backend choices like +the vector store and model provider are made **per tool, per that tool's own +standard** — never by forcing a shared backend across unrelated stacks. + + +## Where to go next + + + + The tools whose calls this pipeline traces. + + + The AI-coding-tool side of the same Cribl → Splunk spine. + + + Deploys Langfuse and the Cribl OTEL sources. + + + The models being traced. + +