From dad7dffa408eda6dcdb2f76020ad81c8ae6a4072 Mon Sep 17 00:00:00 2001
From: JacobPEvans <20714140+JacobPEvans-personal@users.noreply.github.com>
Date: Fri, 26 Jun 2026 12:09:38 -0400
Subject: [PATCH] docs(ai): add AI orchestration stack + LLM observability
pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Document the self-hosted AI orchestration layer (n8n, Dify, LangFlow,
CrewAI, LangChain) and its LLM observability pipeline, generically and
ahead of implementation.
- ai-development/ai-orchestration-stack: the five tools, the services-vs-
libraries distinction, and a blunt "which to reach for" guide.
- observability/llm-observability: OpenLLMetry + OTEL GenAI emission, Cribl
as the single ingest hub fanning out to Langfuse (trace/cost/eval) and
Splunk (archival/SIEM); why Langfuse.
- Wire both into docs.json nav.
No homelab topology, VLANs, or addresses — concept and tool guidance only.
Assisted-by: Claude:claude-opus-4-8
Claude-Session: https://claude.ai/code/session_013KC8izFrMx32DVFduQp2tU
---
ai-development/ai-orchestration-stack.mdx | 111 ++++++++++++++++++++++
docs.json | 2 +
observability/llm-observability.mdx | 105 ++++++++++++++++++++
3 files changed, 218 insertions(+)
create mode 100644 ai-development/ai-orchestration-stack.mdx
create mode 100644 observability/llm-observability.mdx
diff --git a/ai-development/ai-orchestration-stack.mdx b/ai-development/ai-orchestration-stack.mdx
new file mode 100644
index 0000000..3f2b4d3
--- /dev/null
+++ b/ai-development/ai-orchestration-stack.mdx
@@ -0,0 +1,111 @@
+---
+title: "AI orchestration stack"
+description: "Five self-hosted tools for building and running LLM workflows — n8n, Dify, LangFlow, CrewAI, and LangChain — and the blunt rule for which one to reach for."
+tier: 2
+---
+
+> Three of these draw boxes and arrows. Two are Python libraries. Knowing which
+> is which is most of the decision.
+
+The homelab runs a self-hosted layer for building LLM workflows and agents on top
+of its [private model serving](/infrastructure/local-llm). Five tools cover the
+range from "connect an LLM to 400 other apps" to "write a multi-agent crew in
+Python." They overlap on purpose at the edges; the trick is not deploying all
+five and using none of them.
+
+## The five, at a glance
+
+| Tool | What it is | Shape | Reach for it when |
+| --- | --- | --- | --- |
+| **n8n** | General workflow automation, 400+ integrations | Visual builder (service) | You need to wire an LLM into other systems — email, calendars, webhooks, databases, SaaS APIs |
+| **Dify** | Full LLMOps platform — RAG, prompts, evals, agents, model routing | Visual builder (service) | You want a production AI app with retrieval, prompt versioning, and evaluation, low-code |
+| **LangFlow** | Visual node editor for LangChain graphs | Visual builder (service) | You want to prototype a chain by dragging nodes, then export it to Python |
+| **LangChain** | Library of composable LLM building blocks | Python library | You're writing code and want chains, tools, memory, and retrievers as primitives |
+| **CrewAI** | Framework for role-playing multi-agent "crews" | Python library | You're writing code and want agents with roles collaborating on a task |
+
+## Services vs libraries
+
+The single most useful distinction:
+
+- **n8n, Dify, and LangFlow are services.** They run as containers, expose a web
+ UI, and you build inside them. They are deployed and have an HTTPS front door.
+- **LangChain and CrewAI are libraries.** You do not "deploy" them — you `pip
+ install` them into application code. In this homelab they live together in one
+ Python execution box that runs the agent code people write against them.
+
+A common misread is treating LangChain/CrewAI as servers to stand up, or treating
+the three visual builders as interchangeable. They are not.
+
+## How they fit together
+
+{/* Shape: layered. Builders + libs sit on serving, all trace to observability. */}
+
+```mermaid
+%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
+flowchart TB
+ subgraph build [Build & run]
+ N8N([n8n])
+ DIFY([Dify])
+ LF([LangFlow])
+ AX([agent code
LangChain · CrewAI])
+ end
+ MODELS([Model providers
local Ollama · external APIs])
+ OBS([Observability
OTEL traces])
+
+ N8N --> MODELS
+ DIFY --> MODELS
+ LF --> MODELS
+ AX --> MODELS
+ N8N -.-> OBS
+ DIFY -.-> OBS
+ LF -.-> OBS
+ AX -.-> OBS
+
+ classDef svc fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
+ classDef core fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
+ classDef obs fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6;
+
+ class N8N,DIFY,LF,AX svc
+ class MODELS core
+ class OBS obs
+
+ linkStyle 4,5,6,7 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
+```
+
+Solid edges are model calls; coral dashed edges are telemetry. Every tool is
+configured to its **own** model provider per its standard install — the local
+OpenAI-compatible endpoint, an external API, or both — never forced onto a shared
+backend that belongs to another stack. Each tool's call is instrumented, and the
+traces fan out to the [LLM observability](/observability/llm-observability)
+pipeline.
+
+## Picking one — the blunt version
+
+- One tool only, and it has to be one → **Dify**. It covers RAG, prompts, evals,
+ and agents without a separate automation tool.
+- Already automating with workflows → keep **n8n** and add **Dify** as the AI
+ layer beside it.
+- Want to sketch a chain visually and walk away with Python → **LangFlow**.
+- Writing application code → **LangChain** for primitives, **CrewAI** for
+ role-based agent teams. Both are imports, not installs-as-a-service.
+
+LangFlow overlaps Dify's visual builder; it earns its place only for
+lightweight, Python-export prototyping. If that workflow isn't yours, you can run
+the other four and never miss it.
+
+## Where to go next
+
+
+
+ The private GPU model serving these tools call.
+
+
+ How every LLM call gets traced, costed, and evaluated.
+
+
+ Why the compose-based tools run as Docker-in-LXC.
+
+
+ The configuration tier that deploys these app payloads.
+
+
diff --git a/docs.json b/docs.json
index 7814ddf..0f85ffa 100644
--- a/docs.json
+++ b/docs.json
@@ -171,6 +171,7 @@
"group": "AI Development",
"pages": [
"ai-development/overview",
+ "ai-development/ai-orchestration-stack",
"ai-development/repo-boundaries",
"ai-development/ai-assistant-instructions",
"ai-development/claude-code-plugins",
@@ -232,6 +233,7 @@
"group": "Observability",
"pages": [
"observability/overview",
+ "observability/llm-observability",
{
"group": "Repos",
"pages": [
diff --git a/observability/llm-observability.mdx b/observability/llm-observability.mdx
new file mode 100644
index 0000000..afa5ecb
--- /dev/null
+++ b/observability/llm-observability.mdx
@@ -0,0 +1,105 @@
+---
+title: "LLM observability"
+description: "Every LLM call from the orchestration stack emits OpenTelemetry, routes through Cribl, and lands in both Langfuse (trace UX) and Splunk (archival + SIEM)."
+tier: 1
+---
+
+> If a model was called, there's a trace — and you can see what it cost.
+
+The [AI-coding-tool pipeline](/observability/overview) traces the IDEs. This is
+its sibling for the [AI orchestration stack](/ai-development/ai-orchestration-stack):
+n8n, Dify, LangFlow, and the agent code emit OpenTelemetry for every LLM call,
+and the same Cribl tier routes it — this time to two sinks.
+
+## Emitting traces — OpenLLMetry + OTEL GenAI
+
+Apps are instrumented with [OpenLLMetry](https://github.com/traceloop/openllmetry)
+(the Traceloop SDK), which wraps LLM providers, vector stores, and frameworks
+(LangChain, CrewAI) and emits spans following OpenTelemetry's
+[GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+Those conventions matured in 2026, so framework-native spans and SDK-emitted
+spans now line up on the same schema — prompt, completion, model, token counts,
+latency, cost.
+
+The spans leave the app over OTLP (gRPC `4317` / HTTP `4318`) pointed at the
+collector, **not** at any one backend. Keeping the emit target on the pipeline —
+not the trace store — is what lets the same telemetry reach more than one place.
+
+## Cribl is the hub
+
+A single collector tier owns ingest and fan-out. Cribl Edge runs **native
+OpenTelemetry sources**, one per signal type on its own port, so it can route by
+type without parsing payloads. From there it forks:
+
+{/* Shape: fan-out. Apps -> Cribl -> two sinks. */}
+
+```mermaid
+%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
+flowchart LR
+ Apps([Orchestration stack
OpenLLMetry])
+ Cribl([Cribl Edge
native OTEL sources])
+ LF([Langfuse
trace · cost · eval])
+ SP[(Splunk
archival · SIEM)]
+
+ Apps -->|OTLP per type| Cribl
+ Cribl -->|traces| LF
+ Cribl -->|all signals| SP
+
+ classDef app fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
+ classDef hub fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
+ classDef sink fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6;
+
+ class Apps app
+ class Cribl hub
+ class LF,SP sink
+
+ linkStyle 0 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
+ linkStyle 1,2 stroke:#4FB3A9,stroke-width:2px;
+```
+
+- **Langfuse** gets the traces. It is the LLM-native view: trace waterfalls per
+ request, token cost, prompt and completion inspection, plus datasets, evals,
+ and prompt versioning.
+- **Splunk** gets everything, for archival and correlation with the rest of the
+ homelab's telemetry — the same indexer the AI-coding pipeline already feeds.
+
+Apps never talk to a trace store directly, and they never reach across into the
+monitoring tier — they emit to the collector, and the collector decides where it
+goes. One ingest point, two sinks, no second collector to run.
+
+## Why Langfuse
+
+| Criterion | Langfuse |
+| --- | --- |
+| License | MIT — self-host with no feature gates |
+| Ingestion | Native OTLP, GenAI-convention aware |
+| Built for | LLM apps — traces, cost, evals, prompt management |
+| Footprint | Web + worker + Postgres + ClickHouse + Redis + object storage |
+
+[Laminar](https://laminar.sh/) (Apache-2.0) is the runner-up — lighter, tilted
+toward long-running agent debugging. Arize Phoenix is capable but ships under the
+Elastic License, which gates self-host use.
+
+
+Langfuse keeps its trace-of-record (relational + analytical) on durable local
+storage; its blob store points at the homelab object store. Backend choices like
+the vector store and model provider are made **per tool, per that tool's own
+standard** — never by forcing a shared backend across unrelated stacks.
+
+
+## Where to go next
+
+
+
+ The tools whose calls this pipeline traces.
+
+
+ The AI-coding-tool side of the same Cribl → Splunk spine.
+
+
+ Deploys Langfuse and the Cribl OTEL sources.
+
+
+ The models being traced.
+
+