Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions ai-development/ai-orchestration-stack.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: "AI orchestration stack"
description: "Five self-hosted tools for building and running LLM workflows — n8n, Dify, LangFlow, CrewAI, and LangChain — and the blunt rule for which one to reach for."
tier: 2
---

> Three of these draw boxes and arrows. Two are Python libraries. Knowing which
> is which is most of the decision.

The homelab runs a self-hosted layer for building LLM workflows and agents on top
of its [private model serving](/infrastructure/local-llm). Five tools cover the
range from "connect an LLM to 400 other apps" to "write a multi-agent crew in
Python." They overlap on purpose at the edges; the trick is not deploying all
five and using none of them.

## The five, at a glance

| Tool | What it is | Shape | Reach for it when |
| --- | --- | --- | --- |
| **n8n** | General workflow automation, 400+ integrations | Visual builder (service) | You need to wire an LLM into other systems — email, calendars, webhooks, databases, SaaS APIs |
| **Dify** | Full LLMOps platform — RAG, prompts, evals, agents, model routing | Visual builder (service) | You want a production AI app with retrieval, prompt versioning, and evaluation, low-code |
| **LangFlow** | Visual node editor for LangChain graphs | Visual builder (service) | You want to prototype a chain by dragging nodes, then export it to Python |
| **LangChain** | Library of composable LLM building blocks | Python library | You're writing code and want chains, tools, memory, and retrievers as primitives |
| **CrewAI** | Framework for role-playing multi-agent "crews" | Python library | You're writing code and want agents with roles collaborating on a task |

## Services vs libraries

The single most useful distinction:

- **n8n, Dify, and LangFlow are services.** They run as containers, expose a web
UI, and you build inside them. They are deployed and have an HTTPS front door.
- **LangChain and CrewAI are libraries.** You do not "deploy" them — you `pip
install` them into application code. In this homelab they live together in one
Python execution box that runs the agent code people write against them.

A common misread is treating LangChain/CrewAI as servers to stand up, or treating
the three visual builders as interchangeable. They are not.

## How they fit together

{/* Shape: layered. Builders + libs sit on serving, all trace to observability. */}

```mermaid
%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
flowchart TB
subgraph build [Build & run]
N8N([n8n])
DIFY([Dify])
LF([LangFlow])
AX([agent code<br/>LangChain · CrewAI])
end
MODELS([Model providers<br/>local Ollama · external APIs])
OBS([Observability<br/>OTEL traces])

N8N --> MODELS
DIFY --> MODELS
LF --> MODELS
AX --> MODELS
N8N -.-> OBS
DIFY -.-> OBS
LF -.-> OBS
AX -.-> OBS

classDef svc fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
classDef core fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
classDef obs fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6;

class N8N,DIFY,LF,AX svc
class MODELS core
class OBS obs

linkStyle 4,5,6,7 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
```

Solid edges are model calls; coral dashed edges are telemetry. Every tool is
configured to its **own** model provider per its standard install — the local
OpenAI-compatible endpoint, an external API, or both — never forced onto a shared
backend that belongs to another stack. Each tool's call is instrumented, and the
traces fan out to the [LLM observability](/observability/llm-observability)
pipeline.

## Picking one — the blunt version

- One tool only, and it has to be one → **Dify**. It covers RAG, prompts, evals,
and agents without a separate automation tool.
- Already automating with workflows → keep **n8n** and add **Dify** as the AI
layer beside it.
- Want to sketch a chain visually and walk away with Python → **LangFlow**.
- Writing application code → **LangChain** for primitives, **CrewAI** for
role-based agent teams. Both are imports, not installs-as-a-service.

LangFlow overlaps Dify's visual builder; it earns its place only for
lightweight, Python-export prototyping. If that workflow isn't yours, you can run
the other four and never miss it.

## Where to go next

<CardGroup cols={2}>
<Card title="Local LLM" icon="microchip" href="/infrastructure/local-llm">
The private GPU model serving these tools call.
</Card>
<Card title="LLM observability" icon="chart-line" href="/observability/llm-observability">
How every LLM call gets traced, costed, and evaluated.
</Card>
<Card title="LXC vs Docker" icon="boxes-stacked" href="/infrastructure/lxc-vs-docker">
Why the compose-based tools run as Docker-in-LXC.
</Card>
<Card title="ansible-proxmox-apps" icon="screwdriver-wrench" href="/infrastructure/repos/ansible-proxmox-apps">
The configuration tier that deploys these app payloads.
</Card>
</CardGroup>
2 changes: 2 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@
"group": "AI Development",
"pages": [
"ai-development/overview",
"ai-development/ai-orchestration-stack",
"ai-development/repo-boundaries",
"ai-development/ai-assistant-instructions",
"ai-development/claude-code-plugins",
Expand Down Expand Up @@ -232,6 +233,7 @@
"group": "Observability",
"pages": [
"observability/overview",
"observability/llm-observability",
{
"group": "Repos",
"pages": [
Expand Down
105 changes: 105 additions & 0 deletions observability/llm-observability.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "LLM observability"
description: "Every LLM call from the orchestration stack emits OpenTelemetry, routes through Cribl, and lands in both Langfuse (trace UX) and Splunk (archival + SIEM)."
tier: 1
---

> If a model was called, there's a trace — and you can see what it cost.

The [AI-coding-tool pipeline](/observability/overview) traces the IDEs. This is
its sibling for the [AI orchestration stack](/ai-development/ai-orchestration-stack):
n8n, Dify, LangFlow, and the agent code emit OpenTelemetry for every LLM call,
and the same Cribl tier routes it — this time to two sinks.

## Emitting traces — OpenLLMetry + OTEL GenAI

Apps are instrumented with [OpenLLMetry](https://github.com/traceloop/openllmetry)
(the Traceloop SDK), which wraps LLM providers, vector stores, and frameworks
(LangChain, CrewAI) and emits spans following OpenTelemetry's
[GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Those conventions matured in 2026, so framework-native spans and SDK-emitted
spans now line up on the same schema — prompt, completion, model, token counts,
latency, cost.

The spans leave the app over OTLP (gRPC `4317` / HTTP `4318`) pointed at the
collector, **not** at any one backend. Keeping the emit target on the pipeline —
not the trace store — is what lets the same telemetry reach more than one place.

## Cribl is the hub

A single collector tier owns ingest and fan-out. Cribl Edge runs **native
OpenTelemetry sources**, one per signal type on its own port, so it can route by
type without parsing payloads. From there it forks:

{/* Shape: fan-out. Apps -> Cribl -> two sinks. */}

```mermaid
%%{init: {'theme':'base','look':'handDrawn','themeVariables':{'fontFamily':'Geist','fontSize':'14px','primaryColor':'#102937','primaryTextColor':'#F4EFE6','primaryBorderColor':'#4FB3A9','lineColor':'#4FB3A9','secondaryColor':'#0B1D2A','tertiaryColor':'#1A2A38','clusterBkg':'rgba(79,179,169,0.08)','clusterBorder':'#4FB3A9'}}}%%
flowchart LR
Apps([Orchestration stack<br/>OpenLLMetry])
Cribl([Cribl Edge<br/>native OTEL sources])
LF([Langfuse<br/>trace · cost · eval])
SP[(Splunk<br/>archival · SIEM)]

Apps -->|OTLP per type| Cribl
Cribl -->|traces| LF
Cribl -->|all signals| SP

classDef app fill:#102937,stroke:#E06B4A,stroke-width:2px,color:#F4EFE6;
classDef hub fill:#102937,stroke:#4FB3A9,stroke-width:2px,color:#F4EFE6;
classDef sink fill:#102937,stroke:#F4EFE6,stroke-width:1.5px,color:#F4EFE6;

class Apps app
class Cribl hub
class LF,SP sink

linkStyle 0 stroke:#E06B4A,stroke-width:2px,stroke-dasharray:4 3;
linkStyle 1,2 stroke:#4FB3A9,stroke-width:2px;
```

- **Langfuse** gets the traces. It is the LLM-native view: trace waterfalls per
request, token cost, prompt and completion inspection, plus datasets, evals,
and prompt versioning.
- **Splunk** gets everything, for archival and correlation with the rest of the
homelab's telemetry — the same indexer the AI-coding pipeline already feeds.

Apps never talk to a trace store directly, and they never reach across into the
monitoring tier — they emit to the collector, and the collector decides where it
goes. One ingest point, two sinks, no second collector to run.

## Why Langfuse

| Criterion | Langfuse |
| --- | --- |
| License | MIT — self-host with no feature gates |
| Ingestion | Native OTLP, GenAI-convention aware |
| Built for | LLM apps — traces, cost, evals, prompt management |
| Footprint | Web + worker + Postgres + ClickHouse + Redis + object storage |
Comment thread
JacobPEvans-personal marked this conversation as resolved.

[Laminar](https://laminar.sh/) (Apache-2.0) is the runner-up — lighter, tilted
toward long-running agent debugging. Arize Phoenix is capable but ships under the
Elastic License, which gates self-host use.
Comment thread
JacobPEvans-personal marked this conversation as resolved.

<Note>
Langfuse keeps its trace-of-record (relational + analytical) on durable local
storage; its blob store points at the homelab object store. Backend choices like
the vector store and model provider are made **per tool, per that tool's own
standard** — never by forcing a shared backend across unrelated stacks.
</Note>

## Where to go next

<CardGroup cols={2}>
<Card title="AI orchestration stack" icon="diagram-project" href="/ai-development/ai-orchestration-stack">
The tools whose calls this pipeline traces.
</Card>
<Card title="Observability overview" icon="chart-line" href="/observability/overview">
The AI-coding-tool side of the same Cribl → Splunk spine.
</Card>
<Card title="ansible-proxmox-apps" icon="screwdriver-wrench" href="/infrastructure/repos/ansible-proxmox-apps">
Deploys Langfuse and the Cribl OTEL sources.
</Card>
<Card title="Local LLM" icon="microchip" href="/infrastructure/local-llm">
The models being traced.
</Card>
</CardGroup>
Loading