diff --git a/docs/orchestration/overview.md b/docs/orchestration/overview.md index d99ab04..8888df3 100644 --- a/docs/orchestration/overview.md +++ b/docs/orchestration/overview.md @@ -2,12 +2,12 @@ sidebar_position: 1 sidebar_label: "Orchestration" slug: /orchestration/overview -description: "kubeswarm orchestration modes - pipeline DAG, dynamic delegation and LLM-routed dispatch for agent teams on Kubernetes." +description: "kubeswarm orchestration modes - pipeline DAG, dynamic delegation, LLM-routed dispatch and tree search for agent teams on Kubernetes." --- # Orchestration -A kubeswarm SwarmTeam composes multiple AI agents into a workflow on Kubernetes. Three execution modes let you choose the right orchestration pattern for your use case. +A kubeswarm SwarmTeam composes multiple AI agents into a workflow on Kubernetes. Four execution modes let you choose the right orchestration pattern for your use case. ## Pipeline Mode {#pipeline} @@ -77,6 +77,29 @@ spec: No pipeline, no roles - the team adapts to what each request needs. +## Search Mode {#search} + +A tree search where a planner agent explores multiple approaches, an evaluator scores each one, and weak branches are pruned. BFS and BeamSearch strategies with declarative convergence criteria. + +```yaml +spec: + roles: + - name: planner + model: claude-sonnet-4-6 + - name: worker + model: claude-sonnet-4-6 + + search: + strategy: BFS + plannerRole: planner + executorRole: worker + initialPrompt: "{{ .input.problem }}" + minScorePercent: 85 + maxDepth: 3 +``` + +The planner sees the current tree state and outputs structured actions: expand (create branches), prune (kill dead ends), converge (declare a winner). See [Search Mode](/orchestration/search) for details. + ## Execution Records Every run creates a `SwarmRun` - an immutable execution record with inputs, outputs, token counts and phase transitions. SwarmRun is to SwarmTeam what Job is to CronJob. diff --git a/docs/orchestration/runs.md b/docs/orchestration/runs.md index 7e4eabd..ddaa557 100644 --- a/docs/orchestration/runs.md +++ b/docs/orchestration/runs.md @@ -86,3 +86,18 @@ kubectl get swrun -w kubectl get swrun my-run -o jsonpath='{.status.output}' kubectl describe swrun my-run ``` + +### Search runs + +Search-mode runs store the full tree in `status.searchTree`: + +```bash +# View tree nodes with scores and phases +kubectl get swrun my-search -o jsonpath='{.status.searchTree.nodes}' | \ + jq '.[] | {id, depth, phase, scoreMillis, task}' + +# Check why the search terminated +kubectl get swrun my-search -o jsonpath='{.status.searchTree.terminationReason}' +``` + +See [Search Mode](/orchestration/search) for the full configuration reference. diff --git a/docs/orchestration/search.md b/docs/orchestration/search.md new file mode 100644 index 0000000..e9d2e97 --- /dev/null +++ b/docs/orchestration/search.md @@ -0,0 +1,204 @@ +--- +sidebar_position: 6 +sidebar_label: "Search Mode" +description: "Tree-based search orchestration for kubeswarm. Explore multiple approaches, score results, prune dead ends and converge on the best solution using BFS or BeamSearch." +--- + +# Search Mode + +Search mode lets a SwarmTeam explore a solution space using tree search strategies. Instead of a fixed pipeline DAG, the team becomes a search tree that expands dynamically based on agent outputs. A planner agent decides branching and pruning, executor agents do the work, and an optional evaluator scores results. + +## When to use search + +Use search mode when the problem has multiple valid approaches and you need to systematically find the best one: + +- **Multi-hypothesis research** - explore several theories, prune disproven ones, go deeper on promising leads +- **Code generation with backtracking** - try different algorithms, backtrack from failures +- **Adversarial testing** - generate attack variants, score each, deepen on the most effective +- **Optimization** - explore prompt variants, score each on a test set, converge on the winner + +Use [pipeline mode](/orchestration/pipelines) when the steps are predetermined. Use [dynamic mode](/orchestration/overview#dynamic) when agents should self-organize without scoring. + +## Three roles + +| Role | Required | Purpose | +|------|----------|---------| +| **Planner** | Yes | Sees the current tree state and decides what to explore, prune, or converge on | +| **Executor** | Yes | Executes each node's task and produces output | +| **Evaluator** | No | Scores executor output on a 0-1000 scale. Required for BeamSearch | + +When no evaluator is set, the planner self-scores by including `scoreMillis` in its actions. + +## Strategies + +### BFS (Breadth-First Search) + +Expand all nodes at the current depth before going deeper. Best for exhaustive exploration when depth is bounded. + +```yaml +spec: + roles: + - name: planner + model: claude-sonnet-4-6 + prompt: + inline: | + You receive the current search tree as JSON. Decide what to explore next. + Respond ONLY with a JSON array of actions: + [{"action": "expand"|"prune"|"converge", "parentNode": , "task": "", "scoreMillis": <0-1000>, "reason": ""}] + - name: worker + model: claude-sonnet-4-6 + prompt: + inline: "Execute the given task thoroughly." + + search: + strategy: BFS + plannerRole: planner + executorRole: worker + initialPrompt: "{{ .input.problem }}" + maxDepth: 3 + maxNodes: 15 + minScorePercent: 85 +``` + +### BeamSearch + +Keep only the top K nodes at each depth level and prune the rest. Requires an evaluator role because beam pruning depends on score ordering. + +```yaml +spec: + roles: + - name: investigator + model: claude-sonnet-4-6 + - name: tester + model: claude-sonnet-4-6 + - name: judge + model: claude-haiku-4-5-20251001 + + search: + strategy: BeamSearch + plannerRole: investigator + executorRole: tester + evaluatorRole: judge + initialPrompt: "{{ .input.incident }}" + beamWidth: 3 + minScorePercent: 85 + maxDepth: 5 + maxNodes: 30 + maxParallel: 3 +``` + +The evaluator returns structured JSON scores: + +```json +{ + "scoreMillis": 720, + "reasoning": "Correct approach but missing edge case handling", + "shouldPrune": false, + "metadata": {"evidence_strength": "moderate"} +} +``` + +`scoreMillis` is 0-1000 (milli-units). `shouldPrune` lets the evaluator flag dead ends without waiting for the planner. `metadata` carries domain-specific signals. + +## Convergence + +Every search is bounded. The first condition hit terminates the search: + +| Criterion | Field | Description | +|-----------|-------|-------------| +| Score threshold | `minScorePercent` | A node scores above this percentage (0-100) | +| Depth limit | `maxDepth` | Tree reaches this depth | +| Node limit | `maxNodes` | Total nodes created (max 200) | +| Iteration limit | `maxIterations` | Planner invoked this many times | +| Budget exhaustion | via `budgetRef` | SwarmBudget hard stop | +| Planner decision | `converge` action | Planner explicitly declares a solution | + +When the search terminates, the highest-scoring node's output becomes the SwarmRun output. If no node meets `minScorePercent`, the run reports `SearchExhausted`. + +## Configuration reference + +```yaml +search: + strategy: BFS # BFS or BeamSearch + plannerRole: planner # must match a role in spec.roles + executorRole: executor # must match a role in spec.roles + evaluatorRole: judge # optional (required for BeamSearch) + initialPrompt: "{{ .input.problem }}" + + # Convergence limits + minScorePercent: 85 # 0-100, triggers convergence when exceeded + maxDepth: 10 # default 10, 0 = no limit + maxNodes: 50 # default 50, max 200 + maxIterations: 20 # default 20, 0 = no limit + + # Execution + maxParallel: 3 # concurrent executor tasks per level + maxOutputBytes: 4096 # truncate node output (default 4096, max 8192) + + # BeamSearch only + beamWidth: 3 # nodes kept per depth level + + # Retry and resilience + maxPlannerRetries: 2 # retries on invalid planner JSON + maxEvaluatorRetries: 2 # retries on unparseable evaluator output + stagnationThreshold: 5 # iterations without score improvement before warning + plannerTimeoutSeconds: 120 # stale planner detection +``` + +## Observability + +### Tree state + +The full search tree is stored in `SwarmRun.status.searchTree`: + +```bash +# View tree nodes with scores +kubectl get swrun my-search -o jsonpath='{.status.searchTree.nodes}' | \ + jq '.[] | {id, depth, phase, scoreMillis, task}' + +# Check termination reason +kubectl get swrun my-search -o jsonpath='{.status.searchTree.terminationReason}' + +# Node and iteration counts +kubectl get swrun my-search -o jsonpath='{.status.searchTree.iterations}' +``` + +### Kubernetes events + +| Event | When | +|-------|------| +| `SearchExpanded` | New nodes created | +| `SearchPruned` | Dead ends pruned | +| `SearchConverged` | Solution found | +| `SearchExhausted` | No solution within limits | +| `EvaluatorParseFailed` | Evaluator output could not be parsed | +| `PlannerValidationFailed` | Planner output failed JSON validation | +| `StagnationDetected` | Best score flat for N iterations | + +### OTel metrics + +| Metric | Type | Description | +|--------|------|-------------| +| `kubeswarm.search.nodes.created` | Counter | Nodes created | +| `kubeswarm.search.nodes.pruned` | Counter | Nodes pruned | +| `kubeswarm.search.node.score` | Histogram | Score distribution (0-1000) | +| `kubeswarm.search.iterations` | Counter | Planner invocations | +| `kubeswarm.search.best_score` | Gauge | Current best score | +| `kubeswarm.search.evaluator.parse_failures` | Counter | Evaluator parse errors | +| `kubeswarm.search.planner.validation_failures` | Counter | Planner validation errors | +| `kubeswarm.search.stagnation_iterations` | Gauge | Iterations without improvement | + +## Cost optimization + +Search involves multiple LLM calls per iteration. Use different models for each role: + +- **Planner**: expensive reasoning model (runs once per iteration) +- **Executor**: mid-tier model (runs once per node, parallel) +- **Evaluator**: cheap fast model (runs once per node, scoring is simple) + +A BeamSearch with `beamWidth=3`, `maxDepth=5`, and `maxParallel=3` makes roughly 35 total LLM calls. Most of the cost is in the executor. + +## Examples + +- [Cookbook recipe 18 - Search Brainstorm](https://github.com/kubeswarm/kubeswarm-cookbook/tree/main/recipes/18-search-brainstorm) (BFS, planner self-scores) +- [Cookbook recipe 19 - Root Cause Analyzer](https://github.com/kubeswarm/kubeswarm-cookbook/tree/main/recipes/19-search-root-cause) (BeamSearch with evaluator) diff --git a/docs/reference/api.md b/docs/reference/api.md index 25f4a18..2620f10 100644 --- a/docs/reference/api.md +++ b/docs/reference/api.md @@ -1640,6 +1640,121 @@ _Appears in:_ | `cluster-wide` | RegistryScopeCluster indexes all SwarmAgents cluster-wide. Requires a ClusterRole
that grants cross-namespace SwarmAgent reads.
| + + +#### SearchNodePhase + +_Underlying type:_ _string_ + +SearchNodePhase tracks the lifecycle of a single search tree node. + +_Validation:_ +- Enum: [Pending Running Scored Pruned EvalFailed Solution] + +_Appears in:_ +- [SearchNodeStatus](#searchnodestatus) + +| Field | Description | +| --- | --- | +| `Pending` | | +| `Running` | | +| `Scored` | | +| `Pruned` | | +| `EvalFailed` | | +| `Solution` | | + + + + +#### SearchNodeStatus + + + +SearchNodeStatus records the state of a single search tree node. +Stored in SwarmRun status (Phase 2). + + + +_Appears in:_ +- [SearchTreeStatus](#searchtreestatus) + +| Field | Description | Default | Validation | +| --- | --- | --- | --- | +| `id` _integer_ | ID is the zero-based index of this node in the tree. | | | +| `parentID` _integer_ | ParentID is the ID of the parent node. Nil for the root node. | | | +| `depth` _integer_ | Depth is the tree depth of this node (root = 0). | | | +| `task` _string_ | Task is the task description assigned to this node. | | | +| `output` _string_ | Output is the executor's response for this node. | | | +| `scoreMillis` _integer_ | ScoreMillis is the quality score (0-1000) assigned by the evaluator or planner. | | Maximum: 1000
Minimum: 0
| +| `scoreReasoning` _string_ | ScoreReasoning is the evaluator's explanation for the score. | | | +| `phase` _[SearchNodePhase](#searchnodephase)_ | Phase tracks this node's lifecycle. | | Enum: [Pending Running Scored Pruned EvalFailed Solution]
| +| `taskID` _string_ | TaskID is the queue task ID for the in-flight executor or evaluator call.
Empty when the node is not waiting for a result. | | Optional: true
| +| `tokenUsage` _[TokenUsage](#tokenusage)_ | TokenUsage records tokens consumed by the executor for this node. | | | + + +#### SearchStrategy + +_Underlying type:_ _string_ + +SearchStrategy selects the tree exploration algorithm. + +_Validation:_ +- Enum: [BFS BeamSearch] + +_Appears in:_ +- [SwarmTeamSearchSpec](#swarmteamsearchspec) + +| Field | Description | +| --- | --- | +| `BFS` | | +| `BeamSearch` | | + + +#### SearchTerminationReason + +_Underlying type:_ _string_ + +SearchTerminationReason explains why the search stopped. + +_Validation:_ +- Enum: [MinScoreReached MaxDepthReached MaxNodesReached MaxIterationsReached BudgetExhausted PlannerConverged PlannerFailure SearchCancelled] + +_Appears in:_ +- [SearchTreeStatus](#searchtreestatus) + +| Field | Description | +| --- | --- | +| `MinScoreReached` | | +| `MaxDepthReached` | | +| `MaxNodesReached` | | +| `MaxIterationsReached` | | +| `BudgetExhausted` | | +| `PlannerConverged` | | +| `PlannerFailure` | | +| `SearchCancelled` | | + + +#### SearchTreeStatus + + + +SearchTreeStatus captures the full state of a search tree. +Stored in SwarmRun status (Phase 2). + + + +_Appears in:_ +- [SwarmRunStatus](#swarmrunstatus) + +| Field | Description | Default | Validation | +| --- | --- | --- | --- | +| `nodes` _[SearchNodeStatus](#searchnodestatus) array_ | Nodes is the ordered list of all tree nodes. | | | +| `iterations` _integer_ | Iterations is the number of planner invocations completed so far. | | | +| `solutionNodeID` _integer_ | SolutionNodeID is the ID of the node selected as the final answer. | | | +| `terminationReason` _[SearchTerminationReason](#searchterminationreason)_ | TerminationReason explains why the search stopped. | | Enum: [MinScoreReached MaxDepthReached MaxNodesReached MaxIterationsReached BudgetExhausted PlannerConverged PlannerFailure SearchCancelled]
| +| `lastPlannerIteration` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#time-v1-meta)_ | LastPlannerIteration is the timestamp of the most recent planner invocation. | | | + + #### SlackChannelSpec @@ -2455,6 +2570,7 @@ _Appears in:_ | `roles` _[SwarmTeamRole](#swarmteamrole) array_ | Roles is a snapshot of the SwarmTeam role definitions at trigger time.
Empty for routed-mode runs. | | MaxItems: 50
Optional: true
| | `output` _string_ | Output is a Go template expression that selects the final run result.
Example: "\{\{ .steps.summarize.output \}\}"
For routed-mode runs this defaults to "\{\{ .steps.route.output \}\}" at trigger time. | | Optional: true
| | `routing` _[SwarmTeamRoutingSpec](#swarmteamroutingspec)_ | Routing is a snapshot of the SwarmTeam routing config at trigger time.
Set when the team operates in routed mode. Mutually exclusive with Pipeline. | | Optional: true
| +| `search` _[SwarmTeamSearchSpec](#swarmteamsearchspec)_ | Search is a snapshot of the SwarmTeam search config at trigger time.
Set when the team operates in search mode. Mutually exclusive with Pipeline and Routing. | | Optional: true
| | `timeoutSeconds` _integer_ | TimeoutSeconds is the maximum wall-clock seconds this run may take.
Zero means no timeout. | | Minimum: 1
Optional: true
| | `maxTokens` _integer_ | MaxTokens is the total token budget for this run across all steps.
Zero means no limit. | | Minimum: 1
Optional: true
| @@ -2479,6 +2595,7 @@ _Appears in:_ | `completionTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#time-v1-meta)_ | CompletionTime is when this run reached a terminal phase (Succeeded or Failed). | | | | `totalTokenUsage` _[TokenUsage](#tokenusage)_ | TotalTokenUsage is the sum of token usage across all steps in this run. | | | | `totalCostUSD` _string_ | TotalCostUSD is the estimated total dollar cost of this run, summed across
all steps using the operator's configured CostProvider. Decimal string. | | | +| `searchTree` _[SearchTreeStatus](#searchtreestatus)_ | SearchTree holds the search tree state for search-mode runs.
Only populated when spec.search is set. | | Optional: true
| | `observedGeneration` _integer_ | ObservedGeneration is the .metadata.generation this status reflects. | | | | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#condition-v1-meta) array_ | Conditions reflect the current state of the SwarmRun. | | | @@ -2798,11 +2915,46 @@ _Appears in:_ | `afterSeconds` _integer_ | AfterSeconds is how long a role must be idle (no active steps) before it is scaled to zero.
Minimum 30. Defaults to 300 (5 minutes). | 300 | Minimum: 30
Optional: true
| +#### SwarmTeamSearchSpec + + + +SwarmTeamSearchSpec configures search tree orchestration mode. + + + +_Appears in:_ +- [SwarmRunSpec](#swarmrunspec) +- [SwarmTeamSpec](#swarmteamspec) + +| Field | Description | Default | Validation | +| --- | --- | --- | --- | +| `strategy` _[SearchStrategy](#searchstrategy)_ | Strategy selects the tree exploration algorithm. | | Enum: [BFS BeamSearch]
Required: true
| +| `plannerRole` _string_ | PlannerRole is the role name of the agent that decides branching/pruning. | | MinLength: 1
Required: true
| +| `executorRole` _string_ | ExecutorRole is the role name of the agent that executes each node task. | | MinLength: 1
Required: true
| +| `evaluatorRole` _string_ | EvaluatorRole is the role name of the agent that scores node outputs.
When omitted, the planner must include scoreMillis in expand/converge actions.
Required when strategy is BeamSearch. | | Optional: true
| +| `initialPrompt` _string_ | InitialPrompt is the root task description. Go template with .input access. | | Required: true
| +| `minScorePercent` _integer_ | MinScorePercent is the quality threshold (0-100). A node scoring above
this percent (mapped to millis internally) triggers convergence. | | Maximum: 100
Minimum: 0
Optional: true
| +| `maxDepth` _integer_ | MaxDepth limits the tree depth. 0 means no limit. | 10 | Minimum: 0
Optional: true
| +| `maxNodes` _integer_ | MaxNodes limits the total number of tree nodes. Budget protection.
Hard-capped at 200 to stay within etcd value size limits. | 50 | Maximum: 200
Minimum: 1
Optional: true
| +| `maxOutputBytes` _integer_ | MaxOutputBytes limits the size of each node's output and task fields.
Default 4096. With MaxNodes=200, worst case is 200*4KB = 800KB (safe for etcd). | 4096 | Maximum: 16384
Minimum: 256
Optional: true
| +| `maxIterations` _integer_ | MaxIterations limits planner invocations. 0 means no limit. | 20 | Minimum: 0
Optional: true
| +| `maxParallel` _integer_ | MaxParallel is the maximum concurrent executor tasks per level. | 3 | Minimum: 1
Optional: true
| +| `beamWidth` _integer_ | BeamWidth is the number of top-scoring nodes kept per depth level.
Only used when strategy is BeamSearch. Defaults to 3 at runtime when
strategy is BeamSearch and beamWidth is not set. No CRD default so
the CEL mutual exclusion rule can distinguish "not set" from "set". | | Minimum: 1
Optional: true
| +| `maxPlannerRetries` _integer_ | MaxPlannerRetries is the number of retry attempts when the planner
produces invalid JSON output. | 2 | Minimum: 0
Optional: true
| +| `maxEvaluatorRetries` _integer_ | MaxEvaluatorRetries is the number of retry attempts when the evaluator
produces unparseable output. After exhaustion, the node is marked EvalFailed. | 2 | Minimum: 0
Optional: true
| +| `stagnationThreshold` _integer_ | StagnationThreshold is the number of consecutive planner iterations without
improvement to the best score before a StagnationDetected event is emitted. | 5 | Minimum: 1
Optional: true
| +| `plannerTimeoutSeconds` _integer_ | PlannerTimeoutSeconds is the maximum age (in seconds) of lastPlannerIteration
before the reconciler considers the planner stale and re-invokes it. | 120 | Minimum: 10
Optional: true
| + + #### SwarmTeamSpec SwarmTeamSpec defines the desired state of SwarmTeam. +NOTE: plannerRole, executorRole, and evaluatorRole existence checks are +performed in the admission webhook (ValidateSearchConfig) instead of CEL +because roles.exists() traversal exceeds the CRD CEL cost budget. @@ -2830,6 +2982,7 @@ _Appears in:_ | `artifactStore` _[ArtifactStoreSpec](#artifactstorespec)_ | ArtifactStore configures where pipeline file artifacts are stored.
When unset, file artifact support is disabled and any OutputArtifacts
declarations on pipeline steps are ignored. | | Optional: true
| | `autoscaling` _[SwarmTeamAutoscaling](#swarmteamautoscaling)_ | Autoscaling configures demand-driven replica scaling for this team's inline agents.
When enabled, the operator scales each role's managed SwarmAgent between 0 and its
configured replica count based on the number of active pipeline steps for that role.
Only applies to inline roles (those with model+systemPrompt); external SwarmAgent references
are not scaled by the team controller. | | Optional: true
| | `routing` _[SwarmTeamRoutingSpec](#swarmteamroutingspec)_ | Routing configures routed mode. When set, the team operates in routed mode:
tasks are dispatched automatically via an LLM router call against SwarmRegistry.
Mutually exclusive with spec.pipeline and spec.roles. | | Optional: true
| +| `search` _[SwarmTeamSearchSpec](#swarmteamsearchspec)_ | Search configures search tree orchestration mode.
Mutually exclusive with Pipeline and Routing. | | Optional: true
| #### SwarmTeamStatus @@ -2901,6 +3054,7 @@ TokenUsage records the number of tokens consumed by a single step or the whole p _Appears in:_ - [PipelineStepStatus](#pipelinestepstatus) +- [SearchNodeStatus](#searchnodestatus) - [SwarmAgentStatus](#swarmagentstatus) - [SwarmRunStatus](#swarmrunstatus)