Skip to content

Budget Controls

skobeltsyn edited this page Mar 28, 2026 · 1 revision

Budget Controls

Prevent runaway LLM loops with turn-based budgets.


Why Budgets Exist

The agentic loop works like this: the LLM reasons, calls tools, receives results, reasons again, calls more tools, and so on until it produces a final text answer. But what if it never produces that final answer?

Without a budget, an agent can:

  • Loop indefinitely, calling the same tool with slightly different arguments.
  • Burn through API credits or local compute on a single request.
  • Block a thread forever in a synchronous execution model.
  • Amplify errors: each failed tool call leads to another attempt, which fails again.

Budgets are the guardrail. They set a hard upper bound on how many times the LLM can be called within a single agent invocation.


Configuration

Set a budget with the budget {} DSL block inside your agent definition:

val agent = agent<String, String>("researcher") {
    model { ollama("qwen2.5:7b") }

    budget {
        maxTurns = 10
    }

    skills {
        skill<String, String>("research", "Research a topic using tools") {
            tools("search", "summarize")
            // ... tool definitions
        }
    }
}

BudgetConfig

The DSL produces a BudgetConfig data class:

data class BudgetConfig(
    val maxTurns: Int = Int.MAX_VALUE
)

The default is Int.MAX_VALUE -- effectively unlimited. In production, you should always set an explicit limit.


BudgetExceededException

When the agent reaches its turn limit, the framework throws BudgetExceededException:

import agents_engine.core.BudgetExceededException

try {
    val result = agent("Analyze all 10,000 files in the repository")
} catch (e: BudgetExceededException) {
    println("Agent ran out of turns: ${e.message}")
    // Handle gracefully: return partial result, notify user, etc.
}

The exception is thrown before the next LLM call would happen. This means:

  • All previous tool calls have completed.
  • All previous LLM responses are intact.
  • The agent's message history is available up to the point of termination.

Catching in Pipelines

In a then pipeline, BudgetExceededException propagates like any other exception:

val pipeline = parse then analyze then summarize

try {
    pipeline(input)
} catch (e: BudgetExceededException) {
    // Which agent exceeded its budget? Check the message.
    println(e.message)  // "Agent 'analyze' exceeded budget of 10 turns"
}

Counting Turns

A turn is one LLM request-response cycle. Here is how turns map to the agentic loop:

Turn 1: LLM receives [system, user] -> returns ToolCalls([search("kotlin agents")])
         Framework executes search, appends tool result

Turn 2: LLM receives [system, user, assistant(toolcalls), tool(result)] -> returns ToolCalls([summarize(...)])
         Framework executes summarize, appends tool result

Turn 3: LLM receives [system, user, assistant, tool, assistant, tool] -> returns Text("Here is the summary...")
         Done. 3 turns used.

Key points:

  • Each call to ModelClient.chat() is one turn.
  • Multiple tool calls in a single LLM response count as one turn (the LLM made one request that happened to include multiple tool calls).
  • The final Text response also counts as a turn.
  • Tool execution itself does not count -- only the LLM call does.

Example: Turn Counting

val agent = agent<String, String>("counter-demo") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 3 }

    skills {
        skill<String, String>("work", "Do work") {
            tools("step_a", "step_b")
            tool("step_a", "First step") { args -> "result_a" }
            tool("step_b", "Second step") { args -> "result_b" }
        }
    }
}

If the LLM's behavior is:

  • Turn 1: calls step_a and step_b together -> 1 turn
  • Turn 2: calls step_a again -> 1 turn
  • Turn 3: returns text "Done" -> 1 turn

Total: 3 turns. Exactly at the limit. If the LLM tried a 4th call, it would throw.


Best Practices

1. Always Set a Budget in Production

// Don't do this in production
val agent = agent<String, String>("risky") {
    model { ollama("qwen2.5:7b") }
    // No budget -- defaults to Int.MAX_VALUE
    // ...
}

// Do this instead
val agent = agent<String, String>("safe") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 15 }
    // ...
}

2. Budget by Task Complexity

Match your budget to the expected number of tool calls:

Task Type Typical Turns Suggested Budget
Single tool call + answer 2 3-5
Multi-step analysis (3-5 tools) 4-6 8-10
Complex research (many tools, iteration) 8-15 15-20
Open-ended exploration 10-30 25-30

Leave headroom above the expected turns. The LLM might need an extra turn to correct a mistake or rephrase its answer.

3. Separate Budgets for Nested Agents

When agents are composed via structure {}, each has its own budget. A parent agent with maxTurns = 10 does not share that budget with its children:

val researcher = agent<String, String>("researcher") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 20 }   // generous budget for deep research
    // ...
}

val summarizer = agent<String, String>("summarizer") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 3 }    // tight budget: should be quick
    // ...
}

val pipeline = researcher then summarizer
// researcher gets 20 turns, summarizer gets 3 -- independent

4. Use Low Budgets for Repair Agents

Tool Error Recovery repair agents should have tight budgets. A repair agent that loops is worse than the original error:

val jsonFixer = agent<String, String>("json-fixer") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 1 }    // single-shot: one LLM call, no tools
    // ...
}

5. Test Budget Boundaries

Write tests that verify your agent completes within its budget:

@Test
fun `agent completes within budget`() {
    var turnCount = 0
    val mockClient = ModelClient { messages ->
        turnCount++
        if (turnCount < 3) {
            LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
        } else {
            LlmResponse.Text("done")
        }
    }

    val agent = agent<String, String>("test") {
        model { ollama("unused"); client = mockClient }
        budget { maxTurns = 5 }
        skills {
            skill<String, String>("work", "Work") {
                tools("step")
                tool("step", "A step") { "ok" }
            }
        }
    }

    val result = agent("go")
    assertEquals("done", result)
    assertEquals(3, turnCount)  // completed in 3 turns, well within budget of 5
}

@Test
fun `agent throws when budget exceeded`() {
    val mockClient = ModelClient { _ ->
        // Never returns Text -- always calls tools
        LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
    }

    val agent = agent<String, String>("test") {
        model { ollama("unused"); client = mockClient }
        budget { maxTurns = 3 }
        skills {
            skill<String, String>("work", "Work") {
                tools("step")
                tool("step", "A step") { "ok" }
            }
        }
    }

    assertThrows<BudgetExceededException> {
        agent("go")
    }
}

6. Log Budget Usage

Combine budgets with Observability Hooks to track how many turns agents actually use:

var turns = 0

val agent = agent<String, String>("monitored") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 10 }

    onToolUse { name, args, result ->
        turns++
        println("Turn $turns: $name")
    }

    // ...
}

agent("input")
println("Total turns used: $turns")

This data helps you tune budgets over time. If an agent consistently uses 3 turns, a budget of 20 is wasteful -- tighten it to 5 to catch regressions early.


Next Steps

Clone this wiki locally