-
Notifications
You must be signed in to change notification settings - Fork 0
Budget Controls
Prevent runaway LLM loops with turn-based budgets.
The agentic loop works like this: the LLM reasons, calls tools, receives results, reasons again, calls more tools, and so on until it produces a final text answer. But what if it never produces that final answer?
Without a budget, an agent can:
- Loop indefinitely, calling the same tool with slightly different arguments.
- Burn through API credits or local compute on a single request.
- Block a thread forever in a synchronous execution model.
- Amplify errors: each failed tool call leads to another attempt, which fails again.
Budgets are the guardrail. They set a hard upper bound on how many times the LLM can be called within a single agent invocation.
Set a budget with the budget {} DSL block inside your agent definition:
val agent = agent<String, String>("researcher") {
model { ollama("qwen2.5:7b") }
budget {
maxTurns = 10
}
skills {
skill<String, String>("research", "Research a topic using tools") {
tools("search", "summarize")
// ... tool definitions
}
}
}The DSL produces a BudgetConfig data class:
data class BudgetConfig(
val maxTurns: Int = Int.MAX_VALUE
)The default is Int.MAX_VALUE -- effectively unlimited. In production, you should always set an explicit limit.
When the agent reaches its turn limit, the framework throws BudgetExceededException:
import agents_engine.core.BudgetExceededException
try {
val result = agent("Analyze all 10,000 files in the repository")
} catch (e: BudgetExceededException) {
println("Agent ran out of turns: ${e.message}")
// Handle gracefully: return partial result, notify user, etc.
}The exception is thrown before the next LLM call would happen. This means:
- All previous tool calls have completed.
- All previous LLM responses are intact.
- The agent's message history is available up to the point of termination.
In a then pipeline, BudgetExceededException propagates like any other exception:
val pipeline = parse then analyze then summarize
try {
pipeline(input)
} catch (e: BudgetExceededException) {
// Which agent exceeded its budget? Check the message.
println(e.message) // "Agent 'analyze' exceeded budget of 10 turns"
}A turn is one LLM request-response cycle. Here is how turns map to the agentic loop:
Turn 1: LLM receives [system, user] -> returns ToolCalls([search("kotlin agents")])
Framework executes search, appends tool result
Turn 2: LLM receives [system, user, assistant(toolcalls), tool(result)] -> returns ToolCalls([summarize(...)])
Framework executes summarize, appends tool result
Turn 3: LLM receives [system, user, assistant, tool, assistant, tool] -> returns Text("Here is the summary...")
Done. 3 turns used.
Key points:
- Each call to
ModelClient.chat()is one turn. - Multiple tool calls in a single LLM response count as one turn (the LLM made one request that happened to include multiple tool calls).
- The final
Textresponse also counts as a turn. - Tool execution itself does not count -- only the LLM call does.
val agent = agent<String, String>("counter-demo") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 3 }
skills {
skill<String, String>("work", "Do work") {
tools("step_a", "step_b")
tool("step_a", "First step") { args -> "result_a" }
tool("step_b", "Second step") { args -> "result_b" }
}
}
}If the LLM's behavior is:
- Turn 1: calls
step_aandstep_btogether -> 1 turn - Turn 2: calls
step_aagain -> 1 turn - Turn 3: returns text "Done" -> 1 turn
Total: 3 turns. Exactly at the limit. If the LLM tried a 4th call, it would throw.
// Don't do this in production
val agent = agent<String, String>("risky") {
model { ollama("qwen2.5:7b") }
// No budget -- defaults to Int.MAX_VALUE
// ...
}
// Do this instead
val agent = agent<String, String>("safe") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 15 }
// ...
}Match your budget to the expected number of tool calls:
| Task Type | Typical Turns | Suggested Budget |
|---|---|---|
| Single tool call + answer | 2 | 3-5 |
| Multi-step analysis (3-5 tools) | 4-6 | 8-10 |
| Complex research (many tools, iteration) | 8-15 | 15-20 |
| Open-ended exploration | 10-30 | 25-30 |
Leave headroom above the expected turns. The LLM might need an extra turn to correct a mistake or rephrase its answer.
When agents are composed via structure {}, each has its own budget. A parent agent with maxTurns = 10 does not share that budget with its children:
val researcher = agent<String, String>("researcher") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 20 } // generous budget for deep research
// ...
}
val summarizer = agent<String, String>("summarizer") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 3 } // tight budget: should be quick
// ...
}
val pipeline = researcher then summarizer
// researcher gets 20 turns, summarizer gets 3 -- independentTool Error Recovery repair agents should have tight budgets. A repair agent that loops is worse than the original error:
val jsonFixer = agent<String, String>("json-fixer") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 1 } // single-shot: one LLM call, no tools
// ...
}Write tests that verify your agent completes within its budget:
@Test
fun `agent completes within budget`() {
var turnCount = 0
val mockClient = ModelClient { messages ->
turnCount++
if (turnCount < 3) {
LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
} else {
LlmResponse.Text("done")
}
}
val agent = agent<String, String>("test") {
model { ollama("unused"); client = mockClient }
budget { maxTurns = 5 }
skills {
skill<String, String>("work", "Work") {
tools("step")
tool("step", "A step") { "ok" }
}
}
}
val result = agent("go")
assertEquals("done", result)
assertEquals(3, turnCount) // completed in 3 turns, well within budget of 5
}
@Test
fun `agent throws when budget exceeded`() {
val mockClient = ModelClient { _ ->
// Never returns Text -- always calls tools
LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
}
val agent = agent<String, String>("test") {
model { ollama("unused"); client = mockClient }
budget { maxTurns = 3 }
skills {
skill<String, String>("work", "Work") {
tools("step")
tool("step", "A step") { "ok" }
}
}
}
assertThrows<BudgetExceededException> {
agent("go")
}
}Combine budgets with Observability Hooks to track how many turns agents actually use:
var turns = 0
val agent = agent<String, String>("monitored") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 10 }
onToolUse { name, args, result ->
turns++
println("Turn $turns: $name")
}
// ...
}
agent("input")
println("Total turns used: $turns")This data helps you tune budgets over time. If an agent consistently uses 3 turns, a budget of 20 is wasteful -- tighten it to 5 to catch regressions early.
- Model & Tool Calling -- understand the loop that budgets constrain
- Tool Error Recovery -- error recovery interacts with budgets (retries consume turns)
- Observability Hooks -- monitor budget usage in real time
Getting Started
Core Concepts
Composition Operators
LLM Integration
- Model & Tool Calling
- Tool Error Recovery
- Skill Selection & Routing
- Budget Controls
- Observability Hooks
Guided Generation
Agent Memory
Reference