Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

📄 Paper: https://opeyemibami.github.io/assets/notes_pdfs/agent-learning.pdf

🔗 DOI: https://doi.org/10.5281/zenodo.19339516

Abstract

Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.

This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.

Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.

Keywords: LLM agents, tool systems, orchestration, policy optimization, reinforcement learning, AI systems

This repository contains the reference implementation accompanying the paper:

Agent Learning formalizes orchestration in tool-augmented large language model (LLM) systems as a policy optimization problem under an explicit execution-aware cost functional.

Overview

Tool-augmented LLM agents often rely on heuristic orchestration logic for tool selection, sequencing, and escalation. This work models orchestration as a learnable policy operating over system states while keeping the reasoning module and tool environment fixed.

The implementation includes:

A synthetic tool environment
Execution-aware cost modeling
Heuristic orchestration baselines
Policy-gradient training with variance reduction
Multi-seed evaluation
Component-wise cost analysis

This repository provides controlled validation of orchestration learning in a finite-horizon setting.

Repository Structure

agent-learning/
├── config.py
├── environment.py
├── policy.py
├── evaluate.py
├── train.py
└── README.md

environment.py — Synthetic tool environment
policy.py — Parameterized orchestration policy
evaluate.py — Baseline and learned policy evaluation
train.py — Policy training and multi-seed aggregation
config.py — Experimental configuration

Installation

Create a virtual environment and install dependencies:

pip install torch numpy

No additional frameworks are required.

Reproducing Results

Baseline Evaluation

python train.py --mode baseline

Train Learned Policy

python train.py

The script runs multi-seed evaluation and reports:

Total execution cost (mean ± std)
Success rate (mean ± std)
Component-wise cost breakdown

Scope

This implementation validates orchestration learning in a controlled synthetic environment.

It is:

A research artifact
A systems-level validation
Not a production agent framework
Not a deployment-ready orchestration engine

Citation

If this work is useful, please cite:

@misc{bamigbade2026agentlearning,
  author       = {Bamigbade, Opeyemi and Oni, Stephen},
  title        = {Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems},
  year         = {2026},
  month        = {March},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19339516},
  url          = {https://doi.org/10.5281/zenodo.19339516}
}

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent_learning		agent_learning
paper		paper
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Abstract

Overview

Repository Structure

Installation

Reproducing Results

Baseline Evaluation

Train Learned Policy

Scope

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

Abstract

Overview

Repository Structure

Installation

Reproducing Results

Baseline Evaluation

Train Learned Policy

Scope

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages