📄 Paper: https://opeyemibami.github.io/assets/notes_pdfs/agent-learning.pdf
Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.
This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.
Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.
Keywords: LLM agents, tool systems, orchestration, policy optimization, reinforcement learning, AI systems
This repository contains the reference implementation accompanying the paper:
Agent Learning formalizes orchestration in tool-augmented large language model (LLM) systems as a policy optimization problem under an explicit execution-aware cost functional.
Tool-augmented LLM agents often rely on heuristic orchestration logic for tool selection, sequencing, and escalation. This work models orchestration as a learnable policy operating over system states while keeping the reasoning module and tool environment fixed.
The implementation includes:
- A synthetic tool environment
- Execution-aware cost modeling
- Heuristic orchestration baselines
- Policy-gradient training with variance reduction
- Multi-seed evaluation
- Component-wise cost analysis
This repository provides controlled validation of orchestration learning in a finite-horizon setting.
agent-learning/
├── config.py
├── environment.py
├── policy.py
├── evaluate.py
├── train.py
└── README.md
environment.py— Synthetic tool environmentpolicy.py— Parameterized orchestration policyevaluate.py— Baseline and learned policy evaluationtrain.py— Policy training and multi-seed aggregationconfig.py— Experimental configuration
Create a virtual environment and install dependencies:
pip install torch numpy
No additional frameworks are required.
python train.py --mode baseline
python train.py
The script runs multi-seed evaluation and reports:
- Total execution cost (mean ± std)
- Success rate (mean ± std)
- Component-wise cost breakdown
This implementation validates orchestration learning in a controlled synthetic environment.
It is:
- A research artifact
- A systems-level validation
- Not a production agent framework
- Not a deployment-ready orchestration engine
If this work is useful, please cite:
@misc{bamigbade2026agentlearning,
author = {Bamigbade, Opeyemi and Oni, Stephen},
title = {Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems},
year = {2026},
month = {March},
publisher = {Zenodo},
doi = {10.5281/zenodo.19339516},
url = {https://doi.org/10.5281/zenodo.19339516}
}MIT License.