Skip to content

OBA-Research/agent-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems

📄 Paper: https://opeyemibami.github.io/assets/notes_pdfs/agent-learning.pdf

🔗 DOI: https://doi.org/10.5281/zenodo.19339516


Abstract

Large language model (LLM) agents increasingly rely on external tools and structured workflows to accomplish complex tasks. While recent work has emphasized improving reasoning quality and prompting strategies, the orchestration layer responsible for tool selection, execution sequencing, escalation, and constraint handling remains largely heuristic.

This work introduces Agent Learning, a framework that formalizes orchestration in tool-augmented LLM systems as an execution-aware policy optimization problem. An agent is modeled as a triplet consisting of a fixed stochastic reasoning module, a set of external tools, and an orchestration policy mapping system states to actions.

Empirical evaluation in a controlled synthetic tool environment demonstrates that learned policies consistently outperform heuristic baselines in minimizing execution cost and improving constraint satisfaction across multiple random seeds.

Keywords: LLM agents, tool systems, orchestration, policy optimization, reinforcement learning, AI systems


This repository contains the reference implementation accompanying the paper:

Agent Learning formalizes orchestration in tool-augmented large language model (LLM) systems as a policy optimization problem under an explicit execution-aware cost functional.


Overview

Tool-augmented LLM agents often rely on heuristic orchestration logic for tool selection, sequencing, and escalation. This work models orchestration as a learnable policy operating over system states while keeping the reasoning module and tool environment fixed.

The implementation includes:

  • A synthetic tool environment
  • Execution-aware cost modeling
  • Heuristic orchestration baselines
  • Policy-gradient training with variance reduction
  • Multi-seed evaluation
  • Component-wise cost analysis

This repository provides controlled validation of orchestration learning in a finite-horizon setting.


Repository Structure

agent-learning/
├── config.py
├── environment.py
├── policy.py
├── evaluate.py
├── train.py
└── README.md
  • environment.py — Synthetic tool environment
  • policy.py — Parameterized orchestration policy
  • evaluate.py — Baseline and learned policy evaluation
  • train.py — Policy training and multi-seed aggregation
  • config.py — Experimental configuration

Installation

Create a virtual environment and install dependencies:

pip install torch numpy

No additional frameworks are required.


Reproducing Results

Baseline Evaluation

python train.py --mode baseline

Train Learned Policy

python train.py

The script runs multi-seed evaluation and reports:

  • Total execution cost (mean ± std)
  • Success rate (mean ± std)
  • Component-wise cost breakdown

Scope

This implementation validates orchestration learning in a controlled synthetic environment.

It is:

  • A research artifact
  • A systems-level validation
  • Not a production agent framework
  • Not a deployment-ready orchestration engine

Citation

If this work is useful, please cite:

@misc{bamigbade2026agentlearning,
  author       = {Bamigbade, Opeyemi and Oni, Stephen},
  title        = {Agent Learning: Execution-Aware Policy Optimization for LLM Tool Systems},
  year         = {2026},
  month        = {March},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19339516},
  url          = {https://doi.org/10.5281/zenodo.19339516}
}

License

MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages