Agent Cloud Optimizer (ACO)

Governance-first, local-first optimization for JVM microservices. ACO combines metric-driven diagnosis, bounded recommendations, audit artifacts, confidence gates, rollback modeling, and benchmark scenarios in one Java codebase.

What ACO Is

ACO is an open-source Java Autonomic Reliability Governance (ARG) reference implementation for experimenting with governed optimization of JVM-backed services. ARG is the discipline of designing bounded, auditable, and reversible automation under explicit constraints — particularly where agents plan and act faster than human-in-the-loop validation can operate.

ACO watches runtime signals such as latency, heap, GC, CPU, and thread behavior; produces structured optimization plans; evaluates those plans against policy and actuation budgets; and validates whether changes helped or hurt.

This is not "YOLO let the LLM touch prod." That would be extremely innovative in the worst possible way.

ACO currently supports:

Local LLM analysis through Ollama
Deterministic fallback analysis through SimpleAgent
OptimizationPlan artifacts for auditability
Policy evaluation before actuation
Actuation budgets — rate-limiting change magnitude and frequency, not just traffic
Blast radius constraints through actuation scoping
Confidence gates and progressive autonomy modes (Advisory / Auto-Governed / Denied)
Eligibility tiers — action-based permission model separating observational, low-risk, and high-risk actions
Validation and rollback modeling
Deterministic benchmark scenarios for amplification testing

Why This Project Exists

JVM tuning is still too often a guessing game:

thread pools get bumped because latency is bad
heap gets inflated "just to be safe"
GC settings drift without a recorded reason
optimization decisions are made in war rooms and forgotten a week later

ACO turns that into a more disciplined loop:

Observe runtime behavior
Reason about likely bottlenecks
Plan a structured optimization with evidence and rollback path
Govern that plan with policy, budgets, and autonomy checks
Validate the outcome
Audit — preserve every decision for review and rollback

The point is not raw automation. The point is bounded, explainable optimization.

Architecture

Execution Pipeline

Every optimization cycle flows through a fixed governance pipeline. No step can be skipped.

flowchart TD
    WS[Workload Simulator] --> MC[Metrics Collection]
    MC --> SLO[SLO Detector]
    SLO --> AR[Agent Reasoning]
    AR --> PA[Plan Assembly]
    PA --> PE[Policy Evaluation]
    PE --> AB[Actuation Budget]
    AB --> AG{Autonomy Gate}
    AG -->|Advisory| ADV[Advisory Report]
    AG -->|Auto-Governed| VE[Validation]
    AG -->|Denied| RE[Rollback]
    VE --> AUD[Audit & Report]
    RE --> AUD
    ADV --> AUD

Component Layers

ACO is organized into six layers. Each layer has a single responsibility and depends only on the layers below it.

flowchart TD
    A["Infrastructure — Ollama · Load Runner · Workload Simulator"]
    B["Observation — Metrics Collection · SLO Detection"]
    C["Reasoning — LLM Agent · Simple Agent"]
    D["Planning — Plan Assembler · Optimization Plan · Rollback Recipe"]
    E["Governance — Policy Engine · Budget Ledger · Autonomy Gate"]
    F["Validation & Audit — Validation · Rollback · Report"]

    A --> B --> C --> D --> E --> F

Quick Start

Option 1: Docker with local LLM

git clone https://github.com/sibasispadhi/agentic-cloud-optimizer.git
cd agentic-cloud-optimizer
docker compose up --build

Open:

Live dashboard: http://localhost:8081/live-dashboard.html
Results page: http://localhost:8081/results.html

First run downloads the Ollama model, so yeah, give it a minute.

Option 2: SimpleAgent only (no LLM download)

git clone https://github.com/sibasispadhi/agentic-cloud-optimizer.git
cd agentic-cloud-optimizer
docker compose -f docker-compose.simple.yml up --build

Open:

Live dashboard: http://localhost:8081/live-dashboard.html
Results page: http://localhost:8081/results.html

Option 3: Local development

# Terminal 1
ollama serve
ollama pull llama3.2:3b

# Terminal 2
mvn spring-boot:run

Then open:

For a more guided local setup, use docs/START_HERE.md.

Core Workflow

The Architecture diagram above shows the full pipeline. Each stage maps to a layer:

Pipeline stage	Layer	Detail
Workload Simulator	Infrastructure	executes load; drives metric signals
Metrics Collection	Observation	latency, throughput, heap, CPU, GC, threads
SLO Detector	Observation	triggers optimization when Service Level Objective (SLO) thresholds breach
Agent Reasoning	Reasoning	`SpringAiLlmAgent` or `SimpleAgent`
Plan Assembly	Planning	`OptimizationPlan` with evidence and rollback path
Policy Evaluation	Governance	`PolicyEngine` approves, warns, or denies
Actuation Budget	Governance	`ActuationBudgetLedger` enforces change limits
Autonomy Gate	Governance	Advisory / Auto-Governed / Denied
Validation	Validation & Audit	confirms improvement; triggers rollback if not
Audit & Report	Validation & Audit	persists every decision for review

Implemented Phases

Phase 0 — Baseline optimizer stabilization

local LLM-backed reasoning with Ollama
deterministic fallback agent
externalized thresholds in configuration
Docker startup flows

Phase 1 — OptimizationPlan artifact

Implemented in src/main/java/com/cloudoptimizer/agent/artifact/

Key outputs:

OptimizationPlan
PlanChange
PlanEvidence
ValidationRecipe
RollbackRecipe

Phase 2 — Policy engine

Implemented in src/main/java/com/cloudoptimizer/agent/policy/

Key outputs:

PolicyEngine
DefaultPolicyEngine
ActuationPolicy
PolicyDecision

Phase 3 — Actuation budgets

Implemented in src/main/java/com/cloudoptimizer/agent/budget/

Key outputs:

ActuationBudget
ActuationBudgetLedger
BudgetConsumption

Phase 4 — Confidence gates and progressive autonomy

Implemented in src/main/java/com/cloudoptimizer/agent/autonomy/

Key outputs:

AutonomyGate
AutonomyMode
AutonomyGateResult

Phase 5 — Validation and rollback modeling

Implemented in the artifact and service layers

Key outputs:

ValidationExecutor
RollbackExecutor
ValidationResult
RollbackResult

Phase 6 — Benchmark scenarios

Implemented in src/main/java/com/cloudoptimizer/agent/benchmark/

Included scenarios:

retry storm
thread saturation
CPU throttling
heap overprovisioning
burst traffic

Benchmark / Amplification Evidence

ACO includes deterministic benchmark scenarios so governance claims can be tested without needing production data.

Example outcomes from the benchmark layer:

Retry storm: 3× amplification factor
Naive latency-reactive agent: p99 worsened by 58%
Governed agent: p99 recovered by 75% (480 ms → 120 ms); throughput restored from 40 RPS to 78 RPS

That is the whole point of the project: useful automation should dampen instability, not cosplay as a chaos monkey.

Generated Outputs

ACO writes artifacts and reports under artifacts/.

Typical outputs include:

baseline.json
after.json
report.json
optimization-plan artifacts
reasoning traces
validation and rollback records

You can also inspect example outputs in examples/README.md.

Run Commands

Build

mvn clean package -DskipTests

CLI mode

./scripts/run-agent.sh
# Windows: scripts\run-agent.bat

Live web UI

./scripts/run-web-ui.sh
# Windows: scripts\run-web-ui.bat

Verify Ollama

./scripts/verify-ollama.sh

Tech Stack

Java 21
Spring Boot 3.2
Spring AI
Ollama for local LLM inference
Jackson for artifact serialization
JUnit for test coverage
Docker / Docker Compose for local execution

Public Docs

docs/START_HERE.md — setup and first run
docs/INDEX.md — current documentation map
docs/WHAT_THIS_IS.md — scope and boundaries
docs/OLLAMA_SETUP.md — Ollama install help
docs/WINDOWS_SETUP.md — Windows notes
docs/ARCHITECTURE_PATTERNS.md — design patterns
docs/STARTUP_READY_PLAN.md — rollout/readiness plan

What Is Not Implemented Yet

These are roadmap items, not shipped capabilities:

GitOps PR generation
OpenSLO ingestion
external telemetry adapters beyond the current local flow
broader multi-service / multi-region rollout controls

If it is not in code and tested, it does not get to sit in the README pretending it exists.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
demo.html		demo.html
docker-compose.simple.yml		docker-compose.simple.yml
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml
view-demo.bat		view-demo.bat
view-demo.sh		view-demo.sh

Folders and files

Latest commit

History

Repository files navigation

Agent Cloud Optimizer (ACO)

What ACO Is

Why This Project Exists

Architecture

Execution Pipeline

Component Layers

Quick Start

Option 1: Docker with local LLM

Option 2: SimpleAgent only (no LLM download)

Option 3: Local development

Core Workflow

Implemented Phases

Phase 0 — Baseline optimizer stabilization

Phase 1 — OptimizationPlan artifact

Phase 2 — Policy engine

Phase 3 — Actuation budgets

Phase 4 — Confidence gates and progressive autonomy

Phase 5 — Validation and rollback modeling

Phase 6 — Benchmark scenarios

Benchmark / Amplification Evidence

Generated Outputs

Run Commands

Build

CLI mode

Live web UI

Verify Ollama

Tech Stack

Public Docs

What Is Not Implemented Yet

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages