Sim-Arena

Canonical README for the sim-arena codebase — use this file for setup, training, benchmarks, and distributed operations.

TL;DR: A reinforcement learning gym where AI agents (DQN, Epsilon-Greedy, or hand-coded policies) learn to fix Kubernetes resource problems by running simulations, observing pod states, taking actions (like increasing CPU), and getting rewards when pods become healthy. Now also supports LLM benchmarking — Gemini and Claude models can be evaluated on the same scenarios using live Kubernetes tool access via MCP. For distributed multi-worker training, see the protocol/ directory and the runbooks in docs/.

What This System Does

Problem: Kubernetes pods fail when they request too much or too little CPU/memory. Figuring out the right resource requests is hard.

Solution: Sim-Arena creates a "gym" where an AI agent can:

Start a simulation of a failing Kubernetes workload (using SimKube)
Observe what's wrong (e.g., "3 pods are pending")
Take an action (e.g., "increase CPU requests")
Get a reward (shaped or binary based on pod health)
Learn over time which actions fix which problems

RL Training: Fully working training loop with DQN and Epsilon-Greedy agents, plus hand-coded fallback policies. Checkpointing, visualization, and learning curve tracking are all supported.

LLM Benchmarking: LLM agents (Gemini, Claude) can be evaluated on the same scenarios. They use MCP tools to query the live cluster — pod states, events, logs, deployment config — before deciding on an action. Both RL and LLM agents share the same action space and reward functions.

Distributed / Federated Training: Multiple EC2 workers can run jobs in parallel using an S3-based communication protocol. A sync_server.py process coordinates federated averaging (FedAvg) of DQN weights between episodes across workers.

The Big Picture

┌─────────────────────────────────────────────────────────────┐
│                     TRAINING LOOP (RL)                      │
└─────────────────────────────────────────────────────────────┘

for each episode:
  Input: Trace file (broken workload) + Agent (DQN or Greedy)
     ↓
  1. Create Simulation (SimKube starts fake cluster)
     ↓
  2. Wait (duration seconds for pods to fail)
     ↓
  3. Observe (count ready/pending pods)
     ↓
  4. Agent Decision (neural net or epsilon-greedy chooses action)
     ↓
  5. Apply Action (modify trace file)
     ↓
  6. Compute Reward (shaped, base, cost_aware_v2, or max_punish)
     ↓
  7. Agent Learn (update Q-network / value table)
     ↓
  8. Checkpoint & Visualize
     ↓
  Output: Updated agent weights + logs + plots

┌─────────────────────────────────────────────────────────────┐
│                  BENCHMARK LOOP (LLM)                       │
└─────────────────────────────────────────────────────────────┘

for each scenario:
  Input: Trace file + LLM provider (Gemini / Claude)
     ↓
  1. Create Simulation (same SimKube infrastructure)
     ↓
  2. Observe base pod state
     ↓
  3. LLM queries MCP tools (get_pods, get_events, describe_deployment, get_pod_logs)
     ↓
  4. LLM decides action → same ACTION_SPACE as RL agents
     ↓
  5. Apply Action + Compute Reward (same functions)
     ↓
  6. Record metrics (steps, reward, tool calls, latency, solved)
     ↓
  Output: report.json + report.md in benchmark/results/

┌─────────────────────────────────────────────────────────────┐
│           DISTRIBUTED / FEDERATED TRAINING (EC2)            │
└─────────────────────────────────────────────────────────────┘

  Mac/laptop: dispatch.py submit → writes manifest.json to S3
              sync_server.py     → runs locally, polls S3

  EC2 Worker 1:  worker.py → claims job → train.py ep 1
                             → upload checkpoint to S3
                             → wait for global_weights.pt (FedAvg barrier)
                             → train.py ep 2 with averaged weights

  EC2 Worker 2:  worker.py → claims job → train.py ep 1
                             → upload checkpoint to S3
                             → wait for global_weights.pt (same barrier)
                             → train.py ep 2 with averaged weights

  sync_server.py: sees both workers done → FedAvg → writes global_weights.pt

Directory Structure

sim-arena/
│
├── runner/                    # Orchestration
│   ├── train.py              # ★ Main RL training loop (multi-episode, checkpointing)
│   ├── one_step.py           # Run ONE agent step (RL or LLM)
│   ├── multi_step.py         # Run ONE episode (many steps)
│   ├── policies.py           # Hand-coded fallback policies
│   └── safeguards.py         # Resource limit validation
│
├── agent/                     # ★ All agent implementations
│   ├── agent.py              # Agent factory (AgentType enum + unified Agent class)
│   ├── dqn.py                # Deep Q-Network implementation
│   ├── eps_greedy.py         # Epsilon-Greedy tabular agent
│   ├── llm_agent.py          # ★ LLM agent (provider-agnostic, uses MCP tools)
│   ├── prompt_builder.py     # Builds system + user prompts from obs dict
│   ├── action_parser.py      # Parses LLM JSON response → action index
│   └── providers/            # ★ LLM provider implementations
│       ├── base.py           # LLMProvider abstract base class
│       ├── gemini_provider.py  # Google Gemini (google-genai SDK)
│       └── anthropic_provider.py  # Anthropic Claude
│
├── protocol/                  # ★ Distributed training communication layer
│   ├── dispatch.py           # CLI: submit jobs to S3 + list status
│   ├── worker.py             # EC2 worker: poll S3, claim jobs, run train.py
│   ├── sync_server.py        # ★ Per-episode barrier + FedAvg aggregation
│   ├── federated_avg.py      # FedAvg: average DQN q_net / target_net weights
│   ├── schemas.py            # JobManifest + JobResult dataclasses
│   ├── s3_helpers.py         # boto3 upload/download/list/JSON helpers
│   └── sync_paths.py         # Canonical S3 keys for per-episode sync
│
├── sim_mcp/                   # ★ MCP server: Kubernetes observability tools
│   ├── server.py             # FastMCP server (runs as stdio subprocess)
│   ├── client.py             # MCPClientSync wrapper used by LLMAgent
│   └── tools/
│       ├── _k8s.py           # Shared Kubernetes client loader
│       ├── pods.py           # get_pods(namespace)
│       ├── deployments.py    # describe_deployment(namespace, deploy)
│       ├── events.py         # get_events(namespace, deploy, last_n)
│       └── logs.py           # get_pod_logs(namespace, pod_name, tail_lines)
│
├── benchmark/                 # ★ LLM benchmark harness
│   ├── run.py                # Entry point: python benchmark/run.py --provider gemini
│   ├── metrics.py            # Per-step + per-episode metric collection & reports
│   └── scenarios/
│       ├── index.json        # Scenario registry (name, trace, target, problem_type)
│       └── __init__.py       # load_scenarios() helper
│
├── env/                       # Environment (simulation wrapper)
│   ├── sim_env.py            # Create/delete SimKube simulations
│   ├── simkube_gymenv.py     # Gymnasium-compatible SimKubeEnv wrapper
│   ├── __init__.py
│   └── actions/              # Trace mutation operations
│       ├── ops.py            # bump_cpu, bump_mem, scale_replicas
│       └── trace_io.py       # Load/save MessagePack files
│
├── observe/                   # Observation & reward
│   ├── reader.py             # Extract pod states from cluster
│   ├── reward.py             # Compute reward (base / shaped / max_punish)
│   └── print_obs.py          # Debug helper
│
├── ops/                       # Infrastructure/lifecycle
│   ├── hooks.py              # Pre-start/post-end hooks
│   ├── preflight.py          # Cluster health checks
│   └── ec2_workers.py        # ★ Launch / terminate EC2 worker fleets (boto3)
│
├── demo/                      # Demo traces & scripts
│   ├── traces/               # 100 generated trace files (.msgpack + .json)
│   ├── generate_traces.py    # Script to make more traces
│   └── *.py                  # Conversion helpers (json2msg, normalize, etc.)
│
├── docs/                      # Runbooks and protocol docs
│   ├── WORKER_PROTOCOL.md    # S3 job protocol: manifests, results, FedAvg
│   ├── EC2_MULTI_WORKER_RUNBOOK.md  # Launch N workers from AMI
│   ├── EC2_SETUP_FROM_SCRATCH.md   # Single-instance setup guide
│   ├── WORKER_SETUP.md       # Worker-specific cheat sheet
│   └── NEXT_TASKS.md         # Project task history (all tasks completed)
│
├── checkpoints/               # ★ Auto-saved RL agent checkpoints
├── tests/                     # Unit & integration tests
├── runs/                      # Per-step output logs + EC2 worker inventory JSON
├── .env.example               # API key template — copy to .env and fill in
└── docs/archive/              # Archived design docs

Namespaces: `--ns` vs `virtual-default`

SimKube creates pods in a virtual namespace derived from the trace: virtual-<trace-namespace>. Demo traces use namespace "default", so pods appear in virtual-default.

--ns (e.g. virtual-default) is where the Simulation CR lives and where preflight checks run.
Pods appear in virtual-default (not necessarily in --ns).
To view pods: kubectl get pods -n virtual-default
make clean-ns cleans virtual-default.

How Everything Fits Together

The RL Training Flow (Step by Step)

USER RUNS:
  python runner/train.py --trace demo/trace-0001.msgpack --ns virtual-default --agent dqn

train.py: SETUP
  - Parse args, resolve seed, create checkpoint folder
  - Redirect stdout+stderr → checkpoints/<run>/train.log
  - Initialize Agent (DQN or Epsilon-Greedy)

train.py: for each episode
  → runner/multi_step.py: run_episode()
     → runner/one_step.py: one_step() × max_steps

one_step.py:
  1. PREFLIGHT      — cluster accessible, SimKube CRDs exist, namespace clean
  2. CREATE SIM     — SimKube Simulation CR created, pods start appearing
  3. WAIT           — duration seconds for pod state to stabilise
  4. OBSERVE        — {"ready": 0, "pending": 3, "total": 3}
  5. AGENT DECISION — DQN forward pass or epsilon-greedy lookup → action index
  6. APPLY ACTION   — load trace, mutate resources, save trace
  7. COMPUTE REWARD — shaped / base / cost_aware_v2 / max_punish
  8. LOG & CLEANUP  — write step.jsonl, delete Simulation CR

train.py: AGENT LEARN + CHECKPOINT
  - Update Q-network / replay buffer
  - Save checkpoint_latest + learning curve plots
  - Save checkpoint_epN every N episodes

The LLM Benchmark Flow

USER RUNS:
  python benchmark/run.py --provider gemini --ns virtual-default

benchmark/run.py:
  1. Load scenarios from benchmark/scenarios/index.json
  2. Start MCP server subprocess (sim_mcp/server.py via MCPClientSync)
  3. For each scenario:
       → run_episode() from runner/multi_step.py  [unchanged from RL]
          → one_step() with agent_name="llm"
             → LLMAgent.act(obs, namespace, deploy, ...)
                1. Build prompt from obs + scenario context
                2. Call LLM API with 4 MCP tools attached
                3. LLM calls tools 0–N times autonomously:
                     get_pods / describe_deployment / get_events / get_pod_logs
                4. LLM returns JSON → action index (same ACTION_SPACE)
  4. Record: steps_to_solve, reward, tool_calls, latency, solved
  5. Write benchmark/results/<timestamp>/report.json + report.md

The Distributed / Federated Training Flow

OPERATOR (from laptop):
  export JOBS_BUCKET=diya-simarena-jobs-664926621123-us-east-2-an
  GROUP="fedrun-$(date +%Y%m%d-%H%M)"
  python protocol/dispatch.py submit --bucket "$JOBS_BUCKET" \
    --trace s3://diya-simarena-traces/demo/trace-mem-slight.msgpack \
    --agent dqn --episodes 2 --steps 3 --duration 40 --timeout 7200 \
    --federation-group "$GROUP" --federation-size 2
  # (run the above twice — one manifest per worker)

  python protocol/sync_server.py --bucket "$JOBS_BUCKET" --poll-interval 10
  # keep this running; it does FedAvg when both workers finish an episode

ON EACH EC2 WORKER (ssh in, then):
  # 0. Health check FIRST — do this every session
  kubectl get pods -A | grep kwok          # must be Running, not CrashLoopBackOff
  kubectl get nodes                         # all needed nodes must be Ready
  kubectl delete simulations.simkube.io --all -n default 2>/dev/null || true
  pkill -f "train.py" 2>/dev/null || true

  # 1. Pull latest code
  cd ~/work/sim-arena && git pull --rebase origin main
  source .venv/bin/activate

  # 2. Export credentials + env
  export AWS_ACCESS_KEY_ID=<your_key>
  export AWS_SECRET_ACCESS_KEY=<your_secret>
  export AWS_DEFAULT_REGION=us-east-2
  export SIM_ARENA_DRIVER_TIMEOUT=150
  export SIM_ARENA_DEPLOY_TIMEOUT=90
  export SIM_ARENA_NODE_DATA_DIR=/var/kind/cluster
  export PYTHONPATH=/home/ubuntu/work/sim-arena
  export JOBS_BUCKET=diya-simarena-jobs-664926621123-us-east-2-an

  # 3. Refresh K8s secret
  kubectl create secret generic simkube -n simkube \
    --from-literal=AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
    --from-literal=AWS_DEFAULT_REGION=us-east-2 \
    --dry-run=client -o yaml | kubectl apply -f -

  # 4. Run worker
  python protocol/worker.py --bucket "$JOBS_BUCKET" --run-once --log-level INFO

Detailed Component Breakdown

RL Components

`runner/train.py` — Main RL Entry Point

Orchestrates the full training run across multiple episodes.

Flag	Default	Description
`--trace`	required	Initial trace file or directory
`--ns`	required	Kubernetes namespace
`--target`	required	Target pod count
`--agent`	`greedy`	`greedy`, `dqn`, or `random`
`--episodes`	200	Total training episodes
`--steps`	200	Max steps per episode
`--duration`	40	Seconds per sim step
`--reward`	`shaped`	`base`, `shaped`, `cost_aware_v2`, or `max_punish`
`--Naction`	4	Action space size
`--checkpoint-interval`	10	Save every N episodes
`--load`	None	Resume from checkpoint
`--save`	None	Extra final save path
`--seed`	random	Base random seed
`--lr`	0.001	DQN learning rate
`--gamma`	0.97	DQN discount factor
`--step-penalty`	0	Per-step penalty for `cost_aware_v2`

`agent/agent.py` — Agent Factory

Wraps all agent types behind a single Agent interface.

agent = Agent(AgentType.DQN, state_dim=5, n_actions=7, ...)
action_idx = agent.act(obs_vector)
agent.update(state, action, reward, next_state, done)
agent.save("checkpoint.pt")
agent.load("checkpoint.pt")

AgentType enum values: DQN, EPSILON_GREEDY, RANDOM, LLM

`observe/reward.py` — Reward Functions

base — Binary (1 if ready==target and pending==0, else 0)
shaped — Continuous (−1 to 1) with distance-based penalties
cost_aware_v2 — All negative except 0 at goal; health + cost shaping; penalises blocked actions
max_punish — Base + penalties for exceeding CPU/memory/replica limits

`runner/safeguards.py` — Resource Limits

Blocks actions that would exceed safe limits: CPU max 16000m, memory max 32Gi, replicas max 100. Prevents the agent from allocating absurd resources.

LLM Benchmark Components

`sim_mcp/server.py` — MCP Server

FastMCP server exposing four Kubernetes observability tools. Runs as a stdio subprocess started automatically by MCPClientSync.

`sim_mcp/tools/` — K8s Tools

Tool	What it returns	When to use
`get_pods(namespace)`	Pod phase, container states, restart counts	First call — understand why pods are stuck
`describe_deployment(namespace, deploy)`	CPU/mem requests, desired vs ready replicas	Before deciding bump/reduce/scale
`get_events(namespace, deploy, last_n)`	Warning + Normal events (OOMKilled, FailedScheduling)	Diagnose root cause
`get_pod_logs(namespace, pod_name, tail_lines)`	Last N container log lines	When events don't explain the failure

`agent/llm_agent.py` — LLM Agent

Provider-agnostic. Builds prompts, dispatches to the provider, records per-step metrics. Compatible with the same Agent interface as DQN/EpsGreedy.

`agent/providers/` — LLM Providers

Provider	Class	Default model	API key env var
`gemini`	`GeminiProvider`	`gemini-2.5-flash-lite`	`GEMINI_API_KEY`
`anthropic`	`AnthropicProvider`	`claude-sonnet-4-20250514`	`ANTHROPIC_API_KEY`

`benchmark/run.py` — Benchmark Entry Point

Flag	Default	Description
`--provider`	`gemini`	`gemini` or `anthropic`
`--model`	provider default	Override model string
`--ns`	`virtual-default`	Kubernetes namespace
`--steps`	10	Max steps per episode
`--duration`	60	Seconds per sim step
`--filter-type`	None	Filter by problem_type
`--scenario`	None	Run a single named scenario
`--list-scenarios`	—	Print scenarios and exit
`--max-tool-rounds`	8	Max MCP tool calls per step

The Agent Loop Flow

Single Step (shared by RL and LLM)

def one_step(trace_path, namespace, deploy, target, duration, agent, reward_name, seed):
    run_hooks("pre_start", namespace)
    sim_uid = create_simulation(name, trace_path, duration, namespace)
    wait_fixed(duration)
    obs = observe(namespace, deploy)          # {"ready": 0, "pending": 3, ...}

    # RL agents:
    action_idx = agent.act(dqn_state_vector)

    # LLM agents:
    action_idx = agent.act(obs, namespace, deploy, step_idx, max_steps)

    out_trace, info = apply_action(trace_path, action_idx, deploy, output_path)
    reward = compute_reward(obs, target, reward_name)
    agent.update(...)                         # no-op for LLM
    write_step_record({...})
    delete_simulation(name, namespace)
    return {"status": 0, "record": {...}}

Key Concepts

Action Space

Index	Action	Effect
0	`noop`	Do nothing
1	`bump_cpu_small`	+500m CPU request
2	`bump_mem_small`	+256Mi memory request
3	`scale_up_replicas`	+1 replica
4	`reduce_cpu_small`	−500m CPU request
5	`reduce_mem_small`	−256Mi memory request
6	`scale_down_replicas`	−1 replica

Traces

MessagePack files containing recorded Kubernetes events. demo/traces/ contains 100 pre-generated traces. demo/generate_traces.py creates more. Remote traces live in s3://diya-simarena-traces/.

Agents

Agent	Type	Checkpoint	Best for
`dqn`	Deep Q-Network	`.pt`	Full RL training
`greedy`	Epsilon-Greedy	`.json`	Fast prototyping
`random`	Random policy	`.json`	Baseline
`llm`	LLM + MCP tools	`.json` (metadata)	Benchmark evaluation
`bump_cpu` etc.	Hand-coded	none	Debugging

How to Use

Setup

# Install dependencies
pip install -r requirements.txt

# Set up API keys (for LLM benchmarking)
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY and/or ANTHROPIC_API_KEY

Train a DQN Agent

# Clean up any ghost simulations first (do this every session)
pkill -f "train.py.*--ns virtual-default"
kubectl delete simulations.simkube.io --all -n virtual-default

# Start training
nohup python runner/train.py \
  --trace demo/traces/trace-0001.msgpack \
  --ns virtual-default \
  --deploy web \
  --target 3 \
  --agent dqn \
  --episodes 50 &

# Monitor logs
tail -f checkpoints/dqn_<timestamp>/train.log

Resume from a Checkpoint

nohup python runner/train.py \
  --trace demo/traces/trace-0001.msgpack \
  --ns virtual-default \
  --target 3 \
  --agent dqn \
  --load checkpoints/dqn_20260218_22/checkpoint_ep20.pt \
  --episodes 50 &

Run a Single Step (Debug)

python runner/one_step.py \
  --trace demo/traces/trace-0001.msgpack \
  --ns virtual-default \
  --deploy web \
  --target 3 \
  --duration 60 \
  --policy bump_cpu

cat runs/step.jsonl

LLM Benchmarking

Quick Start

# List available scenarios
python benchmark/run.py --list-scenarios

# Run all scenarios with Gemini (default: gemini-2.5-flash-lite)
python benchmark/run.py --provider gemini --ns virtual-default

# Run with a specific model
python benchmark/run.py --provider gemini --model gemini-2.5-flash --ns virtual-default

# Run with Anthropic Claude
python benchmark/run.py --provider anthropic --ns virtual-default

# Single scenario for quick testing
python benchmark/run.py --provider gemini --scenario cpu-insufficient-small --steps 5

Recommended Models

Model	RPM	RPD	Use case
`gemini-2.5-flash-lite`	4K	Unlimited	Default — fast testing, no quota issues
`gemini-2.5-flash`	1K	10K	Higher reasoning quality
`claude-sonnet-4-20250514`	—	—	Anthropic benchmark

Benchmark Results

Results are written to benchmark/results/<timestamp>_<provider>_<model>/:

report.json — full per-step and per-episode data
report.md — human-readable summary table
command.txt — exact invocation for reproducibility

Scenario Problem Types

Type	Description
`cpu_insufficient`	CPU requests too low; pods stuck Pending
`mem_insufficient`	Memory requests too low; pods OOMKilled
`replica_deficit`	Fewer replicas than target
`combined`	Both resource and replica problems
`over_allocation`	Resources far exceed actual usage

How the LLM Agent Works

The LLM receives a prompt with the current observation (ready/pending/total pods, target) and has access to four MCP tools it can call in any order before committing to an action. It must respond with a JSON object:

{"action_index": 3, "reasoning": "Only 2 of 5 target replicas are running; scale up."}

The response is parsed by agent/action_parser.py with three fallback strategies (full JSON → extract JSON block → bare integer) so a malformed response never crashes the loop.

Distributed Training (S3 workers and EC2)

For scaling beyond a single machine, jobs are defined as manifests in S3; EC2 workers poll the bucket, run runner/train.py, and upload checkpoints, logs, and result.json. A sync_server.py process handles per-episode weight coordination and optional FedAvg across workers.

Topic	Location
Protocol (manifests, results, federation)	`docs/WORKER_PROTOCOL.md`
Launch N EC2 workers + inventory JSON	`docs/EC2_MULTI_WORKER_RUNBOOK.md`
Single-instance AMI / S3 secret setup	`docs/EC2_SETUP_FROM_SCRATCH.md`
Worker cheat sheet (env vars, traces)	`docs/WORKER_SETUP.md`
Task history (all tasks completed)	`docs/NEXT_TASKS.md`

Note: TRAINING_SERVER_README.md describes a Flask "central server" design that is not yet implemented as training_server.py in this repo — treat it as a design spec. runner/dist_run.py is a stub; job_type=experience_collection is not end-to-end until that runner is implemented.

Before every worker session, always run the cluster health check first to avoid timeouts from ghost simulations or KWOK crashes:

kubectl get pods -A | grep kwok                                      # must be Running
kubectl get nodes                                                      # all needed nodes Ready
kubectl delete simulations.simkube.io --all -n default 2>/dev/null || true
pkill -f "train.py" 2>/dev/null || true

For Future Development

Adding a New RL Agent

Implement your agent class in agent/
Add a new AgentType enum value in agent/agent.py
Wire up initialization in runner/train.py's argument parsing block

Adding a New LLM Provider

Subclass LLMProvider in agent/providers/
Implement run_step() and model_name
Register in agent/providers/__init__.py's _PROVIDER_DEFAULTS dict

Adding a New MCP Tool

Add a function in sim_mcp/tools/
Register it with @mcp.tool() in sim_mcp/server.py
The tool is automatically available to all LLM providers via MCPClientSync.anthropic_tools

Enhancing Observations (RL)

Add features in observe/reader.py and update state_dim in runner/train.py and the DQN network accordingly.

Adding Benchmark Scenarios

Add entries to benchmark/scenarios/index.json pointing at any trace file in demo/traces/.

Implementing `runner/dist_run.py` (Experience Collection)

The stub currently exits immediately. Implement run_experience_collection(manifest) to run episodes with job_type=experience_collection, save transitions to S3, and write result.json with transitions_s3_uri.

Quick Reference

"Where is X happening?"

What	Where
RL training loop	`runner/train.py`
Episode runner	`runner/multi_step.py`
Single step (RL + LLM)	`runner/one_step.py`
DQN agent	`agent/dqn.py`
Epsilon-Greedy agent	`agent/eps_greedy.py`
LLM agent	`agent/llm_agent.py`
Gemini provider	`agent/providers/gemini_provider.py`
Anthropic provider	`agent/providers/anthropic_provider.py`
MCP server	`sim_mcp/server.py`
MCP client	`sim_mcp/client.py`
K8s tools	`sim_mcp/tools/`
Benchmark entry point	`benchmark/run.py`
Benchmark scenarios	`benchmark/scenarios/index.json`
Benchmark metrics	`benchmark/metrics.py`
Trace modification	`env/actions/ops.py`
Observation extraction	`observe/reader.py`
Reward calculation	`observe/reward.py`
Simulation management	`env/sim_env.py`
Resource safeguards	`runner/safeguards.py`
Cluster health checks	`ops/preflight.py`
S3 job dispatch (submit/list)	`protocol/dispatch.py`
EC2 worker polling loop	`protocol/worker.py`
Per-episode S3 barrier + FedAvg	`protocol/sync_server.py`, `protocol/federated_avg.py`
EC2 fleet launch / terminate	`ops/ec2_workers.py`

Data Flow

                      RL Agent
                     ↗
Trace → Simulation → Cluster → Observation → Agent → Action → Modified Trace
                                                ↘         ↑
                                             LLM Agent     |
                                            (MCP tools) ───┘
                                                  ↓
                                            Reward + Metrics

Distributed:
  S3 manifest → worker.py → train.py → S3 checkpoint
                                 ↑            ↓
                          sync_server.py ← FedAvg ← all workers' checkpoints

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
agent		agent
benchmark		benchmark
demo		demo
docs		docs
env		env
observe		observe
ops		ops
protocol		protocol
runner		runner
runs/ec2_workers		runs/ec2_workers
sim_mcp		sim_mcp
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DEVELOPER_README.md		DEVELOPER_README.md
GENERAL_README.md		GENERAL_README.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SIMKUBE_ISSUE.md		SIMKUBE_ISSUE.md
TRAINING_SERVER_README.md		TRAINING_SERVER_README.md
kind.yml		kind.yml
output.txt		output.txt
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh
setup_env.sh		setup_env.sh

Folders and files

Latest commit

History

Repository files navigation

Sim-Arena

Table of Contents

What This System Does

The Big Picture

Directory Structure

Namespaces: --ns vs virtual-default

How Everything Fits Together

The RL Training Flow (Step by Step)

The LLM Benchmark Flow

The Distributed / Federated Training Flow

Detailed Component Breakdown

RL Components

runner/train.py — Main RL Entry Point

agent/agent.py — Agent Factory

observe/reward.py — Reward Functions

runner/safeguards.py — Resource Limits

LLM Benchmark Components

sim_mcp/server.py — MCP Server

sim_mcp/tools/ — K8s Tools

agent/llm_agent.py — LLM Agent

agent/providers/ — LLM Providers

benchmark/run.py — Benchmark Entry Point

The Agent Loop Flow

Single Step (shared by RL and LLM)

Key Concepts

Action Space

Traces

Agents

How to Use

Setup

Train a DQN Agent

Resume from a Checkpoint

Run a Single Step (Debug)

LLM Benchmarking

Quick Start

Recommended Models

Benchmark Results

Scenario Problem Types

How the LLM Agent Works

Distributed Training (S3 workers and EC2)

For Future Development

Adding a New RL Agent

Adding a New LLM Provider

Adding a New MCP Tool

Enhancing Observations (RL)

Adding Benchmark Scenarios

Implementing runner/dist_run.py (Experience Collection)

Quick Reference

"Where is X happening?"

Data Flow

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Namespaces: `--ns` vs `virtual-default`

`runner/train.py` — Main RL Entry Point

`agent/agent.py` — Agent Factory

`observe/reward.py` — Reward Functions

`runner/safeguards.py` — Resource Limits

`sim_mcp/server.py` — MCP Server

`sim_mcp/tools/` — K8s Tools

`agent/llm_agent.py` — LLM Agent

`agent/providers/` — LLM Providers

`benchmark/run.py` — Benchmark Entry Point

Implementing `runner/dist_run.py` (Experience Collection)

Packages