Skip to content

Add PPO RL controller, scenario library, and evaluation pipeline#12

Merged
jaywonchung merged 14 commits into
masterfrom
feat/ppo-controller
May 11, 2026
Merged

Add PPO RL controller, scenario library, and evaluation pipeline#12
jaywonchung merged 14 commits into
masterfrom
feat/ppo-controller

Conversation

@ZhiruiLiang

Copy link
Copy Markdown
Collaborator

Library code (openg2g/)

  • New module openg2g/controller/ppo.py — PPOBatchSizeController (single-site) and SharedPPOBatchSizeController (multi-site) that wrap a trained stable-baselines3 PPO policy as a Controller. Sits alongside the existing RuleBasedBatchSizeController and OFOBatchSizeController.
  • New subpackage openg2g/rl/ with env.py — a Gymnasium environment exposing the simulator as an RL training target. Provides structured observations (per-zone / per-bus / system-summary modes), composable rewards (voltage / throughput / latency / switching), and scenario sampling from a pre-built library. This is what train_ppo.py learns against.
  • Modified openg2g/controller/rule_based.py — tightened the default deadband for finer voltage tracking and added a zone_buses argument for zone-local observation (used in multi-DC ieee123 to give each site credit only for its own zone).
  • Modified openg2g/grid/opendss.py — single tiny change: downgrade a multi-bank RegControl message from info to debug to avoid log spam on ieee34.

Examples (examples/offline/)

  • New examples/offline/train_ppo.py — PPO training entrypoint. Wraps BatchSizeEnv in VecNormalize, runs stable-baselines3 PPO, saves model +VecNormalize stats + per-checkpoint snapshots + TensorBoard logs.
  • New examples/offline/build_scenario_library.py — generates randomized PV / TVL / inference-ramp scenarios, screens them by running baseline + OFO and rejecting cases with no learning signal, writes a library.pkl for the trainer.
  • New examples/offline/evaluate_controllers.py — held-out scenario eval that runs baseline / OFO / rule-based / PPO on the same scenarios and produces side-by-side voltage and throughput metrics (CSV + plots).
  • Modified examples/offline/systems.py — adds the PPO-side infrastructure layered on top of master's feeder constants: DCSite dataclass that bundles deployments with ReplicaSchedules, hardcoded model spec list (ALL_MODEL_SPECS), randomize_scenario / materialize_scenario helpers, ScenarioOpenDSSGrid for randomized PV/TVL, and with_ramp convenience for experiment factories.
  • Modified examples/offline/sweep_dc_locations.py — extends the existing 1-D and 2-D bus sweeps with a zone-constrained 3-phase sweep for ieee123 (Phase 1 screening per zone, Phase 2 combination, optional Phase 3 refinement). Also migrated to master's new grid.attach_dc(...) and Coordinator(datacenters=[...]) APIs.

Documentation (docs/)

  • New docs/examples/rl-controller.md — end-to-end walkthrough of the 3-stage PPO workflow: build scenario library → train PPO → evaluate.
  • Modified docs/examples/voltage-regulation-strategies.md — adds PPO as a fourth control strategy alongside baseline / rule-based / OFO, with a cross-link to the new RL doc.
  • Modified _zensical.toml — nav entry for the new RL example doc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Reinforcement Learning (RL) framework for voltage regulation using Proximal Policy Optimization (PPO), including training environments, controllers, and evaluation scripts. Key additions include a Gymnasium-compatible environment, PPO-based controllers, and utilities for generating scenario libraries and benchmarking against model-based and rule-based strategies. Feedback focuses on significant code duplication between the new scripts and core modules, particularly regarding profile generation and grid definitions. Reviewers also highlighted design concerns such as fragile file-searching logic for normalization stats, cross-script dependencies, and a functional limitation in the shared multi-site controller regarding command routing. Refactoring is recommended to centralize shared logic and improve the modularity of the scenario handling and simulation setup.

Comment thread examples/offline/sweep_dc_locations.py Outdated
Comment thread openg2g/controller/ppo.py Outdated
Comment thread examples/rl_controller/build_library.py
Comment thread examples/rl_controller/build_library.py
Comment thread examples/rl_controller/build_library.py
Comment thread examples/offline/evaluate_controllers.py Outdated
Comment thread openg2g/rl/env.py Outdated
@jaywonchung

Copy link
Copy Markdown
Member

@gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive reinforcement learning (RL) workflow for voltage regulation using Proximal Policy Optimization (PPO). Key additions include a Gymnasium environment, training and evaluation scripts, scenario library generation, and detailed documentation. To support this, the Coordinator was refactored to allow per-tick stepping and external command dispatch, and the rule-based controller was enhanced with zone-local observation capabilities. Feedback is provided regarding the .gitignore configuration, specifically suggesting the removal of leading slashes from Slurm and launch script patterns to ensure they are ignored across all subdirectories.

Comment thread .gitignore Outdated
@jaywonchung jaywonchung merged commit 2008715 into master May 11, 2026
5 checks passed
@jaywonchung jaywonchung deleted the feat/ppo-controller branch May 11, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants