[New Env] Cloud SRE & FinOps Environment#506
Conversation
|
Hi @naveenkumar982! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Greptile SummaryThis PR adds a new Cloud SRE & FinOps environment with three difficulty-tiered tasks (phantom volume cleanup, latency spike remediation, noisy neighbor incident), seeded procedural generation, chaos injection, and deterministic grading — all following the expected OpenEnv container+client layout.
Confidence Score: 4/5Safe to merge after fixing the _handle_scale reward bug; the alignment question on dataclasses vs Pydantic should be confirmed but is unlikely to break the current wire format. One genuine P1 logic defect (dead-code reward branch) prevents a score of 5. All other findings are P2 style or alignment questions that do not block correctness of the primary task flows. envs/cloud_sre_env/server/cloud_sre_environment.py (lines 720-732) and envs/cloud_sre_env/models.py (dataclass vs Pydantic alignment). Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Agent sends action] --> B[step()]
B --> C{cmd?}
C -- terminate --> D[_handle_terminate]
C -- scale --> E[_handle_scale]
C -- reboot --> F[_handle_reboot]
C -- inspect --> G[_handle_inspect]
C -- wait --> H[step_reward = -0.01]
D & E & F & G & H --> I[_action_history.append]
I --> J{chaos_enabled?}
J -- yes --> K[_maybe_inject_chaos]
J -- no --> L[_recalculate_state]
K --> L
L --> M{current_step >= MAX_STEPS?}
M -- yes --> N[done = True]
M -- no --> O[_build_observation]
N --> O
O --> P[Return SREObservation]
P --> Q[grade() — called by orchestrator]
Q --> R{task?}
R -- Task1 --> S[PhantomVolumeCleanup grader]
R -- Task2 --> T[LatencySpikeRemediation grader]
R -- Task3 --> U[NoisyNeighborIncident grader]
S & T & U --> V[Return score, breakdown]
|
naveenkumar982
left a comment
There was a problem hiding this comment.
everything is updated now
Implements a new Cloud SRE & FinOps environment for OpenEnv. This environment features 3 difficulty-tiered tasks (Phantom Volume Cleanup, Latency Spike Remediation, and Noisy Neighbor Incident), testing an agent's ability to diagnose outages, optimize costs, and perform multi-step mitigations without causing collateral damage to production workloads.
Features: