fix(optimization): Handle failed inference cases gracefully in GEPA pipeline#6005
fix(optimization): Handle failed inference cases gracefully in GEPA pipeline#6005thoang3 wants to merge 1 commit into
Conversation
…ipeline Fixes three cascading crashes when eval cases fail during GEPA optimization: 1. Initialize inferences to empty list on inference failure - Prevents TypeError when iterating over None - Maintains consistent type contract (inferences always List) - File: local_eval_service.py 2. Use .get() for score lookup in GEPA adapter - Gracefully handles missing scores from failed eval cases - Defaults to 0.0 (conservative default penalizing failing prompts) - Allows optimization to continue with successful cases - File: gepa_root_agent_prompt_optimizer.py 3. Add None check before rounding scores - Handles failed cases with None scores before type conversion - Python 3.14+ compatible (TypeError on round(None)) - File: local_eval_sampler.py These minimal changes enable graceful degradation when transient failures occur (rate limits, timeouts, API errors), allowing GEPA optimization to complete successfully with successful eval cases. Fixes google#6004 Related: google#5876, google#5115, google#5403, PR google#5878
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @thoang3, thank you for creating this PR! We really appreciate your contribution to fixing these GEPA pipeline crashes. Before we can review and accept your pull request, please complete the following step from our Contribution Guidelines:
This action is required for us to legally accept and merge your code. Thank you so much for your understanding and cooperation! |
Link to Issue
Fixes #6004
Problem
When using GEPA optimization with evaluation sets containing failed cases (e.g., due to inference failures, user simulator errors, or API timeouts), three cascading crashes occur:
TypeError in local_eval_service.py:283
inference_result.inferences = NoneTypeError: 'NoneType' object is not iterableKeyError in gepa_root_agent_prompt_optimizer.py:150
result.scoresKeyError: '<example_id>'TypeError in local_eval_sampler.py:292 (Python 3.14+)
score = Nonein metric resultsround(None, 2)raises:TypeError: type NoneType doesn't define __round__ methodSolution
Three minimal, defensive fixes:
Initialize
inferencesto empty list on failure (1 line)google/adk/evaluation/local_eval_service.pyUse
.get(example_id, 0.0)for score lookup (1 line)google/adk/optimization/gepa_root_agent_prompt_optimizer.pyAdd None check before rounding scores (1 line)
google/adk/optimization/local_eval_sampler.pyTesting Plan
Unit Tests:
Manual Testing:
Verify GEPA optimization completes with:
Impact
✅ GEPA optimization completes even when some eval cases fail
✅ Failed cases contribute 0.0 score (conservative, semantically correct)
✅ No change to behavior when all cases succeed
✅ Better production robustness for transient failures (rate limits, timeouts, API errors)
Checklist