Skip to content

fix(optimization): Handle failed inference cases gracefully in GEPA pipeline#6005

Open
thoang3 wants to merge 1 commit into
google:mainfrom
thoang3:fix/gepa-failed-eval-cases-error-handling
Open

fix(optimization): Handle failed inference cases gracefully in GEPA pipeline#6005
thoang3 wants to merge 1 commit into
google:mainfrom
thoang3:fix/gepa-failed-eval-cases-error-handling

Conversation

@thoang3
Copy link
Copy Markdown

@thoang3 thoang3 commented Jun 7, 2026

Link to Issue

Fixes #6004

Problem

When using GEPA optimization with evaluation sets containing failed cases (e.g., due to inference failures, user simulator errors, or API timeouts), three cascading crashes occur:

  1. TypeError in local_eval_service.py:283

    • Failed inferences leave inference_result.inferences = None
    • Evaluation code tries to iterate over None: TypeError: 'NoneType' object is not iterable
  2. KeyError in gepa_root_agent_prompt_optimizer.py:150

    • Failed cases don't populate score entries in result.scores
    • GEPA adapter crashes when aggregating scores: KeyError: '<example_id>'
  3. TypeError in local_eval_sampler.py:292 (Python 3.14+)

    • Failed cases may have score = None in metric results
    • Attempting round(None, 2) raises: TypeError: type NoneType doesn't define __round__ method

Solution

Three minimal, defensive fixes:

  1. Initialize inferences to empty list on failure (1 line)

    • File: google/adk/evaluation/local_eval_service.py
    • Type-safe, matches success path
  2. Use .get(example_id, 0.0) for score lookup (1 line)

    • File: google/adk/optimization/gepa_root_agent_prompt_optimizer.py
    • Graceful degradation with conservative default (0.0 penalizes failures)
  3. Add None check before rounding scores (1 line)

    • File: google/adk/optimization/local_eval_sampler.py
    • Defensive guard for Python 3.14+ compatibility

Testing Plan

Unit Tests:

  • All existing tests pass (35 tests)
  • test_local_eval_service.py: 18 PASSED
  • gepa_root_agent_prompt_optimizer_test.py: 6 PASSED
  • local_eval_sampler_test.py: 11 PASSED
  • No regressions - behavior identical when all cases succeed

Manual Testing:
Verify GEPA optimization completes with:

Evaluation summary: X PASSED, Y FAILED
(rather than crashing mid-optimization)

Impact

✅ GEPA optimization completes even when some eval cases fail
✅ Failed cases contribute 0.0 score (conservative, semantically correct)
✅ No change to behavior when all cases succeed
✅ Better production robustness for transient failures (rate limits, timeouts, API errors)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have performed a self-review of my own code
  • All unit tests pass locally
  • No regressions in existing functionality
  • Changes are minimal and focused
  • Commit message follows Conventional Commits format

…ipeline

Fixes three cascading crashes when eval cases fail during GEPA optimization:

1. Initialize inferences to empty list on inference failure
   - Prevents TypeError when iterating over None
   - Maintains consistent type contract (inferences always List)
   - File: local_eval_service.py

2. Use .get() for score lookup in GEPA adapter
   - Gracefully handles missing scores from failed eval cases
   - Defaults to 0.0 (conservative default penalizing failing prompts)
   - Allows optimization to continue with successful cases
   - File: gepa_root_agent_prompt_optimizer.py

3. Add None check before rounding scores
   - Handles failed cases with None scores before type conversion
   - Python 3.14+ compatible (TypeError on round(None))
   - File: local_eval_sampler.py

These minimal changes enable graceful degradation when transient failures
occur (rate limits, timeouts, API errors), allowing GEPA optimization to
complete successfully with successful eval cases.

Fixes google#6004
Related: google#5876, google#5115, google#5403, PR google#5878
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 7, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Jun 7, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 7, 2026

Response from ADK Triaging Agent

Hello @thoang3, thank you for creating this PR! We really appreciate your contribution to fixing these GEPA pipeline crashes.

Before we can review and accept your pull request, please complete the following step from our Contribution Guidelines:

  • Sign our Contributor License Agreement (CLA): It looks like the Google CLA check has failed. Please visit https://cla.developers.google.com/ to sign or verify your agreement.

This action is required for us to legally accept and merge your code. Thank you so much for your understanding and cooperation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GEPA optimizer crashes with KeyError on failed eval cases

2 participants