Skip to content

fix(llm_config): disable reasoning_effort for Opus 4.7#670

Closed
juanmichelini wants to merge 1 commit intomainfrom
fix/opus-4-7-reasoning-effort
Closed

fix(llm_config): disable reasoning_effort for Opus 4.7#670
juanmichelini wants to merge 1 commit intomainfrom
fix/opus-4-7-reasoning-effort

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

Summary

Fixes the Opus 4.7 LLM failure where Opus 4.6 works with the same setup.

Root Cause

LiteLLM handles reasoning_effort differently for Opus 4.6 vs 4.7:

  • Opus 4.6: reasoning_effort="high" maps to type="adaptive" (model decides thinking budget)
  • Opus 4.7: reasoning_effort="high" maps to type="enabled" with budget_tokens=4096 (fixed)

This causes unexpected behavior for 4.7 (excessive thinking, token limit issues) while 4.6 works correctly.

The SDK sets reasoning_effort="high" by default for all models. When this is passed to LiteLLM, 4.7 gets the problematic fixed budget mapping while 4.6 gets the adaptive type which works fine.

Evidence

From LiteLLM's code (llms/anthropic/chat/transformation.py):

def _map_reasoning_effort(...):
    if AnthropicConfig._is_claude_4_6_model(model):
        return AnthropicThinkingParam(type="adaptive")  # 4.6 - works
    elif reasoning_effort == "high":
        return AnthropicThinkingParam(
            type="enabled",
            budget_tokens=DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET,  # 4096 for 4.7 - breaks
        )

Fix

In benchmarks/utils/llm_config.py, disable reasoning_effort for Opus 4.7 models by setting it to None. This allows them to use default behavior without the problematic fixed budget mapping.

Verification

Opus 4.7 after fix:
  reasoning_effort: None
  extended_thinking_budget: 200000

Opus 4.6 (unchanged):
  reasoning_effort: high
  extended_thinking_budget: 200000

The fix is minimal and targeted - it only affects Opus 4.7 and doesn't impact other models.

@juanmichelini can click here to continue refining the PR

LiteLLM handles reasoning_effort='high' differently for Opus 4.6 vs 4.7:
- Opus 4.6: maps to type='adaptive' (model decides thinking budget)
- Opus 4.7: maps to type='enabled' with fixed budget_tokens=4096

This causes unexpected behavior for 4.7 (excessive thinking, token limit
issues) while 4.6 works correctly.

The fix disables reasoning_effort for Opus 4.7 models, allowing them to
use the default behavior without the problematic fixed budget mapping.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Targeted workaround for a real LiteLLM bug. The fix works but could be simplified. Missing concrete evidence of before/after behavior.

Comment on lines +32 to +37
if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None:
llm = LLM(
**{
**llm.model_dump(),
"reasoning_effort": None,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Pydantic models support model.copy(update={...}) which is cleaner than manual dict unpacking:

Suggested change
if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None:
llm = LLM(
**{
**llm.model_dump(),
"reasoning_effort": None,
}
if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None:
llm = llm.model_copy(update={"reasoning_effort": None})

This avoids the nested dict unpacking and is more idiomatic for Pydantic models.

Comment on lines +6 to +11
from openhands.sdk.llm.utils.model_features import model_matches


# Models where LiteLLM handles reasoning_effort incorrectly.
# LiteLLM maps reasoning_effort="high" to type="adaptive" for 4.6 but to
# type="enabled" with fixed budget_tokens=4096 for 4.7, causing issues.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Add a comment linking to the upstream LiteLLM issue (if one exists) so we know when this workaround can be removed:

# Models where LiteLLM handles reasoning_effort incorrectly.
# TODO: Remove this workaround once LiteLLM fixes the mapping.
# See: https://github.com/BerriAI/litellm/issues/XXXXX
# LiteLLM maps reasoning_effort="high" to type="adaptive" for 4.6 but to
# type="enabled" with fixed budget_tokens=4096 for 4.7, causing issues.
OPUS_4_7_MODELS = [
    "claude-opus-4-7",
]

If no issue exists, consider filing one to track this upstream.

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

No longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants