fix(llm_config): disable reasoning_effort for Opus 4.7#670
Closed
juanmichelini wants to merge 1 commit intomainfrom
Closed
fix(llm_config): disable reasoning_effort for Opus 4.7#670juanmichelini wants to merge 1 commit intomainfrom
juanmichelini wants to merge 1 commit intomainfrom
Conversation
LiteLLM handles reasoning_effort='high' differently for Opus 4.6 vs 4.7: - Opus 4.6: maps to type='adaptive' (model decides thinking budget) - Opus 4.7: maps to type='enabled' with fixed budget_tokens=4096 This causes unexpected behavior for 4.7 (excessive thinking, token limit issues) while 4.6 works correctly. The fix disables reasoning_effort for Opus 4.7 models, allowing them to use the default behavior without the problematic fixed budget mapping. Co-authored-by: openhands <openhands@all-hands.dev>
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Targeted workaround for a real LiteLLM bug. The fix works but could be simplified. Missing concrete evidence of before/after behavior.
Comment on lines
+32
to
+37
| if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None: | ||
| llm = LLM( | ||
| **{ | ||
| **llm.model_dump(), | ||
| "reasoning_effort": None, | ||
| } |
Collaborator
There was a problem hiding this comment.
🟡 Suggestion: Pydantic models support model.copy(update={...}) which is cleaner than manual dict unpacking:
Suggested change
| if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None: | |
| llm = LLM( | |
| **{ | |
| **llm.model_dump(), | |
| "reasoning_effort": None, | |
| } | |
| if model_matches(llm.model, OPUS_4_7_MODELS) and llm.reasoning_effort is not None: | |
| llm = llm.model_copy(update={"reasoning_effort": None}) |
This avoids the nested dict unpacking and is more idiomatic for Pydantic models.
Comment on lines
+6
to
+11
| from openhands.sdk.llm.utils.model_features import model_matches | ||
|
|
||
|
|
||
| # Models where LiteLLM handles reasoning_effort incorrectly. | ||
| # LiteLLM maps reasoning_effort="high" to type="adaptive" for 4.6 but to | ||
| # type="enabled" with fixed budget_tokens=4096 for 4.7, causing issues. |
Collaborator
There was a problem hiding this comment.
🟡 Suggestion: Add a comment linking to the upstream LiteLLM issue (if one exists) so we know when this workaround can be removed:
# Models where LiteLLM handles reasoning_effort incorrectly.
# TODO: Remove this workaround once LiteLLM fixes the mapping.
# See: https://github.com/BerriAI/litellm/issues/XXXXX
# LiteLLM maps reasoning_effort="high" to type="adaptive" for 4.6 but to
# type="enabled" with fixed budget_tokens=4096 for 4.7, causing issues.
OPUS_4_7_MODELS = [
"claude-opus-4-7",
]If no issue exists, consider filing one to track this upstream.
Collaborator
Author
|
No longer needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the Opus 4.7 LLM failure where Opus 4.6 works with the same setup.
Root Cause
LiteLLM handles
reasoning_effortdifferently for Opus 4.6 vs 4.7:reasoning_effort="high"maps totype="adaptive"(model decides thinking budget)reasoning_effort="high"maps totype="enabled"withbudget_tokens=4096(fixed)This causes unexpected behavior for 4.7 (excessive thinking, token limit issues) while 4.6 works correctly.
The SDK sets
reasoning_effort="high"by default for all models. When this is passed to LiteLLM, 4.7 gets the problematic fixed budget mapping while 4.6 gets the adaptive type which works fine.Evidence
From LiteLLM's code (
llms/anthropic/chat/transformation.py):Fix
In
benchmarks/utils/llm_config.py, disablereasoning_effortfor Opus 4.7 models by setting it toNone. This allows them to use default behavior without the problematic fixed budget mapping.Verification
The fix is minimal and targeted - it only affects Opus 4.7 and doesn't impact other models.
@juanmichelini can click here to continue refining the PR