Commit 3e7c7b7
authored
🤖 Reduce flakiness in OpenAI integration tests by defaulting to low reasoning (#269)
Multiple tests in CI were timing out waiting for stream-end events when
using OpenAI's reasoning models (gpt-5-codex). The issue stems from
reasoning models taking longer to complete in CI environments.
## Solution
Modified `sendMessageWithModel()` helper to automatically apply low
reasoning level for all OpenAI tests unless explicitly overridden. This:
- Reduces model execution time and improves reliability in CI
- Still validates all functionality (events, tokens, timestamps, etc.)
- Preserves ability to override for specific tests (e.g. web_search
tests that need high reasoning)
- Applies consistently to all provider-parametrized tests
## Affected Tests
All integration tests using `sendMessageWithModel()` with OpenAI
provider will now default to low reasoning level, making them faster and
more reliable in CI environments.
## Testing
- Tested locally: Both openai:gpt-5-codex and
anthropic:claude-sonnet-4-5 variants pass
- The 'should include tokens and timestamp in delta events' test now
completes in ~15s instead of timing out
Generated with `cmux`1 parent 8600660 commit 3e7c7b7
1 file changed
+8
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
137 | | - | |
| 138 | + | |
| 139 | + | |
138 | 140 | | |
139 | 141 | | |
140 | 142 | | |
| |||
193 | 195 | | |
194 | 196 | | |
195 | 197 | | |
196 | | - | |
| 198 | + | |
197 | 199 | | |
198 | 200 | | |
199 | 201 | | |
| |||
1311 | 1313 | | |
1312 | 1314 | | |
1313 | 1315 | | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
1314 | 1319 | | |
1315 | 1320 | | |
1316 | 1321 | | |
1317 | | - | |
| 1322 | + | |
1318 | 1323 | | |
1319 | 1324 | | |
1320 | 1325 | | |
| |||
0 commit comments