Debugging Async Slow Runs by xzrderek · Pull Request #41 · eval-protocol/python-sdk

xzrderek · 2025-08-08T18:52:31Z

everything should be parallel now :)

tau bench takes ~3 min now instead of 15

xzrderek · 2025-08-10T07:30:23Z

eval_protocol/mcp/client/connection.py

        body = {"seed": session.seed}

-        timeout = httpx.Timeout(3.0)
+        timeout = httpx.Timeout(15.0)


cc @mayinghan we should come up with a better solution to this timeout. for complex environments like tau, can definitely take a long time. e.g. it takes ~12 seconds to reset all the environments for airline (loads a large json, and we're doing it on a thread pool, so not truely concurrent)

what is this sleep for? I thought we fixed it with health check?

it’s not a sleep but a timeout, so if the env reset takes more than 15s, it’ll time out. this reset is called on cleanup when the rollouts end.
can we just remove the timeout amount since it’s possible for env reset to take more than 15s or is that dangerous?

maybe we can also consider delete that session completely from the mcp server? but then the server will never be able to persistent any state after one single run

but then the server will never be able to persistent any state after one single run

i don't quite get what this means. i believe you added reset_session recently, and it's triggered at the end of the rollout. so aren't we already not persisting state after a run?

regardless, i'm gonna merge in first and we can talk more later. i'm just calling out that the 15s timeout is likely not a viable long term solution, but it's fine for now.

xzrderek and others added 4 commits August 7, 2025 21:35

test

4fc848c

add error msg

f2de326

current

8f4557b

MINIMAL REPRO

1165ff1

xzrderek changed the title ~~WIP~~ Debugging Async Slow Runs Aug 10, 2025

xzrderek added 4 commits August 10, 2025 04:02

run on local to double check

073d99a

merge

58eba6e

debug

13a8506

cleanup

28932e4

xzrderek commented Aug 10, 2025

View reviewed changes

small fix

9af5b0c

xzrderek requested review from benjibc and mayinghan August 10, 2025 07:36

xzrderek merged commit 16149d2 into main Aug 11, 2025
7 checks passed

xzrderek deleted the derekx/test-long-run branch August 11, 2025 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging Async Slow Runs#41

Debugging Async Slow Runs#41
xzrderek merged 9 commits intomainfrom
derekx/test-long-run

xzrderek commented Aug 8, 2025 •

edited

Loading

Uh oh!

xzrderek Aug 10, 2025

Uh oh!

benjibc Aug 10, 2025

Uh oh!

xzrderek Aug 10, 2025

Uh oh!

mayinghan Aug 10, 2025

Uh oh!

xzrderek Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xzrderek commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xzrderek Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

benjibc Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

xzrderek Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

mayinghan Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

xzrderek Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xzrderek commented Aug 8, 2025 •

edited

Loading

xzrderek Aug 11, 2025 •

edited

Loading