Conversation
| body = {"seed": session.seed} | ||
|
|
||
| timeout = httpx.Timeout(3.0) | ||
| timeout = httpx.Timeout(15.0) |
There was a problem hiding this comment.
cc @mayinghan we should come up with a better solution to this timeout. for complex environments like tau, can definitely take a long time. e.g. it takes ~12 seconds to reset all the environments for airline (loads a large json, and we're doing it on a thread pool, so not truely concurrent)
There was a problem hiding this comment.
what is this sleep for? I thought we fixed it with health check?
There was a problem hiding this comment.
it’s not a sleep but a timeout, so if the env reset takes more than 15s, it’ll time out. this reset is called on cleanup when the rollouts end.
can we just remove the timeout amount since it’s possible for env reset to take more than 15s or is that dangerous?
There was a problem hiding this comment.
maybe we can also consider delete that session completely from the mcp server? but then the server will never be able to persistent any state after one single run
There was a problem hiding this comment.
but then the server will never be able to persistent any state after one single run
i don't quite get what this means. i believe you added reset_session recently, and it's triggered at the end of the rollout. so aren't we already not persisting state after a run?
regardless, i'm gonna merge in first and we can talk more later. i'm just calling out that the 15s timeout is likely not a viable long term solution, but it's fine for now.
everything should be parallel now :)
tau bench takes ~3 min now instead of 15