-
Notifications
You must be signed in to change notification settings - Fork 1
feat: multi-gpu inference, trajectory analyzer #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
BjarniHaukur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Had to do some ad-hoc change to get the infer script running. (and to run vllm serve myself)
Probably some artifact I can fix on my end. Possibly something with the pyproject.
I verified whether the performance degradation I observed with increased parallelism was still in play. And excitingly it does not look like it is! (8 concurrent and 32 concurrent got the same score)
🏆 Current Leaderboard
Performance on SWE-bench Lite subset, ranked by code similarity
| # | Ver | Model | Code Sim | Test Sim | Tokens | Tools |
|---|---|---|---|---|---|---|
| 11 | v5.0.1 | qwen3-32b | 0.276 | 0.000 | 4,409 / 16,384 | 23.2 / 100 |
| 12 | v5.0.1 | qwen3-32b | 0.273 | 0.005 | 5,514 / 16,384 | 32.1 / 100 |
| 13 | v3.2.0 | qwen-2.5-72b-instruct | 0.272 | 0.000 | 5,873 / 16,384 | 35.1 / 100 |
| 14 | v3.2.0 | qwen3-32b | 0.255 | 0.000 | 5,281 / 16,384 | 28.3 / 100 |
| 15 | v3.2.0 | llama-4-maverick | 0.255 | 0.000 | 4,647 / 16,384 | 10.4 / 100 |
| #SBATCH --array=0 | ||
|
|
||
| set -euo pipefail | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # === NSC Cluster Setup === | |
| # Load GCC build environment for Triton JIT compilation | |
| module load buildenv-gcccuda/12.1.1-gcc12.3.0 | |
| # Add Python 3.11 headers (required for Triton to compile cuda_utils) | |
| # These headers were extracted from Python source since python3.11-devel is not installed | |
| export CPATH="$HOME/.local/include/python3.11:${CPATH:-}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to add something like this to get it working. Obviously a hack.
Most of the PR is the trajectory analyzer, but this also includes the changes to the dependencies that made it possible to run inference outisde of apptainer and on the full 8 GPU node with no issues from vllm coming up.
@BjarniHaukur you should be able to just
uv syncand run it. See benchmarks/swe_bench/run_harness_eval.sh for the batchscript.I created a pyproject from scratch since I was facing some version conflicts, but haven't added training dependencies back to it. Let me know if adding them to the new setup is not enough, we can debug if that's not the case. After that we can make a run to see if the parallelism is also working for training.