Skip to content

Conversation

@andre15silva
Copy link
Member

Most of the PR is the trajectory analyzer, but this also includes the changes to the dependencies that made it possible to run inference outisde of apptainer and on the full 8 GPU node with no issues from vllm coming up.

@BjarniHaukur you should be able to just uv sync and run it. See benchmarks/swe_bench/run_harness_eval.sh for the batchscript.

I created a pyproject from scratch since I was facing some version conflicts, but haven't added training dependencies back to it. Let me know if adding them to the new setup is not enough, we can debug if that's not the case. After that we can make a run to see if the parallelism is also working for training.

Copy link
Collaborator

@BjarniHaukur BjarniHaukur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Had to do some ad-hoc change to get the infer script running. (and to run vllm serve myself)

Probably some artifact I can fix on my end. Possibly something with the pyproject.

I verified whether the performance degradation I observed with increased parallelism was still in play. And excitingly it does not look like it is! (8 concurrent and 32 concurrent got the same score)

🏆 Current Leaderboard

Performance on SWE-bench Lite subset, ranked by code similarity

# Ver Model Code Sim Test Sim Tokens Tools
11 v5.0.1 qwen3-32b 0.276 0.000 4,409 / 16,384 23.2 / 100
12 v5.0.1 qwen3-32b 0.273 0.005 5,514 / 16,384 32.1 / 100
13 v3.2.0 qwen-2.5-72b-instruct 0.272 0.000 5,873 / 16,384 35.1 / 100
14 v3.2.0 qwen3-32b 0.255 0.000 5,281 / 16,384 28.3 / 100
15 v3.2.0 llama-4-maverick 0.255 0.000 4,647 / 16,384 10.4 / 100

#SBATCH --array=0

set -euo pipefail

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# === NSC Cluster Setup ===
# Load GCC build environment for Triton JIT compilation
module load buildenv-gcccuda/12.1.1-gcc12.3.0
# Add Python 3.11 headers (required for Triton to compile cuda_utils)
# These headers were extracted from Python source since python3.11-devel is not installed
export CPATH="$HOME/.local/include/python3.11:${CPATH:-}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to add something like this to get it working. Obviously a hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants