Conversation
This comment was marked as duplicate.
This comment was marked as duplicate.
| RUN uv pip install --system --no-cache \ | ||
| accelerate \ | ||
| boto3 \ | ||
| bitsandbytes \ | ||
| datasets \ | ||
| evaluate \ | ||
| lm-eval \ | ||
| openai \ | ||
| pandas \ | ||
| scikit-learn \ | ||
| shortuuid \ | ||
| tokenizers \ | ||
| transformers \ | ||
| trl \ | ||
| peft \ | ||
| tiktoken \ | ||
| inspect-ai \ | ||
| matplotlib \ | ||
| certifi | ||
|
|
||
| # Note: flash_attn requires GPU to compile - install at runtime if needed: |
There was a problem hiding this comment.
pin versions like the current images
|
things that are remaining to get full parity with the original PTB implementation:
|
|
after discussing with Alex from Harbor/tbench:
|
|
Added modal storage for hf-cache for harbor in the branch Although there are some other changes as well, so you can probably clone the repo with this branch in another directory and ask your agent:
|
|
Also there is an upcoming change to the judge which will need to be integrated. Will post here. |
|
We need to hardcode baseline values to a json, instead of fetching them from the This is needed for harbor integration (harbor should output the baseline value, in case the judge flags the run). |
|
Merged main into Harbor branch, @rank-and-file maybe we should push the new judge to main soon so we can pull it here. The new judge would require some major changes for Harbor |
Adds Harbor framework support to PostTrainBench, enabling anyone to run our benchmark on cloud GPUs (Modal, Daytona) without needing access to our internal HTCondor cluster.
At the moment:
Tested:
Usage
See
src/harbor_adapter/README.mdfor detailed parity tracking. Key points:result.jsontimer.sh:Minor difference (created at task generation vs job start)Note: Right now I have skipped the installation of flash-attn in the container as we need to have a CUDA runtime for it. In modal the GPU is attached to the sandbox after the container is built, so installation doesn't occur.
Note: I have added a uv environment for us to use in PTB. This is used for using modal and harbor, and is useful in general for reproducibility
Todos: