GPT-2 (124M) Fine-tuning with Proximal Policy Optimization (PPO) for HellaSwag

This project aims to fine-tune the GPT-2 (124M) language model using Proximal Policy Optimization (PPO) to achieve higher performance on the HellaSwag commonsense reasoning benchmark than the official score reported in the HellaSwag paper(0.2955, completion style). EleutherAI/gpt-neo-1.3B is currently being used as the reward model.

Project Goals

Fine-tune GPT-2 (124M) using PPO.
Surpass the baseline HellaSwag performance.
Implement additional features based on the PPO paper.
Future Goal: Andrej scores around 33.7(which is the score with GTP3 (Small)) for hellaswag in the Let's produce GPT-2 (124M) lecture which will be our ultimal goal for this project.

Practical Limitations and Challenges

Integrating more recent and powerful models (e.g., Llama-3.2-3B) as the reward model has proven challenging due to differences in tokenization methods. This is a potential area for future improvement.
I was unable to fit a bigger model to my single RTX 4060 GPU or any of the Kaggle GPUs, so I would need to use a cloud GPU but want to keep costs low.
Pretraining with a refined dataset would yield better results than using a larger model as the reward model for PPO training, as it would be more cost-effective.

Acknowledgements

This project currently adapts code from Andrej Karpathy's karpathy/build-nanogpt, specifically the train_gpt2.py and hellaswag.py script(I have modified some part of the code to adapt to my project). Thank you Andrej Karpathy for making your work publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 (124M) Fine-tuning with Proximal Policy Optimization (PPO) for HellaSwag

Project Goals

Practical Limitations and Challenges

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPT-2 (124M) Fine-tuning with Proximal Policy Optimization (PPO) for HellaSwag

Project Goals

Practical Limitations and Challenges

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages