CodeQA Fine-Tuning

This repository provides a complete pipeline for preprocessing the CodeQA dataset and fine-tuning code language models using Full Fine-Tuning or DoRA Parameter-Efficient Fine-Tuning (PEFT). The goal of this project is to evaluate the effectiveness of PEFT on CodeQA, an application that has previously been underexplored.

💡 Main Points

Efficient Fine-Tuning: DoRA adapts only a small set of rank-decomposed matrices instead of updating every model weight.
Effective Preprocessing: The pipeline effectively parses and grammar-corrects the CodeQA dataset.
Reproducibility: All scripts contain sensible arguments and defaults for reproducibility.

📁 Key Files

.
├── preprocessing/
│   ├── preprocess.py
│   ├── grammar_correction.py
├── scripts/
│   └── run_preprocessing.sh
├── finetuning/
│   ├── full_ft.py
│   └── lora_ft.py
├── evaluation/
│   └── eval.py
├── README.md

🧹 Preprocessing

preprocessing/preprocess.py
Parses and formats the raw CodeQA dataset.
preprocessing/grammar_correction.py
Applies grammar correction to natural language questions and answers in the dataset.
preprocessing/data_formatting.py Formats the dataset into a structure suitable for training and evaluation.
scripts/run_preprocessing.sh
Shell script to run the preprocessing pipeline in the correct order with appropriate arguments.

🧠 Fine-Tuning

finetuning/full_ft.py
Script for full model fine-tuning using CodeT5+ or CodeBERT on the preprocessed CodeQA dataset.
finetuning/lora_ft.py
Script for DoRA parameter-efficient fine-tuning on the same dataset.

📈 Evaluation

evaluation/eval.py
Evaluates model checkpoints on the CodeQA dataset, reporting relevant success metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
evaluation		evaluation
finetuning		finetuning
preprocessing		preprocessing
scripts		scripts
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
code_qa.yaml		code_qa.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeQA Fine-Tuning

💡 Main Points

📁 Key Files

🧹 Preprocessing

🧠 Fine-Tuning

📈 Evaluation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

tylerstennett/lora-code-qa

Folders and files

Latest commit

History

Repository files navigation

CodeQA Fine-Tuning

💡 Main Points

📁 Key Files

🧹 Preprocessing

🧠 Fine-Tuning

📈 Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages