Taglish Microaggression Classifier

This repository contains a full pipeline for the detection and classification of Taglish (Tagalog-English) microaggressions. It includes data generation scripts, preprocessing tools, and a transformer-based classification model.

Getting Started

1. Clone the Repository

git clone https://github.com/kndlcero/nlp.git
cd nlp

2. Set Up Virtual Environment

It is highly recommended to use a virtual environment to manage your Python dependencies.

Create the environment:

python -m venv venv

Activate the environment:

Windows (PowerShell): .\venv\Scripts\Activate.ps1
Windows (CMD): .\venv\Scripts\activate
Mac/Linux: source venv/bin/activate

VS Code Setup:

Open the Command Palette (Ctrl+Shift+P)
Type "Python: Select Interpreter"
Choose the one pointing to your local ./venv

3. Install Requirements

pip install -r requirements.txt

Usage Guide

Option A: Running the Full Pipeline (Training)

The pipeline is designed to be executed chronologically.

Local Data Preparation:

Run scripts 1 through 5 locally to generate and clean the dataset:

python pipeline/1_synthetic_generator.py
python pipeline/2_real_world_loader.py
python pipeline/3_manual_loader.py
python pipeline/4_concatenator.py
python pipeline/5_enhancement.py

Cloud Training (Recommended):

Due to local GPU/CUDA limitations, it is advised to run 6_training_pipeline.py on Google Colab.

Upload the generated taglish_microaggression_enhanced_v2.csv to Colab
Execute the training cell
Expected Metric: F1 scores between 60-70%

Option B: Running the Pretrained Model (Inference)

If you wish to use the model immediately:

Download Assets: Download the model files from this Google Drive Folder
Place Assets: Ensure your folder structure matches the diagram below
Run Inference:

python microaggression_classifier.py

Project Structure

For the classifier to run correctly, ensure your local directory is organized as follows:

Microaggression/
├── pipeline/
│   ├── 1_synthetic_generator.py
│   ├── 2_real_world_loader.py
│   ├── 3_manual_loader.py
│   ├── 4_concatenator.py
│   ├── 5_enhancement.py
│   ├── 6_training_pipeline.py
│   ├── all_samples.txt
│   ├── other.csv files
│   └── taglish_microaggression_enhanced_v2.csv
├── taglish_tokenizer/
│   ├── sentencepiece.bpe.model
│   ├── special_tokens_map.json
│   └── tokenizer_config.json
├── venv/
├── .gitignore
├── best_microaggression_model.pt
├── label_mappings.json
├── microaggression_classifier.py
├── model_config.json
└── requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taglish Microaggression Classifier

Getting Started

1. Clone the Repository

2. Set Up Virtual Environment

3. Install Requirements

Usage Guide

Option A: Running the Full Pipeline (Training)

Option B: Running the Pretrained Model (Inference)

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pipeline		pipeline
.gitignore		.gitignore
README.md		README.md
microaggression_classifier.py		microaggression_classifier.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Taglish Microaggression Classifier

Getting Started

1. Clone the Repository

2. Set Up Virtual Environment

3. Install Requirements

Usage Guide

Option A: Running the Full Pipeline (Training)

Option B: Running the Pretrained Model (Inference)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages