GitHub - godwin2707/Resume-Classification-Project: The Resume Classification Project is a machine learning application designed to automatically classify resumes into specific job roles based on the content of the resume

The Resume Classification Project is a machine learning application designed to automatically classify resumes into specific job roles (e.g., Data Scientist, Web Developer, etc.) based on the content of the resume. Here's a detailed explanation of how the project works and what it includes:

🔍 Project Overview The goal is to build a web-based app that accepts a resume (in .txt or .pdf format), processes its content, and predicts the most suitable job role using a trained machine learning model.

What problems does it solve? It reduces the time and manual effort involved in screening resumes, helping recruiters and hiring platforms identify the right candidate fit more efficiently.

🧠 Key Components Frontend/UI: Built with Streamlit, a Python library for building web apps for ML/data science. Users can upload resumes in .txt or .pdf formats. Shows extracted resume text and displays predicted job role.

Text Extraction: For PDF files, uses: PyPDF2 for extracting text from standard PDFs. pdf2image + pytesseract for OCR-based extraction from graphically designed PDFs (containing images or fancy fonts). For .txt files, reads the text directly.

Preprocessing & Vectorization: Text is converted to lowercase and then transformed using TF-IDF vectorization (TfidfVectorizer.pkl), which converts text to numerical format suitable for ML models.

Machine Learning Model: Uses a K-Nearest Neighbors (KNN) model (KNN.pkl) trained on labeled resume data.

Model predicts a numerical label, which is decoded to a job title using a LabelEncoder (label_encoder.pkl).

🛠️ Tech Stack Python Streamlit – Web interface scikit-learn – Model training and prediction PyPDF2, pdf2image, pytesseract – PDF text extraction pickle – Loading trained models and vectorizers

Future Work: I plan to enhance my Resume Classification project by incorporating more diverse datasets, improving model accuracy with advanced NLP techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Dataset		Dataset
Demonstration		Demonstration
Main_Files		Main_Files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

godwin2707/Resume-Classification-Project

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages