🗂️ FileSense - File Sorter

🔍 Overview

FileSense is an intelligent, local file organizer that sorts documents by meaning, not just by name or extension.

Unlike standard organizers that rely on hardcoded rules, FileSense uses SentenceTransformers and FAISS to understand the semantic context of your files.

✨ New : It is now Self-Organizing. If FileSense encounters a document that doesn't fit any existing folder, it uses Google Gemini (GenAI) to analyze the content, generate a new specific category, create the folder, and update its own sorting logic automatically.

📺 Overview Video: FileSense Demo

🎥 Webpage: ahhyoushh.github.io/FileSense

⚙️ Core Features

Feature	Description
🧠 Semantic Sorting	Sorts by meaning (e.g., "Newton's Laws" → "Physics"), not just keywords.
🤖 Generative Labeling	(New) Uses Google Gemini to auto-generate new categories/folders for unknown file types.
⚡ FAISS Indexing	Uses vector databases for lightning-fast similarity searches.
🔄 Self-Updating	When a new label is generated, the AI creates the folder and rebuilds the index automatically.
👀 OCR Support	Extracts text from scanned PDFs and images using `pdfplumber` and `pytesseract`.
🧩 Keyword Boosting	Hybrid search approach: Vector Similarity + Keyword weighting for maximum accuracy.
🖥️ GUI Launcher	Desktop interface with real-time logs, system tray support, and process management.
🧵 Multithreading	Sorts massive directories in parallel for high performance.

📁 Folder Structure

FileSense/
│
├── scripts/
│   ├── RL/                       # Reinforcement Learning Module
│   ├── RL/                       # Reinforcement Learning & SFT
│   │   ├── rl_policy.py          # Epsilon-Greedy Agent
│   │   ├── rl_feedback.py        # Feedback & Rewards
│   │   ├── rl_config.py          # Hyperparameters
│   │   ├── rl_supabase.py        # Cloud Logging
│   │   └── rl_audit_safe.py      # Safety Audits
│   ├── logger/                   # Logging System
│   │   ├── logger.py             # Main Logger
│   │   └── rl_logger.py          # RL-Specific Logger
│   ├── classify_process_file.py  # Core Logic: Embedding & Classification
│   ├── generate_label.py         # GenAI Interface (Gemini)
│   ├── create_index.py           # FAISS Index Manager
│   ├── extract_text.py           # OCR & Text Extraction
│   ├── multhread.py              # Multithreading Manager
│   ├── launcher.py               # System Tray GUI
│   ├── script.py                 # CLI Entry Point
│   └── watcher_script.py         # Real-time Monitor
│
├── folder_labels.json            # Semantic Knowledge Base
├── folder_embeddings.faiss       # Vector Index
├── evaluation/                   # Metrics & Logs
└── files/                        # Default Input Directory

🔬 How It Works

1️⃣ Text Extraction

FileSense reads the file. If it's a text-based PDF/DOCX, it extracts raw text. If it's a scanned document, it applies OCR/Image processing to read the content.

2️⃣ Semantic Search

It converts the document text into a vector embedding and searches the local folder_embeddings.faiss index.

High Confidence (≥ 0.5): The file is moved to the matching folder.
Low Confidence: The system assumes no suitable folder exists.

3️⃣ Generative Classification (The "AI" Step)

If confidence is low:

The text is sent to Google Gemini.(Optional)
Gemini analyzes the content and determines a broad category (e.g., "Quantum Mechanics") and specific keywords.
It updates folder_labels.json (merging with existing data if needed).
FileSense rebuilds the FAISS index on the fly and classifies the file again with the new knowledge.

🛠️ Installation & Setup

1. Prerequisites

Python 3.8+
A Google Cloud API Key (for Gemini)

2. Install Dependencies

pip install sentence-transformers faiss-cpu numpy pdfplumber pytesseract pillow python-docx watchdog pystray google-genai python-dotenv

Linux Users

Install Tesseract OCR:

sudo apt install tesseract-ocr

3. Environment Setup

Create a .env file in the root directory and add your Google API key:

API_KEY=your_google_gemini_api_key_here

4. Initialization

Create the initial index (even if empty):

python scripts/create_index.py

🚀 Usage

Option A: GUI Launcher (Recommended)

Run the desktop app to manage everything visually.

python scripts/launcher.py

Option B: Real-Time Watcher

Keep it running in the background to sort files as you download them.

python scripts/watcher_script.py --dir ./Downloads

Option C: Bulk Sort

Sort an existing mess of files once.

python scripts/script.py --dir ./Downloads --threads 8

🧾 License

IDEAS TO IMPLEMENT

Use the dateset with category labels for the data, make a script to general folder labels until the similarity crosses a certain threshold for all files in the train dataset. In this way the description and folders_labels.json would be most optimised.
After the last update with gemini, make the model return the revised prompt and use the revised prompt so that the prompt self optimises.
Setup RL: let the user upload logs that include text from the file and folder label given.
explain why i used Sentence transformers rather than just using a tezt classifier

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.github/workflows		.github/workflows
evaluation		evaluation
folder_label_generator		folder_label_generator
landing		landing
scripts		scripts
wiki		wiki
.gitignore		.gitignore
Complete_SFT_Handbook.pdf		Complete_SFT_Handbook.pdf
FileSense_Launcher.bat		FileSense_Launcher.bat
FileSense_Launcher.sh		FileSense_Launcher.sh
LOGGING_ARCHITECTURE.md		LOGGING_ARCHITECTURE.md
README.md		README.md
RL.md		RL.md
Run_FileSense_Silent.vbs		Run_FileSense_Silent.vbs
TODO.md		TODO.md
dataset.csv		dataset.csv
folder_embeddings.faiss		folder_embeddings.faiss
folder_labels.json		folder_labels.json
merging_dataset.csv		merging_dataset.csv
preseeded.json		preseeded.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🗂️ FileSense - File Sorter

🔍 Overview

⚙️ Core Features

📁 Folder Structure

🔬 How It Works

1️⃣ Text Extraction

2️⃣ Semantic Search

3️⃣ Generative Classification (The "AI" Step)

🛠️ Installation & Setup

1. Prerequisites

2. Install Dependencies

Linux Users

3. Environment Setup

4. Initialization

🚀 Usage

Option A: GUI Launcher (Recommended)

Option B: Real-Time Watcher

Option C: Bulk Sort

🧾 License

IDEAS TO IMPLEMENT

About

Uh oh!

Releases 1

Packages

Languages

ahhyoushh/FileSense

Folders and files

Latest commit

History

Repository files navigation

🗂️ FileSense - File Sorter

🔍 Overview

⚙️ Core Features

📁 Folder Structure

🔬 How It Works

1️⃣ Text Extraction

2️⃣ Semantic Search

3️⃣ Generative Classification (The "AI" Step)

🛠️ Installation & Setup

1. Prerequisites

2. Install Dependencies

Linux Users

3. Environment Setup

4. Initialization

🚀 Usage

Option A: GUI Launcher (Recommended)

Option B: Real-Time Watcher

Option C: Bulk Sort

🧾 License

IDEAS TO IMPLEMENT

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages