A professional Machine Learning powered web application that predicts whether a news article is Real or Fake, with comprehensive analytics, AI explainability, and advanced features.
- AI Explainability: See which words influenced the prediction with LIME integration
- Batch Processing: Analyze hundreds of articles at once
- Interactive Dashboards: History tracking, statistics, and comparison tools
- Professional UI: Dark theme with mobile-responsive design and granular loading states
- Export Options: PDF reports/CSV downloads
- Machine Learning: Passive Aggressive Classifier with TF-IDF Vectorization
- Confidence Score: Probability percentage with interactive tooltips
- Multiple Input Methods:
- Full article text
- Headlines only
- URL (with robust validation and automatic text extraction)
- AI Explainability (LIME): Visual word-level influence analysis
- Source Credibility: Domain reputation scoring for URLs with security checks
- Prediction History: Searchable SQLite database of all past predictions
- Batch Analysis: CSV upload for bulk article processing with progress tracking
- Comparison Mode: Side-by-side analysis of multiple articles
- Interactive Charts: Plotly visualizations for insights
- Export Tools: PDF reports and CSV downloads
- User Feedback: Rating system with analytics
FakeNewsDetection/
│
├── 🎯 Core Application
│ ├── app.py # Main Streamlit application
│ ├── train_model.py # Model training script
│ └── utils.py # Text processing & URL validation utilities
│
├── 🔧 Feature Modules
│ ├── database.py # SQLite database operations
│ ├── explainer.py # LIME AI explainability
│ ├── credibility.py # Source credibility checker
│ └── export_utils.py # PDF/CSV export tools
│
├── 📊 Data & Models
│ ├── dataset/
│ │ ├── news.csv # Training dataset
│ │ └── sample_data.csv # Demo dataset
│ ├── model/
│ │ ├── fake_news_model.pkl # Trained model
│ │ └── tfidf_vectorizer.pkl # TF-IDF vectorizer
│ └── data/
│ └── predictions.db # SQLite database (auto-created)
│
├── 📖 Documentation
│ ├── README.md # Main documentation
│ ├── QUICK_START.md # Quick start guide
│ ├── NEW_FEATURES.md # Detailed feature documentation
│ ├── UI_ENHANCEMENTS.md # UI design documentation
│ └── IMPLEMENTATION_SUMMARY.md # Development summary
│
└── ⚙️ Configuration
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
└── download_data.py # Dataset downloader script
git clone https://github.com/YOUR_USERNAME/FakeNewsDetection.gitcd FakeNewsDetectionpip install -r requirements.txtThe project includes sample_data.csv for testing. To train on a full dataset:
- Run
python download_data.pyto fetch Politifact data. - OR place your own
news.csvin thedataset/folder.
python train_model.pyCreates the model and vectorizer in the model/ directory.
streamlit run app.pyOpen your browser to http://localhost:8501
-
🔍 Single Prediction
- Analyze individual articles with granular loading states
- AI explanations with LIME highlighting
- URL validation and content-type checking
- Export to PDF/CSV
-
📦 Batch Processing
- Upload CSV with multiple articles
- Bulk analysis with progress tracking
- Summary statistics and charts
-
⚖️ Comparison Mode
- Compare multiple articles side-by-side
- Visual probability distribution charts
-
📜 History
- View all past predictions with search and filter
- Export history as CSV
-
📊 Statistics
- Total predictions and class distribution
- User feedback analytics
The system uses a Passive Aggressive Classifier calibrated for probability outputs. It is particularly well-suited for large-scale text classification.
Text is converted into numerical features using TF-IDF (Term Frequency - Inverse Document Frequency) with N-grams (1,2), capturing both individual words and common phrases.
Using LIME, the system perturbs the input text to see which words most influence the model's decision, highlighting them in the UI (Red for Fake indicators, Green for Real).
The system performs a HEAD request to verify content types (preventing non-HTML downloads) and evaluates domain reputation based on a curated list of credible and unreliable sources.
- Python 3.x
- Streamlit (Web UI)
- Scikit-learn (Machine Learning)
- LIME (AI Explainability)
- Plotly (Visualizations)
- SQLite (Data Persistence)
- FPDF2 (PDF Generation)
- BeautifulSoup4 (Web Scraping)
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a Pull Request
⭐ If you find this project useful, please give it a star!
