Skip to content

royxlead/self-diagnosing-neural-models-python

Repository files navigation

Self-Diagnosing Neural Models: Uncertainty Quantification & Unsupervised Confidence Estimation

License: MIT

Author: Sourav Roy
Email: royxlead@proton.me
Date: October 2025


Project Overview

This repository contains a complete implementation and evaluation suite for "Self-Diagnosing Neural Models": uncertainty quantification (UQ) methods and a novel unsupervised confidence metric that estimates model confidence without using labels. The project includes:

  • Implementations of multiple UQ methods: Baseline (MSP), Monte Carlo Dropout (MC Dropout), Evidential Deep Learning (EDL), and Deep Ensembles.
  • A novel unsupervised confidence metric combining prediction consistency across augmentations, entropy, feature-space dispersion, and softmax temperature analysis.

Quick highlights (numbers taken from final_report.txt)

  • Dataset: CIFAR-10 (ID) vs CIFAR-100 (OOD)
  • Training epochs reported: 100
  • Best single-model accuracy: Evidential (≈91.7%)
  • Best calibration (lowest ECE): MC Dropout (≈0.0097)
  • Best OOD detection AUROC (≈0.855): Baseline / Ensemble

Installation (minimal)

Create and activate a virtual environment and install the core packages. Adjust the torch wheel for your CUDA version.

# optional: create a venv and activate it
python -m venv .venv; .\\.venv\\Scripts\\Activate.ps1

# common packages - pin or change versions if you need exact reproduction
pip install numpy scipy scikit-learn matplotlib seaborn tqdm tensorboard
# Install torch + torchvision following instructions at https://pytorch.org (pick the right CUDA)
pip install torch torchvision

Quick usage

  • Open self_diagnosing_neural_models_python.ipynb in Jupyter or VS Code and run cells from the top.
  • The notebook exposes a main_pipeline(...) orchestration function to train, evaluate, and export results. When running experiments you can set train_models=False to load existing checkpoints instead of retraining.

Example (from inside the notebook after converting to .py or using the notebook kernel):

models, results, unsupervised_results, evaluator = main_pipeline(
    train_models=False,    # load checkpoints instead of training from scratch
    num_epochs=100,
    run_ablations=True,
    id_dataset='cifar10',
    ood_dataset='cifar100',
    batch_size=128
)

Checkpoints

  • Checkpoints for Baseline, MC Dropout and Evidential models are available in checkpoints/.
  • Ensemble member weights (if present) live in ensemble_model/ensemble_model_*.pth.

Reproducing results and smoke tests

  1. Install dependencies and set the appropriate PyTorch wheel for your GPU/CPU.
  2. Open and run the notebook cells in order — the notebook sets seeds for deterministic behavior where possible.
  3. For quick smoke tests, use the notebook's CLI flags (--smoke-test or --fast-debug) or set the FAST_DEBUG_SUBSET env var.

Visualizations

The images/ folder contains the main plotted outputs. A few representative figures are embedded below — click the images to open the full-size PNGs in the repository.

Comparison across methods (accuracy / ECE / AUROC):

Comparison metrics

Figure: Side-by-side comparison of key metrics across Baseline, MC Dropout, Evidential, and Ensemble models.

Confidence distributions and unsupervised metric behavior:

Confidence distributions

Figure: Predicted confidence histograms and the proposed unsupervised confidence score behavior across datasets.

OOD detection ROC curves (CIFAR-10 ID vs CIFAR-100 OOD):

OOD ROC curves

Figure: ROC curves for OOD detection using CIFAR-10 as ID and CIFAR-100 as OOD; higher AUROC indicates better separability.

More visuals (in images/): training curves for each method, reliability diagrams (*_reliability_diagram.png), per-method unsupervised analyses (*_unsupervised_analysis.png), and ablation plots (ablation_*.png).

Gallery

Training curves

Baseline training curves MC Dropout training curves Evidential training curves

Reliability diagrams

Baseline reliability diagram MC Dropout reliability diagram Evidential reliability diagram Ensemble reliability diagram

Unsupervised analyses

Baseline unsupervised analysis MC Dropout unsupervised analysis Evidential unsupervised analysis Ensemble unsupervised analysis

Ablation studies

Ablation: dropout rate Ablation: ensemble size Ablation: unsupervised weights

Notes about the codebase

  • The notebook contains well-commented components: DatasetManager, BaselineModel, MCDropoutModel, EvidentialModel, UnsupervisedConfidenceMetric, Trainer, DeepEnsemble, ComprehensiveEvaluator, Visualizer, and AblationStudies.
  • The main pipeline function main_pipeline(...) orchestrates dataset loading, training (or loading), evaluation, and plotting.

License

This project is licensed under the MIT License — see the LICENSE file.

About

Self-Diagnosing Neural Networks: models that quantify their own uncertainty and assess prediction reliability without labeled validation data. Includes evidential deep learning, uncertainty-aware loss functions, and confidence-calibrated inference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors