Self-Diagnosing Neural Models: Uncertainty Quantification & Unsupervised Confidence Estimation

Author: Sourav Roy
Email: royxlead@proton.me
Date: October 2025

Project Overview

This repository contains a complete implementation and evaluation suite for "Self-Diagnosing Neural Models": uncertainty quantification (UQ) methods and a novel unsupervised confidence metric that estimates model confidence without using labels. The project includes:

Implementations of multiple UQ methods: Baseline (MSP), Monte Carlo Dropout (MC Dropout), Evidential Deep Learning (EDL), and Deep Ensembles.
A novel unsupervised confidence metric combining prediction consistency across augmentations, entropy, feature-space dispersion, and softmax temperature analysis.

Quick highlights (numbers taken from `final_report.txt`)

Dataset: CIFAR-10 (ID) vs CIFAR-100 (OOD)
Training epochs reported: 100
Best single-model accuracy: Evidential (≈91.7%)
Best calibration (lowest ECE): MC Dropout (≈0.0097)
Best OOD detection AUROC (≈0.855): Baseline / Ensemble

Installation (minimal)

Create and activate a virtual environment and install the core packages. Adjust the torch wheel for your CUDA version.

# optional: create a venv and activate it
python -m venv .venv; .\\.venv\\Scripts\\Activate.ps1

# common packages - pin or change versions if you need exact reproduction
pip install numpy scipy scikit-learn matplotlib seaborn tqdm tensorboard
# Install torch + torchvision following instructions at https://pytorch.org (pick the right CUDA)
pip install torch torchvision

Quick usage

Open self_diagnosing_neural_models_python.ipynb in Jupyter or VS Code and run cells from the top.
The notebook exposes a main_pipeline(...) orchestration function to train, evaluate, and export results. When running experiments you can set train_models=False to load existing checkpoints instead of retraining.

Example (from inside the notebook after converting to .py or using the notebook kernel):

models, results, unsupervised_results, evaluator = main_pipeline(
    train_models=False,    # load checkpoints instead of training from scratch
    num_epochs=100,
    run_ablations=True,
    id_dataset='cifar10',
    ood_dataset='cifar100',
    batch_size=128
)

Checkpoints

Checkpoints for Baseline, MC Dropout and Evidential models are available in checkpoints/.
Ensemble member weights (if present) live in ensemble_model/ensemble_model_*.pth.

Reproducing results and smoke tests

Install dependencies and set the appropriate PyTorch wheel for your GPU/CPU.
Open and run the notebook cells in order — the notebook sets seeds for deterministic behavior where possible.
For quick smoke tests, use the notebook's CLI flags (--smoke-test or --fast-debug) or set the FAST_DEBUG_SUBSET env var.

Visualizations

The images/ folder contains the main plotted outputs. A few representative figures are embedded below — click the images to open the full-size PNGs in the repository.

Comparison across methods (accuracy / ECE / AUROC):

Figure: Side-by-side comparison of key metrics across Baseline, MC Dropout, Evidential, and Ensemble models.

Confidence distributions and unsupervised metric behavior:

Figure: Predicted confidence histograms and the proposed unsupervised confidence score behavior across datasets.

OOD detection ROC curves (CIFAR-10 ID vs CIFAR-100 OOD):

Figure: ROC curves for OOD detection using CIFAR-10 as ID and CIFAR-100 as OOD; higher AUROC indicates better separability.

More visuals (in images/): training curves for each method, reliability diagrams (*_reliability_diagram.png), per-method unsupervised analyses (*_unsupervised_analysis.png), and ablation plots (ablation_*.png).

Gallery

Training curves

Reliability diagrams

Unsupervised analyses

Ablation studies

Notes about the codebase

The notebook contains well-commented components: DatasetManager, BaselineModel, MCDropoutModel, EvidentialModel, UnsupervisedConfidenceMetric, Trainer, DeepEnsemble, ComprehensiveEvaluator, Visualizer, and AblationStudies.
The main pipeline function main_pipeline(...) orchestrates dataset loading, training (or loading), evaluation, and plotting.

License

This project is licensed under the MIT License — see the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Diagnosing Neural Models: Uncertainty Quantification & Unsupervised Confidence Estimation

Project Overview

Quick highlights (numbers taken from `final_report.txt`)

Installation (minimal)

Quick usage

Checkpoints

Reproducing results and smoke tests

Visualizations

Gallery

Notes about the codebase

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
checkpoints		checkpoints
ensemble_model		ensemble_model
images		images
runs		runs
LICENSE		LICENSE
README.md		README.md
evaluation_results.pkl		evaluation_results.pkl
final_report.txt		final_report.txt
self_diagnosing_neural_models_python.ipynb		self_diagnosing_neural_models_python.ipynb

Folders and files

Latest commit

History

Repository files navigation

Self-Diagnosing Neural Models: Uncertainty Quantification & Unsupervised Confidence Estimation

Project Overview

Quick highlights (numbers taken from final_report.txt)

Installation (minimal)

Quick usage

Checkpoints

Reproducing results and smoke tests

Visualizations

Gallery

Notes about the codebase

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick highlights (numbers taken from `final_report.txt`)

Packages