OpenCV capture, labeling, training, and inference for YOLO object detection

Overview

This project implements a lightweight image capture, annotation, training, and inference toolkit for object-detection workflows. The system is designed to operate in a controlled, offline imaging setup and to integrate directly with OpenCV-based workflows and modern deep-learning pipelines (e.g. YOLOv8).

The tool combines:

Direct camera capture
Interactive bounding-box annotation
Structured dataset export
Basic dataset validation utilities
A training companion for YOLOv8
A live inference viewer for quick model checks

The emphasis is on simplicity, transparency, and reproducibility, rather than real-time performance or industrial deployment.

System Architecture

The system is structured as a small number of loosely coupled modules:

Capture Module
Annotation UI
Dataset Storage Layer
Dataset Audit Utilities
Training Companion App
Inference Viewer App

Each module is designed to be camera-agnostic and file-system-based to minimise external dependencies.

Dependencies

Python 3.12 (recommended to use prebuilt wheels for NumPy and OpenCV)
pip
OpenCV Python bindings (opencv-python, which installs numpy)
PyQt5 (GUI toolkit for capture, training, and inference apps)
Ultralytics (ultralytics) for YOLOv8 training and inference
PyTorch (torch) for model loading and inference (required by Ultralytics)
psutil (system telemetry in the inference viewer)
Optional: pynvml for NVIDIA GPU telemetry (inference viewer)
Optional: pyinstaller for packaging into a Windows executable

Recommended local setup

Create and use a dedicated venv named .venv312 to avoid interpreter mismatches: py -3.12 -m venv .venv312 then .\.venv312\Scripts\activate.
Install packages inside it: pip install --upgrade pip then pip install --only-binary=:all: "numpy<2.3.0" "opencv-python<4.13" pyqt5 ultralytics.
VS Code users: set the interpreter to .venv312/Scripts/python.exe (workspace setting already included in .vscode/settings.json). Terminals still need .\.venv312\Scripts\activate before running python main.py.

Install Python 3.12 on Windows

Download the official Windows installer for Python 3.12.
During setup, tick Add Python to PATH and keep the Windows launcher (enables py).

Create a virtual environment and install OpenCV (PowerShell)

# From the repository root
py -3.12 -m venv .venv312
.\.venv312\Scripts\activate

# Upgrade pip and install wheel-only builds to avoid compiling
pip install --upgrade pip
pip install --only-binary=:all: "numpy<2.3.0" "opencv-python<4.13" pyqt5 ultralytics torch psutil pynvml

Image Capture Module

Functionality

Acquires frames from a connected camera using OpenCV (cv2.VideoCapture)
Displays a live preview window for framing and focus checks
Captures single frames on user input
Writes image files directly to disk

Design Considerations

No assumptions are made about camera type (laptop webcam or external USB camera)
Capture settings (resolution, camera index) are configurable
Capture is explicitly user-triggered to avoid near-duplicate frames

Output

Images are saved using a deterministic naming scheme encoding:
- capture date/time
- tool or part identifier
- sequential index
Basic capture metadata is recorded separately for traceability

Annotation Interface

Annotation Model

Object detection using axis-aligned bounding boxes
Each bounding box is associated with a single defect class
Multiple defects per image are supported
Images with no defects are explicitly allowed

User Interface

Implemented using OpenCV window callbacks
Mouse input:
- click-and-drag to draw bounding boxes
Keyboard input:
- numeric keys for class assignment
- undo last annotation
- save and advance to next image

The UI is intentionally minimal to reduce annotation time and cognitive load.

Label Format

Annotations are exported in YOLO object detection format, with one text file per image:

<class_id> <x_center> <y_center> <width> <height>

Where:

All coordinates are normalised to image dimensions
class_id corresponds to a fixed class list defined in classes.txt

This format was selected for compatibility with common training frameworks and ease of validation.

Dataset Storage Layout

The dataset is stored entirely on disk using a transparent directory structure:

captures/
  capture_YYYYMMDD_HHMMSS_xxx.jpg
  null/                 # null/removed captures
classes.txt             # class names (one per line)
class_colors.json       # optional RGB palette
config.json             # optional app config (timers, labeler loop)

Design Rationale

Images are not stored in a database to avoid unnecessary I/O overhead
Labels and metadata are human-readable
The dataset can be inspected, versioned, or transferred without specialised tooling

Dataset Integrity and Validation

A dataset audit step is included to ensure consistency before training.

Validation checks include:

One-to-one correspondence between images and label files
Bounding box values within valid ranges
Detection of empty or malformed label files
Summary statistics:
- images per class
- bounding boxes per class
- distribution across tools or parts

This step is intended to catch annotation errors early and prevent silent failures during training.

Integration with Training Pipelines

The generated dataset is intended to be consumed directly by object-detection training frameworks.

Key integration features:

YOLO-compatible label format
Deterministic filenames for reproducible splits
Support for dataset partitioning based on tool, part geometry, or capture session

The capture and annotation codebase is deliberately decoupled from any specific training implementation, while the companion tools provide a short path into YOLOv8.

Training Companion (YOLOv8)

The training UI (train_model.py) provides a guided path to start YOLOv8 training without writing scripts:

Select a dataset folder and classes.txt
Auto-build a temporary YOLO dataset structure and YAML
80/20 train/validation split with deterministic shuffling
Launch the Ultralytics CLI with configurable model size, image size, epochs, and device
Live logs with basic ETA parsing

The prepared dataset lives under .yolo_training_cache/ inside the selected dataset folder.

Inference Viewer

The inference UI (run_inference.py) provides a live camera preview with YOLOv8 detections:

Select a camera and model file
Run inference on live frames
Render bounding boxes, class labels, and confidences
Switch CPU/GPU (if available) from the UI

This tool is intended for quick sanity checks and demoing models, not for production deployment.

Limitations

Annotation is fully manual
Inference is for visualization and validation only (not optimized for production)
No automated defect proposal or pre-labelling
Image quality and defect visibility depend on external hardware and lighting

These constraints are intentional to keep the system lightweight and maintainable.

Intended Use

This tool is intended for:

Rapid dataset generation for defect detection experiments
Prototyping object-detection pipelines
Controlled data collection in laboratory or bench-top environments

It is not intended for production inspection systems or safety-critical use.

Useful Commands

Run (dev)

.\.venv312\Scripts\activate
python main.py --camera 0

Run with explicit resolution

.\.venv312\Scripts\activate
python main.py --camera 0 --width 1280 --height 720

Train (YOLOv8 UI)

.\.venv312\Scripts\activate
python train_model.py

Inference (YOLOv8 UI)

.\.venv312\Scripts\activate
python run_inference.py

Build (PyInstaller, windowless)

.\.venv312\Scripts\activate
pyinstaller --name OpenCVCapture --noconfirm --onedir -w main.py `
  --icon "programLogo.ico" `
  --hidden-import sip `
  --add-data "classes.txt;." `
  --add-data "class_colors.json;." `
  --add-data "captures;captures" `
  --add-data "programLogo.ico;." `
  --add-data "$Env:VIRTUAL_ENV\Lib\site-packages\PyQt5\Qt5\plugins;PyQt5\Qt5\plugins"

Output lives in dist/OpenCVCapture/. Run from there: .\OpenCVCapture.exe --camera 0.

Clean build artifacts

Remove-Item -Recurse -Force build, dist, OpenCVCapture.spec

Key bindings

Main window

C             Capture frame
Q             Quit
LEFT/RIGHT    Prev/next (inspect mode)
DELETE        Delete current file (inspect mode)
E             Edit current file (inspect mode)
ESC           Exit inspect mode

Labeler window

Drag (LMB)    Draw box
0-9           Choose class
Enter         Apply class to selected box
Left/Right    Select box
Up/Down       Cycle class for selected box
U / Z         Undo last box
S             Save labels
N             Mark null
Q / ESC       Cancel labeling
Scroll        Zoom
Middle drag   Pan

Inference window

C             Capture screenshot
H             Toggle telemetry overlay
Q             Quit

Training window

Ctrl+Q        Quit

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
captures/inference		captures/inference
.gitignore		.gitignore
README.md		README.md
captureIcon.ico		captureIcon.ico
class_colors.json		class_colors.json
classes.txt		classes.txt
config.json		config.json
main.py		main.py
programLogo.ico		programLogo.ico
runIcon.ico		runIcon.ico
run_inference.py		run_inference.py
trainIcon.ico		trainIcon.ico
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCV capture, labeling, training, and inference for YOLO object detection

Overview

System Architecture

Dependencies

Recommended local setup

Install Python 3.12 on Windows

Create a virtual environment and install OpenCV (PowerShell)

Image Capture Module

Functionality

Design Considerations

Output

Annotation Interface

Annotation Model

User Interface

Label Format

Dataset Storage Layout

Design Rationale

Dataset Integrity and Validation

Integration with Training Pipelines

Training Companion (YOLOv8)

Inference Viewer

Limitations

Intended Use

Useful Commands

Run (dev)

Run with explicit resolution

Train (YOLOv8 UI)

Inference (YOLOv8 UI)

Build (PyInstaller, windowless)

Clean build artifacts

Key bindings

About

Uh oh!

Releases

Packages

Languages

JoeShade/openCV-Capture-Label-Train

Folders and files

Latest commit

History

Repository files navigation

OpenCV capture, labeling, training, and inference for YOLO object detection

Overview

System Architecture

Dependencies

Recommended local setup

Install Python 3.12 on Windows

Create a virtual environment and install OpenCV (PowerShell)

Image Capture Module

Functionality

Design Considerations

Output

Annotation Interface

Annotation Model

User Interface

Label Format

Dataset Storage Layout

Design Rationale

Dataset Integrity and Validation

Integration with Training Pipelines

Training Companion (YOLOv8)

Inference Viewer

Limitations

Intended Use

Useful Commands

Run (dev)

Run with explicit resolution

Train (YOLOv8 UI)

Inference (YOLOv8 UI)

Build (PyInstaller, windowless)

Clean build artifacts

Key bindings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages