Quality: GPU configuration crash on CPU-only systems#629
Quality: GPU configuration crash on CPU-only systems#629bachdev wants to merge 2 commits intomicrosoft:mainfrom
Conversation
The code attempts to process the `gpus` string by splitting it, but `gpus` can become `None` if `torch.cuda.is_available()` returns `False`. This will lead to an `AttributeError` when `gpus.split(',')` is called, causing the application to crash on systems without a GPU or where CUDA is not detected. This makes the application unusable in CPU-only environments.
Affected files: main.py
Signed-off-by: BachDEV <1437214+bachdev@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a crash in the classification training entrypoint when running on CPU-only systems by avoiding calling .split() on a None GPU string.
Changes:
- Adjust GPU parsing logic to only split/cast GPU IDs when CUDA is available.
- Explicitly set
gpus = Nonewhen CUDA is unavailable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if torch.cuda.is_available(): | ||
| gpus = [int(i) for i in gpus.split(',')] | ||
| else: | ||
| # If no CUDA devices are available, set gpus to None to indicate CPU usage | ||
| # PyTorch Lightning Trainer will default to CPU if devices is None | ||
| gpus = None |
There was a problem hiding this comment.
On CPU-only systems this still looks likely to crash because the Trainer is configured with accelerator='gpu' unconditionally (line 162). Setting gpus=None will not make Lightning fall back to CPU when the accelerator is explicitly GPU. Consider deriving both accelerator and devices from torch.cuda.is_available() (e.g., accelerator='cpu', devices=1 when CUDA is unavailable, or use accelerator='auto'/devices='auto'). Also, the inline comment about defaulting to CPU when devices is None is misleading in the current configuration.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Problem
The code attempts to process the
gpusstring by splitting it, butgpuscan becomeNoneiftorch.cuda.is_available()returnsFalse. This will lead to anAttributeErrorwhengpus.split(',')is called, causing the application to crash on systems without a GPU or where CUDA is not detected. This makes the application unusable in CPU-only environments.Severity:
criticalFile:
PW_FT_classification/main.pySolution
Explicitly handle the case where
gpusisNoneby assigning an empty list or a CPU-specific device identifier, or by performing the split only ifgpusis a string.Changes
PW_FT_classification/main.py(modified)Testing