Skip to content

docs: overhaul installation instructions around uv + bring-your-own Python/PyTorch/CUDA#15769

Open
pzelasko wants to merge 4 commits into
mainfrom
docs/install-uv-byo
Open

docs: overhaul installation instructions around uv + bring-your-own Python/PyTorch/CUDA#15769
pzelasko wants to merge 4 commits into
mainfrom
docs/install-uv-byo

Conversation

@pzelasko

@pzelasko pzelasko commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

What & why

Harmonizes and corrects the NeMo Speech installation documentation now that the repo standardizes on uv (committed uv.lock) with a fresh Dockerfile. The previous docs were inconsistent and in places broken (wrong Python/PyTorch versions, a GPU pip command that can't resolve, .[test] which isn't an extra, stale clone URLs).

Two framing principles, per maintainer guidance:

  • uv + cu13 is the lead recommendation; pip is a documented fallback.
  • Bring-your-own Python/PyTorch/CUDA is fully supportednemo-toolkit only requires torch>=2.6, so a pre-installed PyTorch is kept, not replaced. The uv.lock/container combo (Python 3.13, PyTorch 2.12, CUDA 12.6/13.2) is the actively-supported baseline, not a hard requirement.

Key fixes

  • Python floor 3.10 (was 3.12 in docs; pyproject/uv.lock enforce 3.10), PyTorch 2.12 actual (cu126/cu132).
  • GPU pip install now shows the required --extra-index-url https://download.pytorch.org/whl/cu13{2,6} (pip doesn't read [tool.uv] indexes).
  • test/docs are PEP 735 --groups, not extras (.[test] removed).
  • uv sync --locked (exact baseline) vs uv pip/pip (BYO) distinction, with a warning not to use uv sync --locked for BYO.
  • Clone URL → canonical NVIDIA-NeMo/NeMo (docs, package metadata, error messages); fixed stale [project.urls] and package_info.py.
  • compiled / compiled-a100 extras documented as the optional SpeechLM2/Automodel accelerated-backend kernels (Automodel runs fine without them); H100+ vs A100 split; built via the Dockerfile.
  • A100 works with both CUDA 12 and CUDA 13 — CUDA 13 recommended, CUDA 12 as a convenience.
  • Routed scattered pages (g2p, magpietts-finetuning, nemo_forced_aligner, index, speechlm2/intro) to the canonical guide via :ref:installation``; aligned the docs build with CI (uv sync --locked); normalized the model-card template + NFA tool refs.

Validation

Installability verified in resource-capped Docker builds (--cpus=6 --memory=10g):

  • py3.10 + preinstalled torch 2.6 (cpu), py3.12 + torch 2.8 (cpu), official pytorch:2.6-cuda12.4 image — pre-installed torch kept in every case, import nemo.collections.asr OK.
  • py3.10 default path (pulls torch 2.12) and the documented [asr,cu13] --extra-index-url …/cu132 GPU command (resolves torch 2.12.0+cu132) — both import OK.

black/isort pass on changed Python files; RST validated via docutils parse.

Pending

The NGC container pull command is a clearly-marked Coming soon placeholder in README and the install page, to be filled in when the image is published.

🤖 Generated with Claude Code

pzelasko and others added 2 commits June 8, 2026 15:02
…ersions

Harmonize and correct installation docs across README, CLAUDE.md, and the
Sphinx install page, and fix stale package-metadata URLs.

- Lead with uv + cu13 as the recommended install; pip is a documented fallback.
- Emphasize bring-your-own Python (>=3.10) / PyTorch (>=2.6) / CUDA: nemo-toolkit
  only pins torch>=2.6, so a pre-installed PyTorch is kept, not replaced.
- Frame the uv.lock/container combo (Python 3.13, PyTorch 2.12, CUDA 12.6/13.2)
  as the actively-supported stack, not a hard requirement.
- Document the compiled / compiled-a100 extras (source-built GPU kernels for
  SpeechLM2 / Automodel: Transformer Engine, FlashAttention, Mamba, grouped-GEMM,
  DeepEP), including the H100+ vs A100 split and that they build via the Dockerfile.
- Fix broken commands: GPU pip install now shows the required --extra-index-url;
  test/docs are PEP 735 groups (--group), not extras.
- Correct the Python floor (3.10), torch version (2.12), and clone URL
  (NVIDIA-NeMo/NeMo); add an NGC container placeholder pending the image.
- Update stale repo URLs to NVIDIA-NeMo/NeMo in pyproject.toml and package_info.py.

Validated installability in Docker (py3.10/3.11/3.12; preinstalled torch
2.6/2.8/official cu124 kept; default + cu13 GPU paths resolve and import).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Incorporate the useful parts of a parallel install-docs review and apply a
broader consistency pass:

- Distinguish uv sync --locked (exact supported baseline; add --python 3.13)
  from uv pip / pip (bring-your-own), with a warning not to use uv sync --locked
  for BYO. Offer uv pip alongside pip for the fallback path.
- Clarify A100: works with BOTH CUDA 12 and CUDA 13 — CUDA 13 (default base
  image) recommended, CUDA 12 base offered only as a convenience.
- Broaden PyTorch targets to CPU/CUDA/ROCm/Apple Silicon; note cu12/cu13 also
  add the matching CUDA Python deps (cuda-python, numba-cuda).
- Route scattered pages to the canonical install guide via :ref:`installation`
  (g2p, magpietts-finetuning, nemo_forced_aligner) and modernize index.rst /
  speechlm2/intro.rst snippets; add a docker run example and a lighter
  import-only verify step.
- Align docs build with CI (uv sync --locked --group docs; uv run make linkcheck);
  prune the now-fixed nemo_forced_aligner entry from the broken-links list.
- Normalize stale install references in the model-card template, NFA tool docs,
  and runtime error messages (nemo-toolkit name; NVIDIA-NeMo/NeMo clone URL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pzelasko pzelasko requested a review from a team as a code owner June 8, 2026 22:22
@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added core Changes to NeMo Core TTS labels Jun 8, 2026
…drop torchvision mention

- PyTorch target wording: "CPU, CUDA, etc." (drop explicit ROCm / Apple Silicon).
- compiled-a100: note the patched A100 DeepEP is auto-built/installed by the
  Dockerfile when the CUDA 12 base image is selected.
- Remove the stray torchvision mention from the conda tip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ssofja
Ssofja previously approved these changes Jun 9, 2026

@Ssofja Ssofja left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pzelasko

pzelasko commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test e4ed7e7

@nithinraok nithinraok left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for doc build for final view.

Comment thread docs/source/starthere/install.rst Outdated
Comment thread docs/source/starthere/install.rst Outdated
@pzelasko

pzelasko commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test e62b47f

@pzelasko pzelasko enabled auto-merge (squash) June 9, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to NeMo Core TTS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants