NVIDIA-NeMo · pzelasko · Jun 9, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,13 +8,9 @@ NeMo Speech — toolkit for training/deploying speech models (ASR, TTS, Speech L
 
 ## Build & Install
 
-```bash
-pip install -e '.[all]'       # Full dev install
-pip install -e '.[asr]'       # ASR only
-pip install -e '.[test]'      # With test deps
-```
+See the canonical installation guide — [`docs/source/starthere/install.rst`](docs/source/starthere/install.rst) (published at https://docs.nvidia.com/nemo/speech/nightly/) — for the uv, pip (bring-your-own Python/PyTorch/CUDA), Docker, and optional `compiled` (SpeechLM2/Automodel) install paths.
 
-Requires Python 3.10+, PyTorch 2.6+.
+Dev quickstart: `uv sync --extra all --extra cu13` (Python 3.12+, PyTorch 2.7+; `test`/`docs` are `--group`s, not extras).
 
 ## Code Style
 
@@ -49,7 +45,7 @@ Markers: `unit`, `integration`, `system`, `pleasefixme` (broken — skip), `skip
 Sphinx-based docs live in `docs/source/`. Build with:
 
 ```bash
-uv sync --group docs                                 # one-time setup
+uv sync --locked --group docs                        # one-time setup (matches CI)
 uv run make -C docs clean html                       # full rebuild
 uv run make -C docs html                             # incremental rebuild
 ```

diff --git a/README.md b/README.md
@@ -49,9 +49,13 @@ For technical documentation, please see the
 
 ## Requirements
 
+NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
+
 - Python 3.12 or above
-- Pytorch 2.6 or above
-- NVIDIA GPU (if you intend to do model training)
+- PyTorch 2.7 or above (CPU, CUDA, etc. — your choice)
+- NVIDIA GPU + CUDA (required for training; recommended for inference)
+
+If you already have a Python/PyTorch/CUDA stack that satisfies those minimums, NeMo Speech installs on top of it **without replacing it**, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
 
 As of [Pytorch 2.6](https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true),
 `torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`.
@@ -68,9 +72,51 @@ can have the risk of arbitrary code execution.
 
 ## Install NeMo Speech
 
-NeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'`
-To install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'`
-or `pip install 'nemo-toolkit[all,cu13]'` respectively.
+The recommended way to install NeMo Speech is from source with [uv](https://docs.astral.sh/uv/), which reproduces our actively-tested stack from the committed `uv.lock`. If you need different Python/PyTorch/CUDA versions, NeMo also installs over your existing environment via pip — see the [pip fallback](#from-pypi-with-pip-fallback--bring-your-own-versions) below.
+
+### From source with uv (recommended)
+
+```bash
+git clone https://github.com/NVIDIA-NeMo/NeMo.git
+cd NeMo
+uv sync --extra all --extra cu13     # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
+```
+
+This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use).
+
+> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.
+
+### Docker (turnkey, our supported stack)
+
+> **NGC container:** _Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here._
+
+To build the container from source (CUDA 13 / H100+ by default):
+
+```bash
+git clone https://github.com/NVIDIA-NeMo/NeMo.git
+cd NeMo
+docker buildx build -f docker/Dockerfile -t nemo-speech .          # CUDA 13 / H100+ (default)
+docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
+```
+
+For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`).
+
+### From PyPI with pip (fallback — bring your own versions)
+
+Prefer your own Python/PyTorch/CUDA? Install your PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`:
+
+```bash
+uv pip install 'nemo-toolkit[asr,tts]'   # or plain: pip install 'nemo-toolkit[asr,tts]'
+```
+
+> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack.
+
+To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required):
+
+```bash
+pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132   # CUDA 13.x
+pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126   # CUDA 12.x
+```
 
 ## Contribute to NeMo
 

diff --git a/docs/README.md b/docs/README.md
@@ -2,12 +2,10 @@
 
 ## Building the Documentation
 
-1. Create and activate a virtual environment.
-
-1. Install the documentation dependencies:
+1. Install the documentation dependencies into the locked `uv` environment:
 
    ```console
-   $ uv sync --group docs
+   $ uv sync --locked --group docs
    ```
 
 1. Build the documentation:
@@ -21,7 +19,7 @@
 1. Build the documentation, as described in the preceding section, but use the following command:
 
    ```shell
-   make -C docs clean linkcheck
+   uv run make -C docs clean linkcheck
    ```
 
 1. Run the link-checking script:

diff --git a/docs/source/broken_links_needing_review..json b/docs/source/broken_links_needing_review..json
@@ -6,14 +6,6 @@
   "uri": "https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/optimizer.py#L793",
   "info": "Anchor 'L793' not found"
 }
-{
-  "filename": "tools/nemo_forced_aligner.rst",
-  "lineno": 22,
-  "status": "broken",
-  "code": 0,
-  "uri": "https://github.com/NVIDIA/NeMo#installation",
-  "info": "Anchor 'installation' not found"
-}
 {
   "filename": "checkpoints/intro.rst",
   "lineno": 28,

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -57,11 +57,11 @@ What is NeMo?
 - **Scalable training** — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support
 - **Simple configuration** — YAML-based experiment configs with `Hydra <https://hydra.cc/>`__
 
-Get started in 30 seconds:
+Get started (install the PyTorch build for your platform first):
 
 .. code-block:: bash
 
-   pip install nemo_toolkit[asr,tts]
+   uv pip install 'nemo-toolkit[asr,tts]'
 
 .. code-block:: python
 

diff --git a/docs/source/speechlm2/intro.rst b/docs/source/speechlm2/intro.rst
@@ -5,7 +5,9 @@ SpeechLM2
    The SpeechLM2 collection is still in active development and the code is likely to keep changing.
 
 .. note::
-   Install with ``pip install nemo-toolkit[speechlm2]`` to get all required dependencies including NeMo Automodel.
+   Install your chosen compatible PyTorch stack first, then install SpeechLM2 with
+   ``uv pip install 'nemo-toolkit[speechlm2]'`` (or, from a source checkout, ``uv pip install -e '.[speechlm2]'``)
+   to get all required dependencies including NeMo Automodel. See :ref:`installation` for details.
 
 SpeechLM2 refers to a collection that augments pre-trained Large Language Models (LLMs) with speech understanding and generation capabilities.