diff --git a/CLAUDE.md b/CLAUDE.md
index 037115ed3aa1..9a355c624155 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,13 +8,9 @@ NeMo Speech — toolkit for training/deploying speech models (ASR, TTS, Speech L
 
 ## Build & Install
 
-```bash
-pip install -e '.[all]'       # Full dev install
-pip install -e '.[asr]'       # ASR only
-pip install -e '.[test]'      # With test deps
-```
+See the canonical installation guide — [`docs/source/starthere/install.rst`](docs/source/starthere/install.rst) (published at https://docs.nvidia.com/nemo/speech/nightly/) — for the uv, pip (bring-your-own Python/PyTorch/CUDA), Docker, and optional `compiled` (SpeechLM2/Automodel) install paths.
 
-Requires Python 3.10+, PyTorch 2.6+.
+Dev quickstart: `uv sync --extra all --extra cu13` (Python 3.12+, PyTorch 2.7+; `test`/`docs` are `--group`s, not extras).
 
 ## Code Style
 
@@ -49,7 +45,7 @@ Markers: `unit`, `integration`, `system`, `pleasefixme` (broken — skip), `skip
 Sphinx-based docs live in `docs/source/`. Build with:
 
 ```bash
-uv sync --group docs                                 # one-time setup
+uv sync --locked --group docs                        # one-time setup (matches CI)
 uv run make -C docs clean html                       # full rebuild
 uv run make -C docs html                             # incremental rebuild
 ```
diff --git a/README.md b/README.md
index 76cb0ede09ab..a7506cab6839 100644
--- a/README.md
+++ b/README.md
@@ -49,9 +49,13 @@ For technical documentation, please see the
 
 ## Requirements
 
+NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
+
 - Python 3.12 or above
-- Pytorch 2.6 or above
-- NVIDIA GPU (if you intend to do model training)
+- PyTorch 2.7 or above (CPU, CUDA, etc. — your choice)
+- NVIDIA GPU + CUDA (required for training; recommended for inference)
+
+If you already have a Python/PyTorch/CUDA stack that satisfies those minimums, NeMo Speech installs on top of it **without replacing it**, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
 
 As of [Pytorch 2.6](https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true),
 `torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`.
@@ -68,9 +72,51 @@ can have the risk of arbitrary code execution.
 
 ## Install NeMo Speech
 
-NeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'`
-To install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'`
-or `pip install 'nemo-toolkit[all,cu13]'` respectively.
+The recommended way to install NeMo Speech is from source with [uv](https://docs.astral.sh/uv/), which reproduces our actively-tested stack from the committed `uv.lock`. If you need different Python/PyTorch/CUDA versions, NeMo also installs over your existing environment via pip — see the [pip fallback](#from-pypi-with-pip-fallback--bring-your-own-versions) below.
+
+### From source with uv (recommended)
+
+```bash
+git clone https://github.com/NVIDIA-NeMo/NeMo.git
+cd NeMo
+uv sync --extra all --extra cu13     # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
+```
+
+This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use).
+
+> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.
+
+### Docker (turnkey, our supported stack)
+
+> **NGC container:** _Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here._
+
+To build the container from source (CUDA 13 / H100+ by default):
+
+```bash
+git clone https://github.com/NVIDIA-NeMo/NeMo.git
+cd NeMo
+docker buildx build -f docker/Dockerfile -t nemo-speech .          # CUDA 13 / H100+ (default)
+docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
+```
+
+For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`).
+
+### From PyPI with pip (fallback — bring your own versions)
+
+Prefer your own Python/PyTorch/CUDA? Install your PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`:
+
+```bash
+uv pip install 'nemo-toolkit[asr,tts]'   # or plain: pip install 'nemo-toolkit[asr,tts]'
+```
+
+> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack.
+
+To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required):
+
+```bash
+pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132   # CUDA 13.x
+pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126   # CUDA 12.x
+```
 
 ## Contribute to NeMo
 
diff --git a/docs/README.md b/docs/README.md
index cdd2a6870ca7..c125027996c6 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -2,12 +2,10 @@
 
 ## Building the Documentation
 
-1. Create and activate a virtual environment.
-
-1. Install the documentation dependencies:
+1. Install the documentation dependencies into the locked `uv` environment:
 
    ```console
-   $ uv sync --group docs
+   $ uv sync --locked --group docs
    ```
 
 1. Build the documentation:
@@ -21,7 +19,7 @@
 1. Build the documentation, as described in the preceding section, but use the following command:
 
    ```shell
-   make -C docs clean linkcheck
+   uv run make -C docs clean linkcheck
    ```
 
 1. Run the link-checking script:
diff --git a/docs/source/broken_links_needing_review..json b/docs/source/broken_links_needing_review..json
index a7cbd050b36f..6af065c7f236 100644
--- a/docs/source/broken_links_needing_review..json
+++ b/docs/source/broken_links_needing_review..json
@@ -6,14 +6,6 @@
   "uri": "https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/optimizer.py#L793",
   "info": "Anchor 'L793' not found"
 }
-{
-  "filename": "tools/nemo_forced_aligner.rst",
-  "lineno": 22,
-  "status": "broken",
-  "code": 0,
-  "uri": "https://github.com/NVIDIA/NeMo#installation",
-  "info": "Anchor 'installation' not found"
-}
 {
   "filename": "checkpoints/intro.rst",
   "lineno": 28,
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 4b10b42a97bf..0dc70f7cc3d1 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -57,11 +57,11 @@ What is NeMo?
 - **Scalable training** — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support
 - **Simple configuration** — YAML-based experiment configs with `Hydra <https://hydra.cc/>`__
 
-Get started in 30 seconds:
+Get started (install the PyTorch build for your platform first):
 
 .. code-block:: bash
 
-   pip install nemo_toolkit[asr,tts]
+   uv pip install 'nemo-toolkit[asr,tts]'
 
 .. code-block:: python
 
diff --git a/docs/source/speechlm2/intro.rst b/docs/source/speechlm2/intro.rst
index 791ecad968d9..97ee334d68af 100644
--- a/docs/source/speechlm2/intro.rst
+++ b/docs/source/speechlm2/intro.rst
@@ -5,7 +5,9 @@ SpeechLM2
    The SpeechLM2 collection is still in active development and the code is likely to keep changing.
 
 .. note::
-   Install with ``pip install nemo-toolkit[speechlm2]`` to get all required dependencies including NeMo Automodel.
+   Install your chosen compatible PyTorch stack first, then install SpeechLM2 with
+   ``uv pip install 'nemo-toolkit[speechlm2]'`` (or, from a source checkout, ``uv pip install -e '.[speechlm2]'``)
+   to get all required dependencies including NeMo Automodel. See :ref:`installation` for details.
 
 SpeechLM2 refers to a collection that augments pre-trained Large Language Models (LLMs) with speech understanding and generation capabilities.
 
diff --git a/docs/source/starthere/install.rst b/docs/source/starthere/install.rst
index f43184d0d807..5745b25f7980 100644
--- a/docs/source/starthere/install.rst
+++ b/docs/source/starthere/install.rst
@@ -8,60 +8,52 @@ This page covers how to install NVIDIA NeMo for speech AI tasks (ASR, TTS, speak
 Prerequisites
 -------------
 
-Before installing NeMo, ensure you have:
+NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
 
 #. **Python** 3.12 or above
-#. **PyTorch** 2.7+ (install **before** NeMo so CUDA wheels match your GPU driver)
-#. **NVIDIA GPU** (required for training; CPU-only inference is possible but slow)
+#. **PyTorch** 2.7 or above, for your chosen target (CPU, CUDA, etc.)
+#. **NVIDIA GPU + CUDA** (required for training; CPU-only inference is possible but slow)
+#. **uv** for the fastest source/PyPI workflow (``pip`` also works in a prepared environment)
 
-Recommended installation order
-------------------------------
+.. admonition:: Bring your own Python / PyTorch / CUDA
+   :class: important
 
-Install dependencies in this order when setting up a **local GPU** environment:
+   The recommended install path is uv (below), which gives you our actively-tested stack. But NeMo Speech can also install *on top of* an existing environment: if you already have a Python, PyTorch, and CUDA stack that satisfies the minimums above, your pre-installed PyTorch is **kept, not replaced** (see :ref:`the pip fallback <install-from-pypi>`).
 
-#. Create and activate a Python environment.
-#. Install a **CUDA toolkit** (or rely on a driver + PyTorch bundle that matches your CUDA major version).
-#. Install **PyTorch** (and torchvision if you need it) from the index that matches your CUDA build.
-#. Install **NeMo** (from PyPI or editable source) **with the extras** for the collections you need (``asr``, ``tts``, etc.).
+   The versions pinned in ``uv.lock`` and shipped in the official container — **Python 3.13, PyTorch 2.12, CUDA 12.6/13.2** — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
 
-Putting PyTorch in place first avoids mismatched CUDA runtimes and makes NeMo’s optional GPU-dependent packages resolve correctly.
+.. note::
 
-**Example (conda + pip, CUDA 13.0 PyTorch wheels):**
+   As of `PyTorch 2.6 <https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true>`_, ``torch.load`` defaults to ``weights_only=True``. Some checkpoints require ``weights_only=False``; in that case set ``TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1`` before loading, and only with trusted files (loading untrusted files with full pickle support risks arbitrary code execution).
 
-.. code-block:: bash
-
-   # 1) New environment (adjust Python version if your platform requires it)
-   conda create -n nemo python=3.12 -y
-   conda activate nemo
-
-   # 2) CUDA toolkit from conda (optional if you already have a compatible toolkit via the driver)
-   conda install nvidia::cuda-toolkit
+.. _install-from-source:
 
-   # 3) PyTorch built for CUDA 13.x — change cu130 / URL if you use cu124 or CPU-only
-   pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
+Install from Source with uv (recommended)
+------------------------------------------
 
-   # 4) NeMo: use extras for ASR/TTS/etc. For a clone of the repo, use editable install (see below)
-   pip install nemo_toolkit[asr,tts]
+The recommended way to install NeMo Speech is from source with `uv <https://docs.astral.sh/uv/>`_, which reproduces our actively-tested stack from the committed ``uv.lock``:
 
-Adjust the PyTorch ``--index-url`` (e.g. ``cu124``, ``cu121``, or CPU) to match `PyTorch’s install matrix <https://pytorch.org/get-started/locally/>`_ and your NVIDIA driver.
+.. code-block:: bash
 
-Install from PyPI
------------------
+   git clone https://github.com/NVIDIA-NeMo/NeMo.git
+   cd NeMo
 
-The quickest way to install NeMo is via pip. Install only the collections you need:
+   # CUDA 13.x (recommended). Use --extra cu12 for CUDA 12.x. uv resolves the
+   # matching PyTorch CUDA wheel automatically from the pinned indexes.
+   uv sync --extra all --extra cu13
 
-.. code-block:: bash
+   # Optional: add the test suite tooling, or the docs build dependencies
+   # uv sync --extra all --extra cu13 --group test
+   # uv sync --group docs
 
-   # Install ASR and TTS (most common)
-   pip install nemo_toolkit[asr,tts]
+``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``. For the **exact** container baseline, add ``--locked --python 3.13`` (i.e. ``uv sync --locked --python 3.13 --extra all --extra cu13``) — this is the path the Dockerfile and CI use.
 
-   # Install everything speech-related
-   pip install nemo_toolkit[asr,tts,audio]
+On Linux, pass exactly one of ``--extra cu13`` (recommended) or ``--extra cu12`` — they are mutually exclusive. If you omit both, uv installs the generic PyPI PyTorch wheel instead of NVIDIA's CUDA-matched build.
 
-Available extras:
+Available collection extras (combine with one CUDA extra above):
 
 .. list-table::
-   :widths: 15 85
+   :widths: 18 82
    :header-rows: 1
 
    * - Extra
@@ -72,46 +64,143 @@ Available extras:
      - Text-to-Speech models, vocoders, and audio codecs
    * - ``audio``
      - Audio processing models (enhancement, separation)
+   * - ``speechlm2``
+     - Speech language models (includes NeMo Automodel)
+   * - ``all``
+     - All of the collections above
+   * - ``cu12`` / ``cu13``
+     - Our pinned CUDA 12.x / 13.x PyTorch build **plus** the matching CUDA Python deps (``cuda-python``, ``numba-cuda``). Linux; pick at most one.
 
-.. _install-from-source:
+.. note::
 
-Install from Source
--------------------
+   ``test`` and ``docs`` are dependency *groups* (PEP 735), not extras. Install them with ``--group`` (e.g. ``uv sync --group test``) — the bracket form ``.[test]`` does not work.
+
+.. _install-compiled-extras:
+
+Optional compiled dependencies for SpeechLM2 / Automodel (``compiled`` / ``compiled-a100``)
+-------------------------------------------------------------------------------------------
+
+The Automodel backend used for SpeechLM2 **does not require any compiled dependencies — it runs without them.** The ``compiled`` and ``compiled-a100`` extras are an *optional* performance add-on: when their source-built GPU kernels are installed, Automodel can route to dedicated accelerated backends (FP8 Transformer kernels via Transformer Engine, FlashAttention, Mamba/state-space layers, and Mixture-of-Experts ops). They contain:
+
+.. list-table::
+   :widths: 30 70
+   :header-rows: 1
+
+   * - Package
+     - Purpose
+   * - ``transformer-engine``
+     - NVIDIA Transformer Engine — FP8 and accelerated Transformer kernels
+   * - ``flash-attn``
+     - FlashAttention attention kernels
+   * - ``mamba-ssm`` + ``causal-conv1d``
+     - Mamba / state-space-model kernels (hybrid Mamba architectures)
+   * - ``nv-grouped-gemm``
+     - Grouped GEMM kernels for Mixture-of-Experts (MoE) layers
+   * - ``deep_ep`` (DeepEP)
+     - Expert-parallel communication kernels for MoE (``compiled`` only — see below)
+   * - ``onnx-ir`` + ``onnxscript``
+     - Pinned ONNX export tooling
+
+Choose the variant that matches your GPU (the two are mutually exclusive):
+
+* ``compiled`` — Hopper/Blackwell and newer (SM90/SM100/SM120, e.g. H100/H200/B200). Includes DeepEP.
+* ``compiled-a100`` — Ampere A100 (SM80). Omits DeepEP, which requires a separately-built, patched version on A100; our Dockerfile auto-builds and installs it when the CUDA 12 base image is selected.
+
+.. warning::
+
+   These packages **build from source** and need a full CUDA build environment — build tools, matching ``TORCH_CUDA_ARCH_LIST`` / ``NVTE_CUDA_ARCHS`` flags, ``--no-build-isolation``, and (for ``compiled``) extra manual build steps that the Dockerfile performs (e.g. flash-attn-4 and DeepEP patches). The supported, reproducible way to get them is the container build, which sets all of this up for you:
+
+   .. code-block:: bash
+
+      # Hopper/Blackwell (default GPU_TARGET=h100plus → compiled)
+      docker buildx build -f docker/Dockerfile -t nemo-speech .
+
+      # Ampere A100 (GPU_TARGET=a100 → compiled-a100)
+      docker buildx build -f docker/Dockerfile \
+        --build-arg BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 \
+        --build-arg GPU_TARGET=a100 -t nemo-speech .
+
+   A bare ``uv sync --extra all --extra cu13 --extra compiled`` outside this environment will likely fail to compile.
+
+Using Docker (turnkey, our supported stack)
+--------------------------------------------
 
-For the latest development version or if you plan to contribute, clone the repository and install in editable mode.
+.. note::
 
-The ``test`` extra pulls in **pytest and tooling for the test suite**. It does **not** install NeMo collection dependencies (ASR, TTS, audio, etc.). Add those extras explicitly or imports like ``nemo.collections.asr`` will fail.
+   **NGC container:** *Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here.*
+
+To build the container from source, use the provided ``docker/Dockerfile`` (CUDA 13 / H100+ by default):
 
 .. code-block:: bash
 
-   git clone https://github.com/NVIDIA/NeMo.git
+   git clone https://github.com/NVIDIA-NeMo/NeMo.git
    cd NeMo
+   docker buildx build -f docker/Dockerfile -t nemo-speech .          # CUDA 13 / H100+ (default)
+   docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
+
+For A100, set ``GPU_TARGET=a100``. A100 works with **both CUDA 12 and CUDA 13** — CUDA 13 (the default base image) is recommended; the CUDA 12 base is offered only as a convenience:
 
-   # After PyTorch is installed (see Recommended installation order above):
-   # Collections you need for development (required for nemo.collections.* imports)
-   pip install -e '.[asr,tts]'
+.. code-block:: bash
 
-   # Optional: add test to run pytest with NeMo’s dev test dependencies
-   # pip install -e '.[asr,tts,test]'
+   # A100 on CUDA 13 (recommended) — uses the default CUDA 13 base image
+   docker buildx build -f docker/Dockerfile --build-arg GPU_TARGET=a100 -t nemo-speech:a100 .
 
-Using Docker
-------------
+   # A100 on CUDA 12 (convenience)
+   docker buildx build -f docker/Dockerfile \
+     --build-arg BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 \
+     --build-arg GPU_TARGET=a100 -t nemo-speech:a100-cu12 .
+
+See the header of ``docker/Dockerfile`` for all build arguments (``BASE_IMAGE``, ``GPU_TARGET``).
+
+.. _install-from-pypi:
+
+Install from PyPI with pip (fallback — bring your own versions)
+---------------------------------------------------------------
+
+Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo. Your pre-installed PyTorch is kept, not replaced. ``uv pip`` (uv's fast, pip-compatible installer) works just like ``pip``:
+
+.. code-block:: bash
+
+   uv venv --python 3.12          # any Python >= 3.12 your PyTorch supports — or use your own env
+   source .venv/bin/activate
+
+   # 1) Your choice of PyTorch (example: CUDA 12.6 build). Skip if you already have one.
+   uv pip install torch --index-url https://download.pytorch.org/whl/cu126
+
+   # 2) NeMo — your PyTorch above is kept (plain `pip install` works identically)
+   uv pip install 'nemo-toolkit[asr,tts]'        # also: [asr,tts,audio], [speechlm2], etc.
 
-NVIDIA provides Docker containers with NeMo pre-installed. Check the `NeMo GitHub releases <https://github.com/NVIDIA/NeMo/releases>`_ for the latest container tags.
+.. warning::
+
+   Do **not** use ``uv sync --locked`` for a bring-your-own stack — it intentionally applies ``uv.lock`` and replaces your Python/PyTorch/CUDA with the supported container baseline. Use ``uv pip`` (or ``pip``) here; reserve ``uv sync --locked`` for reproducing the supported stack (above).
+
+To instead have the installer pull *our* pinned PyTorch build, add the matching CUDA extra **and** the PyTorch wheel index (``pip`` / ``uv pip`` do not read uv's project index config, so ``--extra-index-url`` is required):
+
+.. code-block:: bash
+
+   pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132   # CUDA 13.x
+   pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126   # CUDA 12.x
+
+.. tip::
+
+   Prefer a conda environment? Create and activate one (``conda create -n nemo python=3.12 -y && conda activate nemo``), then run the same ``uv`` or ``pip`` commands above inside it. NeMo Speech does not require a separate conda CUDA toolkit.
 
 Verify Installation
 -------------------
 
-After installing, verify that NeMo is working:
+After installing, verify that the chosen collection imports:
+
+.. code-block:: bash
+
+   python -c "import nemo.collections.asr as nemo_asr; print('NeMo ASR installed')"
+
+If you installed with ``uv sync`` and have not activated ``.venv``, run the check through ``uv run python``. To also exercise a model download:
 
 .. code-block:: python
 
    import nemo.collections.asr as nemo_asr
-   print("NeMo ASR installed successfully!")
-
-   # Quick test: load a pretrained model
    model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
-   print(f"Model loaded: {model.__class__.__name__}")
+   print(f"Loaded: {model.__class__.__name__}")
 
 What's Next?
 ------------
diff --git a/docs/source/tools/nemo_forced_aligner.rst b/docs/source/tools/nemo_forced_aligner.rst
index d8f89c70447f..2ad87f0dd33a 100644
--- a/docs/source/tools/nemo_forced_aligner.rst
+++ b/docs/source/tools/nemo_forced_aligner.rst
@@ -19,7 +19,7 @@ Demos & Tutorials
 Quickstart
 ----------
 
-1. Install `NeMo <https://github.com/NVIDIA/NeMo#installation>`__.
+1. Install NeMo with the ASR collection. See :ref:`installation`.
 2. Prepare a NeMo-style manifest containing the paths of audio files you would like to proces, and (optionally) their text.
 3. Run NFA's ``align.py`` script with the desired config, e.g.:
 
diff --git a/docs/source/tts/g2p.rst b/docs/source/tts/g2p.rst
index 491095b88732..71f61139290d 100644
--- a/docs/source/tts/g2p.rst
+++ b/docs/source/tts/g2p.rst
@@ -126,7 +126,7 @@ Using this unknown token forces a G2P model to produce the same masking token as
 Requirements
 ------------
 
-G2P requires the NeMo ASR collection to be installed (``pip install nemo_toolkit[asr]``).
+G2P requires the NeMo ASR collection to be installed. See :ref:`installation` and include the ``asr`` extra.
 
 
 References
diff --git a/docs/source/tts/magpietts-finetuning.rst b/docs/source/tts/magpietts-finetuning.rst
index 4c58da1ae990..50f5da384a55 100644
--- a/docs/source/tts/magpietts-finetuning.rst
+++ b/docs/source/tts/magpietts-finetuning.rst
@@ -20,7 +20,7 @@ Before finetuning, you will need:
 - A pretrained Magpie-TTS checkpoint (``pretrained.ckpt`` or ``pretrained.nemo``). Public checkpoints (``https://huggingface.co/nvidia/magpie_tts_multilingual_357m``) are available on Hugging Face.
 - The audio codec model (``https://huggingface.co/nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps``), available on Hugging Face alongside the TTS checkpoint.
 - A prepared dataset. For faster finetuning audio codec tokens must be pre-extracted from your audio files. See the *Dataset Preparation* section below.
-- NeMo installed from source or via the NeMo container. See the `NeMo GitHub page <https://github.com/NVIDIA/NeMo>`_ for installation instructions.
+- NeMo installed from source or with the local Dockerfile. See :ref:`installation` for installation instructions.
 
 
 Dataset Preparation
diff --git a/nemo/agents/voice_agent/pipecat/services/nemo/stt.py b/nemo/agents/voice_agent/pipecat/services/nemo/stt.py
index bb048f50805e..69111ec4e8d0 100644
--- a/nemo/agents/voice_agent/pipecat/services/nemo/stt.py
+++ b/nemo/agents/voice_agent/pipecat/services/nemo/stt.py
@@ -50,7 +50,7 @@
 
 except ModuleNotFoundError as e:
     logger.error(f"Exception: {e}")
-    logger.error('In order to use NVIDIA NeMo STT, you need to `pip install "nemo_toolkit[all]"`.')
+    logger.error('In order to use NVIDIA NeMo STT, you need to `pip install "nemo-toolkit[all]"`.')
     raise Exception(f"Missing module: {e}")
 
 
diff --git a/nemo/collections/speechlm2/vllm/salm/audio.py b/nemo/collections/speechlm2/vllm/salm/audio.py
index c892364c0a56..6e7ff0c38f86 100644
--- a/nemo/collections/speechlm2/vllm/salm/audio.py
+++ b/nemo/collections/speechlm2/vllm/salm/audio.py
@@ -95,7 +95,7 @@ def _load_nemo_perception(perception_cfg: dict) -> nn.Module:
         from nemo.collections.speechlm2.modules import AudioPerceptionModule
     except ImportError as e:
         raise ImportError(
-            "NeMo is required for the audio encoder. " "Install with: pip install nemo_toolkit[asr]"
+            "NeMo is required for the audio encoder. " "Install with: pip install 'nemo-toolkit[asr]'"
         ) from e
 
     cfg = DictConfig(perception_cfg)
diff --git a/nemo/collections/speechlm2/vllm/salm/model.py b/nemo/collections/speechlm2/vllm/salm/model.py
index cffc19a4977d..e3da43cb6986 100644
--- a/nemo/collections/speechlm2/vllm/salm/model.py
+++ b/nemo/collections/speechlm2/vllm/salm/model.py
@@ -28,7 +28,7 @@
 granite-4.0-micro escape hatch).
 
 Requires NeMo toolkit for the audio encoder:
-    pip install nemo_toolkit[asr]
+    pip install 'nemo-toolkit[asr]'
 """
 
 from collections.abc import Iterable
diff --git a/nemo/core/config/templates/model_card.py b/nemo/core/config/templates/model_card.py
index 80b09a2d37cf..3de051b3845e 100644
--- a/nemo/core/config/templates/model_card.py
+++ b/nemo/core/config/templates/model_card.py
@@ -36,9 +36,9 @@
 
 ## NVIDIA NeMo: Training
 
-To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
+To train, fine-tune, or experiment with the model, install the PyTorch build for your platform first, then install [NVIDIA NeMo](https://docs.nvidia.com/nemo/speech/nightly/starthere/install.html) with the extras you need.
 ```
-pip install nemo_toolkit['all']
+pip install 'nemo-toolkit[all]'
 ``` 
 
 ## How to Use this Model
diff --git a/nemo/package_info.py b/nemo/package_info.py
index 6f097f59bca1..1fff85f30991 100644
--- a/nemo/package_info.py
+++ b/nemo/package_info.py
@@ -28,8 +28,8 @@
 __contact_names__ = "NVIDIA"
 __contact_emails__ = "nemo-toolkit@nvidia.com"
 __homepage__ = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/"
-__repository_url__ = "https://github.com/nvidia/nemo"
-__download_url__ = "https://github.com/NVIDIA/NeMo/releases"
+__repository_url__ = "https://github.com/NVIDIA-NeMo/NeMo"
+__download_url__ = "https://github.com/NVIDIA-NeMo/NeMo/releases"
 __description__ = "NeMo - a toolkit for Conversational AI"
 __license__ = "Apache2"
 __keywords__ = "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, tts, speech, language"
diff --git a/pyproject.toml b/pyproject.toml
index 33d69147a674..99e702d903cf 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -355,8 +355,8 @@ py-modules = ["nemo"]
 nemo_speechlm = "nemo.collections.speechlm2.vllm.salm:register"
 
 [project.urls]
-Download = "https://github.com/NVIDIA/NeMo/releases"
-Homepage = "https://github.com/nvidia/nemo"
+Download = "https://github.com/NVIDIA-NeMo/NeMo/releases"
+Homepage = "https://github.com/NVIDIA-NeMo/NeMo"
 
 [tool.isort]
 profile = "black"  # black-compatible
diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md
index fc06e979a3a4..6a0de6fc0908 100644
--- a/tools/nemo_forced_aligner/README.md
+++ b/tools/nemo_forced_aligner/README.md
@@ -12,7 +12,7 @@ NFA is a tool for generating token-, word- and segment-level timestamps of speec
 
 
 ## Quickstart
-1. Install [NeMo](https://github.com/NVIDIA/NeMo#installation).
+1. Install [NeMo](https://docs.nvidia.com/nemo/speech/nightly/starthere/install.html) with the ASR collection.
 2. Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text.
 3. Run NFA's `align.py` script with the desired config, e.g.:
     ``` bash
diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py
index b38c40d68f90..8956adee8738 100644
--- a/tools/nemo_forced_aligner/align.py
+++ b/tools/nemo_forced_aligner/align.py
@@ -48,9 +48,9 @@
     raise ImportError(
         "Missing required dependency for NFA. "
         "Install NeMo with NFA utilities support:\n"
-        "  pip install 'nemo_toolkit[all]>=2.5.0'\n"
+        "  pip install 'nemo-toolkit[all]>=2.5.0'\n"
         "Or install the latest development version:\n"
-        "  pip install git+https://github.com/NVIDIA/NeMo.git"
+        "  pip install git+https://github.com/NVIDIA-NeMo/NeMo.git"
     )
 """
 Align the utterances in manifest_filepath. 
diff --git a/tools/nemo_forced_aligner/align_eou.py b/tools/nemo_forced_aligner/align_eou.py
index f40fa7eadaec..f851bee08ed9 100644
--- a/tools/nemo_forced_aligner/align_eou.py
+++ b/tools/nemo_forced_aligner/align_eou.py
@@ -53,9 +53,9 @@
     raise ImportError(
         "Missing required dependency for NFA. "
         "Install NeMo with NFA utilities support:\n"
-        "  pip install 'nemo_toolkit[all]>=2.5.0'\n"
+        "  pip install 'nemo-toolkit[all]>=2.5.0'\n"
         "Or install the latest development version:\n"
-        "  pip install git+https://github.com/NVIDIA/NeMo.git"
+        "  pip install git+https://github.com/NVIDIA-NeMo/NeMo.git"
     )
 
 """
diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt
index 9daa6d2f2496..cd21068986ef 100644
--- a/tools/nemo_forced_aligner/requirements.txt
+++ b/tools/nemo_forced_aligner/requirements.txt
@@ -1,3 +1,3 @@
-nemo_toolkit[all]
+nemo-toolkit[all]
 prettyprinter # for testing
 pytest # for testing