diff --git a/CLAUDE.md b/CLAUDE.md index 037115ed3aa1..9a355c624155 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -8,13 +8,9 @@ NeMo Speech — toolkit for training/deploying speech models (ASR, TTS, Speech L ## Build & Install -```bash -pip install -e '.[all]' # Full dev install -pip install -e '.[asr]' # ASR only -pip install -e '.[test]' # With test deps -``` +See the canonical installation guide — [`docs/source/starthere/install.rst`](docs/source/starthere/install.rst) (published at https://docs.nvidia.com/nemo/speech/nightly/) — for the uv, pip (bring-your-own Python/PyTorch/CUDA), Docker, and optional `compiled` (SpeechLM2/Automodel) install paths. -Requires Python 3.10+, PyTorch 2.6+. +Dev quickstart: `uv sync --extra all --extra cu13` (Python 3.12+, PyTorch 2.7+; `test`/`docs` are `--group`s, not extras). ## Code Style @@ -49,7 +45,7 @@ Markers: `unit`, `integration`, `system`, `pleasefixme` (broken — skip), `skip Sphinx-based docs live in `docs/source/`. Build with: ```bash -uv sync --group docs # one-time setup +uv sync --locked --group docs # one-time setup (matches CI) uv run make -C docs clean html # full rebuild uv run make -C docs html # incremental rebuild ``` diff --git a/README.md b/README.md index 76cb0ede09ab..a7506cab6839 100644 --- a/README.md +++ b/README.md @@ -49,9 +49,13 @@ For technical documentation, please see the ## Requirements +NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**: + - Python 3.12 or above -- Pytorch 2.6 or above -- NVIDIA GPU (if you intend to do model training) +- PyTorch 2.7 or above (CPU, CUDA, etc. — your choice) +- NVIDIA GPU + CUDA (required for training; recommended for inference) + +If you already have a Python/PyTorch/CUDA stack that satisfies those minimums, NeMo Speech installs on top of it **without replacing it**, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement. As of [Pytorch 2.6](https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true), `torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`. @@ -68,9 +72,51 @@ can have the risk of arbitrary code execution. ## Install NeMo Speech -NeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'` -To install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'` -or `pip install 'nemo-toolkit[all,cu13]'` respectively. +The recommended way to install NeMo Speech is from source with [uv](https://docs.astral.sh/uv/), which reproduces our actively-tested stack from the committed `uv.lock`. If you need different Python/PyTorch/CUDA versions, NeMo also installs over your existing environment via pip — see the [pip fallback](#from-pypi-with-pip-fallback--bring-your-own-versions) below. + +### From source with uv (recommended) + +```bash +git clone https://github.com/NVIDIA-NeMo/NeMo.git +cd NeMo +uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x +``` + +This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run ` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use). + +> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details. + +### Docker (turnkey, our supported stack) + +> **NGC container:** _Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here._ + +To build the container from source (CUDA 13 / H100+ by default): + +```bash +git clone https://github.com/NVIDIA-NeMo/NeMo.git +cd NeMo +docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default) +docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash +``` + +For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`). + +### From PyPI with pip (fallback — bring your own versions) + +Prefer your own Python/PyTorch/CUDA? Install your PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`: + +```bash +uv pip install 'nemo-toolkit[asr,tts]' # or plain: pip install 'nemo-toolkit[asr,tts]' +``` + +> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack. + +To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required): + +```bash +pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x +pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126 # CUDA 12.x +``` ## Contribute to NeMo diff --git a/docs/README.md b/docs/README.md index cdd2a6870ca7..c125027996c6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,12 +2,10 @@ ## Building the Documentation -1. Create and activate a virtual environment. - -1. Install the documentation dependencies: +1. Install the documentation dependencies into the locked `uv` environment: ```console - $ uv sync --group docs + $ uv sync --locked --group docs ``` 1. Build the documentation: @@ -21,7 +19,7 @@ 1. Build the documentation, as described in the preceding section, but use the following command: ```shell - make -C docs clean linkcheck + uv run make -C docs clean linkcheck ``` 1. Run the link-checking script: diff --git a/docs/source/broken_links_needing_review..json b/docs/source/broken_links_needing_review..json index a7cbd050b36f..6af065c7f236 100644 --- a/docs/source/broken_links_needing_review..json +++ b/docs/source/broken_links_needing_review..json @@ -6,14 +6,6 @@ "uri": "https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/optimizer.py#L793", "info": "Anchor 'L793' not found" } -{ - "filename": "tools/nemo_forced_aligner.rst", - "lineno": 22, - "status": "broken", - "code": 0, - "uri": "https://github.com/NVIDIA/NeMo#installation", - "info": "Anchor 'installation' not found" -} { "filename": "checkpoints/intro.rst", "lineno": 28, diff --git a/docs/source/index.rst b/docs/source/index.rst index 4b10b42a97bf..0dc70f7cc3d1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -57,11 +57,11 @@ What is NeMo? - **Scalable training** — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support - **Simple configuration** — YAML-based experiment configs with `Hydra `__ -Get started in 30 seconds: +Get started (install the PyTorch build for your platform first): .. code-block:: bash - pip install nemo_toolkit[asr,tts] + uv pip install 'nemo-toolkit[asr,tts]' .. code-block:: python diff --git a/docs/source/speechlm2/intro.rst b/docs/source/speechlm2/intro.rst index 791ecad968d9..97ee334d68af 100644 --- a/docs/source/speechlm2/intro.rst +++ b/docs/source/speechlm2/intro.rst @@ -5,7 +5,9 @@ SpeechLM2 The SpeechLM2 collection is still in active development and the code is likely to keep changing. .. note:: - Install with ``pip install nemo-toolkit[speechlm2]`` to get all required dependencies including NeMo Automodel. + Install your chosen compatible PyTorch stack first, then install SpeechLM2 with + ``uv pip install 'nemo-toolkit[speechlm2]'`` (or, from a source checkout, ``uv pip install -e '.[speechlm2]'``) + to get all required dependencies including NeMo Automodel. See :ref:`installation` for details. SpeechLM2 refers to a collection that augments pre-trained Large Language Models (LLMs) with speech understanding and generation capabilities. diff --git a/docs/source/starthere/install.rst b/docs/source/starthere/install.rst index f43184d0d807..5745b25f7980 100644 --- a/docs/source/starthere/install.rst +++ b/docs/source/starthere/install.rst @@ -8,60 +8,52 @@ This page covers how to install NVIDIA NeMo for speech AI tasks (ASR, TTS, speak Prerequisites ------------- -Before installing NeMo, ensure you have: +NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**: #. **Python** 3.12 or above -#. **PyTorch** 2.7+ (install **before** NeMo so CUDA wheels match your GPU driver) -#. **NVIDIA GPU** (required for training; CPU-only inference is possible but slow) +#. **PyTorch** 2.7 or above, for your chosen target (CPU, CUDA, etc.) +#. **NVIDIA GPU + CUDA** (required for training; CPU-only inference is possible but slow) +#. **uv** for the fastest source/PyPI workflow (``pip`` also works in a prepared environment) -Recommended installation order ------------------------------- +.. admonition:: Bring your own Python / PyTorch / CUDA + :class: important -Install dependencies in this order when setting up a **local GPU** environment: + The recommended install path is uv (below), which gives you our actively-tested stack. But NeMo Speech can also install *on top of* an existing environment: if you already have a Python, PyTorch, and CUDA stack that satisfies the minimums above, your pre-installed PyTorch is **kept, not replaced** (see :ref:`the pip fallback `). -#. Create and activate a Python environment. -#. Install a **CUDA toolkit** (or rely on a driver + PyTorch bundle that matches your CUDA major version). -#. Install **PyTorch** (and torchvision if you need it) from the index that matches your CUDA build. -#. Install **NeMo** (from PyPI or editable source) **with the extras** for the collections you need (``asr``, ``tts``, etc.). + The versions pinned in ``uv.lock`` and shipped in the official container — **Python 3.13, PyTorch 2.12, CUDA 12.6/13.2** — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement. -Putting PyTorch in place first avoids mismatched CUDA runtimes and makes NeMo’s optional GPU-dependent packages resolve correctly. +.. note:: -**Example (conda + pip, CUDA 13.0 PyTorch wheels):** + As of `PyTorch 2.6 `_, ``torch.load`` defaults to ``weights_only=True``. Some checkpoints require ``weights_only=False``; in that case set ``TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1`` before loading, and only with trusted files (loading untrusted files with full pickle support risks arbitrary code execution). -.. code-block:: bash - - # 1) New environment (adjust Python version if your platform requires it) - conda create -n nemo python=3.12 -y - conda activate nemo - - # 2) CUDA toolkit from conda (optional if you already have a compatible toolkit via the driver) - conda install nvidia::cuda-toolkit +.. _install-from-source: - # 3) PyTorch built for CUDA 13.x — change cu130 / URL if you use cu124 or CPU-only - pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130 +Install from Source with uv (recommended) +------------------------------------------ - # 4) NeMo: use extras for ASR/TTS/etc. For a clone of the repo, use editable install (see below) - pip install nemo_toolkit[asr,tts] +The recommended way to install NeMo Speech is from source with `uv `_, which reproduces our actively-tested stack from the committed ``uv.lock``: -Adjust the PyTorch ``--index-url`` (e.g. ``cu124``, ``cu121``, or CPU) to match `PyTorch’s install matrix `_ and your NVIDIA driver. +.. code-block:: bash -Install from PyPI ------------------ + git clone https://github.com/NVIDIA-NeMo/NeMo.git + cd NeMo -The quickest way to install NeMo is via pip. Install only the collections you need: + # CUDA 13.x (recommended). Use --extra cu12 for CUDA 12.x. uv resolves the + # matching PyTorch CUDA wheel automatically from the pinned indexes. + uv sync --extra all --extra cu13 -.. code-block:: bash + # Optional: add the test suite tooling, or the docs build dependencies + # uv sync --extra all --extra cu13 --group test + # uv sync --group docs - # Install ASR and TTS (most common) - pip install nemo_toolkit[asr,tts] +``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run `` or activate the environment with ``source .venv/bin/activate``. For the **exact** container baseline, add ``--locked --python 3.13`` (i.e. ``uv sync --locked --python 3.13 --extra all --extra cu13``) — this is the path the Dockerfile and CI use. - # Install everything speech-related - pip install nemo_toolkit[asr,tts,audio] +On Linux, pass exactly one of ``--extra cu13`` (recommended) or ``--extra cu12`` — they are mutually exclusive. If you omit both, uv installs the generic PyPI PyTorch wheel instead of NVIDIA's CUDA-matched build. -Available extras: +Available collection extras (combine with one CUDA extra above): .. list-table:: - :widths: 15 85 + :widths: 18 82 :header-rows: 1 * - Extra @@ -72,46 +64,143 @@ Available extras: - Text-to-Speech models, vocoders, and audio codecs * - ``audio`` - Audio processing models (enhancement, separation) + * - ``speechlm2`` + - Speech language models (includes NeMo Automodel) + * - ``all`` + - All of the collections above + * - ``cu12`` / ``cu13`` + - Our pinned CUDA 12.x / 13.x PyTorch build **plus** the matching CUDA Python deps (``cuda-python``, ``numba-cuda``). Linux; pick at most one. -.. _install-from-source: +.. note:: -Install from Source -------------------- + ``test`` and ``docs`` are dependency *groups* (PEP 735), not extras. Install them with ``--group`` (e.g. ``uv sync --group test``) — the bracket form ``.[test]`` does not work. + +.. _install-compiled-extras: + +Optional compiled dependencies for SpeechLM2 / Automodel (``compiled`` / ``compiled-a100``) +------------------------------------------------------------------------------------------- + +The Automodel backend used for SpeechLM2 **does not require any compiled dependencies — it runs without them.** The ``compiled`` and ``compiled-a100`` extras are an *optional* performance add-on: when their source-built GPU kernels are installed, Automodel can route to dedicated accelerated backends (FP8 Transformer kernels via Transformer Engine, FlashAttention, Mamba/state-space layers, and Mixture-of-Experts ops). They contain: + +.. list-table:: + :widths: 30 70 + :header-rows: 1 + + * - Package + - Purpose + * - ``transformer-engine`` + - NVIDIA Transformer Engine — FP8 and accelerated Transformer kernels + * - ``flash-attn`` + - FlashAttention attention kernels + * - ``mamba-ssm`` + ``causal-conv1d`` + - Mamba / state-space-model kernels (hybrid Mamba architectures) + * - ``nv-grouped-gemm`` + - Grouped GEMM kernels for Mixture-of-Experts (MoE) layers + * - ``deep_ep`` (DeepEP) + - Expert-parallel communication kernels for MoE (``compiled`` only — see below) + * - ``onnx-ir`` + ``onnxscript`` + - Pinned ONNX export tooling + +Choose the variant that matches your GPU (the two are mutually exclusive): + +* ``compiled`` — Hopper/Blackwell and newer (SM90/SM100/SM120, e.g. H100/H200/B200). Includes DeepEP. +* ``compiled-a100`` — Ampere A100 (SM80). Omits DeepEP, which requires a separately-built, patched version on A100; our Dockerfile auto-builds and installs it when the CUDA 12 base image is selected. + +.. warning:: + + These packages **build from source** and need a full CUDA build environment — build tools, matching ``TORCH_CUDA_ARCH_LIST`` / ``NVTE_CUDA_ARCHS`` flags, ``--no-build-isolation``, and (for ``compiled``) extra manual build steps that the Dockerfile performs (e.g. flash-attn-4 and DeepEP patches). The supported, reproducible way to get them is the container build, which sets all of this up for you: + + .. code-block:: bash + + # Hopper/Blackwell (default GPU_TARGET=h100plus → compiled) + docker buildx build -f docker/Dockerfile -t nemo-speech . + + # Ampere A100 (GPU_TARGET=a100 → compiled-a100) + docker buildx build -f docker/Dockerfile \ + --build-arg BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 \ + --build-arg GPU_TARGET=a100 -t nemo-speech . + + A bare ``uv sync --extra all --extra cu13 --extra compiled`` outside this environment will likely fail to compile. + +Using Docker (turnkey, our supported stack) +-------------------------------------------- -For the latest development version or if you plan to contribute, clone the repository and install in editable mode. +.. note:: -The ``test`` extra pulls in **pytest and tooling for the test suite**. It does **not** install NeMo collection dependencies (ASR, TTS, audio, etc.). Add those extras explicitly or imports like ``nemo.collections.asr`` will fail. + **NGC container:** *Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here.* + +To build the container from source, use the provided ``docker/Dockerfile`` (CUDA 13 / H100+ by default): .. code-block:: bash - git clone https://github.com/NVIDIA/NeMo.git + git clone https://github.com/NVIDIA-NeMo/NeMo.git cd NeMo + docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default) + docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash + +For A100, set ``GPU_TARGET=a100``. A100 works with **both CUDA 12 and CUDA 13** — CUDA 13 (the default base image) is recommended; the CUDA 12 base is offered only as a convenience: - # After PyTorch is installed (see Recommended installation order above): - # Collections you need for development (required for nemo.collections.* imports) - pip install -e '.[asr,tts]' +.. code-block:: bash - # Optional: add test to run pytest with NeMo’s dev test dependencies - # pip install -e '.[asr,tts,test]' + # A100 on CUDA 13 (recommended) — uses the default CUDA 13 base image + docker buildx build -f docker/Dockerfile --build-arg GPU_TARGET=a100 -t nemo-speech:a100 . -Using Docker ------------- + # A100 on CUDA 12 (convenience) + docker buildx build -f docker/Dockerfile \ + --build-arg BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 \ + --build-arg GPU_TARGET=a100 -t nemo-speech:a100-cu12 . + +See the header of ``docker/Dockerfile`` for all build arguments (``BASE_IMAGE``, ``GPU_TARGET``). + +.. _install-from-pypi: + +Install from PyPI with pip (fallback — bring your own versions) +--------------------------------------------------------------- + +Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see `PyTorch's install matrix `_), then add NeMo. Your pre-installed PyTorch is kept, not replaced. ``uv pip`` (uv's fast, pip-compatible installer) works just like ``pip``: + +.. code-block:: bash + + uv venv --python 3.12 # any Python >= 3.12 your PyTorch supports — or use your own env + source .venv/bin/activate + + # 1) Your choice of PyTorch (example: CUDA 12.6 build). Skip if you already have one. + uv pip install torch --index-url https://download.pytorch.org/whl/cu126 + + # 2) NeMo — your PyTorch above is kept (plain `pip install` works identically) + uv pip install 'nemo-toolkit[asr,tts]' # also: [asr,tts,audio], [speechlm2], etc. -NVIDIA provides Docker containers with NeMo pre-installed. Check the `NeMo GitHub releases `_ for the latest container tags. +.. warning:: + + Do **not** use ``uv sync --locked`` for a bring-your-own stack — it intentionally applies ``uv.lock`` and replaces your Python/PyTorch/CUDA with the supported container baseline. Use ``uv pip`` (or ``pip``) here; reserve ``uv sync --locked`` for reproducing the supported stack (above). + +To instead have the installer pull *our* pinned PyTorch build, add the matching CUDA extra **and** the PyTorch wheel index (``pip`` / ``uv pip`` do not read uv's project index config, so ``--extra-index-url`` is required): + +.. code-block:: bash + + pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x + pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126 # CUDA 12.x + +.. tip:: + + Prefer a conda environment? Create and activate one (``conda create -n nemo python=3.12 -y && conda activate nemo``), then run the same ``uv`` or ``pip`` commands above inside it. NeMo Speech does not require a separate conda CUDA toolkit. Verify Installation ------------------- -After installing, verify that NeMo is working: +After installing, verify that the chosen collection imports: + +.. code-block:: bash + + python -c "import nemo.collections.asr as nemo_asr; print('NeMo ASR installed')" + +If you installed with ``uv sync`` and have not activated ``.venv``, run the check through ``uv run python``. To also exercise a model download: .. code-block:: python import nemo.collections.asr as nemo_asr - print("NeMo ASR installed successfully!") - - # Quick test: load a pretrained model model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2") - print(f"Model loaded: {model.__class__.__name__}") + print(f"Loaded: {model.__class__.__name__}") What's Next? ------------ diff --git a/docs/source/tools/nemo_forced_aligner.rst b/docs/source/tools/nemo_forced_aligner.rst index d8f89c70447f..2ad87f0dd33a 100644 --- a/docs/source/tools/nemo_forced_aligner.rst +++ b/docs/source/tools/nemo_forced_aligner.rst @@ -19,7 +19,7 @@ Demos & Tutorials Quickstart ---------- -1. Install `NeMo `__. +1. Install NeMo with the ASR collection. See :ref:`installation`. 2. Prepare a NeMo-style manifest containing the paths of audio files you would like to proces, and (optionally) their text. 3. Run NFA's ``align.py`` script with the desired config, e.g.: diff --git a/docs/source/tts/g2p.rst b/docs/source/tts/g2p.rst index 491095b88732..71f61139290d 100644 --- a/docs/source/tts/g2p.rst +++ b/docs/source/tts/g2p.rst @@ -126,7 +126,7 @@ Using this unknown token forces a G2P model to produce the same masking token as Requirements ------------ -G2P requires the NeMo ASR collection to be installed (``pip install nemo_toolkit[asr]``). +G2P requires the NeMo ASR collection to be installed. See :ref:`installation` and include the ``asr`` extra. References diff --git a/docs/source/tts/magpietts-finetuning.rst b/docs/source/tts/magpietts-finetuning.rst index 4c58da1ae990..50f5da384a55 100644 --- a/docs/source/tts/magpietts-finetuning.rst +++ b/docs/source/tts/magpietts-finetuning.rst @@ -20,7 +20,7 @@ Before finetuning, you will need: - A pretrained Magpie-TTS checkpoint (``pretrained.ckpt`` or ``pretrained.nemo``). Public checkpoints (``https://huggingface.co/nvidia/magpie_tts_multilingual_357m``) are available on Hugging Face. - The audio codec model (``https://huggingface.co/nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps``), available on Hugging Face alongside the TTS checkpoint. - A prepared dataset. For faster finetuning audio codec tokens must be pre-extracted from your audio files. See the *Dataset Preparation* section below. -- NeMo installed from source or via the NeMo container. See the `NeMo GitHub page `_ for installation instructions. +- NeMo installed from source or with the local Dockerfile. See :ref:`installation` for installation instructions. Dataset Preparation diff --git a/nemo/agents/voice_agent/pipecat/services/nemo/stt.py b/nemo/agents/voice_agent/pipecat/services/nemo/stt.py index bb048f50805e..69111ec4e8d0 100644 --- a/nemo/agents/voice_agent/pipecat/services/nemo/stt.py +++ b/nemo/agents/voice_agent/pipecat/services/nemo/stt.py @@ -50,7 +50,7 @@ except ModuleNotFoundError as e: logger.error(f"Exception: {e}") - logger.error('In order to use NVIDIA NeMo STT, you need to `pip install "nemo_toolkit[all]"`.') + logger.error('In order to use NVIDIA NeMo STT, you need to `pip install "nemo-toolkit[all]"`.') raise Exception(f"Missing module: {e}") diff --git a/nemo/collections/speechlm2/vllm/salm/audio.py b/nemo/collections/speechlm2/vllm/salm/audio.py index c892364c0a56..6e7ff0c38f86 100644 --- a/nemo/collections/speechlm2/vllm/salm/audio.py +++ b/nemo/collections/speechlm2/vllm/salm/audio.py @@ -95,7 +95,7 @@ def _load_nemo_perception(perception_cfg: dict) -> nn.Module: from nemo.collections.speechlm2.modules import AudioPerceptionModule except ImportError as e: raise ImportError( - "NeMo is required for the audio encoder. " "Install with: pip install nemo_toolkit[asr]" + "NeMo is required for the audio encoder. " "Install with: pip install 'nemo-toolkit[asr]'" ) from e cfg = DictConfig(perception_cfg) diff --git a/nemo/collections/speechlm2/vllm/salm/model.py b/nemo/collections/speechlm2/vllm/salm/model.py index cffc19a4977d..e3da43cb6986 100644 --- a/nemo/collections/speechlm2/vllm/salm/model.py +++ b/nemo/collections/speechlm2/vllm/salm/model.py @@ -28,7 +28,7 @@ granite-4.0-micro escape hatch). Requires NeMo toolkit for the audio encoder: - pip install nemo_toolkit[asr] + pip install 'nemo-toolkit[asr]' """ from collections.abc import Iterable diff --git a/nemo/core/config/templates/model_card.py b/nemo/core/config/templates/model_card.py index 80b09a2d37cf..3de051b3845e 100644 --- a/nemo/core/config/templates/model_card.py +++ b/nemo/core/config/templates/model_card.py @@ -36,9 +36,9 @@ ## NVIDIA NeMo: Training -To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version. +To train, fine-tune, or experiment with the model, install the PyTorch build for your platform first, then install [NVIDIA NeMo](https://docs.nvidia.com/nemo/speech/nightly/starthere/install.html) with the extras you need. ``` -pip install nemo_toolkit['all'] +pip install 'nemo-toolkit[all]' ``` ## How to Use this Model diff --git a/nemo/package_info.py b/nemo/package_info.py index 6f097f59bca1..1fff85f30991 100644 --- a/nemo/package_info.py +++ b/nemo/package_info.py @@ -28,8 +28,8 @@ __contact_names__ = "NVIDIA" __contact_emails__ = "nemo-toolkit@nvidia.com" __homepage__ = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/" -__repository_url__ = "https://github.com/nvidia/nemo" -__download_url__ = "https://github.com/NVIDIA/NeMo/releases" +__repository_url__ = "https://github.com/NVIDIA-NeMo/NeMo" +__download_url__ = "https://github.com/NVIDIA-NeMo/NeMo/releases" __description__ = "NeMo - a toolkit for Conversational AI" __license__ = "Apache2" __keywords__ = "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, tts, speech, language" diff --git a/pyproject.toml b/pyproject.toml index 33d69147a674..99e702d903cf 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -355,8 +355,8 @@ py-modules = ["nemo"] nemo_speechlm = "nemo.collections.speechlm2.vllm.salm:register" [project.urls] -Download = "https://github.com/NVIDIA/NeMo/releases" -Homepage = "https://github.com/nvidia/nemo" +Download = "https://github.com/NVIDIA-NeMo/NeMo/releases" +Homepage = "https://github.com/NVIDIA-NeMo/NeMo" [tool.isort] profile = "black" # black-compatible diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md index fc06e979a3a4..6a0de6fc0908 100644 --- a/tools/nemo_forced_aligner/README.md +++ b/tools/nemo_forced_aligner/README.md @@ -12,7 +12,7 @@ NFA is a tool for generating token-, word- and segment-level timestamps of speec ## Quickstart -1. Install [NeMo](https://github.com/NVIDIA/NeMo#installation). +1. Install [NeMo](https://docs.nvidia.com/nemo/speech/nightly/starthere/install.html) with the ASR collection. 2. Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text. 3. Run NFA's `align.py` script with the desired config, e.g.: ``` bash diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py index b38c40d68f90..8956adee8738 100644 --- a/tools/nemo_forced_aligner/align.py +++ b/tools/nemo_forced_aligner/align.py @@ -48,9 +48,9 @@ raise ImportError( "Missing required dependency for NFA. " "Install NeMo with NFA utilities support:\n" - " pip install 'nemo_toolkit[all]>=2.5.0'\n" + " pip install 'nemo-toolkit[all]>=2.5.0'\n" "Or install the latest development version:\n" - " pip install git+https://github.com/NVIDIA/NeMo.git" + " pip install git+https://github.com/NVIDIA-NeMo/NeMo.git" ) """ Align the utterances in manifest_filepath. diff --git a/tools/nemo_forced_aligner/align_eou.py b/tools/nemo_forced_aligner/align_eou.py index f40fa7eadaec..f851bee08ed9 100644 --- a/tools/nemo_forced_aligner/align_eou.py +++ b/tools/nemo_forced_aligner/align_eou.py @@ -53,9 +53,9 @@ raise ImportError( "Missing required dependency for NFA. " "Install NeMo with NFA utilities support:\n" - " pip install 'nemo_toolkit[all]>=2.5.0'\n" + " pip install 'nemo-toolkit[all]>=2.5.0'\n" "Or install the latest development version:\n" - " pip install git+https://github.com/NVIDIA/NeMo.git" + " pip install git+https://github.com/NVIDIA-NeMo/NeMo.git" ) """ diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt index 9daa6d2f2496..cd21068986ef 100644 --- a/tools/nemo_forced_aligner/requirements.txt +++ b/tools/nemo_forced_aligner/requirements.txt @@ -1,3 +1,3 @@ -nemo_toolkit[all] +nemo-toolkit[all] prettyprinter # for testing pytest # for testing