Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,9 @@ NeMo Speech — toolkit for training/deploying speech models (ASR, TTS, Speech L

## Build & Install

```bash
pip install -e '.[all]' # Full dev install
pip install -e '.[asr]' # ASR only
pip install -e '.[test]' # With test deps
```
See the canonical installation guide — [`docs/source/starthere/install.rst`](docs/source/starthere/install.rst) (published at https://docs.nvidia.com/nemo/speech/nightly/) — for the uv, pip (bring-your-own Python/PyTorch/CUDA), Docker, and optional `compiled` (SpeechLM2/Automodel) install paths.

Requires Python 3.10+, PyTorch 2.6+.
Dev quickstart: `uv sync --extra all --extra cu13` (Python 3.12+, PyTorch 2.7+; `test`/`docs` are `--group`s, not extras).

## Code Style

Expand Down Expand Up @@ -49,7 +45,7 @@ Markers: `unit`, `integration`, `system`, `pleasefixme` (broken — skip), `skip
Sphinx-based docs live in `docs/source/`. Build with:

```bash
uv sync --group docs # one-time setup
uv sync --locked --group docs # one-time setup (matches CI)
uv run make -C docs clean html # full rebuild
uv run make -C docs html # incremental rebuild
```
Expand Down
56 changes: 51 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,13 @@ For technical documentation, please see the

## Requirements

NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:

- Python 3.12 or above
- Pytorch 2.6 or above
- NVIDIA GPU (if you intend to do model training)
- PyTorch 2.7 or above (CPU, CUDA, etc. — your choice)
- NVIDIA GPU + CUDA (required for training; recommended for inference)

If you already have a Python/PyTorch/CUDA stack that satisfies those minimums, NeMo Speech installs on top of it **without replacing it**, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.

As of [Pytorch 2.6](https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true),
`torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`.
Expand All @@ -68,9 +72,51 @@ can have the risk of arbitrary code execution.

## Install NeMo Speech

NeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'`
To install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'`
or `pip install 'nemo-toolkit[all,cu13]'` respectively.
The recommended way to install NeMo Speech is from source with [uv](https://docs.astral.sh/uv/), which reproduces our actively-tested stack from the committed `uv.lock`. If you need different Python/PyTorch/CUDA versions, NeMo also installs over your existing environment via pip — see the [pip fallback](#from-pypi-with-pip-fallback--bring-your-own-versions) below.

### From source with uv (recommended)

```bash
git clone https://github.com/NVIDIA-NeMo/NeMo.git
cd NeMo
uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
```

This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use).

> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.

### Docker (turnkey, our supported stack)

> **NGC container:** _Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here._

To build the container from source (CUDA 13 / H100+ by default):

```bash
git clone https://github.com/NVIDIA-NeMo/NeMo.git
cd NeMo
docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default)
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
```

For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`).

### From PyPI with pip (fallback — bring your own versions)

Prefer your own Python/PyTorch/CUDA? Install your PyTorch first (any version ≥ 2.7 for your CPU/CUDA/etc. target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`:

```bash
uv pip install 'nemo-toolkit[asr,tts]' # or plain: pip install 'nemo-toolkit[asr,tts]'
```

> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack.

To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required):

```bash
pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x
pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126 # CUDA 12.x
```

## Contribute to NeMo

Expand Down
8 changes: 3 additions & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@

## Building the Documentation

1. Create and activate a virtual environment.

1. Install the documentation dependencies:
1. Install the documentation dependencies into the locked `uv` environment:

```console
$ uv sync --group docs
$ uv sync --locked --group docs
```

1. Build the documentation:
Expand All @@ -21,7 +19,7 @@
1. Build the documentation, as described in the preceding section, but use the following command:

```shell
make -C docs clean linkcheck
uv run make -C docs clean linkcheck
```

1. Run the link-checking script:
Expand Down
8 changes: 0 additions & 8 deletions docs/source/broken_links_needing_review..json
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
"uri": "https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/optimizer.py#L793",
"info": "Anchor 'L793' not found"
}
{
"filename": "tools/nemo_forced_aligner.rst",
"lineno": 22,
"status": "broken",
"code": 0,
"uri": "https://github.com/NVIDIA/NeMo#installation",
"info": "Anchor 'installation' not found"
}
{
"filename": "checkpoints/intro.rst",
"lineno": 28,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ What is NeMo?
- **Scalable training** — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support
- **Simple configuration** — YAML-based experiment configs with `Hydra <https://hydra.cc/>`__

Get started in 30 seconds:
Get started (install the PyTorch build for your platform first):

.. code-block:: bash
pip install nemo_toolkit[asr,tts]
uv pip install 'nemo-toolkit[asr,tts]'
.. code-block:: python
Expand Down
4 changes: 3 additions & 1 deletion docs/source/speechlm2/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ SpeechLM2
The SpeechLM2 collection is still in active development and the code is likely to keep changing.

.. note::
Install with ``pip install nemo-toolkit[speechlm2]`` to get all required dependencies including NeMo Automodel.
Install your chosen compatible PyTorch stack first, then install SpeechLM2 with
``uv pip install 'nemo-toolkit[speechlm2]'`` (or, from a source checkout, ``uv pip install -e '.[speechlm2]'``)
to get all required dependencies including NeMo Automodel. See :ref:`installation` for details.

SpeechLM2 refers to a collection that augments pre-trained Large Language Models (LLMs) with speech understanding and generation capabilities.

Expand Down
Loading
Loading