[Compile] accelerate compilation speed using NVRTC #18519

Kathryn-cat · 2025-11-27T19:45:47Z

This PR supports NVRTC as an alternative to NVCC for faster, device-side JIT compilation of CUDA kernels, in favor of the PR apache/tvm-ffi#283.

It enhances the CUDA compilation backend by:

Adding Python NVRTC support using cuda-python bindings
Removing legacy C++ NVRTC fallback in favor of a Python-first approach
Keeping nvcc as the default compiler with fatbin output (no behavior change for existing users)

Users can choose the compilation backend using an environment variable TVM_CUDA_COMPILE_MODE, choosing from "nvcc" and "nvrtc". For example,

TVM_CUDA_COMPILE_MODE=nvrtc python3 your_program.py

Here is a short benchmark of the compilation speed of kernels in test_target_codegen_cuda.py.

NVCC vs NVRTC Compilation Time Comparison (Python-side Call)

Test Case	Code Size	NVCC Time (ms)	NVRTC Time (ms)	Speedup
`test_crossthread_reduction1`	1945 B	241.27	51.23	4.7x
`test_cuda_bf16_vectorize_add`	3760 B	342.72	44.50	7.7x
`test_cuda_const_float_to_half`	12394 B	272.85	31.99	8.5x
`test_cuda_device_func_call`	975 B	215.58	21.47	10.0x
`test_cuda_float_const_hex_format`	685 B	217.39	20.52	10.6x
`test_cuda_floordiv_with_vectorization`	1050 B	213.88	23.32	9.2x
`test_cuda_inf_nan`	673 B	214.33	24.94	8.6x
`test_cuda_tensormap`	755 B	213.91	20.74	10.3x
`test_cuda_thread_sync_inside_condition`	1007 B	213.43	28.29	7.5x
`test_cuda_vectorize_add`	908 B	226.81	40.39	5.6x
`test_cuda_vectorize_load`	734 B	217.25	24.02	9.0x
`test_device_host_call_same_func`	924 B	216.03	21.21	10.2x
`test_vectorized_intrin1`	847 B	226.15	26.34	8.6x

NVSHMEM Support

Currently, NVSHMEM is not supported via NVRTC.

Fallback Behavior: When NVSHMEM is required, the compilation pipeline will automatically fall back to NVCC, even if TVM_CUDA_COMPILE_MODE is set to nvrtc.
Future Roadmap: Support for NVRTC with NVSHMEM is planned for follow-up PRs.

yzh119 · 2025-12-05T08:29:06Z

python/tvm/contrib/nvcc.py

+    Environment Variables
+    ---------------------
+    TVM_CUDA_COMPILE_MODE : str
+        Compiler backend: "nvcc" (default) or "nvrtc"


why not default to nvrtc?

I think we should cross check the speed diff and once confirmed, we can switch to nvrtc default

python/tvm/contrib/nvcc.py

Kathryn-cat added 13 commits November 25, 2025 18:30

init

d63ef67

upd

882f7bd

upd

c4514c4

upd

7e76a59

upd

9df7435

upd

e8d1657

addressed segfault

5601ddb

remove host-side deps; unit test passed except CUtensorMap

9040d59

fixed int8 tests

568412e

CUtensorMap patch

d10e9fe

dual compilation problem fixed

a498b1f

TVM_CUDA_COMPILE_MODE

5f905e2

unit tests

925022f

Kathryn-cat changed the title ~~wip: nvrtc~~ [Compile] accelerate compilation speed using NVRTC Nov 29, 2025

Kathryn-cat marked this pull request as ready for review November 29, 2025 00:45

Kathryn-cat added 9 commits November 29, 2025 19:03

remove deps in cmake

073f51e

update call site

4008da0

gpu ci env

7a28348

lint

734bf71

skip test if cuda-python is not available

9c36b0b

robustify CUDA header files search

c598ba3

fix CI

c8969ec

fixed nvshmem

6158357

nvrtc nvshmem compile

fe4780e

yzh119 reviewed Dec 5, 2025

View reviewed changes

Kathryn-cat added 2 commits December 12, 2025 08:09

remove nvshmem tests

7856b5c

fall back to nvcc for nvshmem

143707e

MasterJH5574 reviewed Dec 12, 2025

View reviewed changes

python/tvm/contrib/nvcc.py Outdated Show resolved Hide resolved

Kathryn-cat added 2 commits December 12, 2025 08:35

update error message

2bcbc03

lint

b7fb6ca

Kathryn-cat added 6 commits December 12, 2025 09:30

lint

e5a9c0e

lint

92514c1

lint

94f1f56

lint

4b11de2

lint

4b14e38

add fast math to enable perf equal for nvcc and nvrtc

4a92047

spectrometerHBH approved these changes Jan 8, 2026

View reviewed changes

spectrometerHBH merged commit fa905d2 into apache:main Jan 8, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Compile] accelerate compilation speed using NVRTC #18519

[Compile] accelerate compilation speed using NVRTC #18519

Uh oh!

Kathryn-cat commented Nov 27, 2025 •

edited

Loading

Uh oh!

yzh119 Dec 5, 2025

Uh oh!

tqchen Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Compile] accelerate compilation speed using NVRTC #18519

[Compile] accelerate compilation speed using NVRTC #18519

Uh oh!

Conversation

Kathryn-cat commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NVCC vs NVRTC Compilation Time Comparison (Python-side Call)

NVSHMEM Support

Uh oh!

yzh119 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

tqchen Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kathryn-cat commented Nov 27, 2025 •

edited

Loading