Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

increasing precision tolerance community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3060 opened May 29, 2026 by francesco-bertolotti Contributor Loading…
4 of 13 tasks
[JAX] Grouped quant+GEMM custom partitioning rules
#3058 opened May 28, 2026 by jberchtold-nvidia Collaborator Draft
13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors. community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3057 opened May 28, 2026 by plugyawn Loading…
7 of 13 tasks
[PyTorch] [torch.compile] torch.compile support for Linear
#3053 opened May 28, 2026 by pggPL Collaborator Draft
13 tasks
[PyTorch] Propagate FP8 graph weight update flag in GroupedLinear community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3052 opened May 28, 2026 by allenphilipj Loading…
Enable NVFP4 fused grouped MLP community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. org-contribution
#3048 opened May 27, 2026 by sraman-rgb Contributor Loading…
1 of 13 tasks
Feat/selective offload on srelu fuser community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3047 opened May 27, 2026 by lhb8125 Contributor Loading…
13 tasks
Add NVFP4 per-token quantization recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3045 opened May 26, 2026 by cael-ling Contributor Draft
13 tasks
docs: expand comm gemm overlap guidance community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3043 opened May 26, 2026 by omribz156 Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3042 opened May 26, 2026 by zianglih Contributor Draft
[PyTorch Debug] Fix scale_inv_min returning 0 for MXFP8/NVFP4
#3041 opened May 25, 2026 by pggPL Collaborator Loading…
6 of 13 tasks
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w… community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3040 opened May 25, 2026 by shangxiaokang Draft
13 tasks
TE_DType in python
#3039 opened May 22, 2026 by vthumbe1503 Collaborator Draft
13 tasks
[fix] Fix CUTLASS grouped GEMM segfault for empty groups community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3037 opened May 22, 2026 by Baibaifan Loading…
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Expert Parallelism: common C API + NCCL EP backend
#3034 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
Add MXFP8 attention unit test with linear and rope layers community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3033 opened May 22, 2026 by layalir Loading…
[Common] Enable NVFP4 2D block scaling in columnwise only
#3027 opened May 21, 2026 by negvet Collaborator Loading…
1 of 13 tasks
[PyT] Reduce test sizes in fused attn fp8 vs fp16 to avoid OOM attention community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3020 opened May 21, 2026 by vedaanta Loading…
1 of 13 tasks
ProTip! Filter pull requests by the default branch with base:main.