Skip to content

[24.04-linux-nvidia] Backport: Mitigate TLBI errata on various Arm CPUs#463

Open
nvmochs wants to merge 9 commits into
NVIDIA:24.04_linux-nvidiafrom
nvmochs:jun2026_tlbi_errata_68
Open

[24.04-linux-nvidia] Backport: Mitigate TLBI errata on various Arm CPUs#463
nvmochs wants to merge 9 commits into
NVIDIA:24.04_linux-nvidiafrom
nvmochs:jun2026_tlbi_errata_68

Conversation

@nvmochs

@nvmochs nvmochs commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

These patches address CVE-2025-10263, an Arm TLBI completion erratum where affected CPUs may complete a broadcast TLBI sequence before all memory accesses translated by the invalidated entry are globally observed. The mitigation enables the existing arm64 repeat-TLBI workaround so affected TLBI sequences are followed by an additional broadcast TLBI/DSB sequence.

For NVIDIA platforms, the series adds the required CPU ID coverage and enables CONFIG_ARM64_ERRATUM_4118414 in the NVIDIA kernel annotations so the mitigation is built for both Grace and Vera platforms. The platform config marks this erratum as required for Grace and Vera enablement.

Verification was performed on both Grace and Vera by booting the patched kernel and confirming:

  • CONFIG_ARM64_ERRATUM_4118414=y
  • CONFIG_ARM64_WORKAROUND_REPEAT_TLBI=y
  • kernel log reports the active workaround: CPU features: detected: Broken broadcast TLBI completion

Upstream LKML:
[PATCH 0/3] arm64: errata: Mitigate TLBI errata on various Arm CPUs - https://lore.kernel.org/all/20260609101203.1512409-1-mark.rutland@arm.com/
[PATCH v1] arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU - https://lore.kernel.org/all/20260609234044.3945938-1-sdonthineni@nvidia.com/

linux-next:
60349e64a6c6 arm64: cputype: Add C1-Ultra definitions
d28413bfc5a2 arm64: cputype: Add C1-Premium definitions
cfd391e74134 arm64: errata: Mitigate TLBI errata on various Arm CPUs
ec7216f92e4e arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU


The patches for this 6.8 PR were picked from Mark Rutland's v6.6 branch that he has prepared for this errata. This includes two optimization patches that were already present in later kernels.

LKML: https://lore.kernel.org/all/20260611134248.1700496-1-mark.rutland@arm.com/
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=stable-6.6/arm-4118414/backport

Note: For the optimization patch ("arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI"), there was 1 hunk that I borrowed from Mark's 6.12 backport (https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=stable-6.12/arm-4118414/backport) because this 6.8 branch already contained a refactored version of some of the TLB code that was not present in v6.6.


LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-bos/+bug/2156557

mrutland-arm and others added 9 commits June 12, 2026 17:38
commit bfd9c93 upstream.

The TLBI instruction accepts XZR as a register argument, and for TLBI
operations with a register argument, there is no functional difference
between using XZR or another GPR which contains zeroes. Operations
without a register argument are encoded as if XZR were used.

Allow the __TLBI_1() macro to use XZR when a register argument is all
zeroes.

Today this only results in a trivial code saving in
__do_compat_cache_op()'s workaround for Neoverse-N1 erratum #1542419. In
subsequent patches this pattern will be used more generally.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Mark: Backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit cae400c3ec49d3ac3a5386f783a90230a2cb26fc git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit a8f7868 upstream.

The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several
errata where broadcast TLBI;DSB sequences don't provide all the
architecturally required synchronization. The workaround performs more
work than necessary, and can have significant overhead. This patch
optimizes the workaround, as explained below.

The workaround was originally added for Qualcomm Falkor erratum 1009 in
commit:

  d9ff80f ("arm64: Work around Falkor erratum 1009")

As noted in the message for that commit, the workaround is applied even
in cases where it is not strictly necessary.

The workaround was later reused without changes for:

* Arm Cortex-A76 erratum #1286807
  SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/

* Arm Cortex-A55 erratum #2441007
  SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/

* Arm Cortex-A510 erratum #2441009
  SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/

The important details to note are as follows:

1. All relevant errata only affect the ordering and/or completion of
   memory accesses which have been translated by an invalidated TLB
   entry. The actual invalidation of TLB entries is unaffected.

2. The existing workaround is applied to both broadcast and local TLB
   invalidation, whereas for all relevant errata it is only necessary to
   apply a workaround for broadcast invalidation.

3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI
   sequence, whereas for all relevant errata it is only necessary to
   execute a single additional TLBI;DSB sequence after any number of
   TLBIs are completed by a DSB.

   For example, for a sequence of batched TLBIs:

       TLBI <op1>[, <arg1>]
       TLBI <op2>[, <arg2>]
       TLBI <op3>[, <arg3>]
       DSB ISH

   ... the existing workaround will expand this to:

       TLBI <op1>[, <arg1>]
       DSB ISH                  // additional
       TLBI <op1>[, <arg1>]     // additional
       TLBI <op2>[, <arg2>]
       DSB ISH                  // additional
       TLBI <op2>[, <arg2>]     // additional
       TLBI <op3>[, <arg3>]
       DSB ISH                  // additional
       TLBI <op3>[, <arg3>]     // additional
       DSB ISH

   ... whereas it is sufficient to have:

       TLBI <op1>[, <arg1>]
       TLBI <op2>[, <arg2>]
       TLBI <op3>[, <arg3>]
       DSB ISH
       TLBI <opX>[, <argX>]     // additional
       DSB ISH                  // additional

   Using a single additional TBLI and DSB at the end of the sequence can
   have significantly lower overhead as each DSB which completes a TLBI
   must synchronize with other PEs in the system, with potential
   performance effects both locally and system-wide.

4. The existing workaround repeats each specific TLBI operation, whereas
   for all relevant errata it is sufficient for the additional TLBI to
   use *any* operation which will be broadcast, regardless of which
   translation regime or stage of translation the operation applies to.

   For example, for a single TLBI:

       TLBI ALLE2IS
       DSB ISH

   ... the existing workaround will expand this to:

       TLBI ALLE2IS
       DSB ISH
       TLBI ALLE2IS             // additional
       DSB ISH                  // additional

   ... whereas it is sufficient to have:

       TLBI ALLE2IS
       DSB ISH
       TLBI VALE1IS, XZR        // additional
       DSB ISH                  // additional

   As the additional TLBI doesn't have to match a specific earlier TLBI,
   the additional TLBI can be implemented in separate code, with no
   memory of the earlier TLBIs. The additional TLBI can also use a
   cheaper TLBI operation.

5. The existing workaround is applied to both Stage-1 and Stage-2 TLB
   invalidation, whereas for all relevant errata it is only necessary to
   apply a workaround for Stage-1 invalidation.

   Architecturally, TLBI operations which invalidate only Stage-2
   information (e.g. IPAS2E1IS) are not required to invalidate TLB
   entries which combine information from Stage-1 and Stage-2
   translation table entries, and consequently may not complete memory
   accesses translated by those combined entries. In these cases,
   completion of memory accesses is only guaranteed after subsequent
   invalidation of Stage-1 information (e.g. VMALLE1IS).

Taking the above points into account, this patch reworks the workaround
logic to reduce overhead:

* New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are
  added and used in place of any dsb(ish) which is used to complete
  broadcast Stage-1 TLB maintenance. When the
  ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will
  execute an additional TLBI;DSB sequence.

  For consistency, it might make sense to add __tlbi_sync_*() helpers
  for local and stage 2 maintenance. For now I've left those with
  open-coded dsb() to keep the diff small.

* The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This
  is no longer needed as the necessary synchronization will happen in
  __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp().

* The additional TLBI operation is chosen to have minimal impact:

  - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at
    EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused
    entry for the reserved ASID in the kernel's own translation regime,
    and have no adverse affect.

  - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used
    in hyp code, where it will target an unused entry in the hyp code's
    TTBR0 mapping, and should have no adverse effect.

* As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a
  TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no
  need for arch_tlbbatch_should_defer() to consider
  ARM64_WORKAROUND_REPEAT_TLBI.

When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this
patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes
the resulting Image 64KiB smaller:

| [mark@lakrids:~/src/linux]% size vmlinux-*
|    text    data     bss     dec     hex filename
| 21179831        19660919         708216 4154896        279fca6 vmlinux-after
| 21181075        19660903         708216 41550194        27a0172 vmlinux-before
| [mark@lakrids:~/src/linux]% ls -l vmlinux-*
| -rwxr-xr-x 1 mark mark 157771472 Feb  4 12:05 vmlinux-after
| -rwxr-xr-x 1 mark mark 157815432 Feb  4 12:05 vmlinux-before
| [mark@lakrids:~/src/linux]% ls -l Image-*
| -rw-r--r-- 1 mark mark 41007616 Feb  4 12:05 Image-after
| -rw-r--r-- 1 mark mark 41073152 Feb  4 12:05 Image-before

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Mark: Backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(backported from commit 828d37b78143dd3fed24ea968aa7b328fa47b35f git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
[mochs: Backport to 6.8:
    - kept the local __flush_tlb_range_op(..., lpa2_is_enabled()) calling
      convention.
    - 6.8 already has the newer __flush_tlb_range_nosync() split, matching
      the 6.12 stable shape, so left the nosync helper without a final
      completion barrier and placed __tlbi_sync_s1ish() in the
      __flush_tlb_range() wrapper.
    - kept KVM_PGTABLE_LAST_LEVEL in nVHE fixmap invalidation and replaced
      the final dsb(ish) with __tlbi_sync_s1ish_hyp().
    - dropped the old arch_tlbbatch_should_defer() repeat-TLBI guard, as the
      optimized workaround now performs the extra TLBI at sync points. ]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit e185c8a upstream.

Add cpu part and model macro definitions for NVIDIA Olympus core.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit f88e6e4cbffd46850d740e65735cb388ea3e5760 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit 60349e64a6c65f9f0aa118af711b3c7e137f07ff upstream.

Add cputype definitions for C1-Ultra. These will be used for errata
detection in subsequent patches.

These values can be found in the C1-Ultra TRM:

  https://developer.arm.com/documentation/108014/0100/

... in section A.5.1 ("MIDR_EL1, Main ID Register").

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit 0a7e94477a86ade945304fbf50f63c95a71ddd64 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit d28413bfc5a255957241f1df5d7fd0c2cd74fe18 upstream.

Add cputype definitions for C1-Premium. These will be used for errata
detection in subsequent patches.

These values can be found in the C1-Premium TRM:

  https://developer.arm.com/documentation/109416/0100/

... in section A.5.1 ("MIDR_EL1, Main ID Register").

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit 5169c0dfe9b39e5a1f09445edb183f4a754eba68 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit cfd391e74134db664feb499d43af286380b10ba8 upstream.

A number of CPUs developed by Arm suffer from errata whereby a broadcast
TLBI;DSB sequence may complete before the global observation of writes
which are translated by an affected TLB entry.

These errata ONLY affect the completion of memory accesses which have
been translated by an invalidated TLB entry, and these errata DO NOT
affect the actual invalidation of TLB entries. TLB entries are removed
correctly.

This issue has been assigned CVE ID CVE-2025-10263.

To mitigate this issue, Arm recommends that software follows any
affected TLBI;DSB sequence with an additional TLBI;DSB, which will
ensure that all memory write effects affected by the first TLBI have
been globally observed. The additional TLBI can use any operation that
is broadcast to affected CPUs, and the additional DSB can use any option
that is sufficient to complete the additional TLBI.

The ARM64_WORKAROUND_REPEAT_TLBI workaround is sufficient to mitigate
the issue. Enable this workaround for affected CPUs, and update the
silicon errata documentation accordingly.

Note that due to the manner in which Arm develops IP and tracks errata,
some CPUs share a common erratum number.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit 3a60639b065ecc1604a5c0b36141468b3402a54f git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit ec7216f92e4ebd485b1c6dc6aa3f6064b71a5768 upstream.

NVIDIA Olympus cores are affected by the TLBI completion issue tracked as
CVE-2025-10263. The existing ARM64_ERRATUM_4118414 handling already uses
ARM64_WORKAROUND_REPEAT_TLBI to issue an additional broadcast TLBI;DSB
sequence and ensure affected memory write effects are globally observed.

Add MIDR_NVIDIA_OLYMPUS to the repeat-TLBI match list so the same
mitigation is enabled on affected Olympus systems. Also document the
NVIDIA Olympus erratum in the arm64 silicon errata table and list it in
the Kconfig help text.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(cherry picked from commit b12413cb2e336ff524ae68da1ab851add6b26542 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
… Cobalt 100 CPU

commit 1940e70a8144bf75e6df26bf6f600862ea7f7ea1 upstream.

Commit fb091ff ("arm64: Subscribe Microsoft Azure Cobalt 100 to ARM
Neoverse N2 errata") states that Microsoft Azure Cobalt 100 CPU "is a
Microsoft implemented CPU based on r0p0 of the ARM Neoverse N2 CPU, and
therefore suffers from all the same errata.".

So enable the workaround for the latest broadcast TLB invalidation bug
on these parts.

Signed-off-by: Will Deacon <will@kernel.org>
[Mark: backport to v6.6.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
(cherry picked from commit ffb902a07bc21daf822a2a3e17a8eb58b4192733 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Enable ARM64_ERRATUM_4118414 to mitigate CVE-2025-10263 on NVIDIA platforms.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
@nvmochs

nvmochs commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

Testing details...

Note: Only tested 6.8-LTS on Grace as we don’t support that kernel on Vera.

nvidia@gb200-nvl4-47:~/mochs$ sudo ./verify_arm64_erratum_4118414.sh 
INFO: checking config: /boot/config-6.8.12+
PASS: CONFIG_ARM64_ERRATUM_4118414=y
PASS: CONFIG_ARM64_WORKAROUND_REPEAT_TLBI=y
PASS: runtime erratum print found: CPU features: detected: Broken broadcast TLBI completion
RESULT: PASS

@nirmoy nirmoy added the help wanted Extra attention is needed label Jun 13, 2026
@nirmoy

nirmoy commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

BaseOS Kernel Review

Summary

Only one minor issue found across the series: in commit 6287a03, the Kconfig help text for the Azure Cobalt 100 entry cites erratum number 4193789, which doesn't match the config symbol ARM64_ERRATUM_4118414. This is a documentation inconsistency only; the MIDR mapping is correct and there's no runtime impact.

Findings: Critical: 0, High: 0, Medium: 0, Low: 1

Latest watcher review: open review

Kernel deb build: failed (failure log, build artifacts)

Head: 5a9e6b0fe3f2

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants