[24.04-linux-nvidia] Backport: Mitigate TLBI errata on various Arm CPUs#463
Open
nvmochs wants to merge 9 commits into
Open
[24.04-linux-nvidia] Backport: Mitigate TLBI errata on various Arm CPUs#463nvmochs wants to merge 9 commits into
nvmochs wants to merge 9 commits into
Conversation
commit bfd9c93 upstream. The TLBI instruction accepts XZR as a register argument, and for TLBI operations with a register argument, there is no functional difference between using XZR or another GPR which contains zeroes. Operations without a register argument are encoded as if XZR were used. Allow the __TLBI_1() macro to use XZR when a register argument is all zeroes. Today this only results in a trivial code saving in __do_compat_cache_op()'s workaround for Neoverse-N1 erratum #1542419. In subsequent patches this pattern will be used more generally. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Mark: Backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit cae400c3ec49d3ac3a5386f783a90230a2cb26fc git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit a8f7868 upstream. The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. The workaround was originally added for Qualcomm Falkor erratum 1009 in commit: d9ff80f ("arm64: Work around Falkor erratum 1009") As noted in the message for that commit, the workaround is applied even in cases where it is not strictly necessary. The workaround was later reused without changes for: * Arm Cortex-A76 erratum #1286807 SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/ * Arm Cortex-A55 erratum #2441007 SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/ * Arm Cortex-A510 erratum #2441009 SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/ The important details to note are as follows: 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH ... the existing workaround will expand this to: TLBI <op1>[, <arg1>] DSB ISH // additional TLBI <op1>[, <arg1>] // additional TLBI <op2>[, <arg2>] DSB ISH // additional TLBI <op2>[, <arg2>] // additional TLBI <op3>[, <arg3>] DSB ISH // additional TLBI <op3>[, <arg3>] // additional DSB ISH ... whereas it is sufficient to have: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH TLBI <opX>[, <argX>] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Taking the above points into account, this patch reworks the workaround logic to reduce overhead: * New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are added and used in place of any dsb(ish) which is used to complete broadcast Stage-1 TLB maintenance. When the ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will execute an additional TLBI;DSB sequence. For consistency, it might make sense to add __tlbi_sync_*() helpers for local and stage 2 maintenance. For now I've left those with open-coded dsb() to keep the diff small. * The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This is no longer needed as the necessary synchronization will happen in __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp(). * The additional TLBI operation is chosen to have minimal impact: - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused entry for the reserved ASID in the kernel's own translation regime, and have no adverse affect. - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used in hyp code, where it will target an unused entry in the hyp code's TTBR0 mapping, and should have no adverse effect. * As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no need for arch_tlbbatch_should_defer() to consider ARM64_WORKAROUND_REPEAT_TLBI. When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes the resulting Image 64KiB smaller: | [mark@lakrids:~/src/linux]% size vmlinux-* | text data bss dec hex filename | 21179831 19660919 708216 4154896 279fca6 vmlinux-after | 21181075 19660903 708216 41550194 27a0172 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l vmlinux-* | -rwxr-xr-x 1 mark mark 157771472 Feb 4 12:05 vmlinux-after | -rwxr-xr-x 1 mark mark 157815432 Feb 4 12:05 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l Image-* | -rw-r--r-- 1 mark mark 41007616 Feb 4 12:05 Image-after | -rw-r--r-- 1 mark mark 41073152 Feb 4 12:05 Image-before Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Mark: Backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (backported from commit 828d37b78143dd3fed24ea968aa7b328fa47b35f git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) [mochs: Backport to 6.8: - kept the local __flush_tlb_range_op(..., lpa2_is_enabled()) calling convention. - 6.8 already has the newer __flush_tlb_range_nosync() split, matching the 6.12 stable shape, so left the nosync helper without a final completion barrier and placed __tlbi_sync_s1ish() in the __flush_tlb_range() wrapper. - kept KVM_PGTABLE_LAST_LEVEL in nVHE fixmap invalidation and replaced the final dsb(ish) with __tlbi_sync_s1ish_hyp(). - dropped the old arch_tlbbatch_should_defer() repeat-TLBI guard, as the optimized workaround now performs the extra TLBI at sync points. ] Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit e185c8a upstream. Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit f88e6e4cbffd46850d740e65735cb388ea3e5760 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit 60349e64a6c65f9f0aa118af711b3c7e137f07ff upstream. Add cputype definitions for C1-Ultra. These will be used for errata detection in subsequent patches. These values can be found in the C1-Ultra TRM: https://developer.arm.com/documentation/108014/0100/ ... in section A.5.1 ("MIDR_EL1, Main ID Register"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit 0a7e94477a86ade945304fbf50f63c95a71ddd64 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit d28413bfc5a255957241f1df5d7fd0c2cd74fe18 upstream. Add cputype definitions for C1-Premium. These will be used for errata detection in subsequent patches. These values can be found in the C1-Premium TRM: https://developer.arm.com/documentation/109416/0100/ ... in section A.5.1 ("MIDR_EL1, Main ID Register"). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit 5169c0dfe9b39e5a1f09445edb183f4a754eba68 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit cfd391e74134db664feb499d43af286380b10ba8 upstream. A number of CPUs developed by Arm suffer from errata whereby a broadcast TLBI;DSB sequence may complete before the global observation of writes which are translated by an affected TLB entry. These errata ONLY affect the completion of memory accesses which have been translated by an invalidated TLB entry, and these errata DO NOT affect the actual invalidation of TLB entries. TLB entries are removed correctly. This issue has been assigned CVE ID CVE-2025-10263. To mitigate this issue, Arm recommends that software follows any affected TLBI;DSB sequence with an additional TLBI;DSB, which will ensure that all memory write effects affected by the first TLBI have been globally observed. The additional TLBI can use any operation that is broadcast to affected CPUs, and the additional DSB can use any option that is sufficient to complete the additional TLBI. The ARM64_WORKAROUND_REPEAT_TLBI workaround is sufficient to mitigate the issue. Enable this workaround for affected CPUs, and update the silicon errata documentation accordingly. Note that due to the manner in which Arm develops IP and tracks errata, some CPUs share a common erratum number. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit 3a60639b065ecc1604a5c0b36141468b3402a54f git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
commit ec7216f92e4ebd485b1c6dc6aa3f6064b71a5768 upstream. NVIDIA Olympus cores are affected by the TLBI completion issue tracked as CVE-2025-10263. The existing ARM64_ERRATUM_4118414 handling already uses ARM64_WORKAROUND_REPEAT_TLBI to issue an additional broadcast TLBI;DSB sequence and ensure affected memory write effects are globally observed. Add MIDR_NVIDIA_OLYMPUS to the repeat-TLBI match list so the same mitigation is enabled on affected Olympus systems. Also document the NVIDIA Olympus erratum in the arm64 silicon errata table and list it in the Kconfig help text. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit b12413cb2e336ff524ae68da1ab851add6b26542 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
… Cobalt 100 CPU commit 1940e70a8144bf75e6df26bf6f600862ea7f7ea1 upstream. Commit fb091ff ("arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata") states that Microsoft Azure Cobalt 100 CPU "is a Microsoft implemented CPU based on r0p0 of the ARM Neoverse N2 CPU, and therefore suffers from all the same errata.". So enable the workaround for the latest broadcast TLB invalidation bug on these parts. Signed-off-by: Will Deacon <will@kernel.org> [Mark: backport to v6.6.y] Signed-off-by: Mark Rutland <mark.rutland@arm.com> (cherry picked from commit ffb902a07bc21daf822a2a3e17a8eb58b4192733 git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stable-6.6/arm-4118414/backport) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Enable ARM64_ERRATUM_4118414 to mitigate CVE-2025-10263 on NVIDIA platforms. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Collaborator
Author
|
Testing details... Note: Only tested 6.8-LTS on Grace as we don’t support that kernel on Vera. |
Collaborator
BaseOS Kernel ReviewSummaryOnly one minor issue found across the series: in commit 6287a03, the Kconfig help text for the Azure Cobalt 100 entry cites erratum number 4193789, which doesn't match the config symbol ARM64_ERRATUM_4118414. This is a documentation inconsistency only; the MIDR mapping is correct and there's no runtime impact. Findings: Critical: 0, High: 0, Medium: 0, Low: 1 Latest watcher review: open review Kernel deb build: failed (failure log, build artifacts) Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These patches address CVE-2025-10263, an Arm TLBI completion erratum where affected CPUs may complete a broadcast TLBI sequence before all memory accesses translated by the invalidated entry are globally observed. The mitigation enables the existing arm64 repeat-TLBI workaround so affected TLBI sequences are followed by an additional broadcast TLBI/DSB sequence.
For NVIDIA platforms, the series adds the required CPU ID coverage and enables CONFIG_ARM64_ERRATUM_4118414 in the NVIDIA kernel annotations so the mitigation is built for both Grace and Vera platforms. The platform config marks this erratum as required for Grace and Vera enablement.
Verification was performed on both Grace and Vera by booting the patched kernel and confirming:
Upstream LKML:
[PATCH 0/3] arm64: errata: Mitigate TLBI errata on various Arm CPUs - https://lore.kernel.org/all/20260609101203.1512409-1-mark.rutland@arm.com/
[PATCH v1] arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU - https://lore.kernel.org/all/20260609234044.3945938-1-sdonthineni@nvidia.com/
linux-next:
60349e64a6c6 arm64: cputype: Add C1-Ultra definitions
d28413bfc5a2 arm64: cputype: Add C1-Premium definitions
cfd391e74134 arm64: errata: Mitigate TLBI errata on various Arm CPUs
ec7216f92e4e arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU
The patches for this 6.8 PR were picked from Mark Rutland's v6.6 branch that he has prepared for this errata. This includes two optimization patches that were already present in later kernels.
LKML: https://lore.kernel.org/all/20260611134248.1700496-1-mark.rutland@arm.com/
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=stable-6.6/arm-4118414/backport
Note: For the optimization patch ("arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI"), there was 1 hunk that I borrowed from Mark's 6.12 backport (https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=stable-6.12/arm-4118414/backport) because this 6.8 branch already contained a refactored version of some of the TLB code that was not present in v6.6.
LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-bos/+bug/2156557