[24.04_linux-nvidia-6.17-next] PCI: mirror PI7C9X3G606GPC Port 4 BAR0/BAR1#442
Conversation
44e1553 to
31881cf
Compare
BaseOS Kernel ReviewSummaryOnly a minor issue found: a misleading comment in drivers/pci/quirks.c where "Tile0/P4" reads like a symbolic device name despite the code using dynamic PCI slot/function numbers. No functional or runtime concerns identified. Findings: Critical: 0, High: 0, Medium: 0, Low: 1 Latest watcher review: open review Kernel deb build: successful (download debs, 4 files) Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 1 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ b6da240e14f0 │ [SAUCE] pci: quirks: mirror pi7c9x3g606gpc port 4 bar0 │ N/A │ N/A │ nirmoyd │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
|
Does this PR need to be applied to the 6.18 reference kernel as well? |
|
Do you have tests (scripts) which can verify this patch set is applied and working? |
Yes, we need that for 6.18 too. I will create a PR for that as well
|
So if the patch is not applied to the kernel that message will not be in the dmesg. We should have a test that verifies that message is in the dmesg or the test should fail that the patch has not been applied, correct? |
Yes, ACK we should have a test may be a greenlit one to check the dmesg to verify the patch. |
31881cf to
5edb468
Compare
5edb468 to
e562a4a
Compare
|
Codex gave me this suggestion but not sure in this environment: NV-Kernels/drivers/pci/quirks.c Lines 6318 to 6320 in e562a4a pci_fixup_resume_early, and may resume the subordinate bus in pci_pm_resume_noirq() before normal resume fixups run: see pci-driver.c:963-L970 ( NV-Kernels/drivers/pci/pci-driver.c Lines 963 to 970 in 42cad5f pci-driver.c:581-L586 ( NV-Kernels/drivers/pci/pci-driver.c Lines 581 to 586 in 42cad5f still stale/zero. I’d move this to DECLARE_PCI_FIXUP_RESUME_EARLY or add an early resume fixup so the mirror is rewritten immediately after pci_restore_state() and before subordinate devices resume. |
e562a4a to
377a0dc
Compare
|
@nirmoy Can you confirm that this is limited to 32-bit bars? Is that because 64-bit bars are not possible in this configuration? |
It make sense to have this. I will update it |
The erratum describes the WAR as copying only BAR0 at offset 0x10. That is sufficient for a 32-bit BAR, but would not be complete for a 64-bit BAR because BAR1 contains the upper address bits. So I am defensively skipping the WAR for 64-bit BARs. The device can be configured with a 64-bit BAR0/BAR1 pair, but on our platform it is configured as a 32-bit BAR. I confirmed that from the BMC readback of the upstream BAR0 value, where the BAR type bits indicate a 32-bit memory BAR. |
Thanks for clarifying. If we end up sending this upstream we may need to relax the checking a bit. |
c2c4176 to
1670e40
Compare
|
I looked at the latest version and agree with change to resume early. I see that the 64-bit bar check was removed. We can do that, but then I think the code needs to properly handle 32 and 64-bit bars being fixed up - I should have been clearer when I said "relax the checking" in my prior comment. Here's what Codex has to say:
|
46a5d4e to
6c7b78e
Compare
|
Codex finding:
Also needs rebase. |
6c7b78e to
48bb4b5
Compare
48bb4b5 to
686413e
Compare
I had this same finding when re-reviewing but dug a bit deeper with Codex. The concern really only applies to the non-boot path...but since the same code is used for both boot and resume, it should be include the update-BAR protection. RESUME_EARLY runs after config restore, so the saved command register may already have memory decoding enabled. Since this patch now claims to support 64-bit BAR0, it should mirror BAR0/BAR1 the same way PCI core updates 64-bit BARs: temporarily clear PCI_COMMAND_MEMORY, write both dwords, then restore the command register. |
Some Pericom/Diodes PI7C9X3G606GPC switches require downstream Port 4 BAR0 to mirror BAR0 of the immediate upstream port. Firmware may apply this during boot, but Linux PCI resource assignment can move the upstream BAR0 and leave Port 4 without the required mirror. Diodes confirmed that Tile0/P4 is OS-visible as device 04, function 0 on the bus below the upstream port. Add a final and early resume quirk for that downstream function. The quirk verifies that the immediate upstream bridge is the same switch, then writes Port 4 BAR0 from the upstream BAR0 after resource assignment and during early resume. If BAR0 is configured as a 64-bit memory BAR, mirror BAR1 as the upper dword as well. Port 4 BAR0 may read back as zero even after a successful write, so the write must be validated by platform-specific means. Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
686413e to
b6da240
Compare
|
Reset validation update: I also tested reset_subordinate. It resets the bus below Port 4, so it did not touch Port 4 BARs or rerun the quirk. Caveat: config space cannot confirm the switch's internal BAR value after reset. That would need the BMC/I2C readback helper, followed by AUX cycle recovery. |
|
Thanks Nirmoy - no further issues from me.
|
|
|
|
|
|
Merged, closing PR. |
Summary
!pdev->mmio_always_on, matching PCI core's non-atomic 64-bit BAR update sequence.Validation
b6da240e14f0481d874da5d76b01e9c511c24ad3.upstream/24.04_linux-nvidia-6.17-nextat5ecb1af86110543dde26a444ababd5999414ce64.git diff --check upstream/24.04_linux-nvidia-6.17-next...HEADscripts/checkpatch.pl --strict --no-signoff --git HEAD~1..HEADreports 0 errors and 0 warnings.make O=/tmp/nv-kernels-pr442-quirks-build defconfigmake O=/tmp/nv-kernels-pr442-quirks-build -j$(nproc) drivers/pci/quirks.onvidia@10.85.119.39and booted DUT10.137.158.190with BMC10.137.159.216available for recovery.BTRFS error, hard NVMe I/O errors, NVMe power-state failures, NVMe timeout/abort, and buffer I/O errors./home/nvidia/nirmoy/pericom-final-validation-20251125T181702Z.References
Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2154457
NVBug: https://nvbugspro.nvidia.com/bug/6205517
NVBug: https://nvbugspro.nvidia.com/bug/6134331
Test artifacts: http://baseos-internal-tools.nvidia.com:8003/