Skip to content

FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context#594

Open
quic-anane wants to merge 2 commits into
qualcomm-linux:qcom-6.18.yfrom
quic-anane:wq_v4
Open

FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context#594
quic-anane wants to merge 2 commits into
qualcomm-linux:qcom-6.18.yfrom
quic-anane:wq_v4

Conversation

@quic-anane
Copy link
Copy Markdown

@quic-anane quic-anane commented May 18, 2026

Changes in this PR
This PR contains two commits:

Revert previous patch.

Reverts the earlier fastrpc_user reference counting change.
This is done to avoid carrying forward a partially updated
implementation and to ensure a clean base.

Apply latest patch (v4) from mailing list

Applies the full, updated version of the fix.
Incorporates all revisions from earlier versions.
Ensures correct ordering of:

fastrpc_user_put()
fastrpc_channel_ctx_put()

Consolidates teardown logic into fastrpc_user_free().
Fixes use-after-free scenarios in workqueue and error paths.

CRs-Fixed: 4502232

@quic-anane quic-anane requested review from a team, idlethread, quic-tingweiz and sgaud-quic May 18, 2026 21:17
@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: CR Not Eligible for Merge

CR 4502232 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0
CR: 4502232
Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

@qcomlnxci
Copy link
Copy Markdown

Test Matrix

Test Case lemans-evk monaco-evk qcs615-ride qcs6490-rb3gen2 qcs8300-ride qcs9100-ride-r3 x1e80100-crd
BT_FW_KMD_Service ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
BT_ON_OFF ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
BT_SCAN ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
CPUFreq_Validation ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
CPU_affinity ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
DSP_AudioPD ✅ Pass ◻️ ⚠️ skip ◻️ ✅ Pass ⚠️ skip ◻️
Ethernet ✅ Pass ◻️ ⚠️ skip ◻️ ⚠️ skip ⚠️ skip ◻️
Freq_Scaling ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
GIC ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
IPA ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
Interrupts ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
OpenCV ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
PCIe ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
Probe_Failure_Check ❌ Fail ◻️ ✅ Pass ◻️ ❌ Fail ❌ Fail ◻️
RMNET ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
UFS_Validation ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
USBHost ❌ Fail ◻️ ❌ Fail ◻️ ❌ Fail ❌ Fail ◻️
WiFi_Firmware_Driver ❌ Fail ◻️ ❌ Fail ◻️ ✅ Pass ✅ Pass ◻️
WiFi_OnOff ✅ Pass ◻️ ⚠️ skip ◻️ ✅ Pass ✅ Pass ◻️
adsp_remoteproc ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ❌ Fail ◻️
cdsp_remoteproc ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ❌ Fail ◻️
gpdsp_remoteproc ✅ Pass ◻️ ⚠️ skip ◻️ ✅ Pass ❌ Fail ◻️
hotplug ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
irq ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
kaslr ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
pinctrl ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
qcom_hwrng ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
remoteproc ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ❌ Fail ◻️
rngtest ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
shmbridge ❌ Fail ◻️ ❌ Fail ◻️ ❌ Fail ❌ Fail ◻️
smmu ❌ Fail ◻️ ❌ Fail ◻️ ✅ Pass ❌ Fail ◻️
watchdog ✅ Pass ◻️ ✅ Pass ◻️ ✅ Pass ✅ Pass ◻️
wpss_remoteproc ✅ Pass ◻️ ⚠️ skip ◻️ ✅ Pass ✅ Pass ◻️

@knaveen-qc
Copy link
Copy Markdown

LAVA Failed Case Triage Summary

PR: #594

Job 101881 | SoC qcs9100-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101881

Failed test cases in LAVA job 101881 (SoC: qcs9100-ride).

  Case 1: ** Remoteproc Boot Failure — SCM PAS init returned -EINVAL (untested machine)
  1. Failed case: ** Remoteproc Boot Failure — SCM PAS init returned -EINVAL (untested machine)
  2. Root cause: ** qcom_scm skipped qseecom initialization for the qcs9100/sa8775p machine ID (qseecom: untested machine, skipping), causing qcom_scm_pas_init_image() to return -EINVAL (-22) for all DSP subsystems; both cdsp0 (remoteproc2) and cdsp1 (remoteproc3) fail at the PAS firmware-init SCM call and remain offline.
  3. Possible fix: Add the qcs9100/sa8775p machine ID to the qseecom-tested-machines allowlist in drivers/firmware/qcom/qcom_scm.c (or the equivalent qseecom machine table), then re-run the CI job to confirm all DSP remoteprocs reach running state. This failure is pre-existing and not introduced by the PR.
  4. Detail analysis attachment: failed_case_job101881_1_detailed.md
  Case 2: ** Remoteproc Boot Failure — PAS Firmware Initialization Error (EINVAL)
  1. Failed case: ** Remoteproc Boot Failure — PAS Firmware Initialization Error (EINVAL)
  2. Root cause: ** On qcs9100-ride (sa8775p), qcom_q6v5_pas returns error -22 (EINVAL) from qcom_scm_pas_init_image() for all DSP subsystems (ADSP remoteproc4, CDSP0/1, GPDSP0/1) at boot time (~7.6s), indicating the TrustZone PAS authentication SMC call is rejected — consistent with a kernel/DSP firmware image misalignment where the flashed .mbn images do not match the signing expectations of the running kernel build; the PR patch (drivers/misc/fastrpc.c only) is entirely unrelated.
  3. Possible fix: Re-flash the board with a fully aligned build (kernel image and DSP .mbn firmware from the same meta/build drop) and re-trigger the CI job; if the error persists on an aligned build, verify that qcom_scm_pas_supported() returns true for the ADSP/CDSP/GPDSP PAS IDs on sa8775p in the kernel's SCM call availability table.
  4. Detail analysis attachment: failed_case_job101881_2_detailed.md
  Case 3: ** Remoteproc Boot Failure — PAS firmware initialization error (-EINVAL)
  1. Failed case: ** Remoteproc Boot Failure — PAS firmware initialization error (-EINVAL)
  2. Root cause: ** Both gpdsp0 (remoteproc0, 20c00000.remoteproc) and gpdsp1 (remoteproc1, 21c00000.remoteproc) fail to boot on the qcs9100-ride (sa8775p) platform because qcom_q6v5_pas returns error -22 (EINVAL) during PAS/SCM firmware initialization of qcom/sa8775p/gpdsp0.mbn and gpdsp1.mbn; the same error affects all remoteprocs (cdsp0, cdsp1, adsp) and GPU/video firmware, indicating a platform-wide firmware authentication or memory-region layout mismatch between the flashed firmware package and the kernel's PAS expectations — this failure is pre-existing and unrelated to the PR under test (fastrpc only).
  3. Possible fix: This failure is not introduced by PR FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context #594 (fastrpc changes only); re-trigger the CI job with a firmware package aligned to kernel 6.18.25-gf71491fcd9b4 for the sa8775p/qcs9100-ride platform, or investigate the PAS/SCM EINVAL path in qcom_q6v5_pas to identify the specific metadata field rejected by TrustZone for this firmware+kernel combination.
  4. Detail analysis attachment: failed_case_job101881_3_detailed.md
  Case 4: ** Remoteproc Boot Failure — PAS/SCM authentication returns EINVAL (qseecom untested machine)
  1. Failed case: ** Remoteproc Boot Failure — PAS/SCM authentication returns EINVAL (qseecom untested machine)
  2. Root cause: ** On qcs9100-ride (sa8775p), qcom_scm logs qseecom: untested machine, skipping at boot, causing qcom_scm_pas_init_image() to return -EINVAL (-22) for all 5 q6v5 PAS subsystems (gpdsp0, gpdsp1, cdsp, cdsp1, adsp); every remoteproc stays offline and the test finds 0 of 5 expected subsystems in running state.
  3. Possible fix: Add the qcs9100/sa8775p machine ID to the qseecom tested-machine allowlist in drivers/firmware/qcom/qcom_scm.c so that qcom_scm_pas_init_image() can authenticate DSP firmware images via TrustZone; verify by re-running the LAVA job and confirming all 5 remoteprocs reach running state.
  4. Detail analysis attachment: failed_case_job101881_4_detailed.md
  Case 5: ** Probe_Failure_Check — Driver Probe Failure (firmware dependency + DT mismatch)
  1. Failed case: ** Probe_Failure_Check — Driver Probe Failure (firmware dependency + DT mismatch)
  2. Root cause: ** Two pre-existing, PR-unrelated errors in dmesg triggered the Probe_Failure_Check scanner: (1) faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2regulatory.db is absent from the rootfs image; (2) Aquantia AQR115C stmmac-0:08: probe with driver Aquantia AQR115C failed with error -22 — the firmware-name DT property is missing for the PHY at MDIO address 0x08 on the qcs9100-ride board, causing the Aquantia driver to return -EINVAL from probe.
  3. Possible fix: Add regulatory.db (from wireless-regdb) to the rootfs image and add/correct the firmware-name DT property in the qcs9100-ride DTS for the Aquantia PHY node at MDIO address 0x08; alternatively, add these two known-platform errors to the Probe_Failure_Check CI allowlist since they are pre-existing and unrelated to PR FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context #594.
  4. Detail analysis attachment: failed_case_job101881_5_detailed.md
  Case 6: ** smmu
  1. Failed case: ** smmu
  2. Root cause: ** The smmu test script asserts that aa00000.video-codec (Video) and interconnect-lpass-ag-noc (Audio) are attached to IOMMU groups, but neither device appears in /sys/kernel/iommu_groupsaa00000.video-codec never attached because the qcom-iris driver failed firmware init (error -22 initializing firmware qcom/vpu/vpu30_p4_s6_16mb.mbn) before IOMMU attachment could complete, and interconnect-lpass-ag-noc is absent because the LPASS/ADSP subsystem is offline on this qcs9100-ride board; this is a pre-existing platform condition unrelated to the PR.
  3. Possible fix: This failure is pre-existing and not introduced by PR FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context #594 (which only modifies drivers/misc/fastrpc.c for a use-after-free fix); update the smmu test script's critical-master checklist to either skip aa00000.video-codec and interconnect-lpass-ag-noc when their upstream dependencies (VPU firmware / ADSP remoteproc) are known-offline on qcs9100-ride, or mark them as non-fatal warnings rather than hard failures.
  4. Detail analysis attachment: failed_case_job101881_6_detailed.md
  Case 7: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The USBHost test on qcs9100-ride failed because only USB root hubs (Linux Foundation 2.0/3.0 virtual hub devices) were enumerated on all three buses — no physical USB peripheral device was connected to the board's host-mode USB ports at the time of the test; the test script explicitly checks for non-hub devices and fails with "Only USB hubs detected, no functional USB devices."
  3. Possible fix: Attach a physical USB peripheral (e.g., USB mass storage or HID device) to one of the qcs9100-ride board's USB host ports in the LAVA lab setup, or mark this test as infrastructure-dependent and skip it when no USB device is wired to the DUT.
  4. Detail analysis attachment: failed_case_job101881_7_detailed.md
  Case 8: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  1. Failed case: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  2. Root cause: ** The shmbridge test unconditionally fails in this CI environment due to known infrastructure noise; per lava-known-benign-failures.md Rule 1, any shmbridge result:fail is always suppressed and does not indicate a kernel regression. The test script detected qcom_scm-related errors in the kernel log (triggered by the presence of qcom_scm.download_mode=1 in the kernel command line, which is normal boot configuration on qcs9100-ride), but this is a known false-positive trigger in the test harness.
  3. Possible fix: No fix required — suppress this failure as known benign CI noise; no kernel or PR change is needed.
  4. Detail analysis attachment: failed_case_job101881_8_detailed.md
Job 101883 | SoC qcs8300-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101883

Failed test cases in LAVA job 101883 (SoC: qcs8300-ride).

  Case 1: ** Probe_Failure_Check
  1. Failed case: ** Probe_Failure_Check
  2. Root cause: ** Two pre-existing, PR-unrelated probe/firmware errors are caught by the test's dmesg scan: (1) faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2 — the regulatory.db firmware file is absent from the rootfs for this kernel version, and (2) Aquantia AQR115C stmmac-0:08: probe with driver Aquantia AQR115C failed with error -22 — the Aquantia AQR115C PHY driver fails because the firmware-name DT property is missing or invalid (-EINVAL) for the stmmac-0:08 MDIO bus node on qcs8300-ride.
  3. Possible fix: No change to the PR is required; fix the qcs8300-ride board DT/rootfs: (1) add the regulatory.db firmware file to the test rootfs image, and (2) add or correct the firmware-name property in the stmmac-0:08 PHY node in the qcs8300 DTS so the Aquantia AQR115C driver can read it without returning -EINVAL.
  4. Detail analysis attachment: failed_case_job101883_1_detailed.md
  Case 2: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** No functional USB peripheral device is physically connected to the USB host port of the qcs8300-ride board in the LAVA lab; the xHCI controller at 0x0a400000 probed correctly and registered Bus 001 with one HS port, but lsusb at test time shows only the Linux virtual root hub (1d6b:0002) — no downstream device was enumerated.
  3. Possible fix: Attach a functional USB peripheral device (e.g., USB storage stick or USB-to-serial adapter) to the USB host port of the qcs8300-ride board in the LAVA lab; this is a board/lab infrastructure gap unrelated to the PR under test.
  4. Detail analysis attachment: failed_case_job101883_2_detailed.md
  Case 3: ** shmbridge
  1. Failed case: ** shmbridge
  2. Root cause: ** The shmbridge test is a known CI infrastructure false positive — it failed because the test script matched the literal string qcom_scm.download_mode=1 in the kernel command line as a "qcom_scm-related error", not because of any actual SCM or shmbridge fault; this failure is unconditionally suppressed per lava-known-benign-failures.md Rule 1.
  3. Possible fix: No fix required — suppress this result as known benign CI noise per Rule 1; no kernel change is needed and this failure must not block PR FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context #594.
  4. Detail analysis attachment: failed_case_job101883_3_detailed.md
  Case 4: ** 0_qcom-next-ci-premerge-tests
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The top-level test case is marked failed by LAVA because three sub-tests reported FAIL: (1) Probe_Failure_Checkregulatory.db firmware missing and Aquantia AQR115C PHY probe failure (-EINVAL) on qcs8300-ride; (2) shmbridge — test script false-positively matched qcom_scm.download_mode=1 in the kernel command line as a qcom_scm error; (3) USBHost — only USB hubs detected, no functional USB devices connected in the lab. None of these failures are introduced by the PR (which only modifies drivers/misc/fastrpc.c).
  3. Possible fix: Suppress the three known false/infra failures in the qcom-linux-testkit test plan for qcs8300-ride: add regulatory.db and Aquantia AQR115C to the Probe_Failure_Check allowlist, fix the shmbridge script to exclude kernel cmdline matches when scanning for qcom_scm errors, and mark USBHost as skipped when no USB devices are present in the lab setup.
  4. Detail analysis attachment: failed_case_job101883_4_detailed.md
Job 101885 | SoC qcs6490-rb3gen2

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101885

Failed test cases in LAVA job 101885 (SoC: qcs6490-rb3gen2).

  Case 1: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the LAVA dispatcher's http-download action (stage 1.2.1) stalled mid-transfer at ~20% (186 MB of 932 MB) of the rootfs artifact from S3 (qcom-multimedia-image-rb3gen2-core-kit.rootfs.qcomflash.tar.gz) and was killed after exhausting the full 1797-second timeout; exact error: "http-download timed out after 1797 seconds" with error_type: Infrastructure.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout beyond 1797 s and configure download-retry with at least 2–3 attempts in the LAVA job definition to survive transient S3/network stalls.
  4. Detail analysis attachment: failed_case_job101885_1_detailed.md
  Case 2: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the LAVA dispatcher's http-download action stalled mid-transfer at ~20% (186 MB of 932 MB) while downloading the rootfs artifact qcom-multimedia-image-rb3gen2-core-kit.rootfs.qcomflash.tar.gz from AWS S3, and was killed after exhausting the full 1797-second (≈30 min) timeout; error_type: Infrastructure confirms this is a lab-side network connectivity issue, not a kernel or PR defect.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout and download-retry block timeout in the LAVA job definition (e.g. from ~30 min to 45–60 min), and investigate intermittent S3/network connectivity from the LAVA worker hosting the qcs6490-rb3gen2 board.
  4. Detail analysis attachment: failed_case_job101885_2_detailed.md
  Case 3: downloads
  1. Failed case: downloads
  2. Root cause: Could not be determined confidently from available logs.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout beyond 1797 s and add a download-retry count > 1 in the LAVA job definition to handle transient S3 throughput degradation.
  4. Detail analysis attachment: failed_case_job101885_3_detailed.md
  Case 4: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure. The http-download stage (level 1.2.1) timed out after 1797 seconds while downloading the 932 MB rootfs artifact qcom-multimedia-image-rb3gen2-core-kit.rootfs.qcomflash.tar.gz from S3; the transfer stalled at ~20% (186 MB) and never resumed, exhausting the 30-minute download-retry budget with only 1 attempt configured. Exact error: "http-download timed out after 1797 seconds".
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout beyond 1797 s and configure download-retry with at least 2–3 attempts in the LAVA job definition to survive transient S3 stalls.
  4. Detail analysis attachment: failed_case_job101885_4_detailed.md
Job 101886 | SoC x1e80100

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101886

Failed test cases in LAVA job 101886 (SoC: x1e80100).

  Case 1: ** boot-fastboot
  1. Failed case: ** boot-fastboot
  2. Root cause: ** The x1e80100 CRD board's ABL (BOOT.MXF.2.4-00541-HAMOA-1) rejected the boot.img during fastboot RAM-boot with remote: 'Failed to load/authenticate boot image: Load Error' — the image was transferred successfully (234920 KB, OKAY) but the firmware's image authentication/verification step failed, indicating the boot.img is unsigned or signed with an untrusted key, or is structurally incompatible with the ABL version on this board.
  3. Possible fix: Verify that the CI pipeline signs boot.img with the correct key trusted by this board's ABL, or confirm the boot.img is built with the correct format (e.g., Android boot image v2/v3 header) expected by BOOT.MXF.2.4-00541-HAMOA-1; if the image format/signing is correct, re-trigger the job to rule out a transient build artifact corruption.
  4. Detail analysis attachment: failed_case_job101886_1_detailed.md
  Case 2: ** Build Load Failure — Fastboot boot image authentication rejected by firmware
  1. Failed case: ** Build Load Failure — Fastboot boot image authentication rejected by firmware
  2. Root cause: ** Result: Build Load Failure; the fastboot boot stage failed because the x1e80100 ABL/UEFI firmware (BOOT.MXF.2.4-00541-HAMOA-1) rejected the boot image with remote: 'Failed to load/authenticate boot image: Load Error' — the boot.img was built with mkbootimg --header_version 2, which is incompatible with the boot image header version expected by the x1e80100 (Snapdragon X Elite) ABL on this board.
  3. Possible fix: Update the postprocess.sh mkbootimg invocation in the LAVA job definition for x1e80100 from --header_version 2 to the correct header version required by the x1e80100 ABL (verify with fastboot getvar all on the board, typically --header_version 4 for Snapdragon X Elite); re-trigger the CI job after the fix to confirm the board boots successfully.
  4. Detail analysis attachment: failed_case_job101886_2_detailed.md
  Case 3: ** Build Load Failure — Fastboot boot image authentication rejected by ABL
  1. Failed case: ** Build Load Failure — Fastboot boot image authentication rejected by ABL
  2. Root cause: ** Result: Build Load Failure. The x1e80100 CRD board's ABL rejected the LAVA-assembled boot.img at the fastboot boot stage on all 3 attempts with FAILED (remote: 'Failed to load/authenticate boot image: Load Error') — the kernel never booted; this is not caused by the PR's fastrpc driver changes.
  3. Possible fix: Re-trigger the CI job to rule out a transient ABL state; if the failure recurs, verify that the boot.img produced by LAVA's mkbootimg step is signed with the key accepted by this board's ABL secure-boot policy, or confirm the board is in an unlocked/test-signed state that permits unsigned fastboot boot images on the x1e80100 CRD platform.
  4. Detail analysis attachment: failed_case_job101886_3_detailed.md
Job 101887 | SoC monaco-evk

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101887

Failed test cases in LAVA job 101887 (SoC: monaco-evk).

  Case 1: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the initramfs artifact download (stage 1.4.1, initramfs-kerneltest-full-image-qcom-armv8a.cpio.gz, 146 MB from S3) stalled at 45% (65 MB) for ~2 min 36 s before the 300 s http-download timeout expired with "http-download timed out after 300 seconds"; the download-retry block had only 1 attempt, so no retry was made and the job was aborted before the monaco-evk board was ever powered on.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from ~9 min 45 s to 15 min in the LAVA job definition to accommodate the 146 MB initramfs over a congested S3 link.
  4. Detail analysis attachment: failed_case_job101887_1_detailed.md
  Case 2: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — transient network throughput collapse during the download-retry (level 1.4) stage; the LAVA worker's HTTP download of initramfs-kerneltest-full-image-qcom-armv8a.cpio.gz (146 MB from AWS S3) stalled at ~50% after ~21 s of healthy transfer (~3.1 MB/s), then crawled to ~0.05 MB/s, exhausting the hard 300 s http-download timeout with the exact error "http-download timed out after 300 seconds".
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from ~9 m 45 s to 15 min in the LAVA job definition, and configure download-retry with at least 2 attempts to survive transient S3 throttling.
  4. Detail analysis attachment: failed_case_job101887_2_detailed.md
  Case 3: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure. The http-download step 1.4.1 timed out after exactly 300 seconds while fetching the 146 MB initramfs ramdisk (initramfs-kerneltest-full-image-qcom-armv8a.cpio.gz) from the meta-qcom S3 bucket; transfer stalled at ~50% (73 MB) with a ~2.5-minute gap between progress ticks, indicating a transient network congestion or S3 throttle event on the LAVA worker's outbound connection. Error: "http-download timed out after 300 seconds".
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from 00:09:45 to 00:15:00 in the LAVA job definition, and consider adding a retry count (retries: 2) for the initramfs download step.
  4. Detail analysis attachment: failed_case_job101887_3_detailed.md
  Case 4: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the http-download action (step 1.4.1) for the initramfs artifact initramfs-kerneltest-full-image-qcom-armv8a.cpio.gz (146 MB) stalled mid-transfer at ~50% after a 2m36s gap with no data, exhausting the 300 s per-attempt timeout; exact error: "http-download timed out after 300 seconds".
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from the current ~10 min to 15 min in the LAVA job definition, and configure download-retry with at least 2 attempts to survive transient S3 connection drops.
  4. Detail analysis attachment: failed_case_job101887_4_detailed.md
Job 101888 | SoC qcs615-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101888

Failed test cases in LAVA job 101888 (SoC: qcs615-ride).

  Case 1: ** smmu
  1. Failed case: ** smmu
  2. Root cause: ** The smmu test script asserts that aa00000.video-codec:video-decoder and aa00000.video-codec:video-encoder sub-devices each have their own IOMMU group entry, but on qcs615-ride with the non-legacy Venus binding these sub-devices are not registered as independent IOMMU clients, causing two [FAIL] Critical master is missing iommu_group attachment assertions and a final FAIL=2 result.
  3. Possible fix: Update the qcs615-ride DT (arch/arm64/boot/dts/qcom/qcs615-ride.dts) to add iommu-map entries for the video-decoder and video-encoder child nodes of aa00000.video-codec, or update the smmu test script to accept the non-legacy Venus binding where sub-device IOMMU groups are managed through the parent device node.
  4. Detail analysis attachment: failed_case_job101888_1_detailed.md
  Case 2: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** No physical USB device is connected to the qcs615-ride board's USB host port in the LAVA lab infrastructure — the xHCI controller (at 0x0a800000) probed successfully with 1 port detected, but lsusb enumeration returned only the Linux Foundation 2.0 root hub (1d6b:0002), indicating the test harness has no USB peripheral attached to the board.
  3. Possible fix: Attach a functional USB peripheral (e.g. USB storage or HID device) to the qcs615-ride board's USB host port in the LAVA lab; if the board is correctly cabled, re-trigger the CI job to confirm — this failure is a lab infrastructure/hardware setup issue unrelated to the PR under test.
  4. Detail analysis attachment: failed_case_job101888_2_detailed.md
  Case 3: ** shmbridge
  1. Failed case: ** shmbridge
  2. Root cause: ** The shmbridge test is a known CI infrastructure false positive on qcs615-ride — the test script flags qcom_scm-related errors found in the kernel log (recurring fastrpc glink-edge DSP errors: err: -1), but these are pre-existing background noise unrelated to the SCM/shmbridge subsystem and do not indicate any kernel regression introduced by this PR.
  3. Possible fix: No fix required — suppress this result per Known Benign Failure Rule 1; the shmbridge test always produces false failures in this CI environment regardless of kernel state.
  4. Detail analysis attachment: failed_case_job101888_3_detailed.md
  Case 4: ** WiFi_Firmware_Driver
  1. Failed case: ** WiFi_Firmware_Driver
  2. Root cause: ** The ath11k WiFi driver modules (ath11k, ath11k_pci, ath11k_ahb) could not be loaded because the running kernel (6.18.25-gf71491fcd9b4) does not match the module tree on disk (6.19.0-00717-ge3aded47f3e5), causing a kernel/module version mismatch; as a result no WiFi interface was created, WiFi_OnOff was skipped (not passed), so the known-benign suppression rule does not apply and the failure is genuine.
  3. Possible fix: Rebuild or redeploy the rootfs so that the installed kernel modules in /lib/modules/ match the kernel image version (6.18.25-gf71491fcd9b4) being booted, ensuring ath11k/ath11k_pci/ath11k_ahb modules load successfully on qcs615-ride.
  4. Detail analysis attachment: failed_case_job101888_4_detailed.md
  Case 5: ** 0_qcom-next-ci-premerge-tests
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The overall test run was marked failed by LAVA's "unfinished test run" mechanism because result_parse.sh did not emit a final LAVA_SIGNAL_TESTCASE for the suite; the underlying sub-test failures (smmu — video-codec IOMMU sub-device missing iommu_group, shmbridgetz_armv8_smc_call failed TzStatus=0xFFFFFFFF at boot, USBHost — no functional USB devices, WiFi_Firmware_Driverath11k module absent) are all pre-existing platform issues on this qcs615-ride board, unrelated to the PR's fastrpc.c reference-counting changes.
  3. Possible fix: Re-trigger the CI job to confirm reproducibility; the failures are pre-existing board/platform issues (IOMMU sub-device registration, TZ SMC boot error, USB hardware, missing ath11k module) that are not introduced by this PR and require separate platform/infra investigation on the qcs615-ride LAVA worker.
  4. Detail analysis attachment: failed_case_job101888_5_detailed.md
Job 101889 | SoC lemans-evk

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101889

Failed test cases in LAVA job 101889 (SoC: lemans-evk).

  Case 1: ** Probe_Failure_Check — Firmware Dependency Failure (missing `regulatory.db`)
  1. Failed case: ** Probe_Failure_Check — Firmware Dependency Failure (missing regulatory.db)
  2. Root cause: ** The Probe_Failure_Check test script detected faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2 in dmesg (cfg80211 regulatory database absent from /lib/firmware on the lemans-evk rootfs); the two Bluetooth firmware failures (qca/wcnhpbtfw21.tlv, qca/hpbtfw21.tlv) are suppressed as known-benign per Rule 3 (BT_ON_OFF PASS), leaving the regulatory.db miss as the sole genuine trigger — this is a pre-existing rootfs/infra gap unrelated to the PR's fastrpc.c changes.
  3. Possible fix: Add regulatory.db to the lemans-evk LAVA test rootfs under /lib/firmware/ (obtainable from the wireless-regdb package), or add a suppression rule in lava-known-benign-failures.md for regulatory.db firmware load failures if this is a known-benign condition on this board; the PR itself requires no changes.
  4. Detail analysis attachment: failed_case_job101889_1_detailed.md
  Case 2: ** smmu
  1. Failed case: ** smmu
  2. Root cause: ** The smmu test script asserts that two devices — aa00000.video-codec (Video) and interconnect-lpass-ag-noc (Audio) — appear in /sys/kernel/iommu_groups, but neither is present: the video codec exposes IOMMU groups via its iris sub-devices (iris_pixel.0, iris_non_pixel.0) rather than the parent DT node, and the LPASS audio NoC interconnect is a bus provider, not a DMA master, so it correctly has no IOMMU group attachment.
  3. Possible fix: Update the smmu test script's critical-master list for lemans-evk: replace aa00000.video-codec with iris_pixel.0 and iris_non_pixel.0, and remove interconnect-lpass-ag-noc (it is an interconnect provider, not a DMA master requiring IOMMU protection).
  4. Detail analysis attachment: failed_case_job101889_2_detailed.md
  Case 3: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The xHCI controller on the lemans-evk (USB bus 2, SuperSpeed port) failed to assign an address to the Genesys Logic USB3.1 hub (usb 2-1: device not accepting address 2, error -62 / two xHCI setup-command timeouts at T+12s and T+18s), leaving only USB hubs visible on the bus; the test script requires at least one non-hub functional USB device and fails with "Only USB hubs detected, no functional USB devices."
  3. Possible fix: Investigate the xHCI SuperSpeed port enumeration failure on the lemans-evk board — check whether the USB3.1 hub downstream device (expected functional USB peripheral) is physically connected and powered, and whether the xHCI Timeout while waiting for setup device command is a pre-existing board/infra issue or a regression introduced by the PR; re-trigger the CI job to determine if this is a transient hardware enumeration glitch, and if it recurs, inspect the DWC3/xHCI driver configuration for the a800000.usb controller on lemans-evk.
  4. Detail analysis attachment: failed_case_job101889_3_detailed.md
  Case 4: ** shmbridge — Suppressed (known benign: shmbridge CI noise)
  1. Failed case: ** shmbridge — Suppressed (known benign: shmbridge CI noise)
  2. Root cause: ** The shmbridge test script incorrectly flags the kernel command-line string qcom_scm.download_mode=1 as a qcom_scm-related error; this is a known false-positive in the CI environment and is unconditionally suppressed per Rule 1 of the LAVA Known Benign Failure Suppressions policy.
  3. Possible fix: No action required — this failure is classified as known CI infrastructure noise and must not be treated as a kernel regression; the PR under test (drivers/misc/fastrpc.c kref refcount fix) is unrelated to qcom_scm or shmbridge.
  4. Detail analysis attachment: failed_case_job101889_4_detailed.md
  Case 5: ** WiFi_Firmware_Driver — **Suppressed (Known Benign: WiFi Firmware False Positive — WiFi ON/OFF Passed)**
  1. Failed case: ** WiFi_Firmware_Driver — Suppressed (Known Benign: WiFi Firmware False Positive — WiFi ON/OFF Passed)
  2. Root cause: ** The WiFi_Firmware_Driver test failed because ath12k/ath12k_wifi7/ath12k_pci/ath12k_ahb modules were not loaded at the time the firmware-driver check ran; however, the board's actual WiFi hardware (driven by ath11k/ath11k_pci) was fully functional — the subsequent WiFi_OnOff test passed with interface wlp1s0 toggling up/down successfully, confirming firmware loaded correctly at runtime. This is a known false positive caused by the test script checking for ath12k-family modules that are not present on this lemans-evk board (which uses ath11k), not a kernel regression introduced by PR FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context #594.
  3. Possible fix: No fix required — suppress this failure per Rule 2 of lava-known-benign-failures.md; the WiFi_OnOff PASS confirms WiFi is fully functional. If the test script should be updated to recognise ath11k/ath11k_pci as valid transport modules for lemans-evk, update the WiFi_Firmware_Driver test's module list to include ath11k variants alongside ath12k variants.
  4. Detail analysis attachment: failed_case_job101889_5_detailed.md
  Case 6: ** 0_qcom-next-ci-premerge-tests
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The LAVA test definition was marked failed because multiple sub-tests reported FAIL: (1) smmuaa00000.video-codec and interconnect-lpass-ag-noc are not attached to any IOMMU group on lemans-evk, indicating a DT or driver probe issue for Video/Audio masters; (2) WiFi_Firmware_Driver — kernel/rootfs version mismatch (6.18.25-gf71491fcd9b4 running vs 6.19.0-00717-ge3aded47f3e5 modules on rootfs) prevents ath12k/ath12k_wifi7 from loading; (3) shmbridge — false positive triggered by qcom_scm.download_mode=1 in the kernel cmdline being matched as an error; (4) USBHost — no functional USB device connected to the board in the lab; (5) Probe_Failure_Check — known benign WiFi/BT firmware-not-found messages. None of these failures are introduced by the PR's fastrpc.c changes.
  3. Possible fix: The smmu IOMMU group failure for aa00000.video-codec and interconnect-lpass-ag-noc on lemans-evk requires investigation of the DT iommus property for those nodes and whether the video/audio drivers probe correctly on this kernel; the kernel/rootfs version mismatch must be resolved by rebuilding the rootfs against kernel 6.18.25-gf71491fcd9b4 or updating the LAVA job to flash a matching rootfs image; the shmbridge test script should be fixed to exclude qcom_scm.download_mode= from its error pattern; and the USBHost test should be marked as infra-dependent or a USB device should be connected to the lemans-evk lab board.
  4. Detail analysis attachment: failed_case_job101889_6_detailed.md

@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: CR Not Eligible for Merge

CR 4502232 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0
CR: 4502232
Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

Anandu Krishnan E added 2 commits May 20, 2026 14:31
…ser structure"

This reverts commit 14e526a.

This change corresponds to the initial (v1) version shared with the
upstream community.

Revert it to apply the complete v4 revision, which includes additional
fixes and updates not present in the earlier version. v4 version
contains this changes as well.

Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
…eue context

There is a race between fastrpc_device_release() and the workqueue
that processes DSP responses. When the user closes the file descriptor,
fastrpc_device_release() frees the fastrpc_user structure. Concurrently,
an in-flight DSP invocation can complete and fastrpc_rpmsg_callback()
schedules context cleanup via schedule_work(&ctx->put_work). If the
workqueue runs fastrpc_context_free() in parallel with or after
fastrpc_device_release() has freed the user structure, it dereferences
the freed fastrpc_user. Depending on the state of the context at the
time of the race, any one of the following accesses can be hit:

 1. fastrpc_buf_free() calls fastrpc_ipa_to_dma_addr(buf->fl->cctx, ...)
    to strip the SID bits from the stored IOVA before passing the
    physical address to dma_free_coherent().

 2. fastrpc_free_map() reads map->fl->cctx->vmperms[0].vmid to
    reconstruct the source permission bitmask needed for the
    qcom_scm_assign_mem() call that returns memory from the DSP VM
    back to HLOS.

 3. fastrpc_free_map() acquires map->fl->lock to safely remove the
    map node from the fl->maps list.

The resulting use-after-free manifests as:

  pc : fastrpc_buf_free+0x38/0x80 [fastrpc]
  lr : fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_put_wq+0x78/0xa0 [fastrpc]
  process_one_work+0x180/0x450
  worker_thread+0x26c/0x388

Add kref-based reference counting to fastrpc_user. Have each invoke
context take a reference on the user at allocation time and release it
when the context is freed. Release the initial reference in
fastrpc_device_release() at file close. Move the teardown of the user
structure — freeing pending contexts, maps, mmaps, and the channel
context reference — into the kref release callback fastrpc_user_free(),
so that it runs only when the last reference is dropped, regardless of
whether that happens at device close or after the final in-flight
context completes.

Link:https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/
Fixes: 6cffd79 ("misc: fastrpc: Add support for dmabuf exporter")
Cc: stable@kernel.org
Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
@qcomlnxci
Copy link
Copy Markdown

Test Matrix

Test Case lemans-evk monaco-evk qcs615-ride qcs6490-rb3gen2 qcs8300-ride qcs9100-ride-r3 x1e80100-crd
BT_FW_KMD_Service ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_ON_OFF ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_SCAN ✅ Pass ❌ Fail ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPUFreq_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPU_affinity ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
DSP_AudioPD ✅ Pass ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ⚠️ skip ◻️
Ethernet ⚠️ skip ✅ Pass ⚠️ skip ⚠️ skip ⚠️ skip ⚠️ skip ◻️
Freq_Scaling ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
GIC ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
IPA ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Interrupts ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
OpenCV ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
PCIe ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Probe_Failure_Check ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
RMNET ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
UFS_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
USBHost ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
WiFi_Firmware_Driver ❌ Fail ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
WiFi_OnOff ✅ Pass ❌ Fail ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
adsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
cdsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
gpdsp_remoteproc ✅ Pass ✅ Pass ⚠️ skip ⚠️ skip ✅ Pass ❌ Fail ◻️
hotplug ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
irq ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
kaslr ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
pinctrl ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
qcom_hwrng ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
rngtest ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
shmbridge ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
smmu ❌ Fail ✅ Pass ❌ Fail ✅ Pass ✅ Pass ❌ Fail ◻️
watchdog ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
wpss_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️

@qcomlnxci qcomlnxci requested a review from a team May 20, 2026 12:43
@sgaud-quic sgaud-quic requested a review from shashim-quic May 20, 2026 13:52
@sgaud-quic
Copy link
Copy Markdown
Contributor

@shashim-quic requesting fresh review as PR was force pushed after approval.

@qlijarvis
Copy link
Copy Markdown

PR #594 — validate-patch

PR: #594

Verdict Issues Detailed Report
⚠️ 0 Full report

Final Summary

  1. Lore link present: Yes — https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/ (commit 2/2 only; commit 1/2 is a revert with no lore link expected)

  2. Lore link matches PR commits: ⚠️ Cannot verify — Network access blocked; manual comparison required

  3. Upstream patch status:Unknown — Cannot fetch lore thread; manual verification required

  4. PR present in qcom-next: ⏭️ Not checked — Git unavailable; manual verification required


Verdict: ⚠️ — click to expand

🔍 Patch Validation Report

PR: #594
Commits: 2 commits (1 revert + 1 FROMLIST patch)
Verdict: ⚠️ PARTIAL — Cannot fully validate due to environment constraints


Commit 1/2: Revert "FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure"

Upstream: N/A (revert commit)
Verdict:PASS (with notes)

Commit Message

Check Status Note
Subject format Proper revert format with original subject
Body explains rationale Clear explanation: reverting v1 to apply v4
Reverted commit SHA 14e526a1006ddfde9e394184d2b3430c4c9280f1
Signed-off-by Present

Analysis

  • Purpose: Reverts an earlier FROMLIST commit (v1) to make way for v4 revision
  • Rationale: v4 contains additional fixes not in v1; cleaner to revert and reapply
  • Diff: Removes kref-based reference counting (35 lines removed, 4 added)
  • Files changed: drivers/misc/fastrpc.c

Notes

This is a preparatory revert for commit 2/2. The revert itself is clean and well-justified.


Commit 2/2: FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context

Upstream: https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/
Verdict: ⚠️ PARTIAL — Lore patch not fetchable; validation incomplete

Commit Message

Check Status Note
Subject matches expected format FROMLIST: prefix + descriptive subject
Body preserves rationale Detailed race condition explanation with stack trace
Fixes tag present/correct Fixes: 6cffd79504ce ("misc: fastrpc: Add support for dmabuf exporter")
Authorship From: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
Lore link present Link:https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/
Cc: stable tag Cc: stable@kernel.org present
Signed-off-by Present

Diff Summary

File Hunks Lines Status
drivers/misc/fastrpc.c Multiple +52, -22 ⚠️ Cannot verify against lore

Changes:

  • Adds struct kref refcount to fastrpc_user
  • Implements fastrpc_user_free(), fastrpc_user_get(), fastrpc_user_put()
  • Moves teardown logic (pending contexts, maps, mmaps) into kref release callback
  • Updates fastrpc_context_alloc() to take user reference
  • Updates fastrpc_context_free() to release user reference
  • Updates fastrpc_device_release() to use fastrpc_user_put()
  • Initializes kref_init(&fl->refcount) in fastrpc_device_open()

Commit Message Quality

Excellent — The commit message provides:

  1. Clear problem statement (race between device release and workqueue)
  2. Detailed technical explanation of the use-after-free scenarios (3 specific access patterns)
  3. Stack trace showing the crash manifestation
  4. Complete solution description (kref-based reference counting)
  5. Proper tags: Fixes:, Cc: stable@, Link:, Signed-off-by:

Upstream Patch Status

⏭️ SKIPPED — Cannot verify due to network restrictions

Expected verification steps (blocked):

  1. Fetch lore patch via curl https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/raw
  2. Check lore thread for maintainer responses (ACK/NACK/review comments)
  3. Search mainline/stable git logs for merge status

Manual verification required: Check lore.kernel.org directly to determine if this patch has been:

  • ✅ ACKed and merged into mainline/stable
  • ❌ NACKed or rejected
  • ⏳ Still under review

Dependency Check

PASS — Commit 1 (revert) is a prerequisite for commit 2 (v4 patch)

The two-commit structure is correct:

  1. First commit reverts the v1 implementation
  2. Second commit applies the v4 implementation

This ensures a clean application of the v4 revision without conflicts.


qcom-next Presence

⏭️ SKIPPED — Cannot verify due to missing git command

Expected verification steps (blocked):

KERNEL_REPO=/local/mnt/workspace/sgaud/Qgenie/image_pipeline/kernel
git -C "$KERNEL_REPO" fetch origin
git -C "$KERNEL_REPO" checkout qcom-next
git -C "$KERNEL_REPO" log --oneline --grep="fastrpc: fix use-after-free"

Manual verification required: Check if these commits are present in the qcom-next branch.


Issues Found

⚠️ Environment Constraints

  1. Network access blocked — Cannot fetch lore patch for byte-by-byte comparison
  2. Git unavailable — Cannot verify qcom-next presence or upstream merge status
  3. Read-only filesystem — Cannot create temporary files for diff analysis

✅ No Content Issues Detected

Based on the PR patch content alone:

  • Commit messages are well-structured and complete
  • Authorship is correct for FROMLIST prefix
  • Tags (Fixes:, Cc: stable@, Link:, Signed-off-by:) are present and properly formatted
  • The revert + reapply strategy is sound
  • Code changes appear consistent with the commit message description

Recommendation

⚠️ CONDITIONAL APPROVAL — Approve pending manual verification of:

  1. Lore patch comparison: Manually fetch the lore patch and verify that commit 2/2 matches the v4 revision at https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/

    • Check that the diff is identical or contains only legitimate adaptations
    • Verify no extra hunks or semantic changes
  2. Upstream status: Check the lore thread for maintainer feedback

    • If ACKed/merged → safe to merge
    • If NACKed → do not merge
    • If under review → wait for community decision
  3. qcom-next presence: Verify these commits are not already in qcom-next to avoid duplicates

If all three checks pass → ✅ MERGE
If any check fails → ❌ HOLD pending resolution


Final Summary

  1. Lore link present: Yes — https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/ (commit 2/2 only; commit 1/2 is a revert with no lore link expected)

  2. Lore link matches PR commits: ⚠️ Cannot verify — Network access blocked; manual comparison required

  3. Upstream patch status:Unknown — Cannot fetch lore thread; manual verification required

  4. PR present in qcom-next: ⏭️ Not checked — Git unavailable; manual verification required


Validation Completeness

Step Status Blocker
Parse PR and check lore link ✅ Complete
Fetch upstream lore patch ❌ Blocked Network access denied
Check upstream patch status ❌ Blocked Network access denied
Obtain PR diff ✅ Complete
Normalize both patches ❌ Blocked Cannot fetch lore patch
Compare commit messages ⚠️ Partial Cannot verify against lore original
Compare diff content ❌ Blocked Cannot fetch lore patch
Verify file paths and symbols ✅ Complete
Check dependency completeness ✅ Complete
Check qcom-next presence ❌ Blocked Git unavailable
Produce validation report ✅ Complete

Overall: 4/11 steps completed, 4/11 blocked by environment, 3/11 partial

@qlijarvis
Copy link
Copy Markdown

PR #594 — checker-log-analyzer

PR: #594
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/26163273222

Checker Result Summary
Checker Result Summary
checkpatch No style issues
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No DTS changes
sparse-check No static analysis warnings
check-uapi-headers No UAPI changes
check-patch-compliance 2 failures: missing prefix on revert commit, content mismatch on FROMLIST commit
tag-check Commit 1/2 missing required prefix

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #594 - Revert v1 and apply v4 fastrpc use-after-free fix
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/26163273222

Checker Result Summary
checkpatch No style issues
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No DTS changes
sparse-check No static analysis warnings
check-uapi-headers No UAPI changes
check-patch-compliance 2 failures: missing prefix on revert commit, content mismatch on FROMLIST commit
tag-check Commit 1/2 missing required prefix

❌ check-patch-compliance

Root cause: Two distinct failures across the two commits in this PR.

Failure details:

Commit 1/2 (dd29e69):

Checking commit: Revert "FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure"
Commit summary does not start with a required prefix

Commit 2/2 (b3036d0):

Checking commit: FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context
Change is different from the one mentioned in Link

Analysis:

  1. Commit 1/2 — Missing prefix on revert:
    The subject line is Revert "FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure".
    The checker requires that all commits (including reverts) start with a valid prefix (FROMLIST:, FROMGIT:, UPSTREAM:, BACKPORT:, etc.).
    A revert commit must have its own prefix before the word Revert.

  2. Commit 2/2 — Content mismatch:
    The commit references Link: https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/.
    The checker fetched the upstream patch from that link and found that the content differs from what is in the PR.
    This is expected: the commit message explains that this is the v4 revision, which includes additional fixes not present in the v1 version that was previously merged. The revert (commit 1/2) removes the v1 version, and commit 2/2 applies the v4 version.
    However, the lore link points to the v4 series, and the checker is detecting that the net change (after the revert) does not match a simple cherry-pick of the upstream patch. This is because the revert undoes v1, and then v4 is applied — the diff is not identical to applying v4 on top of the base without v1.

Fix:

For commit 1/2 (revert):

git rebase -i 0136c1ed9c23   # mark commit dd29e69 as 'edit'
git commit --amend -m "UPSTREAM: Revert \"FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure\"

This reverts commit 14e526a1006ddfde9e394184d2b3430c4c9280f1.

This change corresponds to the initial (v1) version shared with the
upstream community.

Revert it to apply the complete v4 revision, which includes additional
fixes and updates not present in the earlier version. v4 version
contains this changes as well.

Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>"
git rebase --continue

Choose the correct prefix for the revert:

  • If the original commit being reverted was merged upstream → use UPSTREAM:
  • If the original commit was from a mailing list but not yet merged → use FROMLIST:

For commit 2/2 (content mismatch):

The content mismatch is expected in this case because:

  • The PR is reverting v1 (commit 1/2) and then applying v4 (commit 2/2).
  • The lore link points to the v4 series, which is a replacement for v1, not an incremental patch on top of v1.
  • The checker compares the net diff of commit 2/2 against the upstream patch, but commit 2/2 is applying v4 after reverting v1, so the diff includes both the revert and the new changes.

Two options to resolve:

Option A (Recommended): Document the adaptation in the commit message:

git rebase -i 0136c1ed9c23   # mark commit b3036d0 as 'edit'
git commit --amend

# Add a note after the Link: line:
# Link: https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/
# [Applied v4 after reverting v1 (commit 14e526a1006d); net change differs from upstream due to revert]

git rebase --continue

Option B: Squash the revert and the v4 application into a single commit that replaces v1 with v4. This would make the diff match the upstream v4 patch more closely, but loses the explicit revert history.

Reproduce locally:

cd /path/to/kernel
git checkout qcom-6.18.y
git fetch origin pull/594/head:pr-594
git checkout pr-594

# Check commit 1/2 prefix
git log --oneline -1 dd29e69890d8
# Should start with UPSTREAM:/FROMLIST:/etc. before "Revert"

# Check commit 2/2 content match
b4 am --single-message -C -l -3 https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/ -o /tmp/out
git format-patch -1 b3036d0f901b --stdout > /tmp/pr-commit.patch
# Compare the diffs manually

❌ tag-check

Root cause: Commit 1/2 subject line does not start with a required prefix.

Failure details:

The target branch is qcom-6.18.y, which is not qcom-next or qcom-next-staging. Therefore, all commits must start with one of the required prefixes:

  • FROMLIST: / FROMGIT: / UPSTREAM: / BACKPORT: / QCLINUX: / PENDING: / WORKAROUND:

Commit 1/2:

Subject: Revert "FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure"

This commit is missing a prefix before Revert.

Commit 2/2:

Subject: FROMLIST: misc: fastrpc: fix use-after-free of fastrpc_user in workqueue context

This commit has the required FROMLIST: prefix. ✅

Fix:

Same as the fix for check-patch-compliance commit 1/2 above — add a prefix before Revert:

git rebase -i 0136c1ed9c23   # mark commit dd29e69 as 'edit'
git commit --amend -m "UPSTREAM: Revert \"FROMLIST: misc: fastrpc: Add reference counting for fastrpc_user structure\"

This reverts commit 14e526a1006ddfde9e394184d2b3430c4c9280f1.

This change corresponds to the initial (v1) version shared with the
upstream community.

Revert it to apply the complete v4 revision, which includes additional
fixes and updates not present in the earlier version. v4 version
contains this changes as well.

Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>"
git rebase --continue

Reproduce locally:

git log --format="%H %s" 0136c1ed9c23..b3036d0f901b | while read sha subject; do
  if ! echo "$subject" | grep -qE '^(FROMLIST|FROMGIT|UPSTREAM|BACKPORT|QCLINUX|PENDING|WORKAROUND):' &&
     ! echo "$subject" | grep -qE '^(FROMLIST|FROMGIT|UPSTREAM|BACKPORT): Revert'; then
    echo "MISSING prefix in commit $sha: $subject"
  fi
done

Verdict

2 blockers must be fixed before merge:

  1. Commit 1/2: Add a required prefix (UPSTREAM: or FROMLIST:) before Revert in the subject line.
  2. Commit 2/2: Document the content adaptation in the commit message (the diff differs from upstream because it applies v4 after reverting v1), or squash the two commits into a single replacement commit.

Once these are addressed, re-trigger the CI run. All other checkers passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants