Conversation
Scan the built Kafka Connect distribution zip for known vulnerabilities using Trivy. This runs alongside the existing tests on PRs and pushes to main/version branches, and also on release candidate tags, giving visibility into CVEs before a release vote starts. - Table output on all runs (visible in CI logs) - SARIF upload to GitHub Security tab on push events Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the convention used in spark-ci.yml for third-party actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
i think its better to do this in the docker image publishing step, similar to what Superset is doing |
|
@kevinjqliu I'm not sure I follow. How's this relate to publishing docker images? This is to identify CVEs in the Kafka Connect connector itself. thanks. |
|
Looking at how Trivy is used in Superset, its scanning the docker image only. In this PR, we're unpacking the jars to scan. I think it makes more sense to build a kafka connect image and then use Trivy to scan the image. Just my preference |
Not really clear why we would add the infra to build a docker image just to use trivy. Seems like a direct scan is more parsimonious |
rymurr
left a comment
There was a problem hiding this comment.
LGTM, I just want to see if we can avoid 2x build w/o over complicating the CI job
| scan-ref: '/tmp/kafka-connect-scan' | ||
| scanners: 'vuln' | ||
| severity: 'CRITICAL,HIGH' | ||
| ignore-unfixed: true |
There was a problem hiding this comment.
just to make sure I understand: the scan build will break when there is a hihg severity bug? A minor nit would be more comments in the file.
There was a problem hiding this comment.
At the moment the scan is report-only and won't fail the build.
I've also added inline comments explaining the behaviour.
Thinking about it some more, what'd be the ideal behaviour be? If a PR introduces a CVE, ought the build fail? Or maybe log a PR comment?
There was a problem hiding this comment.
It would be ideal if the trivy scan was 'red' but didn't block the CI job.
There was a problem hiding this comment.
Updated abfb8dc
- A failed step with
continue-on-error: trueshows with an orange/amber warning icon - The overall job still shows as green
Address review feedback: move the vulnerability scan steps into the existing kafka-connect-tests job (gated on JVM 21) instead of running a separate job that duplicates checkout, setup, and compilation. Also adds inline comments explaining the scan behaviour and explicit exit-code: '0' to ensure the scan is report-only (the default would fail the build on findings).
Just looking at general patterns from the apache repos. https://grep.app/search?f.repo.pattern=apache%2F&q=uses%3A+aquasecurity%2Ftrivy-action The only instance of From the docs, it seems like Filesystem scan and Container image scan are similar in that they both scan for Vulnerabilities, Misconfigurations, Secrets, and Licenses. I think it would be helpful here if you can run the change on your fork repo and see if the fs trivy scan catches the currently open CVE for kafka connect (#15440) |
Change exit-code from 0 to 1 so the scan step fails visibly when CRITICAL/HIGH CVEs are found, but add continue-on-error: true so the overall job still passes.
rymurr
left a comment
There was a problem hiding this comment.
The changes themselves look good to me. I am just confused/questioning the github actions config...
My assumption is we want:
- results to be uploaded regardless of pass/fail states of steps above
- the job to fail if it detects errors
- the 2nd scan to run regardless of if the other failed
I think whats happening now is the first always looks like it fails, the second always looks like it passes and the entire job always looks like it passes.
It might be worth trying this a few times in your own fork to make sure this actaully has the edge cases we expect.
| output: 'trivy-results.sarif' | ||
| severity: 'CRITICAL,HIGH' | ||
| ignore-unfixed: true | ||
| exit-code: '0' |
There was a problem hiding this comment.
why is this exit code 0 and the othr is 1? What does this mean anyways...it always exits with 0/1 regardless of state?
Seems this is missing continue on error too? Or is the SARIF scan just different?
Temporary commit for testing Trivy scan behaviour on fork. Adds trivy-test-* to push branch triggers to test push events. Improves inline comments explaining continue-on-error and exit-code. Removes redundant explicit exit-code: '0' from SARIF scan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Thanks for the PR! I think this is a great addition, however I'm concerned about supply chain risk due to the recent (second) compromise of trivy's infra.
See
- https://opensourcemalware.com/repository/https%3A%2F%2Fgithub.com%2Faquasecurity%2Ftrivy%2F
- https://labs.boostsecurity.io/articles/20-days-later-trivy-compromise-act-ii/
- https://www.wiz.io/blog/trivy-compromised-teampcp-supply-chain-attack
I see we are using aquasecurity/trivy-action@e368e328979b113139d6f9068e03accaed98a518 . Glad to see we're using the commit hash.
I believe this is still ongoing; we should wait until theres resolution to proceed with this PR.
EDIT: looks like trivy was just pulled from the list of allowed actions by asf-infra apache/infrastructure-actions#548
https://infra.apache.org/blog/trivy_security_incident.html
@kevinjqliu PR to add it: apache/infrastructure-actions#573 |
Summary
checkcompletes, it buildsdistZip, unpacks it, and scans the bundled jars for CRITICAL/HIGH CVEsContext
Discussion on dev@ mailing list: https://lists.apache.org/thread/kbf98950pzstzgon92st7mh9vrbv5yhb
Confluent Marketplace requires a Trivy scan before listing connectors. This has previously caught CVEs that needed patching (e.g. #14985). Running the scan in CI catches vulnerabilities during development and — critically — on RC tags before the release vote starts, when fixes can still be applied.
This is independent of #15212 (adding the KC artifact to the release process) and can land in either order.