Conversation
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
bevzzz
left a comment
There was a problem hiding this comment.
re: pinning actions, GitHub allows enforcing this rule on the repository level.
I think we should use that instead of the linter_actions_pinned.sh script.
| PERL_SCRIPT=' | ||
| use utf8; | ||
| use strict; | ||
| use warnings; | ||
|
|
||
| sub escape_property { | ||
| my ($s) = @_; | ||
| $s =~ s/%/%25/g; | ||
| $s =~ s/\r/%0D/g; | ||
| $s =~ s/\n/%0A/g; | ||
| $s =~ s/:/%3A/g; | ||
| $s =~ s/,/%2C/g; | ||
| return $s; | ||
| } | ||
|
|
||
| sub escape_message { | ||
| my ($s) = @_; | ||
| $s =~ s/%/%25/g; | ||
| $s =~ s/\r/%0D/g; | ||
| $s =~ s/\n/%0A/g; | ||
| return $s; | ||
| } | ||
|
|
||
| my $file = ""; | ||
| my $line_in_file = 0; | ||
| my $errors = 0; | ||
| my $in_binary = 0; | ||
| my $binary_pattern = qr/'"$BINARY_PATTERN"'/i; | ||
|
|
||
| while (<STDIN>) { | ||
| chomp; | ||
|
|
||
| # Track file from diff headers | ||
| if (/^\+\+\+ b\/(.+)$/) { | ||
| $file = $1; | ||
| $line_in_file = 0; | ||
| $in_binary = ($file =~ $binary_pattern) ? 1 : 0; | ||
| next; | ||
| } | ||
|
|
||
| # Skip binary file markers | ||
| if (/^Binary files/) { | ||
| $in_binary = 1; | ||
| next; | ||
| } | ||
|
|
||
| # Track hunk headers for line numbers | ||
| if (/^@@ -\d+(?:,\d+)? \+(\d+)/) { | ||
| $line_in_file = $1 - 1; | ||
| next; | ||
| } | ||
|
|
||
| # Count lines in the new file | ||
| if (/^\+/ || /^ /) { | ||
| $line_in_file++; | ||
| } | ||
|
|
||
| # Only scan added lines, skip binary files | ||
| next if $in_binary; | ||
| next unless /^\+/; | ||
| next if /^\+\+\+ (?:$|b\/|\/dev\/null)/; | ||
|
|
||
| # Remove the leading + for scanning | ||
| my $content = substr($_, 1); | ||
|
|
||
| # Check for suspicious invisible Unicode characters: | ||
| # - Bidi overrides and isolates (U+200E-200F, U+202A-202E, U+2066-2069) | ||
| # - Zero-width characters (U+200B-200D, U+2060) | ||
| # - Byte order mark mid-line (U+FEFF) | ||
| # - Soft hyphen (U+00AD) | ||
| # - Mongolian vowel separator (U+180E) | ||
| # - Combining grapheme joiner (U+034F) | ||
| # - Function application and invisible operators (U+2061-2064) | ||
| # - Hangul fillers (U+115F, U+1160, U+3164, U+FFA0) | ||
| # - Interlinear annotation (U+FFF9-FFFB) | ||
| # - Object replacement / replacement char (U+FFFC-FFFD) -- FFFD is sometimes legitimate | ||
| # - Unicode tag block (U+E0001, U+E0020-E007F) | ||
| # - Deprecated format chars (U+206A-206F) | ||
| if ($content =~ /([\x{00AD}\x{034F}\x{115F}\x{1160}\x{180E}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{2060}-\x{2064}\x{2066}-\x{2069}\x{206A}-\x{206F}\x{3164}\x{FE00}-\x{FE0F}\x{FEFF}\x{FFA0}\x{FFF9}-\x{FFFB}\x{E0001}\x{E0020}-\x{E007F}])/) { | ||
| my $char = $1; | ||
| my $codepoint = sprintf("U+%04X", ord($char)); | ||
| my $col = $-[1] + 1; | ||
|
|
||
| if ($ENV{GITHUB_ACTIONS}) { | ||
| my $efile = escape_property($file); | ||
| my $emsg = escape_message("Hidden Unicode character ${codepoint} found"); | ||
| print "::error file=${efile},line=${line_in_file},col=${col}::${emsg}\n"; | ||
| } else { | ||
| print "ERROR: $file:$line_in_file:$col - Hidden Unicode character $codepoint found\n"; | ||
| } | ||
| $errors++; | ||
| } | ||
| } | ||
|
|
||
| if ($errors > 0) { | ||
| print "\nFound $errors hidden Unicode character(s) in added lines.\n"; | ||
| print "These may indicate a trojan-source attack. See https://trojansource.codes/\n"; | ||
| exit 1; | ||
| } else { | ||
| print "No hidden Unicode characters detected.\n"; | ||
| exit 0; | ||
| } | ||
| ' | ||
|
|
||
| get_diff "$@" | perl -CS -e "$PERL_SCRIPT" |
There was a problem hiding this comment.
I googled around and I see that GitHub already displays a warning in the PR diff view if it detects these hidden characters, which seems to be quite similar to what this script would do, i.e. issue a warning.
Paying close attention to the GitHub warnings when reviewing a PR from an outside contributor [^1] is a reasonable expectation for the maintainer and IMO should be sufficient.
Introducing a ~100-line Perl script does not sit right with me. Nothing's wrong with Perl, but it's a language I do not know, so I am not able to maintain and/or validate the script. I suspect that's the case for some/most of the other maintainers of this project.
WDYT?
[^1] -- or a potentially compromised internal contributor
There was a problem hiding this comment.
This is replicating the same approach used in the Weaviate server. I have a plan to later migrate this to a re-usable Github Action if needed, for ease of maintenance.
The warning is good, but it's not a hard stop.
250770f to
e30dcb5
Compare
Align all uses: refs to the immutable commit SHAs used by weaviate/weaviate. Major versions bumped where necessary: checkout v6, docker/login v4, upload-artifact v7, download-artifact v8, cache v5. Tags preserved as comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e30dcb5 to
53acae2
Compare
Summary
uses:refs in GitHub Actions workflows to the same commit SHAs used byweaviate/weaviate, so this client stays in lockstep with the server# v6) as a trailing comment for readabilityContext
Initial consolidation pass. Going forward, GitHub's repo-level "Require actions to be pinned to a full-length commit SHA" policy (shipped 2025-08-15) will enforce SHA pinning at execution time for every workflow — so no custom linter is needed in this repo.
The scope was originally broader (included
linter_actions_pinned.sh,linter_hidden_unicode.sh, and apr-security-lint.yamlworkflow). Those are dropped here: the native policy makes the SHA linter redundant, and the hidden-Unicode scan will return in a follow-up that delegates to a shared composite action inweaviate/weaviate.Test plan
🤖 Generated with Claude Code