-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JS: Skip minified file if avg line length > 200 #20940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
25eaf27 to
38eef8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the detection of minified JavaScript files by classifying files with an average line length greater than 200 characters as minified. Previously, only filename-based detection (e.g., *.min.js) was used. The new content-based detection can be disabled by setting the environment variable CODEQL_EXTRACTOR_JAVASCRIPT_ALLOW_MINIFIED_FILES=true.
- Adds a new
isMinified()method that calculates average line length by counting line breaks (handling both\nand\r\n) - Introduces a
skipReasonfield to track why files are skipped during extraction - Makes the minified file filtering configurable via environment variable
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
javascript/extractor/src/com/semmle/js/extractor/ScriptExtractor.java |
Implements the core minified file detection logic based on average line length |
javascript/extractor/src/com/semmle/js/extractor/ParseResultInfo.java |
Adds support for tracking skip reasons when files are not extracted |
javascript/extractor/src/com/semmle/js/extractor/FileExtractor.java |
Updates extraction flow to handle and log skipped files |
javascript/extractor/src/com/semmle/js/extractor/ExtractorConfig.java |
Adds configuration option to allow/disallow minified files |
javascript/extractor/src/com/semmle/js/extractor/EnvironmentVariables.java |
Adds environment variable parsing for minified file handling |
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java |
Applies the allowMinified configuration to both filename-based and content-based filtering |
javascript/ql/src/change-notes/2025-12-05-skip-minified-files.md |
Documents the new behavior in release notes |
javascript/ql/test/query-tests/filters/ClassifyFiles/ClassifyFiles.expected |
Updates test expectations to reflect that json-like.js is now detected as minified |
javascript/ql/test/query-tests/Security/CWE-400/ReDoS/regexplib/dates.js |
Adds padding lines to prevent this test file from being classified as minified |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
javascript/extractor/src/com/semmle/js/extractor/FileExtractor.java
Outdated
Show resolved
Hide resolved
javascript/ql/src/change-notes/2025-12-05-skip-minified-files.md
Outdated
Show resolved
Hide resolved
….java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor suggestion, otherwise LGTM! 👍
|
I'm wondering if there's some classes of cases we're going to miss by doing that (i.e. I would imagine URLs could result in >200 length lines, which are potentially prone to injection?), and if: 1) we'd be able to find this out when rolling it out; or 2) have a slightly different heuristic to avoid this? I don't have a strong opinion here but something like: const apiEndpoint = "https://api.example.com/v2/users/profile/settings/notifications/preferences/email?includeArchived=true&format=json&apiKey=abc123def456";
let blah = someOtherNormalJSStuff();
if(blah) {
blah.obviouslyNotMinifiedJS();
}Would be considered minified. No clue as to what the data looks like in terms of distribution of line lengths |
I don't think that's quite right. Your code snippet is ~240 characters and 5 line breaks, for an average line length of 48 -- well below the limit. To get above an average of 200, you either need a few really long lines, or a lot of >200 character lines. |
|
Ya you're right thanks @tausbn -- my brain forgot what an average is and read it just as "max" 🙃 . A 200 average length one for the above example would basically a helper |
Co-authored-by: Taus <tausbn@github.com>
Improves the detection of minified files, by classifying files with an average line length over 200 as minified.
Minifiers typically compile a file to one long line, but sometimes there's a copyright comment in front of it, so we need to account for some normal-looking lines. Sometimes multiple such files are concatenated in a bundle.
Note that we already skip certain files based on filenames, such as
*.min.js, but previously we did not detect minification based on contents.