Add raw argument collections for URL-encoded parameters (#3501)#3564
Add raw argument collections for URL-encoded parameters (#3501)#3564Easton97-Jens wants to merge 7 commits into
Conversation
Removed a large block of variable evaluation logic and added new variable cases for ARGS_GET_NAMES_RAW and ARGS_POST_NAMES_RAW.
…r-security-hotspots Use direct empty-string checks instead of `strlen(...) > 0`
There was a problem hiding this comment.
Pull request overview
Adds raw (pre-URL-decoded) argument collections (ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, ARGS_NAMES_RAW, ARGS_GET_NAMES_RAW, ARGS_POST_NAMES_RAW) so rules can inspect original encoded names/values, addressing #3501. The collections are populated in Transaction::extractArguments before URL decoding, and the parser/scanner are updated to recognize the new tokens. The Bison parser sources were also regenerated against a newer Bison version.
Changes:
- New
*_rawvariable headers and corresponding members inTransactionAnchoredVariables;extractArgumentsnow stores the raw key/value via a newaddRawArgumenthelper before decoding. - Parser/scanner (
seclang-parser.yy,seclang-scanner.ll) extended with the new RAW tokens and grammar rules; regeneratedseclang-parser.hhupdated accordingly; obsoleteposition.hh/stack.hhremoved;configure.acnormalizesbison -ytobison. - Added regression test JSON
variable-ARGS_RAW.json(and registers it intest-suite.in); minorstrlen(buf) > 0→buf[0] != '\0'cleanups inpm.ccandmodsecurity_test.cc.
Reviewed changes
Copilot reviewed 20 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| headers/modsecurity/transaction.h | Adds raw AnchoredSetVariable/translation-proxy members and constructor wiring. |
| src/transaction.cc | Captures raw key/value and calls addRawArgument after successful addArgument. |
| src/transaction_raw_args.h | New inline helper that sets the raw variables for ARGS_RAW/ARGS_GET_RAW/ARGS_POST_RAW. |
| src/variables/args_raw.h, args_get_raw.h, args_post_raw.h, args_names_raw.h, args_get_names_raw.h, args_post_names_raw.h | New variable definitions for raw collections. |
| src/variables/variable.h | Adds resolution branches for the new RAW collections. |
| src/args_names_raw.h | Misplaced duplicate of src/variables/args_names_raw.h (flagged). |
| src/parser/seclang-parser.yy | Adds RAW tokens/grammar; updates Bison directives and modernization. |
| src/parser/seclang-parser.hh | Regenerated parser header reflecting new tokens and offsets. |
| src/parser/seclang-scanner.ll | Adds scanner rules for the RAW tokens; small ordering fix. |
| src/parser/position.hh, stack.hh | Removed (obsolete with newer Bison). |
| src/operators/pm.cc | Replaces strlen(...)>0 with [0] != '\0'. |
| test/common/modsecurity_test.cc | Same strlen → first-byte check cleanup. |
| test/test-cases/regression/variable-ARGS_RAW.json | New regression suite covering RAW collections. |
| test/test-suite.in | Registers the new regression test file. |
| configure.ac | Normalizes YACC value when autoconf returns bison -y. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Easton97-Jens, thanks again this great PR. Please take a look at Copilot's suggestions. Beside of the suggestions, I have a feeling that we should check this modification with a "real-world" example, I mean add an Nginx build to the CI workflow, and try these variables through it. I hope we won't be surprised (eg. the server converts the ARGS and the library will never access them... or similar). First, let me check this (don't need to add any extra tests). |
|



Fixes #3501
what
This PR adds raw argument collections for URL-encoded GET and POST parameters.
New collections:
ARGS_RAWARGS_GET_RAWARGS_POST_RAWARGS_NAMES_RAWARGS_GET_NAMES_RAWARGS_POST_NAMES_RAWThe raw values are captured in
Transaction::extractArguments(...)beforeutils::urldecode_nonstrict_inplace(...)is applied. ExistingARGS,ARGS_GET,ARGS_POST, andARGS_NAMESbehavior remains unchanged.This also includes parser/scanner updates needed to recognize the new
collections and keeps the generated parser sources compatible with modern
Bison/Flex versions.
why
Issue #3501 describes problems caused by relying only on already-decoded
argument collections. In cases such as double-encoded input, rules may need
access to the original parameter name or value before libModSecurity performs
its internal URL decoding.
Adding explicit raw collections allows rules to inspect the pre-decoded
URL-encoded input without changing the behavior of existing collections.
testing
ARGS*collections still behave as before