Skip to content

🍒 11649 - Improve performance of regexps in IAST and query obfuscator#11710

Open
manuel-alvarez-alvarez wants to merge 1 commit into
release/v1.63.xfrom
malvarez/backport-pr-11649
Open

🍒 11649 - Improve performance of regexps in IAST and query obfuscator#11710
manuel-alvarez-alvarez wants to merge 1 commit into
release/v1.63.xfrom
malvarez/backport-pr-11649

Conversation

@manuel-alvarez-alvarez

Copy link
Copy Markdown
Member

Backport #11649 to release/v1.63.x

Migrate the IAST evidence-redaction regexps to RE2/J for linear-time
matching. RE2/J has no back-references, so the SQL tokenizer is reworked
to find Postgres dollar-quoted literals via a precomputed tag index
(binary search) and to enumerate Oracle q'...' delimiters explicitly
instead of relying on a back-reference. Configured redaction patterns
that are valid under java.util.regex but unsupported by RE2/J fall back
to the defaults instead of failing to compile.

Replace the query obfuscator's `while (matcher.find())` + per-match
`Strings.replace` loop (O(N*Q)) with a single appendReplacement /
appendTail pass (O(Q)).

Add JUnit 5 tests for the tokenizers and the obfuscator, a tokenizer
JMH benchmark, and migrate SensitiveHandlerTest from Groovy to JUnit 5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 92ebc2a)
@manuel-alvarez-alvarez manuel-alvarez-alvarez requested review from a team as code owners June 23, 2026 15:25
@manuel-alvarez-alvarez manuel-alvarez-alvarez added type: enhancement Enhancements and improvements tag: performance Performance related changes comp: asm iast Application Security Management (IAST) tag: ai generated Largely based on code generated by an AI or LLM labels Jun 23, 2026
@manuel-alvarez-alvarez manuel-alvarez-alvarez added type: enhancement Enhancements and improvements tag: performance Performance related changes comp: asm iast Application Security Management (IAST) labels Jun 23, 2026
@manuel-alvarez-alvarez manuel-alvarez-alvarez added the tag: ai generated Largely based on code generated by an AI or LLM label Jun 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a004f21de6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"\\$(?<ESCAPE>[^$]*?)\\$.*?\\$\\k<ESCAPE>\\$";
private static final String ORACLE_ESCAPED_LITERAL = buildOracleEscapedLiteral();
// $$ or $tag$ where tag is a SQL identifier
private static final String POSTGRESQL_ESCAPED_LITERAL = "\\$(?:[a-zA-Z_]\\w*)?\\$";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve PostgreSQL dollar-quote tags with non-ASCII identifiers

For PostgreSQL, dollar-quote tags follow unquoted identifier rules, which can include non-ASCII letters. Restricting the opener to ASCII means a valid literal like SELECT $é$secret$é$ is no longer recognized as a dollar-quoted string; the new tokenizer then skips the unmatched $...$ token and the literal body is not redacted. Please keep this pattern and the tag indexer aligned with PostgreSQL identifier characters rather than only [a-zA-Z_]\w*.

Useful? React with 👍 / 👎.

@pr-commenter

pr-commenter Bot commented Jun 23, 2026

Copy link
Copy Markdown

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1782229191 1782229537
end_time 2026-06-23T15:41:17 2026-06-23T15:47:03
git_branch master malvarez/backport-pr-11649
git_commit_sha 741789f a004f21
start_time 2026-06-23T15:39:52 2026-06-23T15:45:38
See matching parameters
Baseline Candidate
ci_job_id 1796465621 1796465621
ci_pipeline_id 120536858 120536858
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1782228334 1782228334

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 5 unstable metrics.

See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-14.706µs; +26.999µs] or [-5.079%; +9.324%]
unstable
[-22.916µs; +38.182µs] or [-6.931%; +11.548%]
unstable
[-33.085µs; +51.045µs] or [-9.565%; +14.757%]
unstable
[-202.935µs; +11.302µs] or [-15.399%; +0.858%]
same
scenario:basic same same same unstable
[-273.412µs; -11.870µs] or [-21.956%; -0.953%]
same
scenario:loop same same unsure
[-13.402µs; -3.703µs] or [-0.148%; -0.041%]
same same
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (330.65 µs) : 309, 353
.   : milestone, 331,
basic (296.443 µs) : 290, 303
.   : milestone, 296,
loop (8.982 ms) : 8977, 8987
.   : milestone, 8982,
section candidate
noprobe (338.283 µs) : 304, 373
.   : milestone, 338,
basic (298.029 µs) : 291, 305
.   : milestone, 298,
loop (8.983 ms) : 8978, 8988
.   : milestone, 8983,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 330.65 µs [308.75 µs, 352.549 µs]
basic 296.443 µs [289.865 µs, 303.021 µs]
loop 8.982 ms [8.977 ms, 8.987 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 338.283 µs [303.54 µs, 373.026 µs]
basic 298.029 µs [290.678 µs, 305.38 µs]
loop 8.983 ms [8.978 ms, 8.988 ms]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: asm iast Application Security Management (IAST) tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant