Skip to content

Creating draft PR with potential fix#36092

Open
jcastro-dotcms wants to merge 6 commits into
mainfrom
support-36004-save-content-with-related-content-performance-improvements
Open

Creating draft PR with potential fix#36092
jcastro-dotcms wants to merge 6 commits into
mainfrom
support-36004-save-content-with-related-content-performance-improvements

Conversation

@jcastro-dotcms

Copy link
Copy Markdown
Member

Proposed Changes

  • change 1
  • change 2

Checklist

  • Tests
  • Translations
  • Security Implications Contemplated (add notes if applicable)

Additional Info

** any additional useful context or info **

Screenshots

Original Updated
** original screenshot ** ** updated screenshot **

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ Issue Linking Required

This PR could not be linked to an issue. All PRs must be linked to an issue for tracking purposes.

How to fix this:

Option 1: Add keyword to PR body (Recommended - auto-removes this comment)
Edit this PR description and add one of these lines:

  • This PR fixes #123 or Fixes: #123

  • This PR closes #123 or Closes: #123

  • This PR resolves #123 or Resolves: #123

  • Other supported keywords: fix, fixed, close, closed, resolve, resolved
    Option 2: Link via GitHub UI (Note: won't clear the failed check)

  1. Go to the PR → Development section (right sidebar)

  2. Click "Link issue" and select an existing issue

  3. Push a new commit or re-run the workflow to clear the failed check
    Option 3: Use branch naming
    Create a new branch with one of these patterns:

  • 123-feature-description (number at start)

  • issue-123-feature-description (issue-number at start)

  • feature-issue-123 (issue-number anywhere)

Why is this required?

Issue linking ensures proper tracking, documentation, and helps maintain project history. It connects your code changes to the problem they solve.---

This comment was automatically generated by the issue linking workflow

@github-actions github-actions Bot added the Area : Backend PR changes Java/Maven backend code label Jun 9, 2026
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jcastro-dotcms's task in 3m 37s —— View job


Code Review

  • Gathering context
  • Reviewing changed files
  • Posting findings

Summary

The PR rewrites the hot path for saving related content: instead of fetching and hydrating every related contentlet one-by-one, it reads raw identifiers from the tree table, deletes/inserts in bulk, and only re-indexes parents whose relationships actually changed. The approach is sound, but there are several issues worth addressing before merge.


Issues

1. SQL injection via dynamic column names — Semgrep findings are real (TreeFactory.java)

getRelatedIds and getNextTreeOrder build SQL by string-concatenating caller-controlled column names (selectColumn, whereColumn) directly into the statement:

// TreeFactory.java ~L459, ~L510
.setSQL("SELECT " + selectColumn + " FROM tree WHERE " + whereColumn + " = ? ...")
.setSQL("... WHERE " + whereColumn + " = ? AND relation_type = ?")

Both private methods are only ever called with the literals "parent" or "child", so there is no current injection vector. However, the pattern is fragile: there is no allowlist guard, and any future caller passing unsanitised input would be vulnerable. The Semgrep rule is correct to flag this.

Fix: add an allowlist assert/throw at the top of each private method, or collapse the two branches into two named methods that hard-code the column name. Fix this →


2. Non-atomic delete-then-insert in insertTrees (TreeFactory.java ~L610)

new DotConnect().executeBatch("DELETE FROM tree WHERE ...", deleteParams);
new DotConnect().executeBatch("INSERT INTO tree ...", insertParams);

executeBatch is @WrapInTransaction, meaning each call gets its own transaction. Between the DELETE batch completing and the INSERT batch starting there is a window where the rows do not exist. A concurrent read of tree during that window (e.g. getRelatedIdsByParentAndRelationType or a search index refresh triggered from another thread) will see no related content. Under concurrent saves of the same parent or a high-traffic node this can cause:

  • transient empty relationship reads;
  • redundant re-index of unchanged parents (because the index sees an empty relationship set);
  • a lost INSERT if the two executeBatch calls end up on different connections.

Both calls should be wrapped in a single explicit transaction (or the delete should be done inline before each insert row using an INSERT ... ON CONFLICT DO UPDATE / MERGE if the DB supports it).


3. isExemptFromPermissionFiltering may produce wrong results for null users in the admin role check (ESContentletAPIImpl.java)

private boolean isExemptFromPermissionFiltering(final User user) throws DotDataException {
    if (user == null) {
        return false;
    }
    ...
    return APILocator.systemUser().getUserId().equals(user.getUserId())
            || roleAPI.doesUserHaveRole(user, roleAPI.loadCMSAdminRole());
}

Returning false for a null user sends the call into deleteReadableRelatedContent, which then calls permissionAPI.filterCollection(…, user). The PermissionAPI contract for null user is not universally guaranteed — some implementations throw, some treat it as anonymous. This was not the behaviour before the refactor (the old code always hydrated content first), so any existing callers that pass a null user would hit a different code path now. At minimum, document the null-user contract; consider throwing IllegalArgumentException so the contract is explicit.


4. deleteAllRelatedContent re-indexes removed parents but skips the invalidateRelatedContentCache for the non-relationship-field case (ESContentletAPIImpl.java ~L4660)

In deleteReadableRelatedContent, invalidateRelatedContentCache(contentlet, …) is called on the hydrated contentlet for every related item. In the new fast path (deleteAllRelatedContent), the cache-invalidation loop uses the identifier-only overload, which correctly bails out early when !relationship.isRelationshipField(). This is fine for that guard, but non-relationship-field cache eviction that the old code handled via the contentlet-level invalidateRelatedContentCache is now skipped entirely in the fast path. If the Contentlet version of the method does anything beyond clearing RelationshipCache, that logic is silently dropped.

Recommendation: audit both overloads to confirm they're functionally equivalent for all relationship types before merging.


5. Scroll fallback in dependenciesLeftToReindex is triggered incorrectly (ESMappingAPIImpl.java ~L1268)

if (oldSearchResults instanceof PaginatedArrayList
        && ((PaginatedArrayList<?>) oldSearchResults).getTotalResults() > oldSearchResults.size()) {
    // fall back to a scroll by passing MAX_LIMIT + 1
    oldSearchResults = contentletAPI.searchIndex(relatedQuery,
            ESContentletAPIImpl.MAX_LIMIT + 1, 0, null, ...);
}

MAX_LIMIT + 1 (10001) triggers the scroll path internally (limit > MAX_LIMIT), but the method name says "falling back to a scroll search". The scroll API returns a List<ContentletSearch> that is NOT a PaginatedArrayList, so the check above would not re-trigger. However, nothing prevents a second page of results — if a single relationship type has >10 000 indexed documents the scroll approach is correct, but the fallback fires on every call that returns exactly MAX_LIMIT results even when the total is only marginally above the limit. This is an inefficiency (unconditional scroll for edge cases). More importantly, searchIndex with a limit above MAX_LIMIT is not documented and depends on internal branching that could change. Consider using the dedicated scroll API directly.


6. relateContent tree position starts at getNextTreeOrderByParent… but treePosition++ increments it without being used in the child branch (ESContentletAPIImpl.java ~L5160)

int treePosition = related.isHasParent()
        ? TreeFactory.getNextTreeOrderByParentAndRelationType(...)
        : TreeFactory.getNextTreeOrderByChildAndRelationType(...);
...
for (final Contentlet relatedContent : related.getRecords()) {
    if (child) {
        // positionInParent is used here — from contentParents, not treePosition
        newTree = new Tree(..., positionInParent);
    } else {
        newTree = new Tree(..., treePosition);
    }
    positionInParent++;
    if (uniqueRelationshipSet.add(newTree)) {
        treesToInsert.add(newTree);
        treePosition++;   // <-- incremented for both branches
    }
}

When child == true, treePosition is fetched from the DB but only incremented — it is never assigned to any Tree. The DB read is wasted. Minor, but worth cleaning up.


7. No transaction boundary around the delete-then-insert pair in relateContent (ESContentletAPIImpl.java)

deleteRelatedContent and the subsequent TreeFactory.insertTrees call (in relateContent) run in two separate calls on the same thread but without an explicit wrapping transaction. Each uses @WrapInTransaction / @CloseDBIfOpened independently. If the insertTrees fails after deleteRelatedContent has committed, the relationship is permanently lost. The outer checkin flow may wrap this in a transaction, but relateContent is also a public API called standalone — this should be guarded.


@jcastro-dotcms jcastro-dotcms marked this pull request as ready for review June 11, 2026 15:52
@semgrep-code-dotcms-test

Copy link
Copy Markdown
Contributor

Semgrep found 10 CUSTOM_INJECTION-2 findings:

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

Comment on lines +510 to +511
.setSQL("SELECT COALESCE(MAX(tree_order), 0) + 1 AS next_position FROM tree"
+ " WHERE " + whereColumn + " = ? AND relation_type = ?")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified a blocking 🔴 issue in your code:
The method identified is susceptible to injection. The input should be validated and properly
escaped.

Why this might be safe to ignore:

This query concatenates only the column name, and in this method that value is constrained by internal callers to the fixed literals "parent" or "child" while user-controlled values are still passed as parameters. The rule matched a generic string-built SQL pattern, but in this context there is no realistic injection path to the database.

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by CUSTOM_INJECTION-2.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

You can view more details about this finding in the Semgrep AppSec Platform.

Comment on lines +476 to +477
"SELECT * FROM contentlet_version_info WHERE identifier IN ("
+ DotConnect.createParametersPlaceholder(chunk.size()) + ")");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified a blocking 🔴 issue in your code:
The method identified is susceptible to injection. The input should be validated and properly
escaped.

Why this might be safe to ignore:

This query builds only the number of parameter placeholders for an IN clause, and the actual identifier values are passed separately with addParam, so user input is not concatenated into SQL. The rule matched a SQL string concatenation pattern, but in this context it does not create a meaningful injection risk.

To resolve this comment:

🔧 No guidance has been designated for this issue. Fix according to your organization's approved methods.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by CUSTOM_INJECTION-2.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

You can view more details about this finding in the Semgrep AppSec Platform.

@semgrep-code-dotcms-test

Copy link
Copy Markdown
Contributor

Semgrep found 3 CUSTOM_INJECTION-2 findings:

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

@jcastro-dotcms

jcastro-dotcms commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Regarding Semgrep's analysis, all three are false positives:

Finding analysis

  • Findings 842408061 + 842408062 — TreeFactory L510–511 (both point at the same two lines, in getNextTreeOrder):
  .setSQL("SELECT COALESCE(MAX(tree_order), 0) + 1 AS next_position FROM tree"
          + " WHERE " + whereColumn + " = ? AND relation_type = ?")

The rule fires on the string concatenation, but the only concatenated variable is whereColumn, which is a private method parameter whose only two callers pass the hardcoded literals "parent" and "child". Everything user-controllable (id, relationType) goes through addParam bound parameters. No injection path exists.

  • Finding 842408064 — VersionableFactoryImpl L476–477:
  .setSQL("SELECT * FROM contentlet_version_info WHERE identifier IN ("
          + DotConnect.createParametersPlaceholder(chunk.size()) + ")");

The concatenated value is the output of createParametersPlaceholder(int) — a generated ?,?,?... string derived from a list size, never from content. All identifiers are bound via addParam. This is also the established house pattern: ContentletIndexAPIImpl.SELECT_CONTENTLET_VERSION_INFO and ESContentFactoryImpl build chunked IN clauses with the exact same helper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant