Skip to content

Phase 3 reindex leaves orphan ES entries in the indicies table #36077

@fabrizzio-dotCMS

Description

@fabrizzio-dotCMS

Problem Statement

During OpenSearch migration Phase 3 (OS is primary), running a full reindex leaves orphan/garbage entries in the dotCMS indicies table (and the corresponding orphaned physical indices). After the OpenSearch reindex completes and the .os (v3.X) indices are promoted to live/working, the cleanup step fails to remove:

  • the previous ES live/working pointers (NULL version), and
  • the ES-side reindex_live / reindex_working pointers (NULL version) created during the same reindex.

The expected end state of a successful reindex is exactly two rows (live + working). Instead the table accumulates stale ES-side rows alongside the promoted OS rows.

This matches the previously-documented delete-orphan bug class for the OS migration (bare index name → OS provider resolves without the .os tag, so the OS cleanup/swap targets a non-existent name and the real index/row is left behind). See related work on branch fix/issue-35820-os-index-naming-suffix / PR #35863.

Impact: orphan rows and orphan physical indices accumulate on every Phase-3 reindex, polluting cluster/index state, wasting storage, and potentially breaking later index cleanup/swap operations. No data loss on the OS primary side.

Observed state (after one Phase-3 reindex)

select * from indicies:

# index_name index_type index_version
1 cluster_409d7b8c31.working_20260604152231 working NULL (ES)
2 cluster_409d7b8c31.live_20260604152231 live NULL (ES)
3 cluster_409d7b8c31.working_20260604192544 reindex_working NULL (ES)
4 cluster_409d7b8c31.live_20260604192544 reindex_live NULL (ES)
5 cluster_409d7b8c31.live_20260604192544.os live 3.X (OS)
6 cluster_409d7b8c31.working_20260604192544.os working 3.X (OS)

Rows 1–4 (NULL version = ES) are the leftover garbage. Only rows 5–6 (the promoted .os / v3.X pair) should remain.

Screenshot evidence (indicies table) to be attached to this issue.

Steps to Reproduce

  1. Stand up a dotCMS environment in OpenSearch migration Phase 3 (OS primary), e.g. the opensearch-upgrade env.
  2. Confirm the indicies table holds the expected live/working pair.
  3. Trigger a full reindex (Maintenance → Reindex, or the reindex API).
  4. Wait for the reindex to complete and the new .os (v3.X) indices to be promoted to live/working.
  5. Query select * from indicies.
  6. Observe: stale ES-side rows (live/working NULL version and reindex_live/reindex_working NULL version) remain alongside the promoted OS rows — 6 rows instead of 2.

Acceptance Criteria

  • After a successful Phase-3 reindex, the indicies table contains exactly the active live + working pair (the promoted .os / v3.X rows) — no leftover ES-side live/working (NULL version) rows.
  • The transient reindex_live / reindex_working rows are removed (or promoted) once the reindex completes — none remain as orphans.
  • The physical indices backing the removed rows are deleted from the cluster (no orphaned .os or bare physical indices left behind).
  • Re-running reindex multiple times in Phase 3 does not accumulate additional orphan rows/indices.
  • Integration coverage exercises a full Phase-3 reindex lifecycle and asserts the final indicies table state (2 rows) and absence of orphan physical indices.

dotCMS Version

main branch (latest) — OpenSearch migration Phase 3 (OS primary), observed in the opensearch-upgrade environment.

Severity

Medium - Some functionality impacted

Links

NA

Metadata

Metadata

Type

No fields configured for Bug.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions