Skip to content

SDSTOR-21989: check index_table existence before using it#409

Open
JacksonYao287 wants to merge 2 commits intoeBay:stable/v4.xfrom
JacksonYao287:21989
Open

SDSTOR-21989: check index_table existence before using it#409
JacksonYao287 wants to merge 2 commits intoeBay:stable/v4.xfrom
JacksonYao287:21989

Conversation

@JacksonYao287
Copy link
Copy Markdown
Collaborator

@JacksonYao287 JacksonYao287 commented May 6, 2026

if homeobject crashes after index_table is destroyed in destroy_pg and before the pg is created in baseline resync, we can not find pg index table when restarts. so we need add a sanity check to see if index_table exists. if not, skip refreshing pg metrics

@JacksonYao287 JacksonYao287 requested a review from Besroy May 6, 2026 01:50
@JacksonYao287 JacksonYao287 force-pushed the 21989 branch 2 times, most recently from 0a92251 to dca49f2 Compare May 6, 2026 03:26
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 35.29412% with 11 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (stable/v4.x@d9009cf). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/lib/homestore_backend/hs_pg_manager.cpp 40.00% 4 Missing and 2 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp 28.57% 3 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@              Coverage Diff               @@
##             stable/v4.x     #409   +/-   ##
==============================================
  Coverage               ?   52.27%           
==============================================
  Files                  ?       36           
  Lines                  ?     5295           
  Branches               ?      659           
==============================================
  Hits                   ?     2768           
  Misses                 ?     2239           
  Partials               ?      288           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xiaoxichen
Copy link
Copy Markdown
Collaborator

In general LGTM

  1. During recovery, should we better handling destroyed PG? Currently we only partially handled the case that Index is destroyed on destroyed PG. (
    } else {
    RELEASE_ASSERT(hs_pg->pg_sb_->state == PGState::DESTROYED, "IndexTable should be recovered before PG");
    hs_pg->index_table_ = nullptr;
    LOGI("Index table not found for destroyed pg={}, index_table_uuid={}", pg_id, uuid_str);
    }
    )

I think we should redo the HSHomeObject::pg_destroy during recovery.

  1. the check index_table logic spread across the code, is it better to add an accessor into HS_PG? which checks the existence of index and logging (if the PG is destroyed) or assert.

@Besroy
Copy link
Copy Markdown
Contributor

Besroy commented May 7, 2026

Agree, checking the PG state seems safer and cleaner. @yuwmao @koujl Do you recall the previous solution for handling crashes during destroy_pg? Perhaps we can integrate it with this fix as a unified solution

@JacksonYao287
Copy link
Copy Markdown
Collaborator Author

JacksonYao287 commented May 8, 2026

agree, we need to redo pg_destroy to make sure all the pg resource are cleared before pg is recreated by BR or permanently destroyed. cc @yuwmao @koujl if you want to do this, feel free to close this PR.

@JacksonYao287 JacksonYao287 requested review from koujl and yuwmao May 8, 2026 03:54
@yuwmao
Copy link
Copy Markdown
Contributor

yuwmao commented May 8, 2026

No, I haven't met the destroy_pg crash issue before in my memory. But also agree on redo pg_destroy.

Agree, checking the PG state seems safer and cleaner. @yuwmao @koujl Do you recall the previous solution for handling crashes during destroy_pg? Perhaps we can integrate it with this fix as a unified solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants