Fix access point cleanup on instance removal#10179
Fix access point cleanup on instance removal#10179openroad-ci wants to merge 5 commits intoThe-OpenROAD-Project:masterfrom
Conversation
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
I found this issue during new Resizer architecture testing. @eder-matheus I added you as a reviewer because I found that you edited the related codes a week ago. |
There was a problem hiding this comment.
Code Review
This pull request refactors access point management in dbInst and dbMPin to ensure idempotency and correctly handle duplicate references. Key changes include updating dbMPin::clearPinAccess to sort and deduplicate access points before destruction and replacing manual cleanup in dbInst with a centralized method call. New unit tests were added to verify these improvements. The review feedback suggests hardening the index check in dbMPin::clearPinAccess against negative values and using structured bindings with std::ranges::unique to enhance code readability.
| @@ -153,8 +155,11 @@ void dbMPin::clearPinAccess(const int pin_access_idx) | |||
| if (pin->aps_.size() <= pin_access_idx) { | |||
There was a problem hiding this comment.
The comparison between pin->aps_.size() (which is unsigned) and pin_access_idx (which is a signed int) can lead to unexpected behavior if pin_access_idx is negative. While it currently works "by accident" for -1 (as it promotes to a large unsigned value), it is safer and clearer to explicitly check for negative values or change the parameter type to uint32_t to match dbInst::getPinAccessIdx().
| if (pin->aps_.size() <= pin_access_idx) { | |
| if (pin_access_idx < 0 || pin->aps_.size() <= static_cast<uint32_t>(pin_access_idx)) { |
| dbVector<dbId<_dbAccessPoint>> aps; | ||
| aps.swap(pin->aps_[pin_access_idx]); | ||
| std::ranges::sort(aps); | ||
| aps.erase(std::ranges::unique(aps).begin(), aps.end()); |
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Jaehyun Kim <jhkim@precisioninno.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
| srcs = [ | ||
| "TestAccessPoint.cpp", | ||
| ], | ||
| features = ["-layering_check"], |
|
@codex review |
|
Why do we have duplicate APs in the first place? |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
osamahammad21
left a comment
There was a problem hiding this comment.
The only effective change of this PR is clearing the MPin's access points at the end of dbMPin::clearPinAccess
However, the other changes and the test case seem to be defensive, which I generally don't prefer.
dbInst::destroy() tried to remove AP back-references before ITerm deletion, but it used iterm->getAccessPoints(). That API derives APs from the instance's current pin-access index, so it can miss preferred APs stored in the ITerm aps_ map for other pin-access indices.
What do you mean by other pin-access indices? Each instance has one and only one pin access index, so its iterms.
Sort and de-duplicate the copied AP id list before destruction so duplicated AP bookkeeping cannot cause a double destroy.
There should never be duplicate APs. If there is, then this needs to be addressed.
| if (pin_access_idx < 0 | ||
| || pin->aps_.size() <= static_cast<std::size_t>(pin_access_idx)) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I wonder what made you do this change? It is correct, but I wonder if there is any call that uses a negative pin_access_idx which would be more concerning.
Also, I think the more correct approach to make pin_access_idx argument a size_t or unsigned int instead.
There was a problem hiding this comment.
nvm, just found it's ai review triggered.
| } | ||
| } | ||
| } | ||
| iterm->clearPrefAccessPoints(); |
There was a problem hiding this comment.
Although this is a more efficient approach, the outcome theoretically should be exactly the same. The preferred access point of an iterm is always a subset of its total access points which is returned by getAccessPoints
| dbVector<dbId<_dbAccessPoint>> aps; | ||
| aps.swap(pin->aps_[pin_access_idx]); | ||
| std::ranges::sort(aps); | ||
| const auto duplicate_aps = std::ranges::unique(aps); | ||
| aps.erase(duplicate_aps.begin(), duplicate_aps.end()); | ||
| for (const dbId<_dbAccessPoint>& ap : aps) { | ||
| odb::dbAccessPoint::destroy( | ||
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | ||
| } |
There was a problem hiding this comment.
Theoretically, there shouldn't be duplicate aps. This looks a bit defensive. I would prefer if such a case exists a crash rather than passing unnoticed.
| dbVector<dbId<_dbAccessPoint>> aps; | |
| aps.swap(pin->aps_[pin_access_idx]); | |
| std::ranges::sort(aps); | |
| const auto duplicate_aps = std::ranges::unique(aps); | |
| aps.erase(duplicate_aps.begin(), duplicate_aps.end()); | |
| for (const dbId<_dbAccessPoint>& ap : aps) { | |
| odb::dbAccessPoint::destroy( | |
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | |
| } | |
| auto& aps = pin->aps_[pin_access_idx]; | |
| for (const auto& ap : aps) { | |
| odb::dbAccessPoint::destroy( | |
| (odb::dbAccessPoint*) block->ap_tbl_->getPtr(ap)); | |
| } | |
| aps.clear(); |
@jhkim-pii Actually after checking the code, I found that erasing the mpin's aps list should be done by |
Summary
Problem
nangate45/gcdcrashed in5_1_grtwhilerepair_timingtriggered parasitic updates, incremental global routing, and DRT dirty pin-access database updates.FLOW_VARIANT=legacy_mtandRSZ_POLICY=legacy_mt. The same4_cts.odb/.sdcinput passed withRSZ_POLICY=legacy, which ruled out an already-corrupt input database.legacy_mtwassizeup,buffer, so the multi-threaded legacy repair flow exposed stale OpenDB access point bookkeeping during repeated ECO pin-access updates.Symptom:
Call stack:
Root-cause
OpenDB access point cleanup did not fully maintain the bidirectional references among MPin pin-access slots, ITerm preferred access points, and AccessPoint ITerm back-references.
dbMPin::clearPinAccess()destroyed thedbAccessPointobjects stored inpin->aps_[pin_access_idx], but it did not clear the MPin AP id vector for that pin-access slot. A later update of the same pin-access index could revisit stale AP ids and try to destroy already-destroyed access points.dbInst::destroy()tried to remove AP back-references before ITerm deletion, but it usediterm->getAccessPoints(). That API derives APs from the instance's current pin-access index, so it can miss preferred APs stored in the ITermaps_map for other pin-access indices._dbAccessPoint::iterms_. Later,dbAccessPoint::destroy()iterated_ap->iterms_and dereferenced an already-destroyed ITerm throughblock->iterm_tbl_->getPtr(), hitting the OpenDB allocation assert.Solution
dbMPin::clearPinAccess(), move the AP ids out ofpin->aps_[pin_access_idx]before destroying the AP objects. This leaves the MPin slot empty immediately, prevents stale id reuse on repeated cleanup, and avoids iterator invalidation whiledbAccessPoint::destroy()updates MPin bookkeeping.dbInst::destroy(), replace the current-pin-access-index cleanup withiterm->clearPrefAccessPoints(). That method walks the ITerm's preferred AP map directly and removes this ITerm id from every referenced AP back-reference list before the ITerm object is destroyed.TestAccessPointregression coverage for idempotent MPin pin-access cleanup, duplicate AP ids in an MPin pin-access slot, and stale AP back-reference removal when an instance is destroyed.Testing
cmake --build build --target TestAccessPoint -j4./build/src/odb/test/cpp/TestAccessPoint --gtest_filter='*clear_mpin_access_points_is_idempotent:*clear_mpin_access_points_removes_duplicate_references:*destroy_inst_removes_access_point_back_references:*test_default'bazel test //src/odb/test/cpp:TestAccessPointThe new stale back-reference test fails on
origin/masterHEADa00a5750166bbefore the fix and passes with this branch.