Skip to content

DAOS-18910 cart: preserve UCX provider across re-init#18195

Draft
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18910-2.6
Draft

DAOS-18910 cart: preserve UCX provider across re-init#18195
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18910-2.6

Conversation

@wangshilong
Copy link
Copy Markdown
Contributor

Client re-initialization could fail when the attach info used a UCX provider such as ucx+dc_x. The first daos_init() succeeded, but after daos_fini() a second daos_init() could return DER_NONEXIST because CaRT no longer recognized the UCX provider string.

The problem was that UCX provider handling depended on mutating the global provider dictionary at init time and restoring it at finalize time. That made re-init fragile and also left provider parsing inconsistent between init and URI parsing paths.

Fix this by treating ucx+* strings as CRT_PROV_UCX without rewriting the global provider table, and by storing the concrete provider string in per-provider runtime config so later URI generation and logging keep the original UCX transport name.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Ticket title is 'daos_test/suite.py:DaosCoreTest.test_daos_pool - POOL9 MGMT_POOL_FIND rpc failed to 6 ranks: DER_NONEXIST(-1005)'
Status is 'Open'
Labels: '2.6.5rc1,2.6.5rc2,pr_test,scrubbed_2.8,test_2.6.5rc1'
https://daosio.atlassian.net/browse/DAOS-18910

Client re-initialization could fail when the attach info used a UCX
provider such as ucx+dc_x. The first daos_init() succeeded, but after
daos_fini() a second daos_init() could return DER_NONEXIST because CaRT
no longer recognized the UCX provider string.

The problem was that UCX provider handling depended on mutating the
global provider dictionary at init time and restoring it at finalize
time. That made re-init fragile and also left provider parsing
inconsistent between init and URI parsing paths.

Fix this by treating ucx+* strings as CRT_PROV_UCX without rewriting the
global provider table, and by storing the concrete provider string in
per-provider runtime config so later URI generation and logging keep the
original UCX transport name.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@wangshilong wangshilong force-pushed the shilongw/DAOS-18910-2.6 branch from fc9749c to c676850 Compare May 7, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant