gh-132372: Speed up logging.config existing logger handling#150242
gh-132372: Speed up logging.config existing logger handling#150242esadomer wants to merge 4 commits into
Conversation
|
The performance increase is moderate, but the PR reduces code duplication so I am +1. @esadomer On your benchmark I could get a much larger speedud with this small commit 2b9ce22 (on top of your work). I have not fully tested this, but the speedup is > 5x. If you are interested you could test it further and pull into your branch. |
d599b2e to
c1a3122
Compare
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
c1a3122 to
02cf98f
Compare
|
I would appreciate the numbers for n=1,2,3. It's not realistic to assume that there are 500 loggers. I have honestly never seen that in production. But the bisect solution is something I would have used as well so I'm also fine with it. |
| if not child.startswith(prefixed): | ||
| break | ||
| if child in existing_set: | ||
| child_loggers.add(child) |
There was a problem hiding this comment.
Please use a dict here instaed of a set to retain the insertion time.
| prefixed = name + "." | ||
| i = bisect_left(existing, prefixed) | ||
| num_existing = len(existing) | ||
| while i < num_existing: |
There was a problem hiding this comment.
An alternative here would be to use islice(existing, i) which would avoid using i+=1 but this adds an import to itertools.
|
|
||
| def _discard_existing_logger(name, existing, existing_set, child_loggers): | ||
| """Discard a configured logger and record its existing children.""" | ||
| if name in existing_set: |
There was a problem hiding this comment.
The function is named discard_existing_logger but we have a check "if name in existing_set", which does not make sense semantically.
It also updates child_loggers so it does 2 things. I don't have a better name though but I would prefer that you remove the "if name in existing_set" check and make it the responsibility of the caller instead. It will slightly reduce the diff and the indentation of the helper (and you save 1 function call when name is not existing)
|
Thanks, I updated the branch to address the inline comments:
I also added the small Validation after the change:
|
|
Can you try a non-debug build as well? it looks like you're using a DEBUG build for the benchmarks. Having a release (PGO+LTO) build may help in removing noise in general (and show clear improvements) |
| if not isinstance(logger, logging.PlaceHolder): | ||
| logger.setLevel(logging.NOTSET) | ||
| # Equivalent to setLevel(NOTSET), but clear the cache once. | ||
| logger.level = logging.NOTSET |
There was a problem hiding this comment.
Is this change really necessary? while setLevel would clear the cache multiple times, I would prefer that we pass through logger.setLevel public API instead of changing .level directly, just in case someone has a logger subclass that does something else with setLevel.
| if cache_clear_needed: | ||
| root.manager._clear_cache() | ||
|
|
||
| def _discard_existing_logger(name, existing, existing_set, child_loggers): |
There was a problem hiding this comment.
| def _discard_existing_logger(name, existing, existing_set, child_loggers): | |
| def _forget_existing_logger(name, existing, existing_set, child_loggers): |
Maybe something like that instead (since we also update child_loggers)
|
Thanks, good catch. I pushed I also updated the PR body with fresh numbers after that change. To clarify the benchmark setup: the I tried to produce a local CPython Windows Release/PGO build too, but this machine fails before creating |
|
CI note: two jobs failed, both look unrelated to this
I tried to rerun the failed jobs, but I do not have Actions rerun permission on |
|
Ok, release build is still enough. Someone else can do the benchmarks on Linux PGO+LTO later |
|
As for GH failures, it's known (I think GH is unstable currently so don't worry) |
gh-132372
This speeds up
logging.config.fileConfig()andlogging.config.dictConfig()when they need to handle many existing loggers whiledisable_existing_loggersis active.The old code kept the existing logger names in a sorted list, but still used repeated list membership,
index(), tail scans, andremove()calls for every configured logger. This keeps the sorted list for child logger ordering, uses a set for membership/removal tracking, and usesbisect_left()to jump directly to the dotted child logger range.Validation:
PCbuild\amd64\python_d.exe -m test test_logging -j0PCbuild\amd64\python_d.exe Tools\patchcheck\patchcheck.pygit diff --checkBenchmark:
Loaded
upstream/main:Lib/logging/config.pyand the patchedLib/logging/config.pyinto the same Python 3.12.8 release interpreter, then timed only thedictConfig()call while resetting identical existing logger state outside the timer.Interpreter details:
Small
nresults are noisy and effectively neutral:Larger logger sets benefit more clearly: