Skip to content

gh-149584: Do not use page cache for thread/frame/interp reads#149585

Closed
maurycy wants to merge 3 commits into
python:mainfrom
maurycy:remote-debugging-no-paged
Closed

gh-149584: Do not use page cache for thread/frame/interp reads#149585
maurycy wants to merge 3 commits into
python:mainfrom
maurycy:remote-debugging-no-paged

Conversation

@maurycy
Copy link
Copy Markdown
Contributor

@maurycy maurycy commented May 8, 2026

Please see gh-149584 for a very detailed investigation.

2026-05-09T01:22:51.167402000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (remote-debugging-no-paged 7606bb1*) % cat /tmp/busy.py 
x = 0
while True:
    x = (x + 1) % 1000003

Before:

2026-05-09T01:12:48.469595000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 57ef219*) % sudo -E ./python.exe -m profiling.sampling run -r 1000khz -d 15 --pstats -o /dev/null /tmp/busy.py
Captured 1,687,897 samples in 15.00 seconds
Sample rate: 112,526.41 samples/sec
Error rate: 0.00
Warning: missed 13312110 samples from the expected total of 15000007 (88.75%)

After:

[130] 2026-05-09T01:08:09.192039000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (remote-debugging-no-paged 7606bb1*) % sudo -E ./python.exe -m profiling.sampling run -r 1000khz -d 15 --pstats -o /dev/null /tmp/busy.py
Captured 2,657,910 samples in 15.00 seconds
Sample rate: 177,193.96 samples/sec
Error rate: 0.00
Warning: missed 12342093 samples from the expected total of 15000003 (82.28%)

Closes #149584

pablogsal added a commit to pablogsal/cpython that referenced this pull request May 10, 2026
Use exact remote reads for interpreter state, thread state, and
interpreter frame structs instead of pulling full remote pages into the
profiler page cache. This matches the core change from
python#149585.
@pablogsal
Copy link
Copy Markdown
Member

Closing in favour of #149649. I will add you to the news entry for attribution :)

@pablogsal pablogsal closed this May 10, 2026
pablogsal added a commit that referenced this pull request May 20, 2026
…he cache behavior (#149649)

Use exact remote reads for interpreter state, thread state, and
interpreter frame structs instead of pulling full remote pages into the
profiler page cache. This matches the core change from
#149585.

The profiler clears the page cache between samples, so live entries are
always packed at the front. Track the live count and only clear/search
that prefix instead of scanning all 1024 slots on the hot path.

Use the frame cache to predict the next thread state and top frame
address, then batch interpreter/thread/frame reads with process_vm_readv
when profiling a Linux target. Reuse prefetched frame buffers in the
frame walker when the prediction is valid.

Cache the last FrameInfo tuple per code object/instruction offset, reuse
cached thread id objects, and append cached parent frames directly on
full frame-cache hits. This cuts Python allocation churn in the
steady-state profiler path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_remote_debugging: reading whole pages over and over

2 participants