When running the `profiling.sampling` module in pstats mode, the output
can be emitted in two different ways: text to stdout or a binary file
when the `--output` argument is set.
The current documentation and help text is confusing as it does not
distinguish between these two output formats so it may be surprising to
the user to get different formats depending whether `--output` is set or not.
When using blocking mode in the remote debugging profiler, ptrace calls
to seize threads can fail with EPERM if the thread has exited between
listing and attaching, is in a special kernel state, or is already being
traced. Previously this raised a RuntimeError that was caught by the
Python sampling loop,and retried indefinitely since EPERM is
a persistent condition that will not resolve on its own.
Treat EPERM the same as ESRCH by returning 1 (skip this thread) instead
of -1 (fatal error). This allows profiling to continue with the threads
that can be traced rather than entering an endless retry loop printing
the same error message repeatedly.
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes:
- Add `base64_encode_trio()` and `base64_decode_quad()` helper functions that process complete groups independently
- Add `base64_encode_fast()` and `base64_decode_fast()` wrappers
- Update `b2a_base64` and `a2b_base64` to use fast path for complete groups
Performance gains (encode/decode speedup vs main, PGO builds):
```
64 bytes 64K 1M
Zen2: 1.2x/1.8x 1.7x/2.8x 1.5x/2.8x
Zen4: 1.2x/1.7x 1.6x/3.0x 1.5x/3.0x [old data, likely faster]
M4: 1.3x/1.9x 2.3x/2.8x 2.4x/2.9x [old data, likely faster]
RPi5-32: 1.2x/1.2x 2.4x/2.4x 2.0x/2.1x
```
Based on my exploratory work done in https://github.com/python/cpython/compare/main...gpshead:cpython:claude/vectorize-base64-c-S7Hku
See PR and issue for further thoughts on sometimes MUCH faster SIMD vectorized versions of this.
Remove documentation for inexistant `concurrent.futures.interpreter.ExecutionFailed`
and replace its occurrences by `concurrent.interpreters.ExecutionFailed` since this
is the documented exception.
If you are building with `--with-thread-sanitizer` and don't use the
suppression file, then running configure will report a thread leak.
Call `pthread_join()` to avoid the thread leak.
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Remove untrue part of `__import__` replacement docs
The original statement effectively says that replacing `__import__` at global scope affects import statements, and not only that, but only import statements within the rest of the executing module. None of that has been true since at least Python 2.7, I think.
This was likely missed in python/cpython#69686.