gh-150228: Improve the PEP 829 batch processing APIs (GH-150542)
* gh-150228: Improve the PEP 829 batch processing APIs
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends. We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.
This also moves the bulk of the module global functions into methods of the
`StartupState` class, so it removes the awkward APIs in 3.15b1. Now, instances
of this class are an accumulator for startup state, using `StartupState.process()`
to process them. Callers can now batch up startup state themselves by using
the methods on this class. The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.
This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue. Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.
Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.
* Add a note that if known_paths is provided to StartupState.__init__(), it
will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.
* Make test_impl_exec_imports_suppressed_by_matching_start() more robust
Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.
* As per review, move some methods to the private API
_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.
* Resolve several review feedbacks
* Move a `versionadded`
* Better list comprehension formatting (use the output from
`ruff format --line-length 78`)
* Add docs for site.makepath() and point the case-normalization requirement to
this utility function.
* Note that StartupState.process() is not idempotent.
* Address another feedback comment
This time, we get rid of the legacy implementation `reset` local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.
* Changes based on gh-150228 review
The comment by @encukou that started this change:
```
I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.
Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?
```
As it turns out, this is an even cleaner design. By moving the bulk of the
previous module global functions into `StartupState` methods, we can get rid
of all the awkward `startup_state` keyword-only arguments which conflict
with `known_path` (Petr's first point). We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods. We also generally don't have to
pass around `process_known_sitedirs`.
Now the following module global functions are essentially shims around
class methods:
* site.addsitedir() -> StartupState.addsitedir()
* site.addusersitepackages() -> StartupState.addusersitepackages()
* site.addsitepackages() -> StartupState.addsitepackages()
* Additional minor changes
* Remove a now unused parameter
(cherry picked from commit 27ebd9abce)
Co-authored-by: Barry Warsaw <barry@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
gh-149819: fix .pth and .start file processing in subprocess when inheriting PYTHONPATH (GH-150177)
* gh-149819: Fix .pth files not loaded in Python subprocesses
After PR gh-149583 (Fix double evaluation of .pth and .site files in
venvs), .pth files are no longer loaded in subprocesses started with
subprocess.run([sys.executable, ...]). The root cause: main() seeds
known_paths from removeduppaths() with all sys.path entries inherited
from the parent process. addsitedir() then skips .pth processing for
every directory already in known_paths.
Fix:
- main(): call removeduppaths() for dedup but start known_paths as a
fresh empty set, so that addsitedir() processes .pth files in every
site-packages directory regardless of inherited sys.path.
- addsitedir(): move known_paths.add() before the sys.path.append and
guard the append with 'sitedir not in sys.path' to avoid creating
duplicate entries when called with a fresh known_paths.
This preserves the gh-75723 dedup guarantee while allowing subprocesses
to load .pth files.
* Fill out the tests for GH#149888
* Extend _make_start() and _make_pth() to take an optional `basedir` which is used instead of
`site.tmpdir` if given.
* Add test_pth_processed_when_sitedir_already_on_path() to test the core GH#149819 bug: .pth files
in subprocesses aren't handled if PYTHONPATH pointing to the .pth directory is inherited.
* Similarly add test_start_processed_when_sitedir_already_on_path() to verify that .start files in
the same circumstances are also now processed.
* Update Lib/site.py
* Oops! Remove redundant code
---------
(cherry picked from commit 3c298e2e38)
Co-authored-by: Barry Warsaw <barry@python.org>
Co-authored-by: BugBounty Mind <bugbounty-mind@deepseek.tui>
Co-authored-by: scoder <stefan_ml@behnel.de>
* gh-149504: Fix re-entrancy bug when .pth/.start file invokes site.addsitedir() (#149659)
* Add re-entrant tests for gh-149504
* Add end-to-end integration test coverage
This ensures that future whitebox internal test changes do not regress the
public surface semantics.
* Implement a state class to process .pth and .start files
By using this state class and managing implicit and explicit batching, we make it structurally
impossible to get bitten by re-entrant site startup processing.
Fixes#149504
(cherry picked from commit b162307d7f)
* Add myself back to CODEOWNERS
On POSIX systems, excluding macOS framework installs, the lib directory
for the free-threaded build now includes a "t" suffix to avoid conflicts
with a co-located default build installation.
It can be used to set the location of a .python_history file
---------
Co-authored-by: Levi Sabah <0xl3vi@gmail.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Fix test_site.test_underpth_basic() when the working directory
contains at least one non-ASCII character: encode the "._pth" file to
UTF-8 and enable the UTF-8 Mode to use UTF-8 for the child process
stdout.
encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.
Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.
Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
The getpath.py file is frozen at build time and executed as code over a namespace. It is never imported, nor is it meant to be importable or reusable. However, it should be easier to read, modify, and patch than the previous code.
This commit attempts to preserve every previously tested quirk, but these may be changed in the future to better align platforms.
Add --with-platlibdir option to the configure script: name of the
platform-specific library directory, stored in the new sys.platlitdir
attribute. It is used to build the path of platform-specific dynamic
libraries and the path of the standard library.
It is equal to "lib" on most platforms. On Fedora and SuSE, it is
equal to "lib64" on 64-bit systems.
Co-Authored-By: Jan Matějek <jmatejek@suse.com>
Co-Authored-By: Matěj Cepl <mcepl@cepl.eu>
Co-Authored-By: Charalampos Stratakis <cstratak@redhat.com>
* posixpath.expanduser() now returns the input path unchanged if
the HOME environment variable is not set and pwd.getpwuid() raises
KeyError (the current user identifier doesn't exist in the password
database).
* Add test_no_home_directory() to test_site.