2024-09-25 14:31:53 -07:00
|
|
|
|
.. _freethreading-python-howto:
|
|
|
|
|
|
|
2025-06-16 23:32:52 +09:00
|
|
|
|
*********************************
|
|
|
|
|
|
Python support for free threading
|
|
|
|
|
|
*********************************
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
2025-06-16 23:32:52 +09:00
|
|
|
|
Starting with the 3.13 release, CPython has support for a build of
|
2024-09-25 14:31:53 -07:00
|
|
|
|
Python called :term:`free threading` where the :term:`global interpreter lock`
|
|
|
|
|
|
(GIL) is disabled. Free-threaded execution allows for full utilization of the
|
|
|
|
|
|
available processing power by running threads in parallel on available CPU cores.
|
|
|
|
|
|
While not all software will benefit from this automatically, programs
|
|
|
|
|
|
designed with threading in mind will run faster on multi-core hardware.
|
|
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
Some third-party packages, in particular ones
|
2025-06-16 23:32:52 +09:00
|
|
|
|
with an :term:`extension module`, may not be ready for use in a
|
|
|
|
|
|
free-threaded build, and will re-enable the :term:`GIL`.
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
|
|
|
|
|
This document describes the implications of free threading
|
|
|
|
|
|
for Python code. See :ref:`freethreading-extensions-howto` for information on
|
|
|
|
|
|
how to write C extensions that support the free-threaded build.
|
|
|
|
|
|
|
|
|
|
|
|
.. seealso::
|
|
|
|
|
|
|
|
|
|
|
|
:pep:`703` – Making the Global Interpreter Lock Optional in CPython for an
|
|
|
|
|
|
overall description of free-threaded Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Installation
|
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
|
|
Starting with Python 3.13, the official macOS and Windows installers
|
|
|
|
|
|
optionally support installing free-threaded Python binaries. The installers
|
|
|
|
|
|
are available at https://www.python.org/downloads/.
|
|
|
|
|
|
|
|
|
|
|
|
For information on other platforms, see the `Installing a Free-Threaded Python
|
2025-05-23 01:37:20 +01:00
|
|
|
|
<https://py-free-threading.github.io/installing-cpython/>`_, a
|
2024-09-25 14:31:53 -07:00
|
|
|
|
community-maintained installation guide for installing free-threaded Python.
|
|
|
|
|
|
|
|
|
|
|
|
When building CPython from source, the :option:`--disable-gil` configure option
|
|
|
|
|
|
should be used to build a free-threaded Python interpreter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Identifying free-threaded Python
|
|
|
|
|
|
================================
|
|
|
|
|
|
|
|
|
|
|
|
To check if the current interpreter supports free-threading, :option:`python -VV <-V>`
|
2025-06-16 23:32:52 +09:00
|
|
|
|
and :data:`sys.version` contain "free-threading build".
|
2024-09-25 14:31:53 -07:00
|
|
|
|
The new :func:`sys._is_gil_enabled` function can be used to check whether
|
|
|
|
|
|
the GIL is actually disabled in the running process.
|
|
|
|
|
|
|
|
|
|
|
|
The ``sysconfig.get_config_var("Py_GIL_DISABLED")`` configuration variable can
|
|
|
|
|
|
be used to determine whether the build supports free threading. If the variable
|
|
|
|
|
|
is set to ``1``, then the build supports free threading. This is the recommended
|
|
|
|
|
|
mechanism for decisions related to the build configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The global interpreter lock in free-threaded Python
|
|
|
|
|
|
===================================================
|
|
|
|
|
|
|
|
|
|
|
|
Free-threaded builds of CPython support optionally running with the GIL enabled
|
|
|
|
|
|
at runtime using the environment variable :envvar:`PYTHON_GIL` or
|
|
|
|
|
|
the command-line option :option:`-X gil`.
|
|
|
|
|
|
|
|
|
|
|
|
The GIL may also automatically be enabled when importing a C-API extension
|
|
|
|
|
|
module that is not explicitly marked as supporting free threading. A warning
|
|
|
|
|
|
will be printed in this case.
|
|
|
|
|
|
|
|
|
|
|
|
In addition to individual package documentation, the following websites track
|
|
|
|
|
|
the status of popular packages support for free threading:
|
|
|
|
|
|
|
|
|
|
|
|
* https://py-free-threading.github.io/tracking/
|
|
|
|
|
|
* https://hugovk.github.io/free-threaded-wheels/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thread safety
|
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
|
|
The free-threaded build of CPython aims to provide similar thread-safety
|
|
|
|
|
|
behavior at the Python level to the default GIL-enabled build. Built-in
|
|
|
|
|
|
types like :class:`dict`, :class:`list`, and :class:`set` use internal locks
|
|
|
|
|
|
to protect against concurrent modifications in ways that behave similarly to
|
|
|
|
|
|
the GIL. However, Python has not historically guaranteed specific behavior for
|
|
|
|
|
|
concurrent modifications to these built-in types, so this should be treated
|
|
|
|
|
|
as a description of the current implementation, not a guarantee of current or
|
|
|
|
|
|
future behavior.
|
|
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
|
|
It's recommended to use the :class:`threading.Lock` or other synchronization
|
|
|
|
|
|
primitives instead of relying on the internal locks of built-in types, when
|
|
|
|
|
|
possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Known limitations
|
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
|
|
This section describes known limitations of the free-threaded CPython build.
|
|
|
|
|
|
|
|
|
|
|
|
Immortalization
|
|
|
|
|
|
---------------
|
|
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
In the free-threaded build, some objects are :term:`immortal`.
|
2024-09-25 14:31:53 -07:00
|
|
|
|
Immortal objects are not deallocated and have reference counts that are
|
|
|
|
|
|
never modified. This is done to avoid reference count contention that would
|
|
|
|
|
|
prevent efficient multi-threaded scaling.
|
|
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
As of the 3.14 release, immortalization is limited to:
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
* Code constants: numeric literals, string literals, and tuple literals
|
|
|
|
|
|
composed of other constants.
|
|
|
|
|
|
* Strings interned by :func:`sys.intern`.
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Frame objects
|
|
|
|
|
|
-------------
|
|
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
It is not safe to access :attr:`frame.f_locals` from a :ref:`frame <frame-objects>`
|
|
|
|
|
|
object if that frame is currently executing in another thread, and doing so may
|
|
|
|
|
|
crash the interpreter.
|
|
|
|
|
|
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
|
|
|
|
|
Iterators
|
|
|
|
|
|
---------
|
|
|
|
|
|
|
2025-12-02 11:23:12 +05:30
|
|
|
|
It is generally not thread-safe to access the same iterator object from
|
|
|
|
|
|
multiple threads concurrently, and threads may see duplicate or missing
|
|
|
|
|
|
elements.
|
2024-09-25 14:31:53 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Single-threaded performance
|
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
|
|
|
|
The free-threaded build has additional overhead when executing Python code
|
2025-12-02 11:23:12 +05:30
|
|
|
|
compared to the default GIL-enabled build. The amount of overhead depends
|
|
|
|
|
|
on the workload and hardware. On the pyperformance benchmark suite, the
|
|
|
|
|
|
average overhead ranges from about 1% on macOS aarch64 to 8% on x86-64 Linux
|
|
|
|
|
|
systems.
|
2025-04-09 16:18:54 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Behavioral changes
|
|
|
|
|
|
==================
|
|
|
|
|
|
|
|
|
|
|
|
This section describes CPython behavioural changes with the free-threaded
|
|
|
|
|
|
build.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Context variables
|
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
|
|
In the free-threaded build, the flag :data:`~sys.flags.thread_inherit_context`
|
|
|
|
|
|
is set to true by default which causes threads created with
|
|
|
|
|
|
:class:`threading.Thread` to start with a copy of the
|
|
|
|
|
|
:class:`~contextvars.Context()` of the caller of
|
|
|
|
|
|
:meth:`~threading.Thread.start`. In the default GIL-enabled build, the flag
|
|
|
|
|
|
defaults to false so threads start with an
|
|
|
|
|
|
empty :class:`~contextvars.Context()`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Warning filters
|
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
|
|
In the free-threaded build, the flag :data:`~sys.flags.context_aware_warnings`
|
|
|
|
|
|
is set to true by default. In the default GIL-enabled build, the flag defaults
|
|
|
|
|
|
to false. If the flag is true then the :class:`warnings.catch_warnings`
|
|
|
|
|
|
context manager uses a context variable for warning filters. If the flag is
|
|
|
|
|
|
false then :class:`~warnings.catch_warnings` modifies the global filters list,
|
|
|
|
|
|
which is not thread-safe. See the :mod:`warnings` module for more details.
|
2026-05-30 12:49:26 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Increased memory usage
|
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
|
|
The free-threaded build will typically use more memory compared to the default
|
|
|
|
|
|
build. There are multiple reasons for this, mostly due to design decisions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All interned strings are immortal
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
For modern Python versions (since version 2.3), interning a string (e.g. with
|
|
|
|
|
|
:func:`sys.intern`) does not cause it to become immortal. Instead, if the last
|
|
|
|
|
|
reference to that string disappears, it will be removed from the interned
|
|
|
|
|
|
string table. This is not the case for the free-threaded build and any interned
|
|
|
|
|
|
string will become immortal, surviving until interpreter shutdown.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Non-GC objects have a larger object header
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
The free-threaded build uses a different :c:type:`PyObject` structure. Instead
|
|
|
|
|
|
of having the GC related information allocated before the :c:type:`PyObject`
|
|
|
|
|
|
structure, like in the default build, the GC related info is part of the normal
|
|
|
|
|
|
object header. For example, on the AMD64 platform, ``None`` uses 32 bytes on
|
|
|
|
|
|
the free-threaded build vs 16 bytes for the default build. GC objects (such as
|
|
|
|
|
|
dicts and lists) are the same size for both builds since the free-threaded
|
|
|
|
|
|
build does not use additional space for the GC info.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
QSBR can delay freeing of memory
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
In order to safely implement lock-free data structures, a safe memory
|
|
|
|
|
|
reclamation (SMR) scheme is used, known as quiescent state-based reclamation
|
|
|
|
|
|
(QSBR). This means that the memory backing data structures allowing lock-free
|
|
|
|
|
|
access will use QSBR, which defers the free operation, rather than immediately
|
|
|
|
|
|
freeing the memory. Two examples of these data structures are the list object
|
|
|
|
|
|
and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython
|
|
|
|
|
|
source tree for more details on how QSBR is implemented. Running
|
|
|
|
|
|
:func:`gc.collect` should cause all memory being held by QSBR to be actually
|
|
|
|
|
|
freed. Note that even when QSBR frees the memory, the underlying memory
|
|
|
|
|
|
allocator may not immediately return that memory to the OS and so the resident
|
|
|
|
|
|
set size (RSS) of the process might not decrease.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
mimalloc allocator vs pymalloc
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
The default build will normally use the "pymalloc" memory allocator for small
|
|
|
|
|
|
allocations (512 bytes or smaller). The free-threaded build does not use
|
|
|
|
|
|
pymalloc and allocates all Python objects using the "mimalloc" allocator. The
|
|
|
|
|
|
pymalloc allocator has the following properties that help keep memory usage
|
|
|
|
|
|
low: small per-allocated-block overhead, effective memory fragmentation
|
|
|
|
|
|
prevention, and quick return of free memory to the operating system. The
|
|
|
|
|
|
mimalloc allocator does quite well in these respects as well but can have some
|
|
|
|
|
|
more overhead.
|
|
|
|
|
|
|
|
|
|
|
|
In the free-threaded build, mimalloc manages memory in a number of separate
|
|
|
|
|
|
heaps (currently four). For example, all GC supporting objects are allocated
|
|
|
|
|
|
from their own heap. Using separate heaps means that free memory in one heap
|
|
|
|
|
|
cannot be used for an allocation that uses another heap. Also, some heaps are
|
|
|
|
|
|
configured to use QSBR (quiescent-state based reclamation) when freeing the
|
|
|
|
|
|
memory that backs up the heap (known as "pages" in mimalloc terminology). The
|
|
|
|
|
|
use of QSBR creates a delay between all memory blocks for a page being freed
|
|
|
|
|
|
and the memory page being released, either for new allocations or back to the
|
|
|
|
|
|
OS.
|
|
|
|
|
|
|
|
|
|
|
|
The mimalloc allocator also defers returning freed memory back to the OS. You
|
|
|
|
|
|
can reduce that delay by setting the environment variable
|
|
|
|
|
|
:envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce
|
|
|
|
|
|
the performance of the allocator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Free-threaded reference counting can cause objects to live longer
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
In the default build, when an object's reference count reaches zero, it is
|
|
|
|
|
|
normally deallocated. The free-threaded build uses "biased reference
|
|
|
|
|
|
counting", with a fast-path for objects "owned" by the current thread and a
|
|
|
|
|
|
slow path for other objects. See :pep:`703` for additional details. Any time
|
|
|
|
|
|
an object's reference count ends up in a "queued" state, deallocation can be
|
|
|
|
|
|
deferred. The queued state is cleared from the "eval breaker" section of the
|
|
|
|
|
|
bytecode evaluator.
|
|
|
|
|
|
|
|
|
|
|
|
The free-threaded build also allows a different mode of reference counting,
|
|
|
|
|
|
known as "deferred reference counting". This mode is enabled by setting a flag
|
|
|
|
|
|
on a per-object basis. Deferred reference counting is enabled for the
|
|
|
|
|
|
following types:
|
|
|
|
|
|
|
|
|
|
|
|
* module objects
|
|
|
|
|
|
* module top-level functions
|
|
|
|
|
|
* class methods defined in the class scope
|
|
|
|
|
|
* descriptor objects
|
|
|
|
|
|
* thread-local objects, created by :class:`threading.local`
|
|
|
|
|
|
|
|
|
|
|
|
When deferred reference counting is enabled, references from Python function
|
|
|
|
|
|
stacks are not added to the reference count. This scheme reduces the overhead
|
|
|
|
|
|
of reference counting, especially for objects used from multiple threads.
|
|
|
|
|
|
Because the stack references are not counted, objects with deferred reference
|
|
|
|
|
|
counting are not immediately freed when their internal reference count goes to
|
|
|
|
|
|
zero. Instead, they are examined by the next GC run and, if no stack
|
|
|
|
|
|
references to them are found, they are freed. This means these objects are
|
|
|
|
|
|
freed by the GC and not when their reference count goes to zero, as is typical.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Per-thread reference counting can delay freeing objects
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
To avoid contention on the reference count fields of frequently shared
|
|
|
|
|
|
objects, the free-threaded build also uses "per-thread reference counting"
|
|
|
|
|
|
for a few selected object types. Rather than updating a single shared
|
|
|
|
|
|
reference count, each thread maintains its own local reference count array,
|
|
|
|
|
|
indexed by a unique id assigned to the object. The true reference count is
|
|
|
|
|
|
only computed by summing the per-thread counts when the object's local
|
|
|
|
|
|
count drops to zero. Per-thread reference counting is currently used for:
|
|
|
|
|
|
|
|
|
|
|
|
* heap type objects (classes created in Python)
|
|
|
|
|
|
* code objects
|
|
|
|
|
|
* the ``__dict__`` of module objects
|
|
|
|
|
|
|
|
|
|
|
|
Because the per-thread counts must be merged back to the object before it
|
|
|
|
|
|
can be deallocated, objects using per-thread reference counting are
|
|
|
|
|
|
typically freed later than they would be in the default build. In
|
|
|
|
|
|
particular, such an object is usually not freed until the thread that
|
|
|
|
|
|
referenced it reaches a safe point (for example, in the "eval breaker"
|
|
|
|
|
|
section of the bytecode evaluator) or exits. Running :func:`gc.collect`
|
|
|
|
|
|
will merge the per-thread counts and allow these objects to be freed.
|