Now that we specialize range iteration in the interpreter for the common
case where the iterator has only one reference, there's not a
significant performance cost to making the iteration thread-safe.
This roughly follows what was done for dictobject to make a lock-free
lookup operation. With this change, the set contains operation scales much
better when used from multiple-threads. The frozenset contains performance
seems unchanged (as already lock-free).
Summary of changes:
* refactor set_lookkey() into set_do_lookup() which now takes a function
pointer that does the entry comparison. This is similar to dictobject and
do_lookup(). In an optimized build, the comparison function is inlined and
there should be no performance cost to this.
* change set_do_lookup() to return a status separately from the entry value
* add set_compare_frozenset() and use if the object is a frozenset. For the
free-threaded build, this avoids some overhead (locking, atomic operations,
incref/decref on key)
* use FT_ATOMIC_* macros as needed for atomic loads and stores
* use a deferred free on the set table array, if shared (only on free-threaded
build, normal build always does an immediate free)
* for free-threaded build, use explicit for loop to zero the table, rather than memcpy()
* when mutating the set, assign so->table to NULL while the change is a
happening. Assign the real table array after the change is done.
Update `bytearray` to contain a `bytes` and provide a zero-copy path to
"extract" the `bytes`. This allows making several code paths more efficient.
This does not move any codepaths to make use of this new API. The documentation
changes include common code patterns which can be made more efficient with
this API.
---
When just changing `bytearray` to contain `bytes` I ran pyperformance on a
`--with-lto --enable-optimizations --with-static-libpython` build and don't see
any major speedups or slowdowns with this; all seems to be in the noise of
my machine (Generally changes under 5% or benchmarks that don't touch
bytes/bytearray).
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Maurycy Pawłowski-Wieroński <5383+maurycy@users.noreply.github.com>
This makes more operations on frame objects thread-safe in the free
threaded build, which fixes some data races that occurred when passing
exceptions between threads.
However, accessing local variables from another thread while its running
is still not thread-safe and may crash the interpreter.
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.
With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.
Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.
Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.
Also add a thorough test that the implementation agrees with this definition.
Author: Greg Price <gnprice@gmail.com>
Co-authored-by: Greg Price <gnprice@gmail.com>
The `dict.get` implementation uses `_Py_dict_lookup_threadsafe`, which is
thread-safe, so we remove the critical section from the argument clinic.
Add a test for concurrent dict get and set operations.
Move creation of a tuple for var-positional parameter out of
_PyArg_UnpackKeywordsWithVararg().
Merge _PyArg_UnpackKeywordsWithVararg() with _PyArg_UnpackKeywords().
Add a new parameter in _PyArg_UnpackKeywords().
The "parameters" and "converters" attributes of ParseArgsCodeGen no
longer contain the var-positional parameter. It is now available as the
"varpos" attribute. Optimize code generation for var-positional
parameter and reuse the same generating code for functions with and without
keyword parameters.
Add special converters for var-positional parameter. "tuple" represents it as
a Python tuple and "array" represents it as a continuous array of PyObject*.
"object" is a temporary alias of "tuple".
Avoid temporary tuple creation when all arguments either positional-only
or vararg.
Objects/setobject.c and Modules/gcmodule.c adapted. This fixes slight
performance regression for set methods, introduced by gh-115112.
They are alternate constructors which only accept numbers
(including objects with special methods __float__, __complex__
and __index__), but not strings.
* Remove the equivalence with real+imag*1j which can be incorrect in corner
cases (non-finite numbers, the sign of zeroes).
* Separately document the three roles of the constructor: parsing a string,
converting a number, and constructing a complex from components.
* Document positional-only parameters of complex(), float(), int() and bool()
as positional-only.
* Add examples for complex() and int().
* Specify the grammar of the string for complex().
* Improve the grammar of the string for float().
* Describe more explicitly the behavior when real and/or imag arguments are
complex numbers. (This will be deprecated in future.)
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used. The following bytes and bytearray methods are adapted:
- count()
- find()
- index()
- rfind()
- rindex()
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used. The following methods are adapted:
- str.count
- str.find
- str.index
- str.rfind
- str.rindex
This makes nearly all the operations on set thread-safe in the free-threaded build, with the exception of `_PySet_NextEntry` and `setiter_iternext`.
Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Starts adding thread safety to dict objects.
Use @critical_section for APIs which are exposed via argument clinic and don't directly correlate with a public C API which needs to acquire the lock
Use a _lock_held suffix for keeping changes to complicated functions simple and just wrapping them with a critical section
Acquire and release the lock in an existing function where it won't be overly disruptive to the existing logic