The dataclasses `__init__` function is generated dynamically by a call to `exec()` and so doesn't have deferred reference counting enabled. Enable deferred reference counting on functions when assigned as an attribute to type objects to avoid reference count contention when creating dataclass instances.
Adds `_PyObject_GetMethodStackRef` which uses stackrefs and takes advantage of deferred reference counting in free-threading while calling method objects in vectorcall.
Add a separate benchmark that measures the effect of
`_PyObject_LookupSpecial()` on scaling.
In the process of cleaning up the scaling benchmarks for inclusion, I
unintentionally changed the "cmodule_function" benchmark to pass an
`int` to `math.floor()` instead of a `float`, which causes it to use the
`_PyObject_LookupSpecial()` code path. `_PyObject_LookupSpecial()` has
its own scaling issues that we want to measure separately from calling a
function on a C module.
These consist of a number of short snippets that help identify scaling
bottlenecks in the free threaded interpreter.
The current bottlenecks are in calling functions in benchmarks that call
functions (due to `LOAD_ATTR` not yet using deferred reference counting)
and when accessing thread-local data.