This got introduced in commit 5884449539
to determine if readline is already linked against curses or tinfo in
the setup.py, which is no longer present.
The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code).
The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md.
This is surely a work-in-progress. An easy next step could be auto-generating super-instructions.
**IMPORTANT: Merge Conflicts**
If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Remove the "configure --with-cxx-main" build option: it didn't work
for many years. Remove the MAINCC variable from configure and
Makefile.
The MAINCC variable was added by the issue gh-42471: commit
0f48d98b74. Previously, --with-cxx-main
was named --with-cxx.
Keep CXX and LDCXXSHARED variables, even if they are no longer used
by Python build system.
- check for ``dup()`` libc function
- handle missing ``F_DUPFD`` in ``dup2()`` replacement function
- add workaround for WASI libc bug in MSG_TRUNC
- ESHUTDOWN is missing, use EPIPE instead
- POLLPRI is missing, define as 0 (no-op)
All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.
"test" targets now correctly compile PGO build in a clean repo.
Python now always use the ``%zu`` and ``%zd`` printf formats to
format a size_t or Py_ssize_t number. Building Python 3.12 requires a
C11 compiler, so these printf formats are now always supported.
* PyObject_Print() and _PyObject_Dump() now use the printf %zd format
to display an object reference count.
* Update PY_FORMAT_SIZE_T comment.
* Remove outdated notes about the %zd format in PyBytes_FromFormat()
and PyUnicode_FromFormat() documentations.
* configure no longer checks for the %zd format and no longer defines
PY_FORMAT_SIZE_T macro in pyconfig.h.
* pymacconfig.h no longer undefines PY_FORMAT_SIZE_T: macOS 10.4 is
no longer supported. Python 3.12 now requires macOS 10.6 (Snow
Leopard) or newer.
At compile time, '+z' is already properly used with HP aCC, and shared
libraries are correctly linked with '+b'. The '-fPIC' switch can safely be
dropped.
This makes macOS gdbm provided by Homebrew not segfault through correct
selection of the linked library (-lgdbm_compat) *AND* the correct ndbm-style
header (gdbm-ndbm.h instead of the invalid ndbm.h).
* Move the code for generating Modules/_sre/sre_constants.h from
Lib/re/_constants.py into a separate script
Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.