cpython/Modules
Cody Maloney 2f5f19e783
gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755)
This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)

This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.

The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.

I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].

After the changes, this is everything in one `filename.read_text()`:

```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().

[0]

```python3
from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))
```

[1]
Before small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

After small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

Before large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

After large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2024-07-04 09:17:00 +02:00
..
_blake2 gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_ctypes gh-61103: Support float and long double complex types in ctypes module (#121248) 2024-07-03 11:08:11 +02:00
_decimal gh-119613: Use C99+ functions instead of Py_IS_NAN/INFINITY/FINITE (#119619) 2024-05-29 09:51:19 +02:00
_hacl gh-99108: Refresh HACL*; update modules accordingly; fix namespacing (GH-117237) 2024-03-26 00:35:26 +00:00
_io gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755) 2024-07-04 09:17:00 +02:00
_multiprocessing gh-117873: Revert _posixshmem.shm_open() change (#118901) 2024-05-13 16:03:52 +02:00
_sqlite Fixes loop variables to be the same types as their limit (GH-120958) 2024-06-24 17:11:47 +01:00
_sre gh-119344: Make critical section API public (#119353) 2024-06-21 15:50:18 -04:00
_ssl gh-111926: Make weakrefs thread-safe in free-threaded builds (#117168) 2024-04-08 10:58:38 -04:00
_testcapi gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
_testinternalcapi Fixes loop variables to be the same types as their limit (GH-120958) 2024-06-24 17:11:47 +01:00
_testlimitedcapi gh-114329: Fix PyList_GetItemRef() limited C API definition (#117520) 2024-04-03 21:02:42 +00:00
_xxtestfuzz gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
cjkcodecs gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
clinic gh-117657: Use critical section to make _socket.socket.close thread safe (GH-120490) 2024-07-01 16:38:30 +02:00
expat gh-116741: Upgrade libexpat to 2.6.2 (#117296) 2024-04-22 18:15:08 -07:00
_abc.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_asynciomodule.c gh-107803: fix thread safety issue in double linked list implementation (#121007) 2024-06-26 05:11:32 +00:00
_bisectmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_bz2module.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_codecsmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_collectionsmodule.c gh-117657: Fix data races reported by TSAN in some set methods (#120914) 2024-07-01 15:11:39 -04:00
_complex.h gh-61103: Support float and long double complex types in ctypes module (#121248) 2024-07-03 11:08:11 +02:00
_contextvarsmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_csv.c gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
_curses_panel.c gh-113565: Improve and harden detection of curses dependencies (#119816) 2024-07-01 08:10:03 +00:00
_cursesmodule.c gh-113565: Improve and harden detection of curses dependencies (#119816) 2024-07-01 08:10:03 +00:00
_datetimemodule.c gh-120713: Normalize year with century for datetime.strftime (GH-120820) 2024-06-29 09:32:42 +03:00
_dbmmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_elementtree.c gh-119577: Adjust DeprecationWarning when testing element truth values in ElementTree (GH-119762) 2024-06-06 20:18:30 -07:00
_functoolsmodule.c gh-121027: Make the functools.partial object a method descriptor (GH-121089) 2024-07-03 09:02:15 +03:00
_gdbmmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_hashopenssl.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_heapqmodule.c Remove almost all unpaired backticks in docstrings (#119231) 2024-05-22 12:35:18 -04:00
_interpchannelsmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_interpqueuesmodule.c gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
_interpreters_common.h gh-76785: Add More Tests to test_interpreters.test_api (gh-117662) 2024-04-10 18:37:01 -06:00
_interpretersmodule.c Use _PyLong_IsNegative instead of _PyLong_Sign if appropriate. (GH-120493) 2024-06-24 09:49:01 +03:00
_json.c gh-119613: Use C99+ functions instead of Py_IS_NAN/INFINITY/FINITE (#119619) 2024-05-29 09:51:19 +02:00
_localemodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_lsprof.c gh-118362: Fix thread safety around lookups from the type cache in the face of concurrent mutators (#118454) 2024-05-06 10:50:35 -07:00
_lzmamodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_math.h gh-101678: refactor the math module to use special functions from c11 (GH-101679) 2023-02-09 00:40:52 -08:00
_opcode.c gh-120642: Move private PyCode APIs to the internal C API (#120643) 2024-06-26 13:54:03 +02:00
_operator.c gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520) 2024-06-21 17:19:31 +02:00
_pickle.c gh-121137: Add missing Py_DECREF calls for ADDITEMS opcode of _pickle.c (#121136) 2024-06-28 14:43:45 -07:00
_posixsubprocess.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_queuemodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_randommodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_scproxy.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_ssl.c gh-117784: Only reference PHA functions ifndef SSL_VERIFY_POST_HANDSHAKE (GH-117785) 2024-07-01 15:28:35 +02:00
_ssl.h GH-103092: isolate _ssl (#104725) 2023-05-22 06:14:48 +05:30
_ssl_data_31.h gh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (#105174) 2023-06-01 09:42:18 -07:00
_ssl_data_111.h gh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (#105174) 2023-06-01 09:42:18 -07:00
_ssl_data_300.h gh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (#105174) 2023-06-01 09:42:18 -07:00
_stat.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_statisticsmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_struct.c gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
_suggestions.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_sysconfig.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_testbuffer.c gh-120674: Protect multi-line macros in _testbuffer.c and _testcapimodule.c (#120675) 2024-06-18 12:04:52 +00:00
_testcapi_feature_macros.inc gh-91325: Skip Stable ABI checks with Py_TRACE_REFS special build (GH-92046) 2024-01-29 16:45:31 +01:00
_testcapimodule.c gh-119344: Make critical section API public (#119353) 2024-06-21 15:50:18 -04:00
_testclinic.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
_testclinic_limited.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
_testexternalinspection.c bpo-115773: Use the right variable name based on the field we are trying read (#118591) 2024-05-07 14:50:41 +00:00
_testimportmultiple.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_testinternalcapi.c gh-120642: Move private PyCode APIs to the internal C API (#120643) 2024-06-26 13:54:03 +02:00
_testlimitedcapi.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
_testmultiphase.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
_testsinglephase.c gh-119584: Fix test_import Failed Assertion (gh-119623) 2024-05-27 19:35:30 +00:00
_threadmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_tkinter.c gh-119614: Fix truncation of strings with embedded null characters in Tkinter (GH-120909) 2024-06-24 12:17:25 +03:00
_tracemalloc.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
_typingmodule.c gh-118895: Call PyType_Ready() on typing.NoDefault (#118897) 2024-05-10 08:42:00 -07:00
_uuidmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_weakref.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
_winapi.c GH-73991: Add follow_symlinks argument to pathlib.Path.copy() (#120519) 2024-06-19 00:59:54 +00:00
_zoneinfo.c gh-120155: Fix Coverity issue in zoneinfo load_data() (#120232) 2024-06-10 11:54:35 +02:00
addrinfo.h gh-95174: WASI: skip missing sockets functions (GH-95179) 2022-07-27 08:19:23 +02:00
arraymodule.c gh-117557: Improve error messages when a string, bytes or bytearray of length 1 are expected (GH-117631) 2024-05-28 12:01:37 +03:00
atexitmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
binascii.c gh-118314: Fix padding edge case in binascii.a2b_base64 strict mode (GH-118320) 2024-05-07 11:18:45 +02:00
cmathmodule.c gh-119613: Use C99+ functions instead of Py_IS_NAN/INFINITY/FINITE (#119619) 2024-05-29 09:51:19 +02:00
config.c.in gh-104169: Fix test_peg_generator after tokenizer refactoring (#110727) 2023-10-12 09:34:35 +02:00
errnomodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
faulthandler.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
fcntlmodule.c gh-120296: Fix format string of fcntl.ioctl() audit (#120301) 2024-06-10 08:17:50 +00:00
gc_weakref.txt Fix links to old SF bugs (#95648) 2022-08-04 18:12:35 +02:00
gcmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
getaddrinfo.c gh-108767: Replace ctype.h functions with pyctype.h functions (#108772) 2023-09-01 18:36:53 +02:00
getbuildinfo.c gh-106320: Remove private pylifecycle.h functions (#106400) 2023-07-04 09:41:43 +00:00
getnameinfo.c gh-95174: WASI: skip missing sockets functions (GH-95179) 2022-07-27 08:19:23 +02:00
getpath.c gh-115136: Fix possible NULL deref in getpath_joinpath() (GH-115137) 2024-02-08 08:40:38 +00:00
getpath.py gh-117786: Fix venv created from Windows Store install by restoring __PYVENV_LAUNCHER__ smuggling (GH-117814) 2024-04-24 23:00:55 +01:00
getpath_noop.c bpo-45582: Port getpath[p].c to Python (GH-29041) 2021-12-03 00:08:42 +00:00
grpmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
hashlib.h gh-111863: Rename Py_NOGIL to Py_GIL_DISABLED (#111864) 2023-11-20 15:52:00 +02:00
itertoolsmodule.c gh-117657: Fix itertools.count thread safety (#119268) 2024-05-21 10:16:34 -07:00
ld_so_aix.in Issue #10656: Fix out-of-tree building on AIX 2016-11-20 07:56:37 +00:00
main.c gh-120346: Respect PYTHON_BASIC_REPL when running in interactive inspect mode (#120349) 2024-06-11 16:15:01 +00:00
makesetup gh-111225: Link extension modules against libpython on Android (#115780) 2024-02-21 23:18:57 +00:00
makexp_aix bpo-42087: Remove support for AIX 5.3 and below (GH-22830) 2020-11-16 16:16:10 +01:00
mathmodule.c gh-119613: Use C99+ functions instead of Py_IS_NAN/INFINITY/FINITE (#119619) 2024-05-29 09:51:19 +02:00
md5module.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
mmapmodule.c gh-118209: Add Windows structured exception handling to mmap module (GH-118213) 2024-05-10 10:47:30 +01:00
overlapped.c gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00
posixmodule.c gh-120586: Fix several "unused function" warnings in posixmodule.c (#120588) 2024-06-17 09:44:13 +03:00
posixmodule.h gh-85283: Convert grp extension to the limited C API (#116611) 2024-03-12 00:46:53 +00:00
pwdmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
pyexpat.c GH-119462: Enforce invariants of type versioning (GH-120731) 2024-06-19 17:38:45 +01:00
readline.c gh-116322: Rename PyModule_ExperimentalSetGIL to PyUnstable_Module_SetGIL (GH-118645) 2024-05-06 18:59:36 +02:00
README Issue #18093: Factor out the programs that embed the runtime 2014-07-25 21:52:14 +10:00
resource.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
rotatingtree.c gh-116181: Remove Py_BUILD_CORE_BUILTIN and Py_BUILD_CORE_MODULE in rotatingtree.c (#121260) 2024-07-03 13:05:05 +05:30
rotatingtree.h bpo-32150: Expand tabs to spaces in C files. (#4583) 2017-11-28 17:56:10 +02:00
selectmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
Setup gh-76785: Rename _xxsubinterpreters to _interpreters (gh-117791) 2024-04-24 16:18:24 +00:00
Setup.bootstrap.in gh-110721: Remove unused code from suggestions.c after moving PyErr_Display to use the traceback module (#113712) 2024-01-08 15:10:45 +00:00
Setup.stdlib.in gh-111997: C-API for signalling monitoring events (#116413) 2024-05-04 08:23:50 +00:00
sha1module.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
sha2module.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
sha3module.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
signalmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
socketmodule.c gh-117657: Use critical section to make _socket.socket.close thread safe (GH-120490) 2024-07-01 16:38:30 +02:00
socketmodule.h gh-110850: Cleanup pycore_time.h includes (#115724) 2024-02-20 16:50:43 +00:00
symtablemodule.c gh-119933: Improve `SyntaxError` message for invalid type parameters expressions (#119976) 2024-06-17 06:51:03 -07:00
syslogmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
termios.c gh-119770: Make termios ioctl() constants positive (#119840) 2024-05-31 17:18:40 +02:00
timemodule.c gh-121199: Use _Py__has_attribute() in timemodule.c (#121203) 2024-07-01 08:49:33 +00:00
tkappinit.c gh-103538: Remove unused TK_AQUA code (GH-103539) 2023-05-10 18:53:13 +00:00
tkinter.h gh-103532: Remove TKINTER_PROTECT_LOADTK code (GH-103535) 2023-04-14 09:04:16 -05:00
unicodedata.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
unicodedata_db.h gh-96954: Fix make regen-unicodedata in out-of-tree builds (#112118) 2023-11-15 16:42:17 +00:00
unicodename_db.h gh-96954: Fix make regen-unicodedata in out-of-tree builds (#112118) 2023-11-15 16:42:17 +00:00
winreparse.h bpo-31512: Add non-elevated symlink support for Windows (GH-3652) 2019-04-09 11:19:46 -07:00
xxlimited.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
xxlimited_35.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
xxmodule.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
xxsubtype.c gh-116322: Add Py_mod_gil module slot (#116882) 2024-05-03 11:30:55 -04:00
zlibmodule.c gh-121040: Use __attribute__((fallthrough)) (#121044) 2024-06-27 09:58:44 +00:00

Source files for standard library extension modules,
and former extension modules that are now builtin modules.