Commit graph

1467 commits

Author SHA1 Message Date
Victor Stinner
3d4226a832
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.

Changes:

* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
  surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
  the _Py_error_handler enum instead of "int surrogateescape" to pass
  the error handler. These functions now return -3 if the error
  handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
  in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
  as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
  workaround anymore.
2018-08-29 22:21:32 +02:00
Victor Stinner
b2457efc78
bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().

_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.

Changes:

* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
  system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
  now use the interpreter configuration rather than
  Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
  global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
  private functions to only modify Py_FileSystemDefaultEncoding and
  Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
  _PyCoreConfig for the warning.
2018-08-29 13:25:36 +02:00
Alexey Izbyshev
74a307d48e bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)
Reported by Svace static analyzer.
2018-08-19 21:52:04 +03:00
Zackery Spytz
e349bf2358 bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (GH-8741)
The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
2018-08-19 07:43:38 +03:00
Victor Stinner
caba55b3b7
bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)
sys_setcheckinterval() now uses a local variable to parse arguments,
before writing into interp->check_interval.
2018-08-03 15:33:52 +02:00
INADA Naoki
16dfca4d82
bpo-34087: Fix buffer overflow in int(s) and similar functions (GH-8274)
`_PyUnicode_TransformDecimalAndSpaceToASCII()` missed trailing NUL char.
It caused buffer overflow in `_Py_string_to_number_with_underscores()`.

This bug is introduced in 9b6c60cb.
2018-07-14 12:06:43 +09:00
Bup
fc93bd467e Change tp_size to tp_basicsize in comment and realign the comments (GH-6775) 2018-06-19 16:59:55 +08:00
Siddhesh Poyarekar
55edd0c185 bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. (GH-6030)
METH_NOARGS functions need only a single argument but they are cast
into a PyCFunction, which takes two arguments.  This triggers an
invalid function cast warning in gcc8 due to the argument mismatch.
Fix this by adding a dummy unused argument.
2018-04-29 21:59:33 +03:00
Xiang Zhang
2b77a921e6
bpo-29803: remove a redandunt op and fix a comment in unicodeobject.c (#660) 2018-02-13 18:33:32 +08:00
Serhiy Storchaka
b7e2d67f7c
bpo-32827: Fix usage of _PyUnicodeWriter_Prepare() in decoding errors handler. (GH-5636) 2018-02-13 08:27:33 +02:00
oldk
aa0735f597 bpo-32747: Remove trailing spaces in docstrings. (GH-5491) 2018-02-02 10:52:55 +02:00
Xiang Zhang
2c7fd46e11
bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)
When using customized decode error handlers, it is possible for builtin decoders
to write out-of-bounds and then crash.
2018-01-31 20:48:05 +08:00
INADA Naoki
7cc95f5069
Fix wrong assert in unicodeobject (GH-5340) 2018-01-28 02:07:09 +09:00
INADA Naoki
a49ac99029
bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342) 2018-01-27 14:06:21 +09:00
Victor Stinner
7ed7aead95
bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.

Changes:

* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
  encoding error, they return the position of the error and an error
  message which are used to raise Unicode errors in
  PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
  cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
  and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
  and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
  and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
  the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
  _PyUnicode_DecodeCurrentLocaleAndSize() and
  _PyUnicode_EncodeCurrentLocale().
2018-01-15 10:45:49 +01:00
Victor Stinner
cb3ae5588b
bpo-29240: Ignore UTF-8 Mode in time module (#5148)
time.strftime() must use the current LC_CTYPE encoding, not UTF-8
if the UTF-8 mode is enabled.

Add _PyUnicode_DecodeCurrentLocale() function.
2018-01-11 10:37:59 +01:00
Victor Stinner
2cba6b8579
bpo-29240: readline now ignores the UTF-8 Mode (#5145)
Add new fuctions ignoring the UTF-8 mode:

* _Py_DecodeCurrentLocale()
* _Py_EncodeCurrentLocale()
* _PyUnicode_DecodeCurrentLocaleAndSize()
* _PyUnicode_EncodeCurrentLocale()

Modify the readline module to use these functions.

Re-enable test_readline.test_nonascii().
2018-01-10 22:46:15 +01:00
Victor Stinner
9dd762013f
bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)
Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in:

* _Py_wfopen()
* _Py_wreadlink()
* _Py_wrealpath()
* _Py_wstat()
* pymain_open_filename()

These functions are called early during Python intialization, only
the RAW memory allocator must be used.
2017-12-21 16:20:32 +01:00
Victor Stinner
e47e698da6
bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)
Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead
of using temporary unicode and bytes objects. So Py_EncodeLocale()
doesn't use the Python C API anymore.
2017-12-21 15:45:16 +01:00
Serhiy Storchaka
a5552f023e
bpo-32240: Add the const qualifier to declarations of PyObject* array arguments. (#4746) 2017-12-15 13:11:11 +02:00
Victor Stinner
91106cd9ff
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Victor Stinner
6a54c676e6
bpo-31979: Remove unused align_maxchar() function (#4527) 2017-11-23 19:02:23 +01:00
Serhiy Storchaka
9b6c60cbce
bpo-31979: Simplify transforming decimals to ASCII (#4336)
in int(), float() and complex() parsers.

This also speeds up parsing non-ASCII numbers by around 20%.
2017-11-13 21:23:48 +02:00
Serhiy Storchaka
e2f92de6a9
Add the const qualifier to "char *" variables that refer to literal strings. (#4370) 2017-11-11 13:06:26 +02:00
stratakis
e8b1965639 bpo-23699: Use a macro to reduce boilerplate code in rich comparison functions (GH-793) 2017-11-02 20:32:54 +10:00
Serhiy Storchaka
a2314283ff
bpo-20047: Make bytearray methods partition() and rpartition() rejecting (#4158)
separators that are not bytes-like objects.
2017-10-29 02:11:54 +03:00
Serhiy Storchaka
56cb465cc9 bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)
and in codecs.escape_decode() when decode an escaped non-ascii byte.
2017-10-20 17:08:15 +03:00
Barry Warsaw
b2e5794870 bpo-31338 (#3374)
* Add Py_UNREACHABLE() as an alias to abort().
* Use Py_UNREACHABLE() instead of assert(0)
* Convert more unreachable code to use Py_UNREACHABLE()
* Document Py_UNREACHABLE() and a few other macros.
2017-09-14 18:13:16 -07:00
Serhiy Storchaka
e3b2b4b8d9 bpo-31393: Fix the use of PyUnicode_READY(). (#3451) 2017-09-08 09:58:51 +03:00
Eric Snow
2ebc5ce42a bpo-30860: Consolidate stateful runtime globals. (#3397)
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals

Other globals are excluded (see globals.txt and check-c-globals.py).
2017-09-07 23:51:28 -06:00
Stefan Krah
f432a3234f bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. (#3157) 2017-08-21 13:09:59 +02:00
Serhiy Storchaka
64e461be09 bpo-22207: Add checks for possible integer overflows in unicodeobject.c. (#2623)
Based on patch by Victor Stinner.
2017-07-11 06:55:25 +03:00
Serhiy Storchaka
f7eae0adfc [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)
Based on patch by Victor Stinner.

Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
2017-06-28 08:30:06 +03:00
Serhiy Storchaka
e613e6add5 bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). (#2285)
Raise a ValueError if the second argument is NULL and the wchar_t\*
string contains null characters.
2017-06-27 16:03:14 +03:00
Serhiy Storchaka
40db90c1ce bpo-29802: Fix reference counting in module-level struct functions (#1213)
when pass arguments of wrong type.
2017-04-20 21:19:31 +03:00
Serhiy Storchaka
b879fe82e7 Expand the PySlice_GetIndicesEx macro. (#1023) 2017-04-08 09:53:51 +03:00
Lisa Roach
43ba8861e0 bpo-29549: Fixes docstring for str.index (#256)
* Updates B.index documentation.

* Updates str.index documentation, makes it Argument Clinic compatible.

* Removes ArgumentClinic code.

* Finishes string.index documentation.

* Updates string.rindex documentation.

* Documents B.rindex.
2017-04-04 22:36:22 -07:00
Serhiy Storchaka
fff9a31a91 bpo-29865: Use PyXXX_GET_SIZE macros rather than Py_SIZE for concrete types. (#748) 2017-03-21 08:53:25 +02:00
Serhiy Storchaka
004e03fb0c bpo-29116: Improve error message for concatenating str with non-str. (#710) 2017-03-19 19:38:42 +02:00
Serhiy Storchaka
202fda55c2 bpo-24037: Add Argument Clinic converter bool(accept={int}). (#485) 2017-03-12 10:10:47 +02:00
Serhiy Storchaka
370fd202f1 Use Py_RETURN_FALSE/Py_RETURN_TRUE rather than PyBool_FromLong(0)/PyBool_FromLong(1). (#567) 2017-03-08 20:47:48 +02:00
Serhiy Storchaka
9f8ad3f39e bpo-29568: Disable any characters between two percents for escaped percent "%%" in the format string for classic string formatting. (GH-513) 2017-03-08 11:51:19 +08:00
Martin Panter
91a8866dc1 Fix grammar in doc string, RST markup 2017-01-24 00:30:06 +00:00
Serhiy Storchaka
228b12edcc Issue #28999: Use Py_RETURN_NONE, Py_RETURN_TRUE and Py_RETURN_FALSE wherever
possible.  Patch is writen with Coccinelle.
2017-01-23 09:47:21 +02:00
Serhiy Storchaka
2a404b63d4 Issue #28769: The result of PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8()
is now of type "const char *" rather of "char *".
2017-01-22 23:07:07 +02:00
Victor Stinner
0c4a828cad Run Argument Clinic: METH_VARARGS=>METH_FASTCALL
Issue #29286. Run Argument Clinic to get the new faster METH_FASTCALL calling
convention for functions using "boring" positional arguments.

Manually fix _elementtree: _elementtree_XMLParser_doctype() must remain
consistent with the clinic code.
2017-01-17 02:21:47 +01:00
INADA Naoki
15f94596b6 Issue #20180: forgot to update AC output. 2017-01-16 21:49:13 +09:00
INADA Naoki
3ae2056512 Issue #20180: convert unicode methods to AC. 2017-01-16 20:41:20 +09:00
Xiang Zhang
7a4da324dc Issue #29145: Merge 3.6. 2017-01-10 10:56:38 +08:00
Xiang Zhang
95403d74d7 Issue #29145: Merge 3.5. 2017-01-10 10:54:19 +08:00