Commit graph

1225 commits

Author SHA1 Message Date
Victor Stinner
c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da Issue #25227: Cleanup unicode_encode_ucs1() error handler
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
  "ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d Issue #24870: revert unwanted change
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628 Issue #25207, #14626: Fix my commit.
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea _PyUnicodeWriter_PrepareInternal(): make the assertion more strict 2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01 Issue #24870: Add _PyUnicodeWriter_PrepareKind() macro
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7 Issue #24870: Reuse the new _Py_error_handler enum
Factorize code with the new get_error_handler() function.

Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware
070bd62cfa Closes #21279: Merge with 3.5 2015-08-06 00:05:13 -05:00
Zachary Ware
d987a81d29 Issue #21279: Merge with 3.4 2015-08-06 00:04:23 -05:00
Zachary Ware
79b98df023 Issue #21279: Flesh out str.translate docs
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00
Raymond Hettinger
ac2ef65c32 Make the unicode equality test an external function rather than in-lining it.
The real benefit of the unicode specialized function comes from
bypassing the overhead of PyObject_RichCompareBool() and not
from being in-lined (especially since there was almost no shared
data between the caller and callee).  Also, the in-lining was
having a negative effect on code generation for the callee.
2015-07-04 16:04:44 -07:00
Serhiy Storchaka
d4ea03c785 Issue #24284: The startswith and endswith methods of the str class no longer
return True when finding the empty string and the indexes are completely out
of range.
2015-05-31 09:15:51 +03:00
Antoine Pitrou
873e0df946 Fix some compilation warnings when using gcc (-Wmaybe-uninitialized). 2015-05-19 21:06:04 +02:00
Antoine Pitrou
f6d1f1fa8a Fix some compilation warnings when using gcc (-Wmaybe-uninitialized). 2015-05-19 21:04:33 +02:00
Serhiy Storchaka
0d4df752ac Issue #15027: The UTF-32 encoder is now 3x to 7x faster. 2015-05-12 23:12:45 +03:00
Serhiy Storchaka
7e9d1d1a1b Issue #23908: os functions now reject paths with embedded null character
on Windows instead of silently truncate them.

Removed no longer used _PyUnicode_HasNULChars().
2015-04-20 10:12:28 +03:00
Serhiy Storchaka
1009bf18b3 Issue #23501: Argumen Clinic now generates code into separate files by default. 2015-04-03 23:53:51 +03:00
Victor Stinner
1912b39def _PyUnicodeWriter_WriteStr() now checks that the input string is consistent
in debug mode to detect bugs earlier.

_PyUnicodeWriter_Finish() doesn't check if the read only string is consistent,
whereas it does check consistency for strings built by itself.
2015-03-26 09:37:23 +01:00
Serhiy Storchaka
d9d769fcdd Issue #23573: Increased performance of string search operations (str.find,
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Victor Stinner
f50e187724 Fix compiler warnings: comparison between signed and unsigned numbers 2015-03-20 11:32:24 +01:00
Victor Stinner
0c39b1b970 Initialize variables to prevent GCC warnings 2015-03-18 15:02:06 +01:00
Steve Dower
3e96f324dc Issue #23451: Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks. 2015-03-02 08:01:10 -08:00
Serhiy Storchaka
78a8249127 Issue #23490: Fixed possible crashes related to interoperability between
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:39 +02:00
Serhiy Storchaka
e55181f517 Issue #23490: Fixed possible crashes related to interoperability between
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:06 +02:00
Serhiy Storchaka
4d0d982985 Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer
overflows.  Added few missed PyErr_NoMemory().
2015-02-16 13:33:32 +02:00
Serhiy Storchaka
1a1ff29659 Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer
overflows.  Added few missed PyErr_NoMemory().
2015-02-16 13:28:22 +02:00
Victor Stinner
29dacf2e97 Issue #15859: PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and
PyUnicode_EncodeCodePage() now raise an exception if the object is not an
Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on
platforms other than Windows. Patch written by Campbell Barton.
2015-01-26 16:41:32 +01:00
Serhiy Storchaka
bbd3aa8ece Issue #23321: Fixed a crash in str.decode() when error handler returned
replacment string longer than mailformed input data.
2015-01-26 01:24:31 +02:00
Serhiy Storchaka
7e4b9057b3 Issue #23321: Fixed a crash in str.decode() when error handler returned
replacment string longer than mailformed input data.
2015-01-26 01:22:54 +02:00
Ethan Furman
b95b56150f Issue20284: Implement PEP461 2015-01-23 20:05:18 -08:00
Serhiy Storchaka
82e07b92b3 Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:33:31 +02:00
Serhiy Storchaka
d3faf43f9b Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:28:37 +02:00
Serhiy Storchaka
b757c83ec6 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:25:22 +02:00
Serhiy Storchaka
133b11b566 Issue #22975: Close block at right place. 2014-12-01 18:56:28 +02:00
Serhiy Storchaka
92bf919ed0 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:26:10 +02:00
Serhiy Storchaka
407249c62b Issue #22975: Close block at right place. 2014-12-01 18:56:54 +02:00
Victor Stinner
3aa979e0cd Issue #20948: Inline makefmt() in unicode_fromformat_arg() 2014-11-18 21:40:51 +01:00
Antoine Pitrou
4e334241b7 Fixed signed/unsigned comparison warning 2014-10-15 23:14:53 +02:00
Benjamin Peterson
736982d36d merge 3.4 (closes #22643) 2014-10-15 12:17:47 -04:00
Benjamin Peterson
9c422f3c3d merge 3.3 2014-10-15 12:17:33 -04:00
Benjamin Peterson
1e211ff10d it suffices to check for PY_SSIZE_T_MAX overflow (#22643) 2014-10-15 12:17:21 -04:00
Benjamin Peterson
315aa40403 Merge 3.4 2014-10-15 11:51:17 -04:00
Benjamin Peterson
60d7a73194 Merge 3.3 2014-10-15 11:51:12 -04:00
Benjamin Peterson
c0e64f5027 make sure length is unsigned 2014-10-15 11:51:05 -04:00
Benjamin Peterson
6925264334 merge 3.4 (#22643) 2014-10-15 11:49:15 -04:00
Benjamin Peterson
1cbb3fe775 merge 3.3 (#22643) 2014-10-15 11:48:41 -04:00
Benjamin Peterson
e1bd38c03c fix integer overflow in unicode case operations (closes #22643) 2014-10-15 11:47:36 -04:00
Gregory P. Smith
8486f9b134 Fix "warning: comparison between signed and unsigned integer expressions"
-Wsign-compare warnings in unicodeobject.c.  These were all a result
of sizeof() being unsigned and being compared to a Py_ssize_t.
Not actual problems.
2014-09-30 00:33:24 -07:00
Benjamin Peterson
fd97a6fb2d merge 3.4 (#22520) 2014-09-29 23:02:56 -04:00