Commit graph

1575 commits

Author SHA1 Message Date
Martin Panter
cda80940ed Issue #15984: Merge PyUnicode doc from 3.5 2016-04-15 02:27:11 +00:00
Martin Panter
6245cb3c01 Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Serhiy Storchaka
21a663ea28 Issue #26057: Got rid of nonneeded use of PyUnicode_FromObject(). 2016-04-13 15:37:23 +03:00
Serhiy Storchaka
f01e408c16 Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:12:01 +03:00
Serhiy Storchaka
57a01d3a0e Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:05:40 +03:00
Serhiy Storchaka
ec39756960 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:50:03 +03:00
Serhiy Storchaka
48842714b9 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:45:48 +03:00
Serhiy Storchaka
ab479c49d3 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:41:15 +03:00
Serhiy Storchaka
fbb1c5ee06 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:40:02 +03:00
Victor Stinner
f2192855dd Merge 3.5 2016-03-01 22:07:53 +01:00
Victor Stinner
337986740f Issue #26464: Fix unicode_fast_translate() again
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner
3d9d77a3dc Merge 3.5 2016-03-01 21:30:50 +01:00
Victor Stinner
6c9aa8f2bf Fix str.translate()
Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner
5b96f17b1c Merge 3.5 2016-01-27 17:01:13 +01:00
Victor Stinner
5bc03a6d4d Fix resize_compact()
Issue #26217: resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka
726fc139a5 Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka
191321d11b Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka
ef1585eb9a Issue #25923: Added more const qualifiers to signatures of static and private functions. 2015-12-25 20:01:53 +02:00
Serhiy Storchaka
2d06e84455 Issue #25923: Added the const qualifier to static constant arrays. 2015-12-25 19:53:18 +02:00
Serhiy Storchaka
f006940351 Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka
5a57ade58e Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka
9b3a2eec1c Issues #25890, #25891, #25892: Removed unused variables in Windows code.
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka
7c088a9b5c Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:05:52 +02:00
Serhiy Storchaka
6648bf5661 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:04:37 +02:00
Serhiy Storchaka
31b9410654 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Serhiy Storchaka
7aa690860e Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Benjamin Peterson
d798dc1034 merge 3.5 (#25630) 2015-11-15 21:57:50 -08:00
Benjamin Peterson
a4d33b3428 make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL (closes #25630) 2015-11-15 21:57:39 -08:00
Serhiy Storchaka
413fdcea21 Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka
4a7c03aab4 Issue #25523: Merge a-to-an corrections from 3.5. 2015-11-02 14:44:29 +02:00
Serhiy Storchaka
a84f6c3dd3 Issue #25523: Merge a-to-an corrections from 3.4. 2015-11-02 14:39:05 +02:00
Serhiy Storchaka
d65c9496da Issue #25523: Further a-to-an corrections. 2015-11-02 14:10:23 +02:00
Victor Stinner
358af13526 Issue #25353: Optimize unicode escape and raw unicode escape encoders to use
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner
6c2cdae9e6 Writer APIs: use empty string singletons
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner
6bd525b656 Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.

Cleanup unicode_encode_ucs1():

* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner
3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00