Commit graph

1327 commits

Author SHA1 Message Date
Serhiy Storchaka
583a93943c Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster. 2014-01-04 19:25:37 +02:00
Victor Stinner
fa4e68d425 Remove deadcode (HASH macro is no more defined) 2014-01-03 17:42:18 +01:00
Victor Stinner
92a419eea4 Remove now unused variables 2014-01-03 17:39:40 +01:00
Victor Stinner
f3b46b4a66 unicode_char() uses get_latin1_char() to get latin1 singleton characters 2014-01-03 13:16:00 +01:00
Victor Stinner
985a82a6d2 add unicode_char() in unicodeobject.c to factorize code 2014-01-03 12:53:47 +01:00
Larry Hastings
44e2eaab54 Issue #19674: inspect.signature() now produces a correct signature
for some builtins.
2013-11-23 15:37:55 -08:00
Larry Hastings
ebdcb50b8a Issue #19730: Argument Clinic now supports all the existing PyArg
"format units" as legacy converters, as well as two new features:
"self converters" and the "version" directive.
2013-11-23 14:54:00 -08:00
Nick Coghlan
c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Christian Heimes
985ecdcfc2 ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'.
Python now uses SipHash24 on all major platforms.
2013-11-20 11:46:18 +01:00
Victor Stinner
4a58707a34 Add _PyUnicodeWriter_WriteASCIIString() function 2013-11-19 12:54:53 +01:00
Serhiy Storchaka
58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Victor Stinner
6989ba0174 Issue #19581: Change the overallocation factor of _PyUnicodeWriter on Windows
On Windows, a factor of 50% gives best performances.
2013-11-18 21:08:39 +01:00
Larry Hastings
ed4a1c5703 Argument Clinic: rename "self" to "module" for module-level functions. 2013-11-18 09:32:13 -08:00
Ezio Melotti
745d54d2fa #17806: Added keyword-argument support for "tabsize" to str/bytes.expandtabs(). 2013-11-16 19:10:57 +02:00
Nick Coghlan
8b097b4ed7 Close #17828: better handling of codec errors
- output type errors now redirect users to the type-neutral
  convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
  will now be automatically wrapped in exceptions that give
  the name of the codec involved
2013-11-13 23:49:21 +10:00
Victor Stinner
66b3270975 _Py_normalize_encoding(): explain how the value 6 was computed 2013-11-07 23:12:23 +01:00
Victor Stinner
df23e30bea Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
if the input string is NULL
2013-11-07 13:33:36 +01:00
Victor Stinner
ad14ccd047 Issue #19512: add _PyUnicode_CompareWithId() function
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.

Add also _PyId_builtins identifier for "builtins" common string.
2013-11-07 00:46:04 +01:00
Victor Stinner
21ea21ef6d Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result
to -1, 0, 1
2013-11-04 11:28:26 +01:00
Victor Stinner
f0c7b2af05 Issue #16286: remove duplicated identity check from unicode_compare()
Move the test to PyUnicode_Compare()
2013-11-04 11:27:14 +01:00
Victor Stinner
fd9e44db37 Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same
pointer) for any operator, not only Py_EQ and Py_NE.

Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.
2013-11-04 11:23:05 +01:00
Victor Stinner
c8bc5377ac Issue #16286: write a new subfunction bytes_compare_eq()
* cleanup bytes_richcompare()
* PyUnicode_RichCompare(): replace a test with a XOR
2013-11-04 11:08:10 +01:00
Victor Stinner
e1b1592fd4 Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t
Patch written by Zachary Ware.
2013-11-03 13:53:12 +01:00
Victor Stinner
a6b9b071a3 Issue #19424: Fix a compiler warning
memcmp() just takes raw pointers
2013-10-30 18:27:13 +01:00
Victor Stinner
602f7cf0b9 Issue #19424: Optimize PyUnicode_CompareWithASCIIString()
Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro.
strlen() is still necessary to check Unicode string containing null bytes.
2013-10-29 23:31:50 +01:00
Victor Stinner
68b674c9d4 Issue #19437: Fix _PyUnicode_New() (constructor of legacy string), set all
attributes before checking for error. The destructor expects all attributes to
be set. It is now safe to call Py_DECREF(unicode) in the constructor.
2013-10-29 19:31:43 +01:00
Victor Stinner
fa3ba4c3bc Issue #18609: Add a fast-path for "iso8859-1" encoding
On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of
the legacy ISO 8859-1 encoding.

Using a C codec instead of a Python codec is faster but also avoids tricky
issues during Python startup or complex code.
2013-10-29 11:34:05 +01:00
Victor Stinner
a5afb58986 Issue #18408: Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on
memory allocation failure
2013-10-29 01:28:23 +01:00
Serhiy Storchaka
c679227e31 Issue #1772673: The type of char* arguments now changed to const char*. 2013-10-19 21:03:34 +03:00
Serhiy Storchaka
55e092f545 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:39:28 +03:00
Serhiy Storchaka
35804e4c63 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:38:19 +03:00
Larry Hastings
3182680210 Issue #16612: Add "Argument Clinic", a compile-time preprocessor
for C files to generate argument parsing code.  (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman
fb13721b1b Close #18780: %-formatting now prints value for int subclasses with %d, %i, and %u codes. 2013-08-31 10:18:55 -07:00
Antoine Pitrou
9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Raymond Hettinger
e56666d17f Silence compiler warning about an uninitialized variable 2013-08-04 11:51:03 -07:00
Raymond Hettinger
5ed1b38a7d merge 2013-08-04 11:51:35 -07:00
Christian Heimes
b578735dff Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes
26532f7519 Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner
e699e5a218 Issue #18408: Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
and _PyUnicode_HAS_WSTR_MEMORY() macros

These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.

For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner
9e6b4d715c Issue #18408: _PyUnicodeWriter_Finish() now clears its buffer attribute in all
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner
15a0bd3965 Issue #18408: Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner
6f8eeee7b9 Issue #18203: Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar() 2013-07-07 22:57:45 +02:00
Victor Stinner
1a7425f67a Issue #18203: Replace malloc() with PyMem_RawMalloc() at Python initialization
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
  of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes
d47802eef7 Fix ref leak in error case of unicode find, count, formatlong
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes
d47a0456b1 Fix ref leak in error case of unicode index
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes
ea71a525c3 Fix ref leak in error case of unicode rindex and rfind
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes
305e49e17e Fix memory leak in endswith
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka
c89533f72f Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka
8eeae2126c Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson
3164f5d565 merge 3.3 (#18183) 2013-06-10 09:24:01 -07:00