Commit graph

1202 commits

Author SHA1 Message Date
Victor Stinner
684d5fd420 Fix PyUnicode_Substring() for start >= length and start > end
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner
b6cd014d75 Unicode: optimize creating of 1-character strings 2012-05-03 02:17:04 +02:00
Victor Stinner
bff7c96834 Issue #14687: Optimize str%tuple for the "%(name)s" syntax
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner
e6abb488c9 unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
of the second argument of PyUnicode_New().

 * Create also align_maxchar() function
 * Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
   character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner
438106b66e Issue #14687: Cleanup PyUnicode_Format() 2012-05-02 00:41:57 +02:00
Victor Stinner
b5c3ea3af3 Issue #14687: Optimize str%args
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
   to not have to check characters, we know that it is really ASCII
 * Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
   a character: if avoids a call to ucs4lib_find_max_char() to compute
   the maximum character (whereas we already know it, it is just the character
   itself)
2012-05-02 00:29:36 +02:00
Victor Stinner
b80e46eca4 Issue #14687: Avoid an useless duplicated string in PyUnicode_Format() 2012-04-30 05:21:52 +02:00
Victor Stinner
aff3cc659b Issue #14687: Cleanup PyUnicode_Format() 2012-04-30 05:19:21 +02:00
Victor Stinner
b11d91d969 Fix my previous commit: bool is a long, restore the specical case for bool 2012-04-28 00:25:34 +02:00
Victor Stinner
d0880d57b0 Simplify and optimize formatlong()
* Remove _PyBytes_FormatLong(): inline it into formatlong()
 * the input type is always a long, so remove the code for bool
 * don't duplicate the string if the length does not change
 * Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner
94d558b063 Optimize _PyUnicode_FindMaxChar() find pure ASCII strings 2012-04-27 22:26:58 +02:00
Victor Stinner
8f825060f1 Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
* In debug mode, fill the string data with invalid characters
 * Simplify also reference counting in PyCodec_BackslashReplaceErrors()
   and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Victor Stinner
718fbf078c _PyUnicode_CheckConsistency() ensures that the unicode string ends with a
null character
2012-04-26 00:39:37 +02:00
Benjamin Peterson
b9f4c9daad make pointer arith c89 2012-04-23 21:45:40 -04:00
Benjamin Peterson
f3b7d86e25 use correct base ptr 2012-04-23 18:07:01 -04:00
Benjamin Peterson
2844a7a6d3 simplify and reformat 2012-04-23 18:00:25 -04:00
Victor Stinner
ece58deb9f Close #14648: Compute correctly maxchar in str.format() for substrin 2012-04-23 23:36:38 +02:00
Benjamin Peterson
64ed576de8 merge 3.2 (#14509) 2012-04-09 15:04:39 -04:00
Benjamin Peterson
ca819c3c9d merge 3.1 (#14509) 2012-04-09 15:01:02 -04:00
Benjamin Peterson
f6622c8a3e fix build without Py_DEBUG and DNDEBUG (closes #14509) 2012-04-09 14:53:07 -04:00
Victor Stinner
afb5205c48 Close #14249: Use bit shifts instead of an union, it's more efficient.
Patch written by Serhiy Storchaka
2012-04-05 22:54:49 +02:00
Victor Stinner
e7eee01f36 Close #14249: Use an union instead of a long to short pointer to avoid aliasing
issue. Speed up UTF-16 by 20%.
2012-04-05 13:44:34 +02:00
Antoine Pitrou
a701388de1 Rename _PyIter_GetBuiltin to _PyObject_GetBuiltin, and do not include it in the stable ABI. 2012-04-05 00:04:20 +02:00
Kristján Valur Jónsson
31668b8f7a Issue #14288: Serialization support for builtin iterators. 2012-04-03 10:49:41 +00:00
Benjamin Peterson
0df542985a grammar 2012-03-26 14:50:32 -04:00
Benjamin Peterson
c067d6661f merge 3.2 2012-03-25 22:41:16 -04:00
Benjamin Peterson
a8755c586e kill this terribly outdated comment 2012-03-25 22:40:54 -04:00
Victor Stinner
0d03478b88 Remove an unused variable 2012-03-06 02:06:01 +01:00
Victor Stinner
c9590ad745 Close #14085: remove assertions from PyUnicode_WRITE macro
Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a
test raising a Python exception.
2012-03-04 01:34:37 +01:00
Ezio Melotti
cda6b6d60d #14081: The sep and maxsplit parameter to str.split, bytes.split, and bytearray.split may now be passed as keyword arguments. 2012-02-26 09:39:55 +02:00
Victor Stinner
b0800dc53b Oops, revert unwanted changes 2012-02-25 00:47:08 +01:00
Victor Stinner
abc649ddbe Issue #14107: fix bigmem tests on str.capitalize(), str.swapcase() and
str.title(). Compute correctly how much memory is required for the test
(memuse).
2012-02-25 00:43:27 +01:00
Antoine Pitrou
842c0f17eb Fix compilation error under Windows (and warnings too). 2012-02-24 13:30:46 +01:00
Victor Stinner
90f50d4df9 Issue #13706: Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF) 2012-02-24 01:44:47 +01:00
Victor Stinner
41a863cb81 Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separator
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
   (from the locale encoding), instead of decoding them implicitly from latin1
 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
 * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
   character if unicode is NULL
 * Replace MIN/MAX macros by Py_MIN/Py_MAX
 * stringlib/undef.h undefines STRINGLIB_IS_UNICODE
 * stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Victor Stinner
b429d3b09c Fix doc of an internal function: unicode_write_cstr() 2012-02-22 21:22:20 +01:00
Antoine Pitrou
ba6bafcfbe Fix compile failure under Windows 2012-02-22 16:41:50 +01:00
Victor Stinner
c516610f0b Optimize str%arg for number formats: %i, %d, %u, %x, %p
Write a specialized function to write an ASCII/latin1 C char* string into a
Python Unicode string.
2012-02-22 13:55:02 +01:00
Victor Stinner
99d7ad0bb0 Micro-optimize computation of maxchar in PyUnicode_TransformDecimalToASCII() 2012-02-22 13:37:39 +01:00
Victor Stinner
da79e632c4 Micro-optimize unicode_expandtabs(): use FILL() macro to write N spaces 2012-02-22 13:37:04 +01:00
Victor Stinner
15e9ed299c PyUnicode_New() and unicode_putchar() check for MAX_UNICODE maximum (U+10FFFF) 2012-02-22 13:36:20 +01:00
Benjamin Peterson
d9a3591ed1 merge 3.2 2012-02-21 11:12:14 -05:00
Benjamin Peterson
e249dcab7a merge 3.2 2012-02-21 11:09:13 -05:00
Benjamin Peterson
69e9727657 ensure no one tries to hash things before the random seed is found 2012-02-21 11:08:50 -05:00
Georg Brandl
16fa2a1097 Forgot the "empty string -> hash == 0" special case for strings. 2012-02-21 00:50:13 +01:00
Georg Brandl
2fb477c0f0 Merge 3.2: Issue #13703 plus some related test suite fixes. 2012-02-21 00:33:36 +01:00
Georg Brandl
09a7c72cad Merge from 3.1: Issue #13703: add a way to randomize the hash values of basic types (str, bytes, datetime)
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.

The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 21:31:46 +01:00
Georg Brandl
2daf6ae249 Issue #13703: add a way to randomize the hash values of basic types (str, bytes, datetime)
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.

The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 19:54:16 +01:00
Victor Stinner
c3a6b02d70 (Merge 3.2) Issue #13913: normalize utf-8 codec name in UTF-8 decoder 2012-02-14 01:18:10 +01:00
Victor Stinner
cbe01342bc Issue #13913: normalize utf-8 codec name in UTF-8 decoder 2012-02-14 01:17:45 +01:00