Commit graph

1296 commits

Author SHA1 Message Date
Gregory P. Smith
63e6c3222f Consolidate the occurrances of the prime used as the multiplier when hashing
to a single #define instead of having several copies in several files.

This excludes the Modules/ tree (datetime and expat both have a copy
for their own purposes with no need for it to be the same).
2012-01-14 15:31:34 -08:00
Benjamin Peterson
c8d8b8861e fix possible refleaks if PyUnicode_READY fails 2012-01-14 13:37:31 -05:00
Benjamin Peterson
bac79498c8 always explicitly check for -1 from PyUnicode_READY 2012-01-14 13:34:47 -05:00
Benjamin Peterson
d5890c8db5 add str.casefold() (closes #13752) 2012-01-14 13:23:30 -05:00
Benjamin Peterson
53aa1d7c57 fix possible if unlikely leak 2011-12-20 13:29:45 -06:00
Benjamin Peterson
e51757f6de move do_title to a better place 2012-01-12 21:10:29 -05:00
Benjamin Peterson
821e4cfd01 make fix_decimal_and_space_to_ascii check if it modifies the string 2012-01-12 15:40:18 -05:00
Benjamin Peterson
0c91392fe6 kill capwords implementation which has been disabled since the begining 2012-01-12 15:25:41 -05:00
Benjamin Peterson
b2bf01d824 use full unicode mappings for upper/lower/title case (#12736)
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Victor Stinner
3fe553160c Add a new PyUnicode_Fill() function
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
2012-01-04 00:33:50 +01:00
Benjamin Peterson
5e458f520c also decref the right thing 2012-01-02 10:12:13 -06:00
Benjamin Peterson
4c13a4a352 ready the correct string 2012-01-02 09:07:38 -06:00
Benjamin Peterson
22a29708fd fix some possible refleaks from PyUnicode_READY error conditions 2012-01-02 09:00:30 -06:00
Benjamin Peterson
9ca3ffac94 == -1 is convention 2012-01-01 16:04:29 -06:00
Benjamin Peterson
e157cf1012 make switch more robust 2012-01-01 15:56:20 -06:00
Benjamin Peterson
c0b95d18fa 4 space indentation 2011-12-20 17:24:05 -06:00
Benjamin Peterson
ead6b53659 fix spacing around switch statements 2011-12-20 17:23:42 -06:00
Benjamin Peterson
822c790527 merge 3.2 2011-12-20 13:32:50 -06:00
Victor Stinner
6099a03202 Issue #13624: Write a specialized UTF-8 encoder to allow more optimization
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner
73f53b57d1 Optimize str * n for len(str)==1 and UCS-2 or UCS-4 2011-12-18 03:26:31 +01:00
Victor Stinner
f644110816 Issue #13621: Optimize str.replace(char1, char2)
Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ().  PyUnicode_READ() is a complex and slow macro.
2011-12-18 02:43:08 +01:00
Victor Stinner
ab870218e3 Issue #10951: Fix compiler warnings in timemodule.c and unicodeobject.c
Thanks Jérémy Anger for the fix.
2011-12-17 22:39:43 +01:00
Victor Stinner
2f197078fb The locale decoder raises a UnicodeDecodeError instead of an OSError
Search the invalid character using mbrtowc().
2011-12-17 07:08:30 +01:00
Victor Stinner
1b57967b96 Issue #13560: Locale codec functions use the classic "errors" parameter,
instead of surrogateescape

So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
ab59594326 What's New in Python 3.3: complete the deprecation list
Add also FIXMEs in unicodeobject.c
2011-12-17 04:59:06 +01:00
Victor Stinner
1f33f2b0c3 Issue #13560: os.strerror() now uses the current locale encoding instead of UTF-8 2011-12-17 04:45:09 +01:00
Victor Stinner
f2ea71fcc8 Issue #13560: Add PyUnicode_EncodeLocale()
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
   available
 * Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
   from the current locale encoding
 * _Py_char2wchar() writes an "error code" in the size argument to indicate
   if the function failed because of memory allocation failure or because of a
   decoding error. The function doesn't write the error message directly to
   stderr.
 * Fix time.strftime() (if wcsftime() is missing): decode strftime() result
   from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923 PyUnicode_Resize(): warn about canonical representation
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c Make PyUnicode_Copy() private => _PyUnicode_Copy()
Undocument the function.

Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380 resize_copy() now supports legacy ready strings 2011-12-12 00:13:42 +01:00
Victor Stinner
488fa49acf Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
* Rename unicode_resizable() to unicode_modifiable()
 * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
   that the function is private
 * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
   to simplify the code
 * unicode_modifiable() return 0 if the hash has been computed or if the string
   is not an exact unicode string
 * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
   hash has already been computed, you cannot modify a string inplace anymore
 * PyUnicode_Concat() checks for integer overflow
2011-12-12 00:01:39 +01:00
Victor Stinner
c4b495497a Create unicode_result_unchanged() subfunction 2011-12-11 22:44:26 +01:00
Victor Stinner
eaab604829 Fix fixup() for unchanged unicode subtype
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
2011-12-11 22:22:39 +01:00
Victor Stinner
e6b2d4407a unicode_fromascii() doesn't check string content twice in debug mode
_PyUnicode_CheckConsistency() also checks string content.
2011-12-11 21:54:30 +01:00
Victor Stinner
a1d12bb119 Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
* Remove micro-optimization from PyUnicode_FromStringAndSize():
   PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
   and one ascii char).
 * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
   useless variable
2011-12-11 21:53:09 +01:00
Victor Stinner
382955ff4e Use directly unicode_empty instead of PyUnicode_New(0, 0) 2011-12-11 21:44:00 +01:00
Victor Stinner
785938eebd Move the slowest UTF-8 decoder to its own subfunction
* Create decode_utf8_errors()
 * Reuse unicode_fromascii()
 * decode_utf8_errors() doesn't refit at the beginning
 * Remove refit_partial_string(), use unicode_adjust_maxchar() instead
2011-12-11 20:09:03 +01:00
Victor Stinner
84def3774d Fix error handling in resize_compact() 2011-12-11 20:04:56 +01:00
Victor Stinner
8faf8216e4 PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
character in not in range [U+0000; U+10ffff].
2011-12-08 22:14:11 +01:00
Victor Stinner
551ac95733 Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
6345be9a14 Close #13093: PyUnicode_EncodeDecimal() doesn't support error handlers
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Benjamin Peterson
1518e8713d and back to the "magic" formula (with a comment) it is 2011-11-23 10:44:52 -06:00
Benjamin Peterson
5944c36931 cave to those who like readable code 2011-11-22 19:05:49 -06:00
Benjamin Peterson
0268675193 fix compiler warning by implementing this more cleverly 2011-11-22 15:29:32 -05:00
Victor Stinner
ca4f20782e find_maxchar_surrogates() reuses surrogate macros 2011-11-22 03:38:40 +01:00
Victor Stinner
0d3721d986 Issue #13441: Disable temporary the check on the maximum character until
the Solaris issue is solved.

But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.

Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
2011-11-22 03:27:53 +01:00
Victor Stinner
f8facacf30 Fix compiler warnings 2011-11-22 02:30:47 +01:00
Victor Stinner
b84d723509 (Merge 3.2) Issue #13093: Fix error handling on PyUnicode_EncodeDecimal() 2011-11-22 01:50:07 +01:00