Commit graph

307 commits

Author SHA1 Message Date
Ezio Melotti
f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Mark Dickinson
61254b9391 Issue #14700: merge tests from 3.3. 2012-10-28 10:23:08 +00:00
Mark Dickinson
2a83f16e5e Issue #14700: merge tests from 3.2. 2012-10-28 10:22:22 +00:00
Mark Dickinson
fb90c0934c Issue #14700: Fix buggy overflow checks for large precision and width in new-style and old-style formatting. 2012-10-28 10:18:03 +00:00
Victor Stinner
15a1136547 Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner
e215d960be Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
* Simplify the code: replace 4 steps with one unique step using the
   _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
   store intermediate results which require to allocate an array of pointers on
   the heap.
 * Use the _PyUnicodeWriter API for speed (and its convinient API):
   overallocate the buffer to reduce the number of "realloc()"
 * Implement "width" and "precision" in Python, don't rely on sprintf(). It
   avoids to need of a temporary buffer allocated on the heap: only use a small
   buffer allocated in the stack.
 * Add _PyUnicodeWriter_WriteCstr() function
 * Split PyUnicode_FromFormatV() into two functions: add
   unicode_fromformat_arg().
 * Inline parse_format_flags(): the format of an argument is now only parsed
   once, it's no more needed to have a subfunction.
 * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
   search the next "%" and copy the substring in one chunk, instead of copying
   character per character.
2012-10-06 23:03:36 +02:00
Benjamin Peterson
4eda93723e add another testcase 2012-08-05 15:05:34 -07:00
Brett Cannon
acc0c181a8 Remove a now worthless test. 2012-05-12 17:40:28 -04:00
Victor Stinner
f59c28c930 unicode_writer_finish() checks string consistency 2012-05-09 03:24:14 +02:00
Victor Stinner
ece58deb9f Close #14648: Compute correctly maxchar in str.format() for substrin 2012-04-23 23:36:38 +02:00
Benjamin Peterson
80d07f8251 inherit maxchar of field value where needed (closes #14648) 2012-04-23 10:55:29 -04:00
Eric V. Smith
97722c4132 str.format_map tests don't do what they say: fix to actually implement the intent of the test. Closes #13450. Patch by Akira Li. 2012-03-12 15:26:21 -07:00
Eric V. Smith
edbb6ca084 str.format_map tests don't do what they say: fix to actually implement the intent of the test. Closes #13450. 2012-03-12 15:16:22 -07:00
Benjamin Peterson
d5890c8db5 add str.casefold() (closes #13752) 2012-01-14 13:23:30 -05:00
Benjamin Peterson
b2bf01d824 use full unicode mappings for upper/lower/title case (#12736)
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Victor Stinner
6345be9a14 Close #13093: PyUnicode_EncodeDecimal() doesn't support error handlers
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Victor Stinner
b84d723509 (Merge 3.2) Issue #13093: Fix error handling on PyUnicode_EncodeDecimal() 2011-11-22 01:50:07 +01:00
Victor Stinner
c814a38f3f Add a test on str.__getnewargs__()
It tests indirectly PyUnicode_Copy(): ensure that the string is a copy.
2011-11-22 01:06:15 +01:00
Victor Stinner
42bf77537e Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API
Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII().
2011-11-21 22:52:58 +01:00
Victor Stinner
040e16e3e8 "unicode_internal" codec has been deprecated: fix related tests 2011-11-15 22:44:05 +01:00
Antoine Pitrou
78edf7576e Issue #13333: The UTF-7 decoder now accepts lone surrogates
(the encoder already accepts them).
2011-11-15 01:44:16 +01:00
Antoine Pitrou
5418ee0b9a Issue #13333: The UTF-7 decoder now accepts lone surrogates
(the encoder already accepts them).
2011-11-15 01:42:21 +01:00
Ezio Melotti
40dc919b0d Fix range in test. 2011-11-11 17:00:46 +02:00
Antoine Pitrou
51f6648a31 Make test more inclusive 2011-11-11 13:35:44 +01:00
Antoine Pitrou
dffab19218 Enable commented out test 2011-11-11 13:31:59 +01:00
Antoine Pitrou
2c3b2302ad Issue #13134: optimize finding single-character strings using memchr 2011-10-11 20:29:21 +02:00
Antoine Pitrou
798b4df812 test_unicode was forgetting to run the common string tests for str.find() 2011-10-08 22:42:00 +02:00
Antoine Pitrou
c0bbe7d38a test_unicode was forgetting to run the common string tests for str.find() 2011-10-08 22:41:35 +02:00
Victor Stinner
1d972ad12a Mark 'abc'.expandtab() optimization as specific to CPython
Improve also str.replace(a, a) test
2011-10-07 13:31:46 +02:00
Victor Stinner
59de0ee9e0 str.replace(a, a) is now returning str unchanged if a is a 2011-10-07 10:01:28 +02:00
Ezio Melotti
a9860aeb08 #13054: fix usage of sys.maxunicode after PEP-393. 2011-10-04 19:06:00 +03:00
Antoine Pitrou
e19aa388e8 When expandtabs() would be a no-op, don't create a duplicate string 2011-10-04 16:04:01 +02:00
Victor Stinner
07ac3ebd7b Optimize unicode_subtype_new(): don't encode to wchar_t and decode from wchar_t
Rewrite unicode_subtype_new(): allocate directly the right type.
2011-10-01 16:16:43 +02:00
Benjamin Peterson
811c2f1369 remove "fast-path" for (i)adding strings
These were just an artifact of the old unicode concatenation hack and likely
just penalized other kinds of adding. Also, this fixes __(i)add__ on string
subclasses.
2011-09-30 21:31:21 -04:00
Martin v. Löwis
287eca658d Fix struct sizes. Drop -1, since the resulting string was actually the largest one
that could be allocated.
2011-09-28 10:03:28 +02:00
Martin v. Löwis
d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Ezio Melotti
a3fbde3504 Merge indentation fix and skip decorator with 3.2. 2011-08-23 00:40:09 +03:00
Ezio Melotti
a5c92b4714 Fix indentation and add a skip decorator. 2011-08-23 00:37:08 +03:00
Ezio Melotti
6f2a683a0c #9200: merge with 3.2. 2011-08-22 20:31:11 +03:00
Ezio Melotti
93e7afc5d9 #9200: The str.is* methods now work with strings that contain non-BMP characters even in narrow Unicode builds. 2011-08-22 14:08:38 +03:00
Benjamin Peterson
f8e7543df9 merge 3.2 (#12732) 2011-08-12 22:18:19 -05:00
Benjamin Peterson
f413b80806 in narrow builds, make sure to test codepoints as identifier characters (closes #12732)
This fixes the use of Unicode identifiers outside the BMP in narrow builds.
2011-08-12 22:17:18 -05:00
Victor Stinner
ab1d16b456 Issue #13093: Fix error handling on PyUnicode_EncodeDecimal()
* Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII()
 * Remove the unused "e" variable in replace()
2011-11-22 01:45:37 +01:00
Eric V. Smith
c12469df22 Merge from 3.2. 2011-07-18 14:08:55 -04:00
Eric V. Smith
12ebefc9d3 Closes #12579. Positional fields with str.format_map() now raise a ValueError instead of SystemError. 2011-07-18 14:03:41 -04:00
Senthil Kumaran
bc9d8f838b merge from 3.2 2011-07-03 21:05:25 -07:00
Senthil Kumaran
9ebe08d2f6 Fix closes issue12471 - wrong TypeError message when '%i' format spec was used. 2011-07-03 21:03:16 -07:00
Ezio Melotti
bf1253b25a #6780: merge with 3.2. 2011-04-26 06:45:24 +03:00
Ezio Melotti
f2b3f780a1 #6780: merge with 3.1. 2011-04-26 06:40:59 +03:00
Ezio Melotti
ba42fd5801 #6780: fix starts/endswith error message to mention that tuples are accepted too. 2011-04-26 06:09:45 +03:00