Commit graph

1467 commits

Author SHA1 Message Date
Hye-Shik Chang
e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Marc-André Lemburg
d25c650461 Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__(). 2004-07-23 16:13:25 +00:00
Nicholas Bastin
9ba301e589 Moved SunPro warning suppression into pyport.h and out of individual
modules and objects.
2004-07-15 15:54:05 +00:00
Marc-André Lemburg
126b44cd41 Fix a copy&paste typo. 2004-07-10 12:04:20 +00:00
Marc-André Lemburg
1dffb120b7 .encode()/.decode() patch part 2. 2004-07-08 19:13:55 +00:00
Marc-André Lemburg
d2d4598ec2 Allow string and unicode return types from .encode()/.decode()
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
2004-07-08 17:57:32 +00:00
Nicholas Bastin
1ce9e4cfc1 Fixed end-of-loop code not reached warning when using SunPro C 2004-06-17 18:27:18 +00:00
Hye-Shik Chang
974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Hye-Shik Chang
4057483164 SF Patch #926375: Remove a useless UTF-16 support code that is never
been used. (Suggested by Martin v. Loewis)
2004-04-06 07:24:51 +00:00
Walter Dörwald
cd736e71a3 Fix reallocation bug in unicode.translate(): The code was comparing
characters instead of character pointers to determine space requirements.
2004-02-05 17:36:00 +00:00
Hye-Shik Chang
1bc09b7c2a Cosmetic fix for wrongly indented tabs with ts=4. 2004-01-03 19:35:43 +00:00
Hye-Shik Chang
7fc4cf57b8 Fix unicode.rsplit()'s bug that ignores separater on the end of string when
using specialized splitter for 1 char sep.
2003-12-23 09:10:16 +00:00
Hye-Shik Chang
40e9509dc7 Fix broken xmlcharrefreplace by rev 2.204.
(Pointy hat goes to perky)
2003-12-22 01:31:13 +00:00
Hye-Shik Chang
4a264fb054 SF #859573: Reduce compiler warnings on gcc 3.2 and above. 2003-12-19 01:59:56 +00:00
Hye-Shik Chang
3ae811b57d Add rsplit method for str and unicode builtin types.
SF feature request #801847.
Original patch is written by Sean Reifschneider.
2003-12-15 18:49:53 +00:00
Guido van Rossum
6c9e130524 - Removed FutureWarnings related to hex/oct literals and conversions
and left shifts.  (Thanks to Kalle Svensson for SF patch 849227.)
  This addresses most of the remaining semantic changes promised by
  PEP 237, except for repr() of a long, which still shows the trailing
  'L'.  The PEP appears to promise warnings for operations that
  changed semantics compared to Python 2.3, but this is not
  implemented; we've suffered through enough warnings related to
  hex/oct literals and I think it's best to be silent now.
2003-11-29 23:52:13 +00:00
Raymond Hettinger
4f8f976576 Add optional fillchar argument to ljust(), rjust(), and center() string methods. 2003-11-26 08:21:35 +00:00
Walter Dörwald
4894c30626 Fix a bug in the memory reallocation code of PyUnicode_TranslateCharmap().
charmaptranslate_makespace() allocated more memory than required for the
next replacement but didn't remember that fact, so memory size was growing
exponentially every time a replacement string is longer that one character.
This fixes SF bug #828737.
2003-10-24 14:25:28 +00:00
Martin v. Löwis
6828e18a6a Patch #825679: Clarify semantics of .isfoo on empty strings.
Backported to 2.3.
2003-10-18 09:55:08 +00:00
Jeremy Hylton
504de6bd2c Fix for SF bug [ 817156 ] invalid \U escape gives 0=length unistr. 2003-10-06 05:08:26 +00:00
Tim Peters
ced69f8a20 On c.l.py, Martin v. Löwis said that Py_UNICODE could be of a signed type,
so fiddle Jeremy's fix to live with that.  Also added more comments.

Bugfix candidate (this bug is in all versions of Python, at least since
2.1).
2003-09-16 20:30:58 +00:00
Jeremy Hylton
d808279be3 Double-fix of crash in Unicode freelist handling.
If a length-1 Unicode string was in the freelist and it was
uninitialized or pointed to a very large (magnitude) negative number,
the check

	 unicode_latin1[unicode->str[0]] == unicode

could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb.

Fix this in two ways:

1. Change guard befor unicode_latin1[] to test against 256U.  If I
   understand correctly, the unsigned long used to store UCS4 on my
   box was getting converted to a signed long to compare with the
   signed constant 256.

2. Change _PyUnicode_New() to make sure the first element of str is
   always initialized to zero.  There are several places in the code
   where the caller can exit with an error before initializing any
   of str, which would leave junk in str[0].

Also, silence a compiler warning on pointer vs. int arithmetic.

Bug fix candidate.
2003-09-16 19:41:39 +00:00
Jeremy Hylton
deb2dc6658 Change checks of PyUnicode_Resize() return value for clarity.
The unicode_resize() family only returns -1 or 0 so simply checking
for != 0 is sufficient, but somewhat unclear.  Many Python API
functions return < 0 on error, reserving the right to return 0 or 1 on
success.  Change the call sites for consistency with these calls.
2003-09-16 03:41:45 +00:00
Raymond Hettinger
9bfe533c69 SF bug #795506: Wrong handling of string format code for float values.
Adding missing support for '%F'.

Will backport to 2.3.1.
2003-08-27 04:55:52 +00:00
Walter Dörwald
150523efa5 Fix refcounting leak in charmaptranslate_lookup() 2003-08-15 16:52:19 +00:00
Walter Dörwald
9b30f206ee Fix another refcounting leak in PyUnicode_EncodeCharmap(). 2003-08-15 16:26:34 +00:00
Walter Dörwald
d4ade0885c Fix another refcounting leak (in PyUnicode_DecodeUnicodeEscape()). 2003-08-15 15:00:26 +00:00
Walter Dörwald
e5402fb340 Fix refcount leak in PyUnicode_EncodeCharmap(). The bug surfaces
when an encoding error occurs and the callback name is unknown,
i.e. when the callback has to be called. The problem was that
the fact that the callback has already been looked up was only
recorded in a local variable in charmap_encoding_error(), because
charmap_encoding_error() got it's own copy of the errorHandler
pointer instead of a pointer to the pointer in
PyUnicode_EncodeCharmap().
2003-08-14 20:25:29 +00:00
Mark Hammond
0ccda1ee10 Support 'mbcs' as a 'built-in' encoding, so the C API can use it without
defering to the encodings package.
As described in [ 763111 ] mbcs encoding should skip encodings package
2003-07-01 00:13:27 +00:00
Raymond Hettinger
f466793fcc SF patch 703666: Several objects don't decref tmp on failure in subtype_new
Submitted By: Christopher A. Craig

Fillin some missing decrefs.
2003-06-28 20:04:25 +00:00
Martin v. Löwis
9a3a9f7791 Consider \U-escapes in raw-unicode-escape. Fixes #444514. 2003-05-18 12:31:09 +00:00
Neal Norwitz
ffe33b7f24 Attempt to make all the various string *strip methods the same.
* Doc - add doc for when functions were added
 * UserString
 * string object methods
 * string module functions
'chars' is used for the last parameter everywhere.

These changes will be backported, since part of the changes
have already been made, but they were inconsistent.
2003-04-10 22:35:32 +00:00
Guido van Rossum
a7132189d2 Reformat a few docstrings that caused line wraps in help() output. 2003-04-09 19:32:45 +00:00
Walter Dörwald
44f527fea4 Change formatchar(), so that u"%c" % 0xffffffff now raises
an OverflowError instead of a TypeError to be consistent
with "%c" % 256. See SF patch #710127.
2003-04-02 16:37:24 +00:00
Raymond Hettinger
c8df5780e1 Sf patch #700047: unicode object leaks refcount on resizing
Contributed by Hye-Shik Chang.
2003-03-09 07:30:43 +00:00
Neal Norwitz
ec74f2fda7 Add more missing PyErr_NoMemory() after failled memory allocs 2003-02-11 23:05:40 +00:00
Walter Dörwald
f6b56aecad Fix two refcounting bugs 2003-02-09 23:42:56 +00:00
Walter Dörwald
2e0b18af30 Change the treatment of positions returned by PEP293
error handers in the Unicode codecs: Negative
positions are treated as being relative to the end of
the input and out of bounds positions result in an
IndexError.

Also update the PEP and include an explanation of
this in the documentation for codecs.register_error.

Fixes a small bug in iconv_codecs: if the position
from the callback is negative *add* it to the size
instead of substracting it.

From SF patch #677429.
2003-01-31 17:19:08 +00:00
Guido van Rossum
5d9113d8be Implement appropriate __getnewargs__ for all immutable subclassable builtin
types.  The special handling for these can now be removed from save_newobj().
Add some testing for this.

Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
2003-01-29 17:58:45 +00:00
Walter Dörwald
adc727490b Fix charmapencode_lookup(), so that a None value in the mapping
is treated as "character maps to <undefined>" and not as
"character mapping must return integer, None or str".
2003-01-08 22:01:33 +00:00
Walter Dörwald
034d97605d Remove variable owned from PyUnicode_FromEncodedObject, which is unused
(except for Py_DECREF calls) since the introduction of __unicode__.
2003-01-08 20:38:39 +00:00
Marc-André Lemburg
79f57833f3 Patch for bug #659709: bogus computation of float length
Python 2.2.x backport candidate. (This bug has been around since
Python 1.6.)
2002-12-29 19:44:06 +00:00
Neil Schemenauer
ce30bc9f49 Add nb_remainder (i.e. __mod__) slot to unicode type. Fixes SF bug #615506. 2002-11-18 16:10:18 +00:00
Neal Norwitz
80a1bf4b5d Fix SF # 635969, No error "not all arguments converted"
When mwh added extended slicing, strings and unicode became mappings.
Thus, dict was set which prevented an error when doing:
	newstr = 'format without a percent' % string_value

This fix raises an exception again when there are no formats
and % with a string value.
2002-11-12 23:01:12 +00:00
Marc-André Lemburg
9cd87aaa54 Fix for bug #626172: crash using unicode latin1 single char
Python 2.2.3 candidate.
2002-10-23 09:02:46 +00:00
Guido van Rossum
049cd6b563 Fix a nasty endcase reported by Armin Rigo in SF bug 618623:
'%2147483647d' % -123 segfaults.  This was because an integer overflow
in a comparison caused the string resize to be skipped.  After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.

An identical bug existed in unicodeobject.c, of course.

Will backport to 2.2.2.
2002-10-11 00:43:48 +00:00
Marc-André Lemburg
24e53b6d91 Add cast to avoid compiler warning. 2002-09-24 09:32:14 +00:00
Neal Norwitz
a0378e1eda Fix part of SF bug # 544248 gcc warning in unicodeobject.c
When --enable-unicode=ucs4, need to cast Py_UNICODE to a char
2002-09-13 13:47:06 +00:00
Guido van Rossum
efc1188239 Fix warnings on 64-bit platforms about casts from pointers to ints.
Two of these were real bugs.
2002-09-12 14:43:41 +00:00
Walter Dörwald
5c1ee17742 Change the unicode.translate docstring to document that
Unicode strings (with arbitrary length) are allowed
as entries in the unicode.translate mapping.

Add a test case for multicharacter replacements.

(Multicharacter replacements were enabled by the
PEP 293 patch)
2002-09-04 20:31:32 +00:00