Commit graph

886 commits

Author SHA1 Message Date
Neal Norwitz
ffe33b7f24 Attempt to make all the various string *strip methods the same.
* Doc - add doc for when functions were added
 * UserString
 * string object methods
 * string module functions
'chars' is used for the last parameter everywhere.

These changes will be backported, since part of the changes
have already been made, but they were inconsistent.
2003-04-10 22:35:32 +00:00
Guido van Rossum
a7132189d2 Reformat a few docstrings that caused line wraps in help() output. 2003-04-09 19:32:45 +00:00
Walter Dörwald
44f527fea4 Change formatchar(), so that u"%c" % 0xffffffff now raises
an OverflowError instead of a TypeError to be consistent
with "%c" % 256. See SF patch #710127.
2003-04-02 16:37:24 +00:00
Raymond Hettinger
c8df5780e1 Sf patch #700047: unicode object leaks refcount on resizing
Contributed by Hye-Shik Chang.
2003-03-09 07:30:43 +00:00
Neal Norwitz
ec74f2fda7 Add more missing PyErr_NoMemory() after failled memory allocs 2003-02-11 23:05:40 +00:00
Walter Dörwald
f6b56aecad Fix two refcounting bugs 2003-02-09 23:42:56 +00:00
Walter Dörwald
2e0b18af30 Change the treatment of positions returned by PEP293
error handers in the Unicode codecs: Negative
positions are treated as being relative to the end of
the input and out of bounds positions result in an
IndexError.

Also update the PEP and include an explanation of
this in the documentation for codecs.register_error.

Fixes a small bug in iconv_codecs: if the position
from the callback is negative *add* it to the size
instead of substracting it.

From SF patch #677429.
2003-01-31 17:19:08 +00:00
Guido van Rossum
5d9113d8be Implement appropriate __getnewargs__ for all immutable subclassable builtin
types.  The special handling for these can now be removed from save_newobj().
Add some testing for this.

Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
2003-01-29 17:58:45 +00:00
Walter Dörwald
adc727490b Fix charmapencode_lookup(), so that a None value in the mapping
is treated as "character maps to <undefined>" and not as
"character mapping must return integer, None or str".
2003-01-08 22:01:33 +00:00
Walter Dörwald
034d97605d Remove variable owned from PyUnicode_FromEncodedObject, which is unused
(except for Py_DECREF calls) since the introduction of __unicode__.
2003-01-08 20:38:39 +00:00
Marc-André Lemburg
79f57833f3 Patch for bug #659709: bogus computation of float length
Python 2.2.x backport candidate. (This bug has been around since
Python 1.6.)
2002-12-29 19:44:06 +00:00
Neil Schemenauer
ce30bc9f49 Add nb_remainder (i.e. __mod__) slot to unicode type. Fixes SF bug #615506. 2002-11-18 16:10:18 +00:00
Neal Norwitz
80a1bf4b5d Fix SF # 635969, No error "not all arguments converted"
When mwh added extended slicing, strings and unicode became mappings.
Thus, dict was set which prevented an error when doing:
	newstr = 'format without a percent' % string_value

This fix raises an exception again when there are no formats
and % with a string value.
2002-11-12 23:01:12 +00:00
Marc-André Lemburg
9cd87aaa54 Fix for bug #626172: crash using unicode latin1 single char
Python 2.2.3 candidate.
2002-10-23 09:02:46 +00:00
Guido van Rossum
049cd6b563 Fix a nasty endcase reported by Armin Rigo in SF bug 618623:
'%2147483647d' % -123 segfaults.  This was because an integer overflow
in a comparison caused the string resize to be skipped.  After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.

An identical bug existed in unicodeobject.c, of course.

Will backport to 2.2.2.
2002-10-11 00:43:48 +00:00
Marc-André Lemburg
24e53b6d91 Add cast to avoid compiler warning. 2002-09-24 09:32:14 +00:00
Neal Norwitz
a0378e1eda Fix part of SF bug # 544248 gcc warning in unicodeobject.c
When --enable-unicode=ucs4, need to cast Py_UNICODE to a char
2002-09-13 13:47:06 +00:00
Guido van Rossum
efc1188239 Fix warnings on 64-bit platforms about casts from pointers to ints.
Two of these were real bugs.
2002-09-12 14:43:41 +00:00
Walter Dörwald
5c1ee17742 Change the unicode.translate docstring to document that
Unicode strings (with arbitrary length) are allowed
as entries in the unicode.translate mapping.

Add a test case for multicharacter replacements.

(Multicharacter replacements were enabled by the
PEP 293 patch)
2002-09-04 20:31:32 +00:00
Walter Dörwald
3aeb632c31 PEP 293 implemention (from SF patch http://www.python.org/sf/432401) 2002-09-02 13:14:32 +00:00
Guido van Rossum
2023c9b84a Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the
wrong thing for a unicode subclass when there were zero string
replacements.  The example given in the SF bug report was only one way
to trigger this; replacing a string of length >= 2 that's not found is
another.  The code would actually write outside allocated memory if
replacement string was longer than the search string.

(I wonder how many more of these are lurking?  The unicode code base
is full of wonders.)

Bugfix candidate; this same bug is present in 2.2.1.
2002-08-23 18:50:21 +00:00
Guido van Rossum
8b1a6d694f Code by Inyeol Lee, submitted to SF bug 595350, to implement
the string/unicode method .replace() with a zero-lengt first argument.
Inyeol contributed tests for this too.
2002-08-23 18:21:28 +00:00
Guido van Rossum
76afbd9aa4 Fix some endcase bugs in unicode rfind()/rindex() and endswith().
These were reported and fixed by Inyeol Lee in SF bug 595350.  The
endswith() bug was already fixed in 2.3, but this adds some more test
cases.
2002-08-20 17:29:29 +00:00
Guido van Rossum
54df53a352 More changes of DeprecationWarning to FutureWarning. 2002-08-14 18:38:27 +00:00
Marc-André Lemburg
cc8764ca9d Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.

Closes SF bug #593581.
2002-08-11 12:23:04 +00:00
Guido van Rossum
078151da90 Implement stage B0 of PEP 237: add warnings for operations that
currently return inconsistent results for ints and longs; in
particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that
either loses bits or changes sign.  (No warnings for repr() of a long,
though that will also change to lose the trailing 'L' eventually.)

This introduces some warnings in the test suite; I'll take care of
those later.
2002-08-11 04:24:12 +00:00
Guido van Rossum
f36921c4b0 Unicode replace() method with empty pattern argument should fail, like
it does for 8-bit strings.
2002-08-09 15:36:48 +00:00
Barry Warsaw
6a043f3fe8 PyUnicode_Contains(): The memcmp() call didn't take into account the
width of Py_UNICODE.  Good catch, MAL.
2002-08-06 19:03:17 +00:00
Barry Warsaw
817918cc3c Committing patch #591250 which provides "str1 in str2" when str1 is a
string of longer than 1 character.
2002-08-06 16:58:21 +00:00
Skip Montanaro
35b37a5c11 tighten up the unicode object's docstring a tad 2002-07-26 16:22:46 +00:00
Jeremy Hylton
938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Martin v. Löwis
6238d2b024 Patch #569753: Remove support for WIN16.
Rename all occurrences of MS_WIN32 to MS_WINDOWS.
2002-06-30 15:26:10 +00:00
Neal Norwitz
20e72130c4 Fix typo in exception message 2002-06-13 21:25:17 +00:00
Martin v. Löwis
14f8b4cfcb Patch #568124: Add doc string macros. 2002-06-13 20:33:02 +00:00
Michael W. Hudson
5efaf7eac8 This is my nearly two year old patch
[ 400998 ] experimental support for extended slicing on lists

somewhat spruced up and better tested than it was when I wrote it.

Includes docs & tests.  The whatsnew section needs expanding, and arrays
should support extended slices -- later.
2002-06-11 10:55:12 +00:00
Marc-André Lemburg
4164439240 Fix a possible segfault. Found be Neal Norvitz. 2002-05-29 13:46:29 +00:00
Marc-André Lemburg
4da6fd63bc Fix for bug [ 561796 ] string.find causes lazy error 2002-05-29 11:33:13 +00:00
Guido van Rossum
cacfc07d08 - A new type object, 'string', is added. This is a common base type
for 'str' and 'unicode', and can be used instead of
  types.StringTypes, e.g. to test whether something is "a string":
  isinstance(x, string) is True for Unicode and 8-bit strings.  This
  is an abstract base class and cannot be instantiated directly.
2002-05-24 19:01:59 +00:00
Raymond Hettinger
0ebac97058 Patch 549187. Improve string formatting error message. 2002-05-21 15:14:57 +00:00
Tim Peters
5de9842b34 Repair widespread misuse of _PyString_Resize. Since it's clear people
don't understand how this function works, also beefed up the docs.  The
most common usage error is of this form (often spread out across gotos):

	if (_PyString_Resize(&s, n) < 0) {
		Py_DECREF(s);
		s = NULL;
		goto outtahere;
	}

The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL.  So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s):  s is already
NULL!  A correct way to write the above is the simpler (and intended)

	if (_PyString_Resize(&s, n) < 0)
		goto outtahere;

Bugfix candidate.
2002-04-27 18:44:32 +00:00
Tim Peters
602f740bc2 SF patch 549375: Compromise PyUnicode_EncodeUTF8
This implements ideas from Marc-Andre, Martin, Guido and me on Python-Dev.

"Short" Unicode strings are encoded into a "big enough" stack buffer,
then exactly as much string space as they turn out to need is allocated
at the end.  This should have speed benefits akin to Martin's "measure
once, allocate once" strategy, but without needing a distinct measuring
pass.

"Long" Unicode strings allocate as much heap space as they could possibly
need (4 x # Unicode chars), and do a realloc at the end to return the
untouched excess.  Since the overallocation is likely to be substantial,
this shouldn't burden the platform realloc with unusably small excess
blocks.

Also simplified uses of the PyString_xyz functions.  Also added a release-
build check that 4*size doesn't overflow a C int.  Sooner or later, that's
going to happen.
2002-04-27 18:03:26 +00:00
Tim Peters
030a5cebf4 unicode_memchr(): Squashed gratuitous int-vs-size_t mismatch (which
gives a compiler wng under MSVC because of the resulting signed-vs-
unsigned comparison).
2002-04-22 19:00:10 +00:00
Walter Dörwald
de02bcb265 Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708

This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
2002-04-22 17:42:37 +00:00
Tim Peters
0eca65c4c5 PyUnicode_EncodeUTF8(): tightened the memory asserts a bit, and at least
tried to catch some possible arithmetic overflows in the debug build.
2002-04-21 17:28:06 +00:00
Martin v. Löwis
2a7ff35a07 Back out 2.140. 2002-04-21 09:59:45 +00:00
Tim Peters
7e3d961fc1 PyUnicode_EncodeUTF8: squash compiler wng. The difference of two
pointers is a signed type.  Changing "allocated" to a signed int makes
undetected overflow more likely, but there was no overflow detection
before either.
2002-04-21 03:26:37 +00:00
Martin v. Löwis
a4eb14b7a4 Patch #495401: Count number of required bytes for encoding UTF-8 before
allocating the target buffer.
2002-04-20 13:44:01 +00:00
Walter Dörwald
0fe940c862 Return the orginal string only if it's a real str or unicode
instance, otherwise make a copy.
2002-04-15 18:42:15 +00:00
Walter Dörwald
068325ef92 Apply the second version of SF patch http://www.python.org/sf/536241
Add a method zfill to str, unicode and UserString and change
Lib/string.py accordingly.

This activates the zfill version in unicodeobject.c that was
commented out and implements the same in stringobject.c. It also
adds the test for unicode support in Lib/string.py back in and
uses repr() instead() of str() (as it was before Lib/string.py 1.62)
2002-04-15 13:36:47 +00:00
Neil Schemenauer
58aa861fa2 Remove PyMalloc_*. 2002-04-12 03:07:20 +00:00