cpython/Objects/unicodeobject.c
Thomas Wouters 0e3f591aee Merged revisions 46753-51188 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r46755 | brett.cannon | 2006-06-08 18:23:04 +0200 (Thu, 08 Jun 2006) | 4 lines

  Make binascii.hexlify() use s# for its arguments instead of t# to actually
  match its documentation stating it accepts any read-only buffer.
........
  r46757 | brett.cannon | 2006-06-08 19:00:45 +0200 (Thu, 08 Jun 2006) | 8 lines

  Buffer objects would return the read or write buffer for a wrapped object when
  the char buffer was requested.  Now it actually returns the char buffer if
  available or raises a TypeError if it isn't (as is raised for the other buffer
  types if they are not present but requested).

  Not a backport candidate since it does change semantics of the buffer object
  (although it could be argued this is enough of a bug to bother backporting).
........
  r46760 | andrew.kuchling | 2006-06-09 03:10:17 +0200 (Fri, 09 Jun 2006) | 1 line

  Update functools section
........
  r46762 | tim.peters | 2006-06-09 04:11:02 +0200 (Fri, 09 Jun 2006) | 6 lines

  Whitespace normalization.

  Since test_file is implicated in mysterious test failures
  when followed by test_optparse, if I had any brains I'd
  look at the checkin that last changed test_file ;-)
........
  r46763 | tim.peters | 2006-06-09 05:09:42 +0200 (Fri, 09 Jun 2006) | 5 lines

  To boost morale :-), force test_optparse to run immediately
  after test_file until we can figure out how to fix it.
  (See python-dev; at the moment we don't even know which checkin
  caused the problem.)
........
  r46764 | tim.peters | 2006-06-09 05:51:41 +0200 (Fri, 09 Jun 2006) | 6 lines

  AutoFileTests.tearDown():  Removed mysterious undocumented
  try/except.  Remove TESTFN.

  Throughout:  used open() instead of file(), and wrapped
  long lines.
........
  r46765 | tim.peters | 2006-06-09 06:02:06 +0200 (Fri, 09 Jun 2006) | 8 lines

  testUnicodeOpen():  I have no idea why, but making this
  test clean up after itself appears to fix the test failures
  when test_optparse follows test_file.

  test_main():  Get rid of TESTFN no matter what.  That's
  also enough to fix the mystery failures.  Doesn't hurt
  to fix them twice :-)
........
  r46766 | tim.peters | 2006-06-09 07:12:40 +0200 (Fri, 09 Jun 2006) | 6 lines

  Remove the temporary hack to force test_optparse to
  run immediately after test_file.  At least 8 buildbot
  boxes passed since the underlying problem got fixed,
  and they all failed before the fix, so there's no point
  to this anymore.
........
  r46767 | neal.norwitz | 2006-06-09 07:54:18 +0200 (Fri, 09 Jun 2006) | 1 line

  Fix grammar and reflow
........
  r46769 | andrew.kuchling | 2006-06-09 12:22:35 +0200 (Fri, 09 Jun 2006) | 1 line

  Markup fix
........
  r46773 | andrew.kuchling | 2006-06-09 15:15:57 +0200 (Fri, 09 Jun 2006) | 1 line

  [Bug #1472827] Make saxutils.XMLGenerator handle \r\n\t in attribute values by escaping them properly.   2.4 bugfix candidate.
........
  r46778 | kristjan.jonsson | 2006-06-09 18:28:01 +0200 (Fri, 09 Jun 2006) | 2 lines

  Turn off warning about deprecated CRT functions on for VisualStudio .NET 2005.
  Make the definition #ARRAYSIZE conditional.  VisualStudio .NET 2005 already has it defined using a better gimmick.
........
  r46779 | phillip.eby | 2006-06-09 18:40:18 +0200 (Fri, 09 Jun 2006) | 2 lines

  Import wsgiref into the stdlib, as of the external version 0.1-r2181.
........
  r46783 | andrew.kuchling | 2006-06-09 18:44:40 +0200 (Fri, 09 Jun 2006) | 1 line

  Add note about XMLGenerator bugfix
........
  r46784 | andrew.kuchling | 2006-06-09 18:46:51 +0200 (Fri, 09 Jun 2006) | 1 line

  Add note about wsgiref
........
  r46785 | brett.cannon | 2006-06-09 19:05:48 +0200 (Fri, 09 Jun 2006) | 2 lines

  Fix inconsistency in naming within an enum.
........
  r46787 | tim.peters | 2006-06-09 19:47:00 +0200 (Fri, 09 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46792 | georg.brandl | 2006-06-09 20:29:52 +0200 (Fri, 09 Jun 2006) | 3 lines

  Test file.__exit__.
........
  r46794 | brett.cannon | 2006-06-09 20:40:46 +0200 (Fri, 09 Jun 2006) | 2 lines

  svn:ignore .pyc and .pyo files.
........
  r46795 | georg.brandl | 2006-06-09 20:45:48 +0200 (Fri, 09 Jun 2006) | 3 lines

  RFE #1491485: str/unicode.endswith()/startswith() now accept a tuple as first argument.
........
  r46798 | andrew.kuchling | 2006-06-09 21:03:16 +0200 (Fri, 09 Jun 2006) | 1 line

  Describe startswith()/endswiith() change; add reminder about wsgiref
........
  r46799 | tim.peters | 2006-06-09 21:24:44 +0200 (Fri, 09 Jun 2006) | 11 lines

  Implementing a happy idea from Georg Brandl:  make runtest() try to
  clean up files and directories the tests often leave behind by
  mistake.  This is the first time in history I don't have a bogus
  "db_home" directory after running the tests ;-)

  Also worked on runtest's docstring, to say something about all the
  arguments, and to document the non-obvious return values.

  New functions runtest_inner() and cleanup_test_droppings() in
  support of the above.
........
  r46800 | andrew.kuchling | 2006-06-09 21:43:25 +0200 (Fri, 09 Jun 2006) | 1 line

  Remove unused variable
........
  r46801 | andrew.kuchling | 2006-06-09 21:56:05 +0200 (Fri, 09 Jun 2006) | 1 line

  Add some wsgiref text
........
  r46803 | thomas.heller | 2006-06-09 21:59:11 +0200 (Fri, 09 Jun 2006) | 1 line

  set eol-style svn property
........
  r46804 | thomas.heller | 2006-06-09 22:01:01 +0200 (Fri, 09 Jun 2006) | 1 line

  set eol-style svn property
........
  r46805 | georg.brandl | 2006-06-09 22:43:48 +0200 (Fri, 09 Jun 2006) | 3 lines

  Make use of new str.startswith/endswith semantics.
  Occurences in email and compiler were ignored due to backwards compat requirements.
........
  r46806 | brett.cannon | 2006-06-10 00:31:23 +0200 (Sat, 10 Jun 2006) | 4 lines

  An object with __call__ as an attribute, when called, will have that attribute checked for __call__ itself, and will continue to look until it finds an object without the attribute.  This can lead to an infinite recursion.

  Closes bug #532646, again.  Will be backported.
........
  r46808 | brett.cannon | 2006-06-10 00:45:54 +0200 (Sat, 10 Jun 2006) | 2 lines

  Fix bug introduced in rev. 46806 by not having variable declaration at the top of a block.
........
  r46812 | georg.brandl | 2006-06-10 08:40:50 +0200 (Sat, 10 Jun 2006) | 4 lines

  Apply perky's fix for #1503157: "/".join([u"", u""]) raising OverflowError.
  Also improve error message on overflow.
........
  r46817 | martin.v.loewis | 2006-06-10 10:14:03 +0200 (Sat, 10 Jun 2006) | 2 lines

  Port cygwin kill_python changes from 2.4 branch.
........
  r46818 | armin.rigo | 2006-06-10 12:57:40 +0200 (Sat, 10 Jun 2006) | 4 lines

  SF bug #1503294.

  PyThreadState_GET() complains if the tstate is NULL, but only in debug mode.
........
  r46819 | martin.v.loewis | 2006-06-10 14:23:46 +0200 (Sat, 10 Jun 2006) | 4 lines

  Patch #1495999: Part two of Windows CE changes.
  - update header checks, using autoconf
  - provide dummies for getenv, environ, and GetVersion
  - adjust MSC_VER check in socketmodule.c
........
  r46820 | skip.montanaro | 2006-06-10 16:09:11 +0200 (Sat, 10 Jun 2006) | 1 line

  document the class, not its initializer
........
  r46821 | greg.ward | 2006-06-10 18:40:01 +0200 (Sat, 10 Jun 2006) | 4 lines

  Sync with Optik docs (rev 518):
    * restore "Extending optparse" section
    * document ALWAYS_TYPED_ACTIONS (SF #1449311)
........
  r46824 | thomas.heller | 2006-06-10 21:51:46 +0200 (Sat, 10 Jun 2006) | 8 lines

  Upgrade to ctypes version 0.9.9.7.

  Summary of changes:

  - support for 'variable sized' data
  - support for anonymous structure/union fields
  - fix severe bug with certain arrays or structures containing more than 256 fields
........
  r46825 | thomas.heller | 2006-06-10 21:55:36 +0200 (Sat, 10 Jun 2006) | 8 lines

  Upgrade to ctypes version 0.9.9.7.

  Summary of changes:

  - support for 'variable sized' data
  - support for anonymous structure/union fields
  - fix severe bug with certain arrays or structures containing more than 256 fields
........
  r46826 | fred.drake | 2006-06-10 22:01:34 +0200 (Sat, 10 Jun 2006) | 4 lines

  SF patch #1303595: improve description of __builtins__, explaining how it
  varies between __main__ and other modules, and strongly suggest not touching
  it but using __builtin__ if absolutely necessary
........
  r46827 | fred.drake | 2006-06-10 22:02:58 +0200 (Sat, 10 Jun 2006) | 1 line

  credit for SF patch #1303595
........
  r46831 | thomas.heller | 2006-06-10 22:29:34 +0200 (Sat, 10 Jun 2006) | 2 lines

  New docs for ctypes.
........
  r46834 | thomas.heller | 2006-06-10 23:07:19 +0200 (Sat, 10 Jun 2006) | 1 line

  Fix a wrong printf format.
........
  r46835 | thomas.heller | 2006-06-10 23:17:58 +0200 (Sat, 10 Jun 2006) | 1 line

  Fix the second occurrence of the problematic printf format.
........
  r46837 | thomas.heller | 2006-06-10 23:56:03 +0200 (Sat, 10 Jun 2006) | 1 line

  Don't use C++ comment.
........
  r46838 | thomas.heller | 2006-06-11 00:01:50 +0200 (Sun, 11 Jun 2006) | 1 line

  Handle failure of PyMem_Realloc.
........
  r46839 | skip.montanaro | 2006-06-11 00:38:13 +0200 (Sun, 11 Jun 2006) | 2 lines

  Suppress warning on MacOSX about possible use before set of proc.
........
  r46840 | tim.peters | 2006-06-11 00:51:45 +0200 (Sun, 11 Jun 2006) | 8 lines

  shuffle() doscstring:  Removed warning about sequence length
  versus generator period.  While this was a real weakness of the
  older WH generator for lists with just a few dozen elements,
  and so could potentially bite the naive ;-), the Twister should
  show excellent behavior up to at least 600 elements.

  Module docstring:  reflowed some jarringly short lines.
........
  r46844 | greg.ward | 2006-06-11 02:40:49 +0200 (Sun, 11 Jun 2006) | 4 lines

  Bug #1361643: fix textwrap.dedent() so it handles tabs appropriately,
  i.e. do *not* expand tabs, but treat them as whitespace that is not
  equivalent to spaces.  Add a couple of test cases.  Clarify docs.
........
  r46850 | neal.norwitz | 2006-06-11 07:44:18 +0200 (Sun, 11 Jun 2006) | 5 lines

  Fix Coverity # 146.  newDBSequenceObject would deref dbobj, so it can't be NULL.

  We know it's not NULL from the ParseTuple and DbObject_Check will verify
  it's not NULL.
........
  r46851 | neal.norwitz | 2006-06-11 07:45:25 +0200 (Sun, 11 Jun 2006) | 4 lines

  Wrap some long lines
  Top/Bottom factor out some common expressions
  Add a XXX comment about widing offset.
........
  r46852 | neal.norwitz | 2006-06-11 07:45:47 +0200 (Sun, 11 Jun 2006) | 1 line

  Add versionadded to doc
........
  r46853 | neal.norwitz | 2006-06-11 07:47:14 +0200 (Sun, 11 Jun 2006) | 3 lines

  Update doc to make it agree with code.
  Bottom factor out some common code.
........
  r46854 | neal.norwitz | 2006-06-11 07:48:14 +0200 (Sun, 11 Jun 2006) | 3 lines

  f_code can't be NULL based on Frame_New and other code that derefs it.
  So there doesn't seem to be much point to checking here.
........
  r46855 | neal.norwitz | 2006-06-11 09:26:27 +0200 (Sun, 11 Jun 2006) | 1 line

  Fix errors found by pychecker
........
  r46856 | neal.norwitz | 2006-06-11 09:26:50 +0200 (Sun, 11 Jun 2006) | 1 line

  warnings was imported at module scope, no need to import again
........
  r46857 | neal.norwitz | 2006-06-11 09:27:56 +0200 (Sun, 11 Jun 2006) | 5 lines

  Fix errors found by pychecker.
  I think these changes are correct, but I'm not sure.  Could someone
  who knows how this module works test it?  It can at least start on
  the cmd line.
........
  r46858 | neal.norwitz | 2006-06-11 10:35:14 +0200 (Sun, 11 Jun 2006) | 1 line

  Fix errors found by pychecker
........
  r46859 | ronald.oussoren | 2006-06-11 16:33:36 +0200 (Sun, 11 Jun 2006) | 4 lines

  This patch improves the L&F of IDLE on OSX. The changes are conditionalized on
  being in an IDLE.app bundle on darwin. This does a slight reorganisation of the
  menus and adds support for file-open events.
........
  r46860 | greg.ward | 2006-06-11 16:42:41 +0200 (Sun, 11 Jun 2006) | 1 line

  SF #1366250: optparse docs: fix inconsistency in variable name; minor tweaks.
........
  r46861 | greg.ward | 2006-06-11 18:24:11 +0200 (Sun, 11 Jun 2006) | 3 lines

  Bug #1498146: fix optparse to handle Unicode strings in option help,
  description, and epilog.
........
  r46862 | thomas.heller | 2006-06-11 19:04:22 +0200 (Sun, 11 Jun 2006) | 2 lines

  Release the GIL during COM method calls, to avoid deadlocks in
  Python coded COM objects.
........
  r46863 | tim.peters | 2006-06-11 21:42:51 +0200 (Sun, 11 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46864 | tim.peters | 2006-06-11 21:43:49 +0200 (Sun, 11 Jun 2006) | 2 lines

  Add missing svn:eol-style property to text files.
........
  r46865 | ronald.oussoren | 2006-06-11 21:45:57 +0200 (Sun, 11 Jun 2006) | 2 lines

  Remove message about using make frameworkinstall, that's no longer necesssary
........
  r46866 | ronald.oussoren | 2006-06-11 22:23:29 +0200 (Sun, 11 Jun 2006) | 2 lines

  Use configure to substitute the correct prefix instead of hardcoding
........
  r46867 | ronald.oussoren | 2006-06-11 22:24:45 +0200 (Sun, 11 Jun 2006) | 4 lines

  - Change fixapplepython23.py to ensure that it will run with /usr/bin/python
    on intel macs.
  - Fix some minor problems in the installer for OSX
........
  r46868 | neal.norwitz | 2006-06-11 22:25:56 +0200 (Sun, 11 Jun 2006) | 5 lines

  Try to fix several networking tests.  The problem is that if hosts have
  a search path setup, some of these hosts resolve to the wrong address.
  By appending a period to the hostname, the hostname should only resolve
  to what we want it to resolve to.  Hopefully this doesn't break different bots.
........
  r46869 | neal.norwitz | 2006-06-11 22:42:02 +0200 (Sun, 11 Jun 2006) | 7 lines

  Try to fix another networking test.  The problem is that if hosts have
  a search path setup, some of these hosts resolve to the wrong address.
  By appending a period to the hostname, the hostname should only resolve
  to what we want it to resolve to.  Hopefully this doesn't break different bots.

  Also add more info to failure message to aid debugging test failure.
........
  r46870 | neal.norwitz | 2006-06-11 22:46:46 +0200 (Sun, 11 Jun 2006) | 4 lines

  Fix test on PPC64 buildbot.  It raised an IOError (really an URLError which
  derives from an IOError).  That seems valid.  Env Error includes both OSError
  and IOError, so this seems like a reasonable fix.
........
  r46871 | tim.peters | 2006-06-11 22:52:59 +0200 (Sun, 11 Jun 2006) | 10 lines

  compare_generic_iter():  Fixed the failure of test_wsgiref's testFileWrapper
  when running with -O.

  test_simple_validation_error still fails under -O.  That appears to be because
  wsgiref's validate.py uses `assert` statements all over the place to check
  arguments for sanity.  That should all be changed (it's not a logical error
  in the software if a user passes bogus arguments, so this isn't a reasonable
  use for `assert` -- checking external preconditions should generally raise
  ValueError or TypeError instead, as appropriate).
........
  r46872 | neal.norwitz | 2006-06-11 23:38:38 +0200 (Sun, 11 Jun 2006) | 1 line

  Get test to pass on S/390.  Shout if you think this change is incorrect.
........
  r46873 | neal.norwitz | 2006-06-12 04:05:55 +0200 (Mon, 12 Jun 2006) | 1 line

  Cleanup Py_ssize_t a little (get rid of second #ifdef)
........
  r46874 | neal.norwitz | 2006-06-12 04:06:17 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix some Py_ssize_t issues
........
  r46875 | neal.norwitz | 2006-06-12 04:06:42 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix some Py_ssize_t issues
........
  r46876 | neal.norwitz | 2006-06-12 04:07:24 +0200 (Mon, 12 Jun 2006) | 2 lines

  Cleanup: Remove import of types to get StringTypes, we can just use basestring.
........
  r46877 | neal.norwitz | 2006-06-12 04:07:57 +0200 (Mon, 12 Jun 2006) | 1 line

  Don't truncate if size_t is bigger than uint
........
  r46878 | neal.norwitz | 2006-06-12 04:08:41 +0200 (Mon, 12 Jun 2006) | 1 line

  Don't leak the list object if there's an error allocating the item storage.  Backport candidate
........
  r46879 | neal.norwitz | 2006-06-12 04:09:03 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix typo.  Backport if anyone cares. :-)
........
  r46880 | neal.norwitz | 2006-06-12 04:09:34 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix indentation of case and a Py_ssize_t issue.
........
  r46881 | neal.norwitz | 2006-06-12 04:11:18 +0200 (Mon, 12 Jun 2006) | 3 lines

  Get rid of f_restricted too.  Doc the other 4 ints that were already removed
  at the NeedForSpeed sprint.
........
  r46882 | neal.norwitz | 2006-06-12 04:13:21 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix the socket tests so they can be run concurrently.  Backport candidate
........
  r46883 | neal.norwitz | 2006-06-12 04:16:10 +0200 (Mon, 12 Jun 2006) | 1 line

  i and j are initialized below when used.  No need to do it twice
........
  r46884 | neal.norwitz | 2006-06-12 05:05:03 +0200 (Mon, 12 Jun 2006) | 1 line

  Remove unused import
........
  r46885 | neal.norwitz | 2006-06-12 05:05:40 +0200 (Mon, 12 Jun 2006) | 1 line

  Impl ssize_t
........
  r46886 | neal.norwitz | 2006-06-12 05:33:09 +0200 (Mon, 12 Jun 2006) | 6 lines

  Patch #1503046, Conditional compilation of zlib.(de)compressobj.copy

  copy is only in newer versions of zlib.  This should allow zlibmodule
  to work with older versions like the Tru64 buildbot.
........
  r46887 | phillip.eby | 2006-06-12 06:04:32 +0200 (Mon, 12 Jun 2006) | 2 lines

  Sync w/external release 0.1.2.  Please see PEP 360 before making changes to external packages.
........
  r46888 | martin.v.loewis | 2006-06-12 06:26:31 +0200 (Mon, 12 Jun 2006) | 2 lines

  Get rid of function pointer cast.
........
  r46889 | thomas.heller | 2006-06-12 08:05:57 +0200 (Mon, 12 Jun 2006) | 3 lines

  I don't know how that happend, but the entire file contents was
  duplicated.  Thanks to Simon Percivall for the heads up.
........
  r46890 | nick.coghlan | 2006-06-12 10:19:37 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix site module docstring to match the code
........
  r46891 | nick.coghlan | 2006-06-12 10:23:02 +0200 (Mon, 12 Jun 2006) | 1 line

  Fix site module docstring to match the code for Mac OSX, too
........
  r46892 | nick.coghlan | 2006-06-12 10:27:13 +0200 (Mon, 12 Jun 2006) | 1 line

  The site module documentation also described the Windows behaviour incorrectly.
........
  r46893 | nick.coghlan | 2006-06-12 12:17:11 +0200 (Mon, 12 Jun 2006) | 1 line

  Make the -m switch conform to the documentation of sys.path by behaving like the -c switch
........
  r46894 | kristjan.jonsson | 2006-06-12 17:45:12 +0200 (Mon, 12 Jun 2006) | 2 lines

  Fix the CRT argument error handling for VisualStudio .NET 2005.  Install a CRT error handler and disable the assertion for debug builds.  This causes CRT to set errno to EINVAL.
  This update fixes crash cases in the test suite where the default CRT error handler would cause process exit.
........
  r46899 | thomas.heller | 2006-06-12 22:56:48 +0200 (Mon, 12 Jun 2006) | 1 line

  Add pep-291 compatibility markers.
........
  r46901 | ka-ping.yee | 2006-06-13 01:47:52 +0200 (Tue, 13 Jun 2006) | 5 lines

  Add the uuid module.

  This module has been tested so far on Windows XP (Python 2.4 and 2.5a2),
  Mac OS X (Python 2.3, 2.4, and 2.5a2), and Linux (Python 2.4 and 2.5a2).
........
  r46902 | tim.peters | 2006-06-13 02:30:01 +0200 (Tue, 13 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46903 | tim.peters | 2006-06-13 02:30:50 +0200 (Tue, 13 Jun 2006) | 2 lines

  Added missing svn:eol-style property to text files.
........
  r46905 | tim.peters | 2006-06-13 05:30:07 +0200 (Tue, 13 Jun 2006) | 5 lines

  get_matching_blocks():  rewrote code & comments so they match; added
  more comments about why it's this way at all; and removed what looked
  like needless expense (sorting (i, j, k) triples directly should give
  exactly the same order as sorting (i, (i, j, k)) pairs).
........
  r46906 | neal.norwitz | 2006-06-13 06:08:53 +0200 (Tue, 13 Jun 2006) | 1 line

  Don't fail if another process is listening on our port.
........
  r46908 | neal.norwitz | 2006-06-13 10:28:19 +0200 (Tue, 13 Jun 2006) | 2 lines

  Initialize the type object so pychecker can't crash the interpreter.
........
  r46909 | neal.norwitz | 2006-06-13 10:41:06 +0200 (Tue, 13 Jun 2006) | 1 line

  Verify the crash due to EncodingMap not initialized does not return
........
  r46910 | thomas.heller | 2006-06-13 10:56:14 +0200 (Tue, 13 Jun 2006) | 3 lines

  Add some windows datatypes that were missing from this file, and add
  the aliases defined in windows header files for the structures.
........
  r46911 | thomas.heller | 2006-06-13 11:40:14 +0200 (Tue, 13 Jun 2006) | 3 lines

  Add back WCHAR, UINT, DOUBLE, _LARGE_INTEGER, _ULARGE_INTEGER.
  VARIANT_BOOL is a special _ctypes data type, not c_short.
........
  r46912 | ronald.oussoren | 2006-06-13 13:19:56 +0200 (Tue, 13 Jun 2006) | 4 lines

  Linecache contains support for PEP302 loaders, but fails to deal with loaders
  that return None to indicate that the module is valid but no source is
  available. This patch fixes that.
........
  r46913 | andrew.kuchling | 2006-06-13 13:57:04 +0200 (Tue, 13 Jun 2006) | 1 line

  Mention uuid module
........
  r46915 | walter.doerwald | 2006-06-13 14:02:12 +0200 (Tue, 13 Jun 2006) | 2 lines

  Fix passing errors to the encoder and decoder functions.
........
  r46917 | walter.doerwald | 2006-06-13 14:04:43 +0200 (Tue, 13 Jun 2006) | 3 lines

  errors is an attribute in the incremental decoder
  not an argument.
........
  r46919 | andrew.macintyre | 2006-06-13 17:04:24 +0200 (Tue, 13 Jun 2006) | 11 lines

  Patch #1454481:  Make thread stack size runtime tunable.

  Heavily revised, comprising revisions:
  46640 - original trunk revision (backed out in r46655)
  46647 - markup fix (backed out in r46655)
  46692:46918 merged from branch aimacintyre-sf1454481

  branch tested on buildbots (Windows buildbots had problems
  not related to these changes).
........
  r46920 | brett.cannon | 2006-06-13 18:06:55 +0200 (Tue, 13 Jun 2006) | 2 lines

  Remove unused variable.
........
  r46921 | andrew.kuchling | 2006-06-13 18:41:41 +0200 (Tue, 13 Jun 2006) | 1 line

  Add ability to set stack size
........
  r46923 | marc-andre.lemburg | 2006-06-13 19:04:26 +0200 (Tue, 13 Jun 2006) | 2 lines

  Update pybench to version 2.0.
........
  r46924 | marc-andre.lemburg | 2006-06-13 19:07:14 +0200 (Tue, 13 Jun 2006) | 2 lines

  Revert wrong svn copy.
........
  r46925 | andrew.macintyre | 2006-06-13 19:14:36 +0200 (Tue, 13 Jun 2006) | 2 lines

  fix exception usage
........
  r46927 | tim.peters | 2006-06-13 20:37:07 +0200 (Tue, 13 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46928 | marc-andre.lemburg | 2006-06-13 20:56:56 +0200 (Tue, 13 Jun 2006) | 9 lines

  Updated to pybench 2.0.

  See svn.python.org/external/pybench-2.0 for the original import of that
  version.

  Note that platform.py was not copied over from pybench-2.0 since
  it is already part of Python 2.5.
........
  r46929 | andrew.macintyre | 2006-06-13 21:02:35 +0200 (Tue, 13 Jun 2006) | 5 lines

  Increase the small thread stack size to get the test
  to pass reliably on the one buildbot that insists on
  more than 32kB of thread stack.
........
  r46930 | marc-andre.lemburg | 2006-06-13 21:20:07 +0200 (Tue, 13 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46931 | thomas.heller | 2006-06-13 22:18:43 +0200 (Tue, 13 Jun 2006) | 2 lines

  More docs for ctypes.
........
  r46932 | brett.cannon | 2006-06-13 23:34:24 +0200 (Tue, 13 Jun 2006) | 2 lines

  Ignore .pyc and .pyo files in Pybench.
........
  r46933 | brett.cannon | 2006-06-13 23:46:41 +0200 (Tue, 13 Jun 2006) | 7 lines

  If a classic class defined a __coerce__() method that just returned its two
  arguments in reverse, the interpreter would infinitely recourse trying to get a
  coercion that worked.  So put in a recursion check after a coercion is made and
  the next call to attempt to use the coerced values.

  Fixes bug #992017 and closes crashers/coerce.py .
........
  r46936 | gerhard.haering | 2006-06-14 00:24:47 +0200 (Wed, 14 Jun 2006) | 3 lines

  Merged changes from external pysqlite 2.3.0 release. Documentation updates will
  follow in a few hours at the latest. Then we should be ready for beta1.
........
  r46937 | brett.cannon | 2006-06-14 00:26:13 +0200 (Wed, 14 Jun 2006) | 2 lines

  Missed test for rev. 46933; infinite recursion from __coerce__() returning its arguments reversed.
........
  r46938 | gerhard.haering | 2006-06-14 00:53:48 +0200 (Wed, 14 Jun 2006) | 2 lines

  Updated documentation for pysqlite 2.3.0 API.
........
  r46939 | tim.peters | 2006-06-14 06:09:25 +0200 (Wed, 14 Jun 2006) | 10 lines

  SequenceMatcher.get_matching_blocks():  This now guarantees that
  adjacent triples in the result list describe non-adjacent matching
  blocks.  That's _nice_ to have, and Guido said he wanted it.

  Not a bugfix candidate:  Guido or not ;-), this changes visible
  endcase semantics (note that some tests had to change), and
  nothing about this was documented before.  Since it was working
  as designed, and behavior was consistent with the docs, it wasn't
  "a bug".
........
  r46940 | tim.peters | 2006-06-14 06:13:00 +0200 (Wed, 14 Jun 2006) | 2 lines

  Repaired typo in new comment.
........
  r46941 | tim.peters | 2006-06-14 06:15:27 +0200 (Wed, 14 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46942 | fred.drake | 2006-06-14 06:25:02 +0200 (Wed, 14 Jun 2006) | 3 lines

  - make some disabled tests run what they intend when enabled
  - remove some over-zealous triple-quoting
........
  r46943 | fred.drake | 2006-06-14 07:04:47 +0200 (Wed, 14 Jun 2006) | 3 lines

  add tests for two cases that are handled correctly in the current code,
  but that SF patch 1504676 as written mis-handles
........
  r46944 | fred.drake | 2006-06-14 07:15:51 +0200 (Wed, 14 Jun 2006) | 1 line

  explain an XXX in more detail
........
  r46945 | martin.v.loewis | 2006-06-14 07:21:04 +0200 (Wed, 14 Jun 2006) | 1 line

  Patch #1455898: Incremental mode for "mbcs" codec.
........
  r46946 | georg.brandl | 2006-06-14 08:08:31 +0200 (Wed, 14 Jun 2006) | 3 lines

  Bug #1339007: Shelf objects now don't raise an exception in their
  __del__ method when initialization failed.
........
  r46948 | thomas.heller | 2006-06-14 08:18:15 +0200 (Wed, 14 Jun 2006) | 1 line

  Fix docstring.
........
  r46949 | georg.brandl | 2006-06-14 08:29:07 +0200 (Wed, 14 Jun 2006) | 2 lines

  Bug #1501122: mention __gt__ &co in description of comparison order.
........
  r46951 | thomas.heller | 2006-06-14 09:08:38 +0200 (Wed, 14 Jun 2006) | 1 line

  Write more docs.
........
  r46952 | georg.brandl | 2006-06-14 10:31:39 +0200 (Wed, 14 Jun 2006) | 3 lines

  Bug #1153163: describe __add__ vs __radd__ behavior when adding
  objects of same type/of subclasses of the other.
........
  r46954 | georg.brandl | 2006-06-14 10:42:11 +0200 (Wed, 14 Jun 2006) | 3 lines

  Bug #1202018: add some common mime.types locations.
........
  r46955 | georg.brandl | 2006-06-14 10:50:03 +0200 (Wed, 14 Jun 2006) | 3 lines

  Bug #1117556: SimpleHTTPServer now tries to find and use the system's
  mime.types file for determining MIME types.
........
  r46957 | thomas.heller | 2006-06-14 11:09:08 +0200 (Wed, 14 Jun 2006) | 1 line

  Document paramflags.
........
  r46958 | thomas.heller | 2006-06-14 11:20:11 +0200 (Wed, 14 Jun 2006) | 1 line

  Add an __all__ list, since this module does 'from ctypes import *'.
........
  r46959 | andrew.kuchling | 2006-06-14 15:59:15 +0200 (Wed, 14 Jun 2006) | 1 line

  Add item
........
  r46961 | georg.brandl | 2006-06-14 18:46:43 +0200 (Wed, 14 Jun 2006) | 3 lines

  Bug #805015: doc error in PyUnicode_FromEncodedObject.
........
  r46962 | gerhard.haering | 2006-06-15 00:28:37 +0200 (Thu, 15 Jun 2006) | 10 lines

  - Added version checks in C code to make sure we don't trigger bugs in older
    SQLite versions.
  - Added version checks in test suite so that we don't execute tests that we
    know will fail with older (buggy) SQLite versions.

  Now, all tests should run against all SQLite versions from 3.0.8 until 3.3.6
  (latest one now). The sqlite3 module can be built against all these SQLite
  versions and the sqlite3 module does its best to not trigger bugs in SQLite,
  but using SQLite 3.3.3 or later is recommended.
........
  r46963 | tim.peters | 2006-06-15 00:38:13 +0200 (Thu, 15 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46964 | neal.norwitz | 2006-06-15 06:54:29 +0200 (Thu, 15 Jun 2006) | 9 lines

  Speculative checkin (requires approval of Gerhard Haering)

  This backs out the test changes in 46962 which prevented crashes
  by not running the tests via a version check.  All the version checks
  added in that rev were removed from the tests.

  Code was added to the error handler in connection.c that seems
  to work with older versions of sqlite including 3.1.3.
........
  r46965 | neal.norwitz | 2006-06-15 07:55:49 +0200 (Thu, 15 Jun 2006) | 1 line

  Try to narrow window of failure on slow/busy boxes (ppc64 buildbot)
........
  r46966 | martin.v.loewis | 2006-06-15 08:45:05 +0200 (Thu, 15 Jun 2006) | 2 lines

  Make import/lookup of mbcs fail on non-Windows systems.
........
  r46967 | ronald.oussoren | 2006-06-15 10:14:18 +0200 (Thu, 15 Jun 2006) | 2 lines

  Patch #1446489	(zipfile: support for ZIP64)
........
  r46968 | neal.norwitz | 2006-06-15 10:16:44 +0200 (Thu, 15 Jun 2006) | 6 lines

  Re-revert this change.  Install the version check and don't run the test
  until Gerhard has time to fully debug the issue.  This affects versions
  before 3.2.1 (possibly only versions earlier than 3.1.3).

  Based on discussion on python-checkins.
........
  r46969 | gregory.p.smith | 2006-06-15 10:52:32 +0200 (Thu, 15 Jun 2006) | 6 lines

  - bsddb: multithreaded DB access using the simple bsddb module interface
    now works reliably.  It has been updated to use automatic BerkeleyDB
    deadlock detection and the bsddb.dbutils.DeadlockWrap wrapper to retry
    database calls that would previously deadlock. [SF python bug #775414]
........
  r46970 | gregory.p.smith | 2006-06-15 11:23:52 +0200 (Thu, 15 Jun 2006) | 2 lines

  minor documentation cleanup.  mention the bsddb.db interface explicitly by name.
........
  r46971 | neal.norwitz | 2006-06-15 11:57:03 +0200 (Thu, 15 Jun 2006) | 5 lines

  Steal the trick from test_compiler to print out a slow msg.
  This will hopefully get the buildbots to pass.  Not sure this
  test will be feasible or even work.  But everything is red now,
  so it can't get much worse.
........
  r46972 | neal.norwitz | 2006-06-15 12:24:49 +0200 (Thu, 15 Jun 2006) | 1 line

  Print some more info to get an idea of how much longer the test will last
........
  r46981 | tim.peters | 2006-06-15 20:04:40 +0200 (Thu, 15 Jun 2006) | 6 lines

  Try to reduce the extreme peak memory and disk-space use
  of this test.  It probably still requires more disk space
  than most buildbots have, and in any case is still so
  intrusive that if we don't find another way to test this I'm
  taking my buildbot offline permanently ;-)
........
  r46982 | tim.peters | 2006-06-15 20:06:29 +0200 (Thu, 15 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r46983 | tim.peters | 2006-06-15 20:07:28 +0200 (Thu, 15 Jun 2006) | 2 lines

  Add missing svn:eol-style property to text files.
........
  r46984 | tim.peters | 2006-06-15 20:38:19 +0200 (Thu, 15 Jun 2006) | 2 lines

  Oops -- I introduced an off-by-6436159488 error.
........
  r46990 | neal.norwitz | 2006-06-16 06:30:34 +0200 (Fri, 16 Jun 2006) | 1 line

  Disable this test until we can determine what to do about it
........
  r46991 | neal.norwitz | 2006-06-16 06:31:06 +0200 (Fri, 16 Jun 2006) | 1 line

  Param name is dir, not directory.  Update docstring.  Backport candidate
........
  r46992 | neal.norwitz | 2006-06-16 06:31:28 +0200 (Fri, 16 Jun 2006) | 1 line

  Add missing period in comment.
........
  r46993 | neal.norwitz | 2006-06-16 06:32:43 +0200 (Fri, 16 Jun 2006) | 1 line

  Fix whitespace, there are memory leaks in this module.
........
  r46995 | fred.drake | 2006-06-17 01:45:06 +0200 (Sat, 17 Jun 2006) | 3 lines

  SF patch 1504676: Make sgmllib char and entity references pluggable
  (implementation/tests contributed by Sam Ruby)
........
  r46996 | fred.drake | 2006-06-17 03:07:54 +0200 (Sat, 17 Jun 2006) | 1 line

  fix change that broke the htmllib tests
........
  r46998 | martin.v.loewis | 2006-06-17 11:15:14 +0200 (Sat, 17 Jun 2006) | 3 lines

  Patch #763580:  Add name and value arguments to
  Tkinter variable classes.
........
  r46999 | martin.v.loewis | 2006-06-17 11:20:41 +0200 (Sat, 17 Jun 2006) | 2 lines

  Patch #1096231: Add default argument to wm_iconbitmap.
........
  r47000 | martin.v.loewis | 2006-06-17 11:25:15 +0200 (Sat, 17 Jun 2006) | 2 lines

  Patch #1494750: Destroy master after deleting children.
........
  r47003 | george.yoshida | 2006-06-17 18:31:52 +0200 (Sat, 17 Jun 2006) | 2 lines

  markup fix
........
  r47005 | george.yoshida | 2006-06-17 18:39:13 +0200 (Sat, 17 Jun 2006) | 4 lines

  Update url.

  Old url returned status code:301 Moved permanently.
........
  r47007 | martin.v.loewis | 2006-06-17 20:44:27 +0200 (Sat, 17 Jun 2006) | 2 lines

  Patch #812986: Update the canvas even if not tracing.
........
  r47008 | martin.v.loewis | 2006-06-17 21:03:26 +0200 (Sat, 17 Jun 2006) | 2 lines

  Patch #815924: Restore ability to pass type= and icon=
........
  r47009 | neal.norwitz | 2006-06-18 00:37:45 +0200 (Sun, 18 Jun 2006) | 1 line

  Fix typo in docstring
........
  r47010 | neal.norwitz | 2006-06-18 00:38:15 +0200 (Sun, 18 Jun 2006) | 1 line

  Fix memory leak reported by valgrind while running test_subprocess
........
  r47011 | fred.drake | 2006-06-18 04:57:35 +0200 (Sun, 18 Jun 2006) | 1 line

  remove unnecessary markup
........
  r47013 | neal.norwitz | 2006-06-18 21:35:01 +0200 (Sun, 18 Jun 2006) | 7 lines

  Prevent spurious leaks when running regrtest.py -R.  There may be more
  issues that crop up from time to time, but this change seems to have been
  pretty stable (no spurious warnings) for about a week.

  Other modules which use threads may require similar use of
  threading_setup/threading_cleanup from test_support.
........
  r47014 | neal.norwitz | 2006-06-18 21:37:40 +0200 (Sun, 18 Jun 2006) | 9 lines

  The hppa ubuntu box sometimes hangs forever in these tests.  My guess
  is that the wait is failing for some reason.  Use WNOHANG, so we won't
  wait until the buildbot kills the test suite.

  I haven't been able to reproduce the failure, so I'm not sure if
  this will help or not.  Hopefully, this change will cause the test
  to fail, rather than hang.  That will be better since we will get
  the rest of the test results.  It may also help us debug the real problem.
........
  r47015 | neal.norwitz | 2006-06-18 22:10:24 +0200 (Sun, 18 Jun 2006) | 1 line

  Revert 47014 until it is more robust
........
  r47016 | thomas.heller | 2006-06-18 23:27:04 +0200 (Sun, 18 Jun 2006) | 6 lines

  Fix typos.
  Fix doctest example.
  Mention in the tutorial that 'errcheck' is explained in the ref manual.
  Use better wording in some places.
  Remoce code examples that shouldn't be in the tutorial.
  Remove some XXX notices.
........
  r47017 | georg.brandl | 2006-06-19 00:17:29 +0200 (Mon, 19 Jun 2006) | 3 lines

  Patch #1507676: improve exception messages in abstract.c, object.c and typeobject.c.
........
  r47018 | neal.norwitz | 2006-06-19 07:40:44 +0200 (Mon, 19 Jun 2006) | 1 line

  Use Py_ssize_t
........
  r47019 | georg.brandl | 2006-06-19 08:35:54 +0200 (Mon, 19 Jun 2006) | 3 lines

  Add news entry about error msg improvement.
........
  r47020 | thomas.heller | 2006-06-19 09:07:49 +0200 (Mon, 19 Jun 2006) | 2 lines

  Try to repair the failing test on the OpenBSD buildbot.  Trial and error...
........
  r47021 | tim.peters | 2006-06-19 09:45:16 +0200 (Mon, 19 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r47022 | walter.doerwald | 2006-06-19 10:07:50 +0200 (Mon, 19 Jun 2006) | 4 lines

  Patch #1506645: add Python wrappers for the curses functions
  is_term_resized, resize_term and resizeterm. This uses three
  separate configure checks (one for each function).
........
  r47023 | walter.doerwald | 2006-06-19 10:14:09 +0200 (Mon, 19 Jun 2006) | 2 lines

  Make check order match in configure and configure.in.
........
  r47024 | tim.peters | 2006-06-19 10:14:28 +0200 (Mon, 19 Jun 2006) | 3 lines

  Repair KeyError when running test_threaded_import under -R,
  as reported by Neal on python-dev.
........
  r47025 | thomas.heller | 2006-06-19 10:32:46 +0200 (Mon, 19 Jun 2006) | 3 lines

  Next try to fix the OpenBSD buildbot tests:
  Use ctypes.util.find_library to locate the C runtime library
  on platforms where is returns useful results.
........
  r47026 | tim.peters | 2006-06-19 11:09:44 +0200 (Mon, 19 Jun 2006) | 13 lines

  TestHelp.make_parser():  This was making a permanent change to
  os.environ (setting envar COLUMNS), which at least caused
  test_float_default() to fail if the tests were run more than once.

  This repairs the test_optparse -R failures Neal reported on
  python-dev.  It also explains some seemingly bizarre test_optparse
  failures we saw a couple weeks ago on the buildbots, when
  test_optparse failed due to test_file failing to clean up after
  itself, and then test_optparse failed in an entirely different
  way when regrtest's -w option ran test_optparse a second time.
  It's now obvious that make_parser() permanently changing os.environ
  was responsible for the second half of that.
........
  r47027 | anthony.baxter | 2006-06-19 14:04:15 +0200 (Mon, 19 Jun 2006) | 2 lines

  Preparing for 2.5b1.
........
  r47029 | fred.drake | 2006-06-19 19:31:16 +0200 (Mon, 19 Jun 2006) | 1 line

  remove non-working document formats from edist
........
  r47030 | gerhard.haering | 2006-06-19 23:17:35 +0200 (Mon, 19 Jun 2006) | 5 lines

  Fixed a memory leak that was introduced with incorrect usage of the Python weak
  reference API in pysqlite 2.2.1.

  Bumbed pysqlite version number to upcoming pysqlite 2.3.1 release.
........
  r47032 | ka-ping.yee | 2006-06-20 00:49:36 +0200 (Tue, 20 Jun 2006) | 2 lines

  Remove Python 2.3 compatibility comment.
........
  r47033 | trent.mick | 2006-06-20 01:21:25 +0200 (Tue, 20 Jun 2006) | 2 lines

  Upgrade pyexpat to expat 2.0.0 (http://python.org/sf/1462338).
........
  r47034 | trent.mick | 2006-06-20 01:57:41 +0200 (Tue, 20 Jun 2006) | 3 lines

  [ 1295808 ] expat symbols should be namespaced in pyexpat
  (http://python.org/sf/1295808)
........
  r47039 | andrew.kuchling | 2006-06-20 13:52:16 +0200 (Tue, 20 Jun 2006) | 1 line

  Uncomment wsgiref section
........
  r47040 | andrew.kuchling | 2006-06-20 14:15:09 +0200 (Tue, 20 Jun 2006) | 1 line

  Add four library items
........
  r47041 | andrew.kuchling | 2006-06-20 14:19:54 +0200 (Tue, 20 Jun 2006) | 1 line

  Terminology and typography fixes
........
  r47042 | andrew.kuchling | 2006-06-20 15:05:12 +0200 (Tue, 20 Jun 2006) | 1 line

  Add introductory paragraphs summarizing the release; minor edits
........
  r47043 | andrew.kuchling | 2006-06-20 15:11:29 +0200 (Tue, 20 Jun 2006) | 1 line

  Minor edits and rearrangements; markup fix
........
  r47044 | andrew.kuchling | 2006-06-20 15:20:30 +0200 (Tue, 20 Jun 2006) | 1 line

  [Bug #1504456] Mention xml -> xmlcore change
........
  r47047 | brett.cannon | 2006-06-20 19:30:26 +0200 (Tue, 20 Jun 2006) | 2 lines

  Raise TestSkipped when the test socket connection is refused.
........
  r47049 | brett.cannon | 2006-06-20 21:20:17 +0200 (Tue, 20 Jun 2006) | 2 lines

  Fix typo of exception name.
........
  r47053 | brett.cannon | 2006-06-21 18:57:57 +0200 (Wed, 21 Jun 2006) | 5 lines

  At the C level, tuple arguments are passed in directly to the exception
  constructor, meaning it is treated as *args, not as a single argument.  This
  means using the 'message' attribute won't work (until Py3K comes around),
  and so one must grab from 'arg' to get the error number.
........
  r47054 | andrew.kuchling | 2006-06-21 19:10:18 +0200 (Wed, 21 Jun 2006) | 1 line

  Link to LibRef module documentation
........
  r47055 | andrew.kuchling | 2006-06-21 19:17:10 +0200 (Wed, 21 Jun 2006) | 1 line

  Note some of Barry's work
........
  r47056 | andrew.kuchling | 2006-06-21 19:17:28 +0200 (Wed, 21 Jun 2006) | 1 line

  Bump version
........
  r47057 | georg.brandl | 2006-06-21 19:45:17 +0200 (Wed, 21 Jun 2006) | 3 lines

  fix [ 1509132 ] compiler module builds incorrect AST for TryExceptFinally
........
  r47058 | georg.brandl | 2006-06-21 19:52:36 +0200 (Wed, 21 Jun 2006) | 3 lines

  Make test_fcntl aware of netbsd3.
........
  r47059 | georg.brandl | 2006-06-21 19:53:17 +0200 (Wed, 21 Jun 2006) | 3 lines

  Patch #1509001: expected skips for netbsd3.
........
  r47060 | gerhard.haering | 2006-06-21 22:55:04 +0200 (Wed, 21 Jun 2006) | 2 lines

  Removed call to enable_callback_tracebacks that slipped in by accident.
........
  r47061 | armin.rigo | 2006-06-21 23:58:50 +0200 (Wed, 21 Jun 2006) | 13 lines

  Fix for an obscure bug introduced by revs 46806 and 46808, with a test.
  The problem of checking too eagerly for recursive calls is the
  following: if a RuntimeError is caused by recursion, and if code needs
  to normalize it immediately (as in the 2nd test), then
  PyErr_NormalizeException() needs a call to the RuntimeError class to
  instantiate it, and this hits the recursion limit again...  causing
  PyErr_NormalizeException() to never finish.

  Moved this particular recursion check to slot_tp_call(), which is not
  involved in instantiating built-in exceptions.

  Backport candidate.
........
  r47064 | neal.norwitz | 2006-06-22 08:30:50 +0200 (Thu, 22 Jun 2006) | 3 lines

  Copy the wsgiref package during make install.
........
  r47065 | neal.norwitz | 2006-06-22 08:35:30 +0200 (Thu, 22 Jun 2006) | 1 line

  Reset the doc date to today for the automatic doc builds
........
  r47067 | andrew.kuchling | 2006-06-22 15:10:23 +0200 (Thu, 22 Jun 2006) | 1 line

  Mention how to suppress warnings
........
  r47069 | georg.brandl | 2006-06-22 16:46:17 +0200 (Thu, 22 Jun 2006) | 3 lines

  Set lineno correctly on list, tuple and dict literals.
........
  r47070 | georg.brandl | 2006-06-22 16:46:46 +0200 (Thu, 22 Jun 2006) | 4 lines

  Test for correct compilation of try-except-finally stmt.
  Test for correct lineno on list, tuple, dict literals.
........
  r47071 | fred.drake | 2006-06-22 17:50:08 +0200 (Thu, 22 Jun 2006) | 1 line

  fix markup nit
........
  r47072 | brett.cannon | 2006-06-22 18:49:14 +0200 (Thu, 22 Jun 2006) | 6 lines

  'warning's was improperly requiring that a command-line Warning category be
  both a subclass of Warning and a subclass of types.ClassType.  The latter is no
  longer true thanks to new-style exceptions.

  Closes bug #1510580.  Thanks to AMK for the test.
........
  r47073 | ronald.oussoren | 2006-06-22 20:33:54 +0200 (Thu, 22 Jun 2006) | 3 lines

  MacOSX: Add a message to the first screen of the installer that tells
  users how to avoid updates to their shell profile.
........
  r47074 | georg.brandl | 2006-06-22 21:02:18 +0200 (Thu, 22 Jun 2006) | 3 lines

  Fix my name ;)
........
  r47075 | thomas.heller | 2006-06-22 21:07:36 +0200 (Thu, 22 Jun 2006) | 2 lines

  Small fixes, mostly in the markup.
........
  r47076 | peter.astrand | 2006-06-22 22:06:46 +0200 (Thu, 22 Jun 2006) | 1 line

  Make it possible to run test_subprocess.py on Python 2.2, which lacks test_support.is_resource_enabled.
........
  r47077 | peter.astrand | 2006-06-22 22:21:26 +0200 (Thu, 22 Jun 2006) | 1 line

  Applied patch #1506758: Prevent MemoryErrors with large MAXFD.
........
  r47079 | neal.norwitz | 2006-06-23 05:32:44 +0200 (Fri, 23 Jun 2006) | 1 line

  Fix refleak
........
  r47080 | fred.drake | 2006-06-23 08:03:45 +0200 (Fri, 23 Jun 2006) | 9 lines

  - SF bug #853506: IP6 address parsing in sgmllib
    ('[' and ']' were not accepted in unquoted attribute values)

  - cleaned up tests of character and entity reference decoding so the
    tests cover the documented relationships among handle_charref,
    handle_entityref, convert_charref, convert_codepoint, and
    convert_entityref, without bringing up Unicode issues that sgmllib
    cannot be involved in
........
  r47085 | andrew.kuchling | 2006-06-23 21:23:40 +0200 (Fri, 23 Jun 2006) | 11 lines

  Fit Makefile for the Python doc environment better; this is a step toward
  including the howtos in the build process.

  	* Put LaTeX output in ../paper-<whatever>/.
  	* Put HTML output in ../html/
  	* Explain some of the Makefile variables
  	* Remove some cruft dating to my environment (e.g. the 'web' target)

  This makefile isn't currently invoked by the documentation build process,
  so these changes won't destabilize anything.
........
  r47086 | hyeshik.chang | 2006-06-23 23:16:18 +0200 (Fri, 23 Jun 2006) | 5 lines

  Bug #1511381: codec_getstreamcodec() in codec.c is corrected to
  omit a default "error" argument for NULL pointer.  This allows
  the parser to take a codec from cjkcodecs again.
  (Reported by Taewook Kang and reviewed by Walter Doerwald)
........
  r47091 | ronald.oussoren | 2006-06-25 22:44:16 +0200 (Sun, 25 Jun 2006) | 6 lines

  Workaround for bug #1512124

  Without this patch IDLE will get unresponsive when you open the debugger
  window on OSX. This is both using the system Tcl/Tk on Tiger as the latest
  universal download from tk-components.sf.net.
........
  r47092 | ronald.oussoren | 2006-06-25 23:14:19 +0200 (Sun, 25 Jun 2006) | 3 lines

  Drop the calldll demo's for macos, calldll isn't present anymore, no need
  to keep the demo's around.
........
  r47093 | ronald.oussoren | 2006-06-25 23:15:58 +0200 (Sun, 25 Jun 2006) | 3 lines

  Use a path without a double slash to compile the .py files after installation
  (macosx, binary installer). This fixes bug #1508369 for python 2.5.
........
  r47094 | ronald.oussoren | 2006-06-25 23:19:06 +0200 (Sun, 25 Jun 2006) | 3 lines

  Also install the .egg-info files in Lib. This will cause wsgiref.egg-info to
  be installed.
........
  r47097 | andrew.kuchling | 2006-06-26 14:40:02 +0200 (Mon, 26 Jun 2006) | 1 line

  [Bug #1511998] Various comments from Nick Coghlan; thanks!
........
  r47098 | andrew.kuchling | 2006-06-26 14:43:43 +0200 (Mon, 26 Jun 2006) | 1 line

  Describe workaround for PyRange_New()'s removal
........
  r47099 | andrew.kuchling | 2006-06-26 15:08:24 +0200 (Mon, 26 Jun 2006) | 5 lines

  [Bug #1512163] Fix typo.

  This change will probably break tests on FreeBSD buildbots, but I'll check in
  a fix for that next.
........
  r47100 | andrew.kuchling | 2006-06-26 15:12:16 +0200 (Mon, 26 Jun 2006) | 9 lines

  [Bug #1512163] Use one set of locking methods, lockf();
  remove the flock() calls.

  On FreeBSD, the two methods lockf() and flock() end up using the same
  mechanism and the second one fails.  A Linux man page claims that the
  two methods are orthogonal (so locks acquired one way don't interact
  with locks acquired the other way) but that clearly must be false.
........
  r47101 | andrew.kuchling | 2006-06-26 15:23:10 +0200 (Mon, 26 Jun 2006) | 5 lines

  Add a test for a conflicting lock.

  On slow machines, maybe the time intervals (2 sec, 0.5 sec) will be too tight.
  I'll see how the buildbots like it.
........
  r47103 | andrew.kuchling | 2006-06-26 16:33:24 +0200 (Mon, 26 Jun 2006) | 1 line

  Windows doesn't have os.fork().  I'll just disable this test for now
........
  r47106 | andrew.kuchling | 2006-06-26 19:00:35 +0200 (Mon, 26 Jun 2006) | 9 lines

  Attempt to fix build failure on OS X and Debian alpha; the symptom is
  consistent with os.wait() returning immediately because some other
  subprocess had previously exited; the test suite then immediately
  tries to lock the mailbox and gets an error saying it's already
  locked.

  To fix this, do a waitpid() so the test suite only continues once
  the intended child process has exited.
........
  r47113 | neal.norwitz | 2006-06-27 06:06:46 +0200 (Tue, 27 Jun 2006) | 1 line

  Ignore some more warnings in the dynamic linker on an older gentoo
........
  r47114 | neal.norwitz | 2006-06-27 06:09:13 +0200 (Tue, 27 Jun 2006) | 6 lines

  Instead of doing a make test, run the regression tests out of the installed
  copy.  This will hopefully catch problems where directories are added
  under Lib/ but not to Makefile.pre.in.  This breaks out the 2 runs
  of the test suite with and without -O which is also nicer.
........
  r47115 | neal.norwitz | 2006-06-27 06:12:58 +0200 (Tue, 27 Jun 2006) | 5 lines

  Fix SF bug #1513032, 'make install' failure on FreeBSD 5.3.

  No need to install lib-old, it's empty in 2.5.
........
  r47116 | neal.norwitz | 2006-06-27 06:23:06 +0200 (Tue, 27 Jun 2006) | 1 line

  Test unimportant change to verify buildbot does not try to build
........
  r47117 | neal.norwitz | 2006-06-27 06:26:30 +0200 (Tue, 27 Jun 2006) | 1 line

  Try again: test unimportant change to verify buildbot does not try to build
........
  r47118 | neal.norwitz | 2006-06-27 06:28:56 +0200 (Tue, 27 Jun 2006) | 1 line

  Verify buildbot picks up these changes (really needs testing after last change to Makefile.pre.in)
........
  r47121 | vinay.sajip | 2006-06-27 09:34:37 +0200 (Tue, 27 Jun 2006) | 1 line

  Removed buggy exception handling in doRollover of rotating file handlers. Exceptions now propagate to caller.
........
  r47123 | ronald.oussoren | 2006-06-27 12:08:25 +0200 (Tue, 27 Jun 2006) | 3 lines

  MacOSX: fix rather dumb buglet that made it impossible to create extensions on
  OSX 10.3 when using a binary distribution build on 10.4.
........
  r47125 | tim.peters | 2006-06-27 13:52:49 +0200 (Tue, 27 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r47128 | ronald.oussoren | 2006-06-27 14:53:52 +0200 (Tue, 27 Jun 2006) | 8 lines

  Use staticly build copies of zlib and bzip2 to build the OSX installer, that
  way the resulting binaries have a better change of running on 10.3.

  This patch also updates the search logic for sleepycat db3/4, without this
  patch you cannot use a sleepycat build with a non-standard prefix; with this
  you can (at least on OSX) if you add the prefix to CPPFLAGS/LDFLAGS at
  configure-time. This change is needed to build the binary installer for OSX.
........
  r47131 | ronald.oussoren | 2006-06-27 17:45:32 +0200 (Tue, 27 Jun 2006) | 5 lines

  macosx: Install a libpython2.5.a inside the framework as a symlink to the actual
  dylib at the root of the framework, that way tools that expect a unix-like
  install (python-config, but more importantly external products like
  mod_python) work correctly.
........
  r47137 | neal.norwitz | 2006-06-28 07:03:22 +0200 (Wed, 28 Jun 2006) | 4 lines

  According to the man pages on Gentoo Linux and Tru64, EACCES or EAGAIN
  can be returned if fcntl (lockf) fails.  This fixes the test failure
  on Tru64 by checking for either error rather than just EAGAIN.
........
  r47139 | neal.norwitz | 2006-06-28 08:28:31 +0200 (Wed, 28 Jun 2006) | 5 lines

  Fix bug #1512695: cPickle.loads could crash if it was interrupted with
  a KeyboardInterrupt since PyTuple_Pack was passed a NULL.

  Will backport.
........
  r47142 | nick.coghlan | 2006-06-28 12:41:47 +0200 (Wed, 28 Jun 2006) | 1 line

  Make full module name available as __module_name__ even when __name__ is set to something else (like '__main__')
........
  r47143 | armin.rigo | 2006-06-28 12:49:51 +0200 (Wed, 28 Jun 2006) | 2 lines

  A couple of crashers of the "won't fix" kind.
........
  r47147 | andrew.kuchling | 2006-06-28 16:25:20 +0200 (Wed, 28 Jun 2006) | 1 line

  [Bug #1508766] Add docs for uuid module; docs written by George Yoshida, with minor rearrangements by me.
........
  r47148 | andrew.kuchling | 2006-06-28 16:27:21 +0200 (Wed, 28 Jun 2006) | 1 line

  [Bug #1508766] Add docs for uuid module; this puts the module in the 'Internet Protocols' section.  Arguably this module could also have gone in the chapters on strings or encodings, maybe even the crypto chapter.  Fred, please move if you see fit.
........
  r47151 | georg.brandl | 2006-06-28 22:23:25 +0200 (Wed, 28 Jun 2006) | 3 lines

  Fix end_fill().
........
  r47153 | trent.mick | 2006-06-28 22:30:41 +0200 (Wed, 28 Jun 2006) | 2 lines

  Mention the expat upgrade and pyexpat fix I put in 2.5b1.
........
  r47154 | fred.drake | 2006-06-29 02:51:53 +0200 (Thu, 29 Jun 2006) | 6 lines

  SF bug #1504333: sgmlib should allow angle brackets in quoted values
  (modified patch by Sam Ruby; changed to use separate REs for start and end
   tags to reduce matching cost for end tags; extended tests; updated to avoid
   breaking previous changes to support IPv6 addresses in unquoted attribute
   values)
........
  r47156 | fred.drake | 2006-06-29 04:57:48 +0200 (Thu, 29 Jun 2006) | 1 line

  document recent bugfixes in sgmllib
........
  r47158 | neal.norwitz | 2006-06-29 06:10:08 +0200 (Thu, 29 Jun 2006) | 10 lines

  Add new utility function, reap_children(), to test_support.  This should
  be called at the end of each test that spawns children (perhaps it
  should be called from regrtest instead?).  This will hopefully prevent
  some of the unexplained failures in the buildbots (hppa and alpha)
  during tests that spawn children.  The problems were not reproducible.
  There were many zombies that remained at the end of several tests.
  In the worst case, this shouldn't cause any more problems,
  though it may not help either.  Time will tell.
........
  r47159 | neal.norwitz | 2006-06-29 07:48:14 +0200 (Thu, 29 Jun 2006) | 5 lines

  This should fix the buildbot failure on s/390 which can't connect to gmail.org.
  It makes the error message consistent and always sends to stderr.

  It would be much better for all the networking tests to hit only python.org.
........
  r47161 | thomas.heller | 2006-06-29 20:34:15 +0200 (Thu, 29 Jun 2006) | 3 lines

  Protect the thread api calls in the _ctypes extension module within
  #ifdef WITH_THREADS/#endif blocks.  Found by Sam Rushing.
........
  r47162 | martin.v.loewis | 2006-06-29 20:58:44 +0200 (Thu, 29 Jun 2006) | 2 lines

  Patch #1509163: MS Toolkit Compiler no longer available
........
  r47163 | skip.montanaro | 2006-06-29 21:20:09 +0200 (Thu, 29 Jun 2006) | 1 line

  add string methods to index
........
  r47164 | vinay.sajip | 2006-06-30 02:13:08 +0200 (Fri, 30 Jun 2006) | 1 line

  Fixed bug in fileConfig() which failed to clear logging._handlerList
........
  r47166 | tim.peters | 2006-06-30 08:18:39 +0200 (Fri, 30 Jun 2006) | 2 lines

  Whitespace normalization.
........
  r47170 | neal.norwitz | 2006-06-30 09:32:16 +0200 (Fri, 30 Jun 2006) | 1 line

  Silence compiler warning
........
  r47171 | neal.norwitz | 2006-06-30 09:32:46 +0200 (Fri, 30 Jun 2006) | 1 line

  Another problem reported by Coverity.  Backport candidate.
........
  r47175 | thomas.heller | 2006-06-30 19:44:54 +0200 (Fri, 30 Jun 2006) | 2 lines

  Revert the use of PY_FORMAT_SIZE_T in PyErr_Format.
........
  r47176 | tim.peters | 2006-06-30 20:34:51 +0200 (Fri, 30 Jun 2006) | 2 lines

  Remove now-unused fidding with PY_FORMAT_SIZE_T.
........
  r47177 | georg.brandl | 2006-06-30 20:47:56 +0200 (Fri, 30 Jun 2006) | 3 lines

  Document decorator usage of property.
........
  r47181 | fred.drake | 2006-06-30 21:29:25 +0200 (Fri, 30 Jun 2006) | 4 lines

  - consistency nit: always include "()" in \function and \method
    (*should* be done by the presentation, but that requires changes all over)
  - avoid spreading the __name meme
........
  r47188 | vinay.sajip | 2006-07-01 12:45:20 +0200 (Sat, 01 Jul 2006) | 1 line

  Added entry for fileConfig() bugfix.
........
  r47189 | vinay.sajip | 2006-07-01 12:47:20 +0200 (Sat, 01 Jul 2006) | 1 line

  Added duplicate call to fileConfig() to ensure that it cleans up after itself correctly.
........
  r47190 | martin.v.loewis | 2006-07-01 17:33:37 +0200 (Sat, 01 Jul 2006) | 2 lines

  Release all forwarded functions in .close. Fixes #1513223.
........
  r47191 | fred.drake | 2006-07-01 18:28:20 +0200 (Sat, 01 Jul 2006) | 7 lines

  SF bug #1296433 (Expat bug #1515266): Unchecked calls to character data
  handler would cause a segfault.  This merges in Expat's lib/xmlparse.c
  revisions 1.154 and 1.155, which fix this and a closely related problem
  (the later does not affect Python).

  Moved the crasher test to the tests for xml.parsers.expat.
........
  r47197 | gerhard.haering | 2006-07-02 19:48:30 +0200 (Sun, 02 Jul 2006) | 4 lines

  The sqlite3 module did cut off data from the SQLite database at the first null
  character before sending it to a custom converter. This has been fixed now.
........
  r47198 | martin.v.loewis | 2006-07-02 20:44:00 +0200 (Sun, 02 Jul 2006) | 1 line

  Correct arithmetic in access on Win32. Fixes #1513646.
........
  r47203 | thomas.heller | 2006-07-03 09:58:09 +0200 (Mon, 03 Jul 2006) | 1 line

  Cleanup: Remove commented out code.
........
  r47204 | thomas.heller | 2006-07-03 09:59:50 +0200 (Mon, 03 Jul 2006) | 1 line

  Don't run the doctests with Python 2.3 because it doesn't have the ELLIPSIS flag.
........
  r47205 | thomas.heller | 2006-07-03 10:04:05 +0200 (Mon, 03 Jul 2006) | 7 lines

  Fixes so that _ctypes can be compiled with the MingW compiler.

  It seems that the definition of '__attribute__(x)' was responsible for
  the compiler ignoring the '__fastcall' attribute on the
  ffi_closure_SYSV function in libffi_msvc/ffi.c, took me quite some
  time to figure this out.
........
  r47206 | thomas.heller | 2006-07-03 10:08:14 +0200 (Mon, 03 Jul 2006) | 11 lines

  Add a new function uses_seh() to the _ctypes extension module.  This
  will return True if Windows Structured Exception handling (SEH) is
  used when calling functions, False otherwise.

  Currently, only MSVC supports SEH.

  Fix the test so that it doesn't crash when run with MingW compiled
  _ctypes.  Note that two tests are still failing when mingw is used, I
  suspect structure layout differences and function calling conventions
  between MSVC and MingW.
........
  r47207 | tim.peters | 2006-07-03 10:23:19 +0200 (Mon, 03 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r47208 | martin.v.loewis | 2006-07-03 11:44:00 +0200 (Mon, 03 Jul 2006) | 3 lines

  Only setup canvas when it is first created.
  Fixes #1514703
........
  r47209 | martin.v.loewis | 2006-07-03 12:05:30 +0200 (Mon, 03 Jul 2006) | 3 lines

  Reimplement turtle.circle using a polyline, to allow correct
  filling of arcs. Also fixes #1514693.
........
  r47210 | martin.v.loewis | 2006-07-03 12:19:49 +0200 (Mon, 03 Jul 2006) | 3 lines

  Bug #1514693: Update turtle's heading when switching between
  degrees and radians.
........
  r47211 | martin.v.loewis | 2006-07-03 13:12:06 +0200 (Mon, 03 Jul 2006) | 2 lines

  Document functions added in 2.3 and 2.5.
........
  r47212 | martin.v.loewis | 2006-07-03 14:19:50 +0200 (Mon, 03 Jul 2006) | 3 lines

  Bug #1417699: Reject locale-specific decimal point in float()
  and atof().
........
  r47213 | martin.v.loewis | 2006-07-03 14:28:58 +0200 (Mon, 03 Jul 2006) | 3 lines

  Bug #1267547: Put proper recursive setup.py call into the
  spec file generated by bdist_rpm.
........
  r47215 | martin.v.loewis | 2006-07-03 15:01:35 +0200 (Mon, 03 Jul 2006) | 3 lines

  Patch #825417: Fix timeout processing in expect,
  read_until. Will backport to 2.4.
........
  r47218 | martin.v.loewis | 2006-07-03 15:47:40 +0200 (Mon, 03 Jul 2006) | 2 lines

  Put method-wrappers into trashcan. Fixes #927248.
........
  r47219 | andrew.kuchling | 2006-07-03 16:07:30 +0200 (Mon, 03 Jul 2006) | 1 line

  [Bug #1515932] Clarify description of slice assignment
........
  r47220 | andrew.kuchling | 2006-07-03 16:16:09 +0200 (Mon, 03 Jul 2006) | 4 lines

  [Bug #1511911] Clarify description of optional arguments to sorted()
     by improving the xref to the section on lists, and by
     copying the explanations of the arguments (with a slight modification).
........
  r47223 | kristjan.jonsson | 2006-07-03 16:59:05 +0200 (Mon, 03 Jul 2006) | 1 line

  Fix build problems with the platform SDK on windows.  It is not sufficient to test for the C compiler version when determining if we have the secure CRT from microsoft.  Must test with an undocumented macro, __STDC_SECURE_LIB__ too.
........
  r47224 | ronald.oussoren | 2006-07-04 14:30:22 +0200 (Tue, 04 Jul 2006) | 7 lines

  Sync the darwin/x86 port libffi with the copy in PyObjC. This fixes a number
  of bugs in that port. The most annoying ones were due to some subtle differences
  between the document ABI and the actual implementation :-(

  (there are no python unittests that fail without this patch, but without it
   some of libffi's unittests fail).
........
  r47234 | georg.brandl | 2006-07-05 10:21:00 +0200 (Wed, 05 Jul 2006) | 3 lines

  Remove remaining references to OverflowWarning.
........
  r47236 | thomas.heller | 2006-07-05 11:13:56 +0200 (Wed, 05 Jul 2006) | 3 lines

  Fix the bitfield test when _ctypes is compiled with MingW.  Structures
  containing bitfields may have different layout on MSVC and MingW .
........
  r47237 | thomas.wouters | 2006-07-05 13:03:49 +0200 (Wed, 05 Jul 2006) | 15 lines


  Fix bug in passing tuples to string.Template. All other values (with working
  str() or repr()) would work, just not multi-value tuples. Probably not a
  backport candidate, since it changes the behaviour of passing a
  single-element tuple:

  >>> string.Template("$foo").substitute(dict(foo=(1,)))

  '(1,)'

  versus

  '1'
........
  r47241 | georg.brandl | 2006-07-05 16:18:45 +0200 (Wed, 05 Jul 2006) | 2 lines

  Patch #1517490: fix glitches in filter() docs.
........
  r47244 | georg.brandl | 2006-07-05 17:50:05 +0200 (Wed, 05 Jul 2006) | 2 lines

  no need to elaborate "string".
........
  r47251 | neal.norwitz | 2006-07-06 06:28:59 +0200 (Thu, 06 Jul 2006) | 3 lines

  Fix refleaks reported by Shane Hathaway in SF patch #1515361.  This change
  contains only the changes related to leaking the copy variable.
........
  r47253 | fred.drake | 2006-07-06 07:13:22 +0200 (Thu, 06 Jul 2006) | 4 lines

  - back out Expat change; the final fix to Expat will be different
  - change the pyexpat wrapper to not be so sensitive to this detail of the
    Expat implementation (the ex-crasher test still passes)
........
  r47257 | neal.norwitz | 2006-07-06 08:45:08 +0200 (Thu, 06 Jul 2006) | 1 line

  Add a NEWS entry for a recent pyexpat fix
........
  r47258 | martin.v.loewis | 2006-07-06 08:55:58 +0200 (Thu, 06 Jul 2006) | 2 lines

  Add sqlite3.dll to the DLLs component, not to the TkDLLs component.
  Fixes #1517388.
........
  r47259 | martin.v.loewis | 2006-07-06 09:05:21 +0200 (Thu, 06 Jul 2006) | 1 line

  Properly quote compileall and Lib paths in case TARGETDIR has a space.
........
  r47260 | thomas.heller | 2006-07-06 09:50:18 +0200 (Thu, 06 Jul 2006) | 5 lines

  Revert the change done in svn revision 47206:

  Add a new function uses_seh() to the _ctypes extension module.  This
  will return True if Windows Structured Exception handling (SEH) is
  used when calling functions, False otherwise.
........
  r47261 | armin.rigo | 2006-07-06 09:58:18 +0200 (Thu, 06 Jul 2006) | 3 lines

  A couple of examples about how to attack the fact that _PyType_Lookup()
  returns a borrowed ref.  Many of the calls are open to attack.
........
  r47262 | thomas.heller | 2006-07-06 10:28:14 +0200 (Thu, 06 Jul 2006) | 2 lines

  The test that calls a function with invalid arguments and catches the
  resulting Windows access violation will not be run by default.
........
  r47263 | thomas.heller | 2006-07-06 10:48:35 +0200 (Thu, 06 Jul 2006) | 5 lines

  Patch #1517790: It is now possible to use custom objects in the ctypes
  foreign function argtypes sequence as long as they provide a
  from_param method, no longer is it required that the object is a
  ctypes type.
........
  r47264 | thomas.heller | 2006-07-06 10:58:40 +0200 (Thu, 06 Jul 2006) | 2 lines

  Document the Struture and Union constructors.
........
  r47265 | thomas.heller | 2006-07-06 11:11:22 +0200 (Thu, 06 Jul 2006) | 2 lines

  Document the changes in svn revision 47263, from patch #1517790.
........
  r47267 | ronald.oussoren | 2006-07-06 12:13:35 +0200 (Thu, 06 Jul 2006) | 7 lines

  This patch solves the problem Skip was seeing with zlib, this patch ensures that
  configure uses similar compiler flags as setup.py when doing the zlib test.

  Without this patch configure would use the first shared library on the linker
  path, with this patch it uses the first shared or static library on that path
  just like setup.py.
........
  r47268 | thomas.wouters | 2006-07-06 12:48:28 +0200 (Thu, 06 Jul 2006) | 4 lines


  NEWS entry for r47267: fixing configure's zlib probing.
........
  r47269 | fredrik.lundh | 2006-07-06 14:29:24 +0200 (Thu, 06 Jul 2006) | 3 lines

  added XMLParser alias for cElementTree compatibility
........
  r47271 | nick.coghlan | 2006-07-06 14:53:04 +0200 (Thu, 06 Jul 2006) | 1 line

  Revert the __module_name__ changes made in rev 47142. We'll revisit this in Python 2.6
........
  r47272 | nick.coghlan | 2006-07-06 15:04:56 +0200 (Thu, 06 Jul 2006) | 1 line

  Update the tutorial section on relative imports
........
  r47273 | nick.coghlan | 2006-07-06 15:35:27 +0200 (Thu, 06 Jul 2006) | 1 line

  Ignore ImportWarning by default
........
  r47274 | nick.coghlan | 2006-07-06 15:41:34 +0200 (Thu, 06 Jul 2006) | 1 line

  Cover ImportWarning, PendingDeprecationWarning and simplefilter() in the warnings module docs
........
  r47275 | nick.coghlan | 2006-07-06 15:47:18 +0200 (Thu, 06 Jul 2006) | 1 line

  Add NEWS entries for the ImportWarning change and documentation update
........
  r47276 | andrew.kuchling | 2006-07-06 15:57:28 +0200 (Thu, 06 Jul 2006) | 1 line

  ImportWarning is now silent by default
........
  r47277 | thomas.heller | 2006-07-06 17:06:05 +0200 (Thu, 06 Jul 2006) | 2 lines

  Document the correct return type of PyLong_AsUnsignedLongLongMask.
........
  r47278 | hyeshik.chang | 2006-07-06 17:21:52 +0200 (Thu, 06 Jul 2006) | 2 lines

  Add a testcase for r47086 which fixed a bug in codec_getstreamcodec().
........
  r47279 | hyeshik.chang | 2006-07-06 17:39:24 +0200 (Thu, 06 Jul 2006) | 3 lines

  Test using all CJK encodings for the testcases which don't require
  specific encodings.
........
  r47280 | martin.v.loewis | 2006-07-06 21:28:03 +0200 (Thu, 06 Jul 2006) | 2 lines

  Properly generate logical file ids. Fixes #1515998.
  Also correct typo in Control.mapping.
........
  r47287 | neal.norwitz | 2006-07-07 08:03:15 +0200 (Fri, 07 Jul 2006) | 17 lines

  Restore rev 47014:

  The hppa ubuntu box sometimes hangs forever in these tests.  My guess
  is that the wait is failing for some reason.  Use WNOHANG, so we won't
  wait until the buildbot kills the test suite.

  I haven't been able to reproduce the failure, so I'm not sure if
  this will help or not.  Hopefully, this change will cause the test
  to fail, rather than hang.  That will be better since we will get
  the rest of the test results.  It may also help us debug the real problem.

  *** The reason this originally failed was because there were many
  zombie children outstanding before rev 47158 cleaned them up.
  There are still hangs in test_subprocess that need to be addressed,
  but that will take more work.  This should close some holes.
........
  r47289 | georg.brandl | 2006-07-07 10:15:12 +0200 (Fri, 07 Jul 2006) | 3 lines

  Fix RFC number.
........
  r50489 | neal.norwitz | 2006-07-08 07:31:37 +0200 (Sat, 08 Jul 2006) | 1 line

  Fix SF bug #1519018: 'as' is now validated properly in import statements
........
  r50490 | georg.brandl | 2006-07-08 14:15:27 +0200 (Sat, 08 Jul 2006) | 3 lines

  Add an additional test for bug #1519018.
........
  r50491 | tim.peters | 2006-07-08 21:55:05 +0200 (Sat, 08 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50493 | neil.schemenauer | 2006-07-09 18:16:34 +0200 (Sun, 09 Jul 2006) | 2 lines

  Fix AST compiler bug #1501934: incorrect LOAD/STORE_GLOBAL generation.
........
  r50495 | neil.schemenauer | 2006-07-09 23:19:29 +0200 (Sun, 09 Jul 2006) | 2 lines

  Fix SF bug 1441486: bad unary minus folding in compiler.
........
  r50497 | neal.norwitz | 2006-07-10 00:14:42 +0200 (Mon, 10 Jul 2006) | 4 lines

  On 64 bit systems, int literals that use less than 64 bits are now ints
  rather than longs.  This also fixes the test for eval(-sys.maxint - 1).
........
  r50500 | neal.norwitz | 2006-07-10 02:04:44 +0200 (Mon, 10 Jul 2006) | 4 lines

  Bug #1512814, Fix incorrect lineno's when code at module scope
  started after line 256.
........
  r50501 | neal.norwitz | 2006-07-10 02:05:34 +0200 (Mon, 10 Jul 2006) | 1 line

  Fix doco.  Backport candidate.
........
  r50503 | neal.norwitz | 2006-07-10 02:23:17 +0200 (Mon, 10 Jul 2006) | 5 lines

  Part of SF patch #1484695.  This removes dead code.  The chksum was
  already verified in .frombuf() on the lines above.  If there was
  a problem an exception is raised, so there was no way this condition
  could have been true.
........
  r50504 | neal.norwitz | 2006-07-10 03:18:57 +0200 (Mon, 10 Jul 2006) | 3 lines

  Patch #1516912: improve Modules support for OpenVMS.
........
  r50506 | neal.norwitz | 2006-07-10 04:36:41 +0200 (Mon, 10 Jul 2006) | 7 lines

  Patch #1504046: Add documentation for xml.etree.

  /F wrote the text docs, Englebert Gruber massaged it to latex and I
  did some more massaging to try and improve the consistency and
  fix some name mismatches between the declaration and text.
........
  r50509 | martin.v.loewis | 2006-07-10 09:23:48 +0200 (Mon, 10 Jul 2006) | 2 lines

  Introduce DISTUTILS_USE_SDK as a flag to determine whether the
  SDK environment should be used. Fixes #1508010.
........
  r50510 | martin.v.loewis | 2006-07-10 09:26:41 +0200 (Mon, 10 Jul 2006) | 1 line

  Change error message to indicate that VS2003 is necessary to build extension modules, not the .NET SDK.
........
  r50511 | martin.v.loewis | 2006-07-10 09:29:41 +0200 (Mon, 10 Jul 2006) | 1 line

  Add svn:ignore.
........
  r50512 | anthony.baxter | 2006-07-10 09:41:04 +0200 (Mon, 10 Jul 2006) | 1 line

  preparing for 2.5b2
........
  r50513 | thomas.heller | 2006-07-10 11:10:28 +0200 (Mon, 10 Jul 2006) | 2 lines

  Fix bug #1518190: accept any integer or long value in the
  ctypes.c_void_p constructor.
........
  r50514 | thomas.heller | 2006-07-10 11:31:06 +0200 (Mon, 10 Jul 2006) | 3 lines

  Fixed a segfault when ctypes.wintypes were imported on
  non-Windows machines.
........
  r50516 | thomas.heller | 2006-07-10 13:11:10 +0200 (Mon, 10 Jul 2006) | 3 lines

  Assigning None to pointer type structure fields possible overwrote
  wrong fields.
........
  r50517 | thomas.heller | 2006-07-10 13:17:37 +0200 (Mon, 10 Jul 2006) | 5 lines

  Moved the ctypes news entries from the 'Library' section into the
  'Extension Modules' section where they belong, probably.

  This destroyes the original order of the news entries, don't know
  if that is important or not.
........
  r50526 | phillip.eby | 2006-07-10 21:03:29 +0200 (Mon, 10 Jul 2006) | 2 lines

  Fix SF#1516184 and add a test to prevent regression.
........
  r50528 | phillip.eby | 2006-07-10 21:18:35 +0200 (Mon, 10 Jul 2006) | 2 lines

  Fix SF#1457312: bad socket error handling in distutils "upload" command.
........
  r50537 | peter.astrand | 2006-07-10 22:39:49 +0200 (Mon, 10 Jul 2006) | 1 line

  Make it possible to run test_subprocess.py with Python 2.2, which lacks test_support.reap_children().
........
  r50541 | tim.peters | 2006-07-10 23:08:24 +0200 (Mon, 10 Jul 2006) | 5 lines

  After approval from Anthony, merge the tim-current_frames
  branch into the trunk.  This adds a new sys._current_frames()
  function, which returns a dict mapping thread id to topmost
  thread stack frame.
........
  r50542 | tim.peters | 2006-07-10 23:11:49 +0200 (Mon, 10 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50553 | martin.v.loewis | 2006-07-11 00:11:28 +0200 (Tue, 11 Jul 2006) | 4 lines

  Patch #1519566: Remove unused _tofill member.
  Make begin_fill idempotent.
  Update demo2 to demonstrate filling of concave shapes.
........
  r50567 | anthony.baxter | 2006-07-11 04:04:09 +0200 (Tue, 11 Jul 2006) | 4 lines

  #1494314: Fix a regression with high-numbered sockets in 2.4.3. This
  means that select() on sockets > FD_SETSIZE (typically 1024) work again.
  The patch makes sockets use poll() internally where available.
........
  r50568 | tim.peters | 2006-07-11 04:17:48 +0200 (Tue, 11 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50575 | thomas.heller | 2006-07-11 18:42:05 +0200 (Tue, 11 Jul 2006) | 1 line

  Add missing Py_DECREF.
........
  r50576 | thomas.heller | 2006-07-11 18:44:25 +0200 (Tue, 11 Jul 2006) | 1 line

  Add missing Py_DECREFs.
........
  r50579 | andrew.kuchling | 2006-07-11 19:20:16 +0200 (Tue, 11 Jul 2006) | 1 line

  Bump version number;  add sys._current_frames
........
  r50582 | thomas.heller | 2006-07-11 20:28:35 +0200 (Tue, 11 Jul 2006) | 3 lines

  When a foreign function is retrived by calling __getitem__ on a ctypes
  library instance, do not set it as attribute.
........
  r50583 | thomas.heller | 2006-07-11 20:40:50 +0200 (Tue, 11 Jul 2006) | 2 lines

  Change the ctypes version number to 1.0.0.
........
  r50597 | neal.norwitz | 2006-07-12 07:26:17 +0200 (Wed, 12 Jul 2006) | 3 lines

  Bug #1520864: unpacking singleton tuples in for loop (for x, in) work again.
........
  r50598 | neal.norwitz | 2006-07-12 07:26:35 +0200 (Wed, 12 Jul 2006) | 1 line

  Fix function name in error msg
........
  r50599 | neal.norwitz | 2006-07-12 07:27:46 +0200 (Wed, 12 Jul 2006) | 4 lines

  Fix uninitialized memory read reported by Valgrind when running doctest.
  This could happen if size == 0.
........
  r50600 | neal.norwitz | 2006-07-12 09:28:29 +0200 (Wed, 12 Jul 2006) | 1 line

  Actually change the MAGIC #.  Create a new section for 2.5c1 and mention the impact of changing the MAGIC #.
........
  r50601 | thomas.heller | 2006-07-12 10:43:47 +0200 (Wed, 12 Jul 2006) | 3 lines

  Fix #1467450: ctypes now uses RTLD_GLOBAL by default on OSX 10.3 to
  load shared libraries.
........
  r50604 | thomas.heller | 2006-07-12 16:25:18 +0200 (Wed, 12 Jul 2006) | 3 lines

  Fix the wrong description of LibraryLoader.LoadLibrary, and document
  the DEFAULT_MODE constant.
........
  r50607 | georg.brandl | 2006-07-12 17:31:17 +0200 (Wed, 12 Jul 2006) | 3 lines

  Accept long options "--help" and "--version".
........
  r50617 | thomas.heller | 2006-07-13 11:53:47 +0200 (Thu, 13 Jul 2006) | 3 lines

  A misspelled preprocessor symbol caused ctypes to be always compiled
  without thread support.  Replaced WITH_THREADS with WITH_THREAD.
........
  r50619 | thomas.heller | 2006-07-13 19:01:14 +0200 (Thu, 13 Jul 2006) | 3 lines

  Fix #1521375.  When running with root priviledges, 'gcc -o /dev/null'
  did overwrite /dev/null.  Use a temporary file instead of /dev/null.
........
  r50620 | thomas.heller | 2006-07-13 19:05:13 +0200 (Thu, 13 Jul 2006) | 2 lines

  Fix misleading words.
........
  r50622 | andrew.kuchling | 2006-07-13 19:37:26 +0200 (Thu, 13 Jul 2006) | 1 line

  Typo fix
........
  r50629 | georg.brandl | 2006-07-14 09:12:54 +0200 (Fri, 14 Jul 2006) | 3 lines

  Patch #1521874: grammar errors in doanddont.tex.
........
  r50630 | neal.norwitz | 2006-07-14 09:20:04 +0200 (Fri, 14 Jul 2006) | 1 line

  Try to improve grammar further.
........
  r50631 | martin.v.loewis | 2006-07-14 11:58:55 +0200 (Fri, 14 Jul 2006) | 1 line

  Extend build_ssl to Win64, using VSExtComp.
........
  r50632 | martin.v.loewis | 2006-07-14 14:10:09 +0200 (Fri, 14 Jul 2006) | 1 line

  Add debug output to analyse buildbot failure.
........
  r50633 | martin.v.loewis | 2006-07-14 14:31:05 +0200 (Fri, 14 Jul 2006) | 1 line

  Fix Debug build of _ssl.
........
  r50636 | andrew.kuchling | 2006-07-14 15:32:38 +0200 (Fri, 14 Jul 2006) | 1 line

  Mention new options
........
  r50638 | peter.astrand | 2006-07-14 16:04:45 +0200 (Fri, 14 Jul 2006) | 1 line

  Bug #1223937: CalledProcessError.errno -> CalledProcessError.returncode.
........
  r50640 | thomas.heller | 2006-07-14 17:01:05 +0200 (Fri, 14 Jul 2006) | 4 lines

  Make the prototypes of our private PyUnicode_FromWideChar and
  PyUnicode_AsWideChar replacement functions compatible to the official
  functions by using Py_ssize_t instead of int.
........
  r50643 | thomas.heller | 2006-07-14 19:51:14 +0200 (Fri, 14 Jul 2006) | 3 lines

  Patch #1521817: The index range checking on ctypes arrays containing
  exactly one element is enabled again.
........
  r50647 | thomas.heller | 2006-07-14 20:22:50 +0200 (Fri, 14 Jul 2006) | 2 lines

  Updates for the ctypes documentation.
........
  r50655 | fredrik.lundh | 2006-07-14 23:45:48 +0200 (Fri, 14 Jul 2006) | 3 lines

  typo
........
  r50664 | george.yoshida | 2006-07-15 18:03:49 +0200 (Sat, 15 Jul 2006) | 2 lines

  Bug #15187702 : ext/win-cookbook.html has a broken link to distutils
........
  r50667 | bob.ippolito | 2006-07-15 18:53:15 +0200 (Sat, 15 Jul 2006) | 1 line

  Patch #1220874: Update the binhex module for Mach-O.
........
  r50671 | fred.drake | 2006-07-16 03:21:20 +0200 (Sun, 16 Jul 2006) | 1 line

  clean up some link markup
........
  r50673 | neal.norwitz | 2006-07-16 03:50:38 +0200 (Sun, 16 Jul 2006) | 4 lines

  Bug #1512814, Fix incorrect lineno's when code within a function
  had more than 255 blank lines.  Byte codes need to go first, line #s second.
........
  r50674 | neal.norwitz | 2006-07-16 04:00:32 +0200 (Sun, 16 Jul 2006) | 5 lines

  a & b were dereffed above, so they are known to be valid pointers.
  z is known to be NULL, nothing to DECREF.

  Reported by Klockwork, #107.
........
  r50675 | neal.norwitz | 2006-07-16 04:02:57 +0200 (Sun, 16 Jul 2006) | 5 lines

  self is dereffed (and passed as first arg), so it's known to be good.
  func is returned from PyArg_ParseTuple and also dereffed.

  Reported by Klocwork, #30 (self one at least).
........
  r50676 | neal.norwitz | 2006-07-16 04:05:35 +0200 (Sun, 16 Jul 2006) | 4 lines

  proto was dereffed above and is known to be good.  No need for X.

  Reported by Klocwork, #39.
........
  r50677 | neal.norwitz | 2006-07-16 04:15:27 +0200 (Sun, 16 Jul 2006) | 5 lines

  Fix memory leaks in some conditions.

  Reported by Klocwork #152.
........
  r50678 | neal.norwitz | 2006-07-16 04:17:36 +0200 (Sun, 16 Jul 2006) | 4 lines

  Fix memory leak under some conditions.

  Reported by Klocwork, #98.
........
  r50679 | neal.norwitz | 2006-07-16 04:22:30 +0200 (Sun, 16 Jul 2006) | 8 lines

  Use sizeof(buffer) instead of duplicating the constants to ensure they won't
  be wrong.

  The real change is to pass (bufsz - 1) to PyOS_ascii_formatd and 1
  to strncat.  strncat copies n+1 bytes from src (not dest).

  Reported by Klocwork #58.
........
  r50680 | neal.norwitz | 2006-07-16 04:32:03 +0200 (Sun, 16 Jul 2006) | 5 lines

  Handle a NULL name properly.

  Reported by Klocwork #67
........
  r50681 | neal.norwitz | 2006-07-16 04:35:47 +0200 (Sun, 16 Jul 2006) | 6 lines

  PyFunction_SetDefaults() is documented as taking None or a tuple.
  A NULL would crash the PyTuple_Check().  Now make NULL return a SystemError.

  Reported by Klocwork #73.
........
  r50683 | neal.norwitz | 2006-07-17 02:55:45 +0200 (Mon, 17 Jul 2006) | 5 lines

  Stop INCREFing name, then checking if it's NULL.  name (f_name) should never
  be NULL so assert it.  Fix one place where we could have passed NULL.

  Reported by Klocwork #66.
........
  r50684 | neal.norwitz | 2006-07-17 02:57:15 +0200 (Mon, 17 Jul 2006) | 5 lines

  otherset is known to be non-NULL based on checks before and DECREF after.
  DECREF otherset rather than XDECREF in error conditions too.

  Reported by Klockwork #154.
........
  r50685 | neal.norwitz | 2006-07-17 02:59:04 +0200 (Mon, 17 Jul 2006) | 7 lines

  Reported by Klocwork #151.

  v2 can be NULL if exception2 is NULL.  I don't think that condition can happen,
  but I'm not sure it can't either.  Now the code will protect against either
  being NULL.
........
  r50686 | neal.norwitz | 2006-07-17 03:00:16 +0200 (Mon, 17 Jul 2006) | 1 line

  Add NEWS entry for a bunch of fixes due to warnings produced by Klocworks static analysis tool.
........
  r50687 | fred.drake | 2006-07-17 07:47:52 +0200 (Mon, 17 Jul 2006) | 3 lines

  document xmlcore (still minimal; needs mention in each of the xml.* modules)
  SF bug #1504456 (partial)
........
  r50688 | georg.brandl | 2006-07-17 15:23:46 +0200 (Mon, 17 Jul 2006) | 3 lines

  Remove usage of sets module (patch #1500609).
........
  r50689 | georg.brandl | 2006-07-17 15:26:33 +0200 (Mon, 17 Jul 2006) | 3 lines

  Add missing NEWS item (#1522771)
........
  r50690 | andrew.kuchling | 2006-07-17 18:47:54 +0200 (Mon, 17 Jul 2006) | 1 line

  Attribute more features
........
  r50692 | kurt.kaiser | 2006-07-17 23:59:27 +0200 (Mon, 17 Jul 2006) | 8 lines

  Patch 1479219 - Tal Einat
  1. 'as' highlighted as builtin in comment string on import line
  2. Comments such as "#False identity" which start with a keyword immediately
     after the '#' character aren't colored as comments.
  3. u or U beginning unicode string not correctly highlighted

  Closes bug 1325071
........
  r50693 | barry.warsaw | 2006-07-18 01:07:51 +0200 (Tue, 18 Jul 2006) | 16 lines

  decode_rfc2231(): Be more robust against buggy RFC 2231 encodings.
  Specifically, instead of raising a ValueError when there is a single tick in
  the parameter, simply return that the entire string unquoted, with None for
  both the charset and the language.  Also, if there are more than 2 ticks in
  the parameter, interpret the first three parts as the standard RFC 2231 parts,
  then the rest of the parts as the encoded string.

  Test cases added.

  Original fewer-than-3-parts fix by Tokio Kikuchi.

  Resolves SF bug # 1218081.  I will back port the fix and tests to Python 2.4
  (email 3.0) and Python 2.3 (email 2.5).

  Also, bump the version number to email 4.0.1, removing the 'alpha' moniker.
........
  r50695 | kurt.kaiser | 2006-07-18 06:03:16 +0200 (Tue, 18 Jul 2006) | 2 lines

  Rebinding Tab key was inserting 'tab' instead of 'Tab'.  Bug 1179168.
........
  r50696 | brett.cannon | 2006-07-18 06:41:36 +0200 (Tue, 18 Jul 2006) | 6 lines

  Fix bug #1520914.  Starting in 2.4, time.strftime() began to check the bounds
  of values in the time tuple passed in.  Unfortunately people came to rely on
  undocumented behaviour of setting unneeded values to 0, regardless of if it was
  within the valid range.  Now those values force the value internally to the
  minimum value when 0 is passed in.
........
  r50697 | facundo.batista | 2006-07-18 14:16:13 +0200 (Tue, 18 Jul 2006) | 1 line

  Comments and docs cleanups, and some little fixes, provided by Santiágo Peresón
........
  r50704 | martin.v.loewis | 2006-07-18 19:46:31 +0200 (Tue, 18 Jul 2006) | 2 lines

  Patch #1524429: Use repr instead of backticks again.
........
  r50706 | tim.peters | 2006-07-18 23:55:15 +0200 (Tue, 18 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50708 | tim.peters | 2006-07-19 02:03:19 +0200 (Wed, 19 Jul 2006) | 18 lines

  SF bug 1524317: configure --without-threads fails to build

  Moved the code for _PyThread_CurrentFrames() up, so it's no longer
  in a huge "#ifdef WITH_THREAD" block (I didn't realize it /was/ in
  one).

  Changed test_sys's test_current_frames() so it passes with or without
  thread supported compiled in.

  Note that test_sys fails when Python is compiled without threads,
  but for an unrelated reason (the old test_exit() fails with an
  indirect ImportError on the `thread` module).  There are also
  other unrelated compilation failures without threads, in extension
  modules (like ctypes); at least the core compiles again.

  Do we really support --without-threads?  If so, there are several
  problems remaining.
........
  r50713 | thomas.heller | 2006-07-19 11:09:32 +0200 (Wed, 19 Jul 2006) | 4 lines

  Make sure the _ctypes extension can be compiled when WITH_THREAD is
  not defined on Windows, even if that configuration is probably not
  supported at all.
........
  r50715 | martin.v.loewis | 2006-07-19 19:18:32 +0200 (Wed, 19 Jul 2006) | 4 lines

  Revert r50706 (Whitespace normalization) and
  r50697: Comments and docs cleanups, and some little fixes
  per recommendation from Raymond Hettinger.
........
  r50719 | phillip.eby | 2006-07-20 17:54:16 +0200 (Thu, 20 Jul 2006) | 4 lines

  Fix SF#1516184 (again) and add a test to prevent regression.
  (There was a problem with empty filenames still causing recursion)
........
  r50720 | georg.brandl | 2006-07-20 18:28:39 +0200 (Thu, 20 Jul 2006) | 3 lines

  Guard for _active being None in __del__ method.
........
  r50721 | vinay.sajip | 2006-07-20 18:28:39 +0200 (Thu, 20 Jul 2006) | 1 line

  Updated documentation for TimedRotatingFileHandler relating to how rollover files are named. The previous documentation was wrongly the same as for RotatingFileHandler.
........
  r50731 | fred.drake | 2006-07-20 22:11:57 +0200 (Thu, 20 Jul 2006) | 1 line

  markup fix
........
  r50739 | kurt.kaiser | 2006-07-21 00:22:52 +0200 (Fri, 21 Jul 2006) | 7 lines

  Avoid occasional failure to detect closing paren properly.
  Patch 1407280 Tal Einat

  M    ParenMatch.py
  M    NEWS.txt
  M    CREDITS.txt
........
  r50740 | vinay.sajip | 2006-07-21 01:20:12 +0200 (Fri, 21 Jul 2006) | 1 line

  Addressed SF#1524081 by using a dictionary to map level names to syslog priority names, rather than a string.lower().
........
  r50741 | neal.norwitz | 2006-07-21 07:29:58 +0200 (Fri, 21 Jul 2006) | 1 line

  Add some asserts that we got good params passed
........
  r50742 | neal.norwitz | 2006-07-21 07:31:02 +0200 (Fri, 21 Jul 2006) | 5 lines

  Move the initialization of some pointers earlier.  The problem is
  that if we call Py_DECREF(frame) like we do if allocating locals fails,
  frame_dealloc() will try to use these bogus values and crash.
........
  r50743 | neal.norwitz | 2006-07-21 07:32:28 +0200 (Fri, 21 Jul 2006) | 4 lines

  Handle allocation failures gracefully.  Found with failmalloc.
  Many (all?) of these could be backported.
........
  r50745 | neal.norwitz | 2006-07-21 09:59:02 +0200 (Fri, 21 Jul 2006) | 1 line

  Speel initialise write.  Tanks Anthony.
........
  r50746 | neal.norwitz | 2006-07-21 09:59:47 +0200 (Fri, 21 Jul 2006) | 2 lines

  Handle more memory allocation failures without crashing.
........
  r50754 | barry.warsaw | 2006-07-21 16:51:07 +0200 (Fri, 21 Jul 2006) | 23 lines

  More RFC 2231 improvements for the email 4.0 package.  As Mark Sapiro rightly
  points out there are really two types of continued headers defined in this
  RFC (i.e. "encoded" parameters with the form "name*0*=" and unencoded
  parameters with the form "name*0="), but we were were handling them both the
  same way and that isn't correct.

  This patch should be much more RFC compliant in that only encoded params are
  %-decoded and the charset/language information is only extract if there are
  any encoded params in the segments.  If there are no encoded params then the
  RFC says that there will be no charset/language parts.

  Note however that this will change the return value for Message.get_param() in
  some cases.  For example, whereas before if you had all unencoded param
  continuations you would have still gotten a 3-tuple back from this method
  (with charset and language == None), you will now get just a string.  I don't
  believe this is a backward incompatible change though because the
  documentation for this method already indicates that either return value is
  possible and that you must do an isinstance(val, tuple) check to discriminate
  between the two.  (Yeah that API kind of sucks but we can't change /that/
  without breaking code.)

  Test cases, some documentation updates, and a NEWS item accompany this patch.
........
  r50759 | georg.brandl | 2006-07-21 19:36:31 +0200 (Fri, 21 Jul 2006) | 3 lines

  Fix check for empty list (vs. None).
........
  r50771 | brett.cannon | 2006-07-22 00:44:07 +0200 (Sat, 22 Jul 2006) | 2 lines

  Remove an XXX marker in a comment.
........
  r50773 | neal.norwitz | 2006-07-22 18:20:49 +0200 (Sat, 22 Jul 2006) | 1 line

  Fix more memory allocation issues found with failmalloc.
........
  r50774 | neal.norwitz | 2006-07-22 19:00:57 +0200 (Sat, 22 Jul 2006) | 1 line

  Don't fail if the directory already exists
........
  r50775 | greg.ward | 2006-07-23 04:25:53 +0200 (Sun, 23 Jul 2006) | 6 lines

  Be a lot smarter about whether this test passes: instead of assuming
  that a 2.93 sec audio file will always take 3.1 sec (as it did on the
  hardware I had when I first wrote the test), expect that it will take
  2.93 sec +/- 10%, and only fail if it's outside of that range.
  Compute the expected
........
  r50776 | kurt.kaiser | 2006-07-23 06:19:49 +0200 (Sun, 23 Jul 2006) | 2 lines

  Tooltips failed on new-syle class __init__ args.  Bug 1027566 Loren Guthrie
........
  r50777 | neal.norwitz | 2006-07-23 09:50:36 +0200 (Sun, 23 Jul 2006) | 1 line

  Handle more mem alloc issues found with failmalloc
........
  r50778 | neal.norwitz | 2006-07-23 09:51:58 +0200 (Sun, 23 Jul 2006) | 5 lines

  If the for loop isn't entered, entryblock will be NULL.  If passed
  to stackdepth_walk it will be dereffed.

  Not sure if I found with failmalloc or Klockwork #55.
........
  r50779 | neal.norwitz | 2006-07-23 09:53:14 +0200 (Sun, 23 Jul 2006) | 4 lines

  Move the initialization of size_a down below the check for a being NULL.

  Reported by Klocwork #106
........
  r50780 | neal.norwitz | 2006-07-23 09:55:55 +0200 (Sun, 23 Jul 2006) | 9 lines

  Check the allocation of b_objects and return if there was a failure.
  Also fix a few memory leaks in other failure scenarios.

  It seems that if b_objects == Py_None, we will have an extra ref to
  b_objects.  Add XXX comment so hopefully someone documents why the
  else isn't necessary or adds it in.

  Reported by Klocwork #20
........
  r50781 | neal.norwitz | 2006-07-23 09:57:11 +0200 (Sun, 23 Jul 2006) | 2 lines

  Fix memory leaks spotted by Klocwork #37.
........
  r50782 | neal.norwitz | 2006-07-23 09:59:00 +0200 (Sun, 23 Jul 2006) | 5 lines

  nextlink can be NULL if teedataobject_new fails, so use XINCREF.
  Ensure that dataobj is never NULL.

  Reported by Klocwork #102
........
  r50783 | neal.norwitz | 2006-07-23 10:01:43 +0200 (Sun, 23 Jul 2006) | 8 lines

  Ensure we don't write beyond errText.  I think I got this right, but
  it definitely could use some review to ensure I'm not off by one
  and there's no possible overflow/wrap-around of bytes_left.
  Reported by Klocwork #1.

  Fix a problem if there is a failure allocating self->db.
  Found with failmalloc.
........
  r50784 | ronald.oussoren | 2006-07-23 11:41:09 +0200 (Sun, 23 Jul 2006) | 3 lines

  Without this patch CMD-W won't close EditorWindows on MacOS X. This solves
  part of bug #1517990.
........
  r50785 | ronald.oussoren | 2006-07-23 11:46:11 +0200 (Sun, 23 Jul 2006) | 5 lines

  Fix for bug #1517996: Class and Path browsers show Tk menu

  This patch replaces the menubar that is used by AquaTk for windows without a
  menubar of their own by one that is more appropriate for IDLE.
........
  r50786 | andrew.macintyre | 2006-07-23 14:57:02 +0200 (Sun, 23 Jul 2006) | 2 lines

  Build updates for OS/2 EMX port
........
  r50787 | andrew.macintyre | 2006-07-23 15:00:04 +0200 (Sun, 23 Jul 2006) | 3 lines

  bugfix: PyThread_start_new_thread() returns the thread ID, not a flag;
  will backport.
........
  r50789 | andrew.macintyre | 2006-07-23 15:04:00 +0200 (Sun, 23 Jul 2006) | 2 lines

  Get mailbox module working on OS/2 EMX port.
........
  r50791 | greg.ward | 2006-07-23 18:05:51 +0200 (Sun, 23 Jul 2006) | 1 line

  Resync optparse with Optik 1.5.3: minor tweaks for/to tests.
........
  r50794 | martin.v.loewis | 2006-07-24 07:05:22 +0200 (Mon, 24 Jul 2006) | 2 lines

  Update list of unsupported systems. Fixes #1510853.
........
  r50795 | martin.v.loewis | 2006-07-24 12:26:33 +0200 (Mon, 24 Jul 2006) | 1 line

  Patch #1448199: Release GIL around ConnectRegistry.
........
  r50796 | martin.v.loewis | 2006-07-24 13:54:53 +0200 (Mon, 24 Jul 2006) | 3 lines

  Patch #1232023: Don't include empty path component from registry,
  so that the current directory does not get added to sys.path.
  Also fixes #1526785.
........
  r50797 | martin.v.loewis | 2006-07-24 14:54:17 +0200 (Mon, 24 Jul 2006) | 3 lines

  Bug #1524310: Properly report errors from FindNextFile in os.listdir.
  Will backport to 2.4.
........
  r50800 | georg.brandl | 2006-07-24 15:28:57 +0200 (Mon, 24 Jul 2006) | 7 lines

  Patch #1523356: fix determining include dirs in python-config.

  Also don't install "python-config" when doing altinstall, but
  always install "python-config2.x" and make a link to it like
  with the main executable.
........
  r50802 | georg.brandl | 2006-07-24 15:46:47 +0200 (Mon, 24 Jul 2006) | 3 lines

  Patch #1527744: right order of includes in order to have HAVE_CONIO_H defined properly.
........
  r50803 | georg.brandl | 2006-07-24 16:09:56 +0200 (Mon, 24 Jul 2006) | 3 lines

  Patch #1515343: Fix printing of deprecated string exceptions with a
  value in the traceback module.
........
  r50804 | kurt.kaiser | 2006-07-24 19:13:23 +0200 (Mon, 24 Jul 2006) | 7 lines

  EditorWindow failed when used stand-alone if sys.ps1 not set.
  Bug 1010370 Dave Florek

  M    EditorWindow.py
  M    PyShell.py
  M    NEWS.txt
........
  r50805 | kurt.kaiser | 2006-07-24 20:05:51 +0200 (Mon, 24 Jul 2006) | 6 lines

  - EditorWindow.test() was failing.  Bug 1417598

  M    EditorWindow.py
  M    ScriptBinding.py
  M    NEWS.txt
........
  r50808 | georg.brandl | 2006-07-24 22:11:35 +0200 (Mon, 24 Jul 2006) | 3 lines

  Repair accidental NameError.
........
  r50809 | tim.peters | 2006-07-24 23:02:15 +0200 (Mon, 24 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50810 | greg.ward | 2006-07-25 04:11:12 +0200 (Tue, 25 Jul 2006) | 3 lines

  Don't use standard assert: want tests to fail even when run with -O.
  Delete cruft.
........
  r50811 | tim.peters | 2006-07-25 06:07:22 +0200 (Tue, 25 Jul 2006) | 10 lines

  current_frames_with_threads():  There's actually no way
  to guess /which/ line the spawned thread is in at the time
  sys._current_frames() is called:  we know it finished
  enter_g.set(), but can't know whether the instruction
  counter has advanced to the following leave_g.wait().
  The latter is overwhelming most likely, but not guaranteed,
  and I see that the "x86 Ubuntu dapper (icc) trunk" buildbot
  found it on the other line once.  Changed the test so it
  passes in either case.
........
  r50815 | martin.v.loewis | 2006-07-25 11:53:12 +0200 (Tue, 25 Jul 2006) | 2 lines

  Bug #1525817: Don't truncate short lines in IDLE's tool tips.
........
  r50816 | martin.v.loewis | 2006-07-25 12:05:47 +0200 (Tue, 25 Jul 2006) | 3 lines

  Bug #978833: Really close underlying socket in _socketobject.close.
  Will backport to 2.4.
........
  r50817 | martin.v.loewis | 2006-07-25 12:11:14 +0200 (Tue, 25 Jul 2006) | 1 line

  Revert incomplete checkin.
........
  r50819 | georg.brandl | 2006-07-25 12:22:34 +0200 (Tue, 25 Jul 2006) | 4 lines

  Patch #1525766: correctly pass onerror arg to recursive calls
  of pkg.walk_packages. Also improve the docstrings.
........
  r50825 | brett.cannon | 2006-07-25 19:32:20 +0200 (Tue, 25 Jul 2006) | 2 lines

  Add comment for changes to test_ossaudiodev.
........
  r50826 | brett.cannon | 2006-07-25 19:34:36 +0200 (Tue, 25 Jul 2006) | 3 lines

  Fix a bug in the messages for an assert failure where not enough arguments to a string
  were being converted in the format.
........
  r50828 | armin.rigo | 2006-07-25 20:09:57 +0200 (Tue, 25 Jul 2006) | 2 lines

  Document why is and is not a good way to fix the gc_inspection crasher.
........
  r50829 | armin.rigo | 2006-07-25 20:11:07 +0200 (Tue, 25 Jul 2006) | 5 lines

  Added another crasher, which hit me today (I was not intentionally
  writing such code, of course, but it took some gdb time to figure out
  what my bug was).
........
  r50830 | armin.rigo | 2006-07-25 20:38:39 +0200 (Tue, 25 Jul 2006) | 3 lines

  Document the crashers that will not go away soon as "won't fix",
  and explain why.
........
  r50831 | ronald.oussoren | 2006-07-25 21:13:35 +0200 (Tue, 25 Jul 2006) | 3 lines

  Install the compatibility symlink to libpython.a on OSX using 'ln -sf' instead
  of 'ln -s', this avoid problems when reinstalling python.
........
  r50832 | ronald.oussoren | 2006-07-25 21:20:54 +0200 (Tue, 25 Jul 2006) | 7 lines

  Fix for bug #1525447 (renaming to MacOSmodule.c would also work, but not
  without causing problems for anyone that is on a case-insensitive filesystem).

  Setup.py tries to compile the MacOS extension from MacOSmodule.c, while the
  actual file is named macosmodule.c. This is no problem on the (default)
  case-insensitive filesystem, but doesn't work on case-sensitive filesystems.
........
  r50833 | ronald.oussoren | 2006-07-25 22:28:55 +0200 (Tue, 25 Jul 2006) | 7 lines

  Fix bug #1517990: IDLE keybindings on OSX

  This adds a new key definition for OSX, which is slightly different from the
  classic mac definition.

  Also add NEWS item for a couple of bugfixes I added recently.
........
  r50834 | tim.peters | 2006-07-26 00:30:24 +0200 (Wed, 26 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50839 | neal.norwitz | 2006-07-26 06:00:18 +0200 (Wed, 26 Jul 2006) | 1 line

  Hmm, only python2.x is installed, not plain python.  Did that change recently?
........
  r50840 | barry.warsaw | 2006-07-26 07:54:46 +0200 (Wed, 26 Jul 2006) | 6 lines

  Forward port some fixes that were in email 2.5 but for some reason didn't make
  it into email 4.0.  Specifically, in Message.get_content_charset(), handle RFC
  2231 headers that contain an encoding not known to Python, or a character in
  the data that isn't in the charset encoding.  Also forward port the
  appropriate unit tests.
........
  r50841 | georg.brandl | 2006-07-26 09:23:32 +0200 (Wed, 26 Jul 2006) | 3 lines

  NEWS entry for #1525766.
........
  r50842 | georg.brandl | 2006-07-26 09:40:17 +0200 (Wed, 26 Jul 2006) | 3 lines

  Bug #1459963: properly capitalize HTTP header names.
........
  r50843 | georg.brandl | 2006-07-26 10:03:10 +0200 (Wed, 26 Jul 2006) | 6 lines

  Part of bug #1523610: fix miscalculation of buffer length.

  Also add a guard against NULL in converttuple and add a test case
  (that previously would have crashed).
........
  r50844 | martin.v.loewis | 2006-07-26 14:12:56 +0200 (Wed, 26 Jul 2006) | 3 lines

  Bug #978833: Really close underlying socket in _socketobject.close.
  Fix httplib.HTTPConnection.getresponse to not close the
  socket if it is still needed for the response.
........
  r50845 | andrew.kuchling | 2006-07-26 19:16:52 +0200 (Wed, 26 Jul 2006) | 1 line

  [Bug #1471938] Fix build problem on Solaris 8 by conditionalizing the use of mvwgetnstr(); it was conditionalized a few lines below.  Fix from Paul Eggert.  I also tried out the STRICT_SYSV_CURSES case and am therefore removing the 'untested' comment.
........
  r50846 | andrew.kuchling | 2006-07-26 19:18:01 +0200 (Wed, 26 Jul 2006) | 1 line

  Correct error message
........
  r50847 | andrew.kuchling | 2006-07-26 19:19:39 +0200 (Wed, 26 Jul 2006) | 1 line

  Minor grammar fix
........
  r50848 | andrew.kuchling | 2006-07-26 19:22:21 +0200 (Wed, 26 Jul 2006) | 1 line

  Put news item in right section
........
  r50850 | andrew.kuchling | 2006-07-26 20:03:12 +0200 (Wed, 26 Jul 2006) | 1 line

  Use sys.exc_info()
........
  r50851 | andrew.kuchling | 2006-07-26 20:15:45 +0200 (Wed, 26 Jul 2006) | 1 line

  Use sys.exc_info()
........
  r50852 | phillip.eby | 2006-07-26 21:48:27 +0200 (Wed, 26 Jul 2006) | 4 lines

  Allow the 'onerror' argument to walk_packages() to catch any Exception, not
  just ImportError.  This allows documentation tools to better skip unimportable
  packages.
........
  r50854 | tim.peters | 2006-07-27 01:23:15 +0200 (Thu, 27 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50855 | tim.peters | 2006-07-27 03:14:53 +0200 (Thu, 27 Jul 2006) | 21 lines

  Bug #1521947:  possible bug in mystrtol.c with recent gcc.

  In general, C doesn't define anything about what happens when
  an operation on a signed integral type overflows, and PyOS_strtol()
  did several formally undefined things of that nature on signed
  longs.  Some version of gcc apparently tries to exploit that now,
  and PyOS_strtol() could fail to detect overflow then.

  Tried to repair all that, although it seems at least as likely to me
  that we'll get screwed by bad platform definitions for LONG_MIN
  and/or LONG_MAX now.  For that reason, I don't recommend backporting
  this.

  Note that I have no box on which this makes a lick of difference --
  can't really test it, except to note that it didn't break anything
  on my boxes.

  Silent change:  PyOS_strtol() used to return the hard-coded 0x7fffffff
  in case of overflow.  Now it returns LONG_MAX.  They're the same only on
  32-bit boxes (although C doesn't guarantee that either ...).
........
  r50856 | neal.norwitz | 2006-07-27 05:51:58 +0200 (Thu, 27 Jul 2006) | 6 lines

  Don't kill a normal instance of python running on windows when checking
  to kill a cygwin instance.  build\\python.exe was matching a normal windows
  instance.  Prefix that with a \\ to ensure build is a directory and not
  PCbuild.  As discussed on python-dev.
........
  r50857 | neal.norwitz | 2006-07-27 05:55:39 +0200 (Thu, 27 Jul 2006) | 5 lines

  Closure can't be NULL at this point since we know it's a tuple.

  Reported by Klocwork # 74.
........
  r50858 | neal.norwitz | 2006-07-27 06:04:50 +0200 (Thu, 27 Jul 2006) | 1 line

  No functional change.  Add comment and assert to describe why there cannot be overflow which was reported by Klocwork.  Discussed on python-dev
........
  r50859 | martin.v.loewis | 2006-07-27 08:38:16 +0200 (Thu, 27 Jul 2006) | 3 lines

  Bump distutils version to 2.5, as several new features
  have been introduced since 2.4.
........
  r50860 | andrew.kuchling | 2006-07-27 14:18:20 +0200 (Thu, 27 Jul 2006) | 1 line

  Reformat docstring; fix typo
........
  r50861 | georg.brandl | 2006-07-27 17:05:36 +0200 (Thu, 27 Jul 2006) | 6 lines

  Add test_main() methods. These three tests were never run
  by regrtest.py.

  We really need a simpler testing framework.
........
  r50862 | tim.peters | 2006-07-27 17:09:20 +0200 (Thu, 27 Jul 2006) | 2 lines

  News for patch #1529686.
........
  r50863 | tim.peters | 2006-07-27 17:11:00 +0200 (Thu, 27 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50864 | georg.brandl | 2006-07-27 17:38:33 +0200 (Thu, 27 Jul 2006) | 3 lines

  Amend news entry.
........
  r50865 | georg.brandl | 2006-07-27 18:08:15 +0200 (Thu, 27 Jul 2006) | 3 lines

  Make uuid test suite pass on this box by requesting output with LC_ALL=C.
........
  r50866 | andrew.kuchling | 2006-07-27 20:37:33 +0200 (Thu, 27 Jul 2006) | 1 line

  Add example
........
  r50867 | thomas.heller | 2006-07-27 20:39:55 +0200 (Thu, 27 Jul 2006) | 9 lines

  Remove code that is no longer used (ctypes.com).

  Fix the DllGetClassObject and DllCanUnloadNow so that they forward the
  call to the comtypes.server.inprocserver module.

  The latter was never documented, never used by published code, and
  didn't work anyway, so I think it does not deserve a NEWS entry (but I
  might be wrong).
........
  r50868 | andrew.kuchling | 2006-07-27 20:41:21 +0200 (Thu, 27 Jul 2006) | 1 line

  Typo fix ('publically' is rare, poss. non-standard)
........
  r50869 | andrew.kuchling | 2006-07-27 20:42:41 +0200 (Thu, 27 Jul 2006) | 1 line

  Add missing word
........
  r50870 | andrew.kuchling | 2006-07-27 20:44:10 +0200 (Thu, 27 Jul 2006) | 1 line

  Repair typos
........
  r50872 | andrew.kuchling | 2006-07-27 20:53:33 +0200 (Thu, 27 Jul 2006) | 1 line

  Update URL; add example
........
  r50873 | andrew.kuchling | 2006-07-27 21:07:29 +0200 (Thu, 27 Jul 2006) | 1 line

  Add punctuation mark; add some examples
........
  r50874 | andrew.kuchling | 2006-07-27 21:11:07 +0200 (Thu, 27 Jul 2006) | 1 line

  Mention base64 module; rewrite last sentence to be more positive
........
  r50875 | andrew.kuchling | 2006-07-27 21:12:49 +0200 (Thu, 27 Jul 2006) | 1 line

  If binhex is higher-level than binascii, it should come first in the chapter
........
  r50876 | tim.peters | 2006-07-27 22:47:24 +0200 (Thu, 27 Jul 2006) | 28 lines

  check_node():  stop spraying mystery output to stderr.

  When a node number disagrees, keep track of all sources & the
  node numbers they reported, and stick all that in the error message.

  Changed all callers to supply a non-empty "source" argument; made
  the "source" argument non-optional.

  On my box, test_uuid still fails, but with the less confusing output:

  AssertionError: different sources disagree on node:
      from source 'getnode1', node was 00038a000015
      from source 'getnode2', node was 00038a000015
      from source 'ipconfig', node was 001111b2b7bf

  Only the last one appears to be correct; e.g.,

  C:\Code\python\PCbuild>getmac

  Physical Address    Transport Name
  =================== ==========================================================
  00-11-11-B2-B7-BF   \Device\Tcpip_{190FB163-5AFD-4483-86A1-2FE16AC61FF1}
  62-A1-AC-6C-FD-BE   \Device\Tcpip_{8F77DF5A-EA3D-4F1D-975E-D472CEE6438A}
  E2-1F-01-C6-5D-88   \Device\Tcpip_{CD18F76B-2EF3-409F-9B8A-6481EE70A1E4}

  I can't find anything on my box with MAC 00-03-8a-00-00-15, and am
  not clear on where that comes from.
........
  r50878 | andrew.kuchling | 2006-07-28 00:40:05 +0200 (Fri, 28 Jul 2006) | 1 line

  Reword paragraph
........
  r50879 | andrew.kuchling | 2006-07-28 00:49:38 +0200 (Fri, 28 Jul 2006) | 1 line

  Add example
........
  r50880 | andrew.kuchling | 2006-07-28 00:49:54 +0200 (Fri, 28 Jul 2006) | 1 line

  Add example
........
  r50881 | barry.warsaw | 2006-07-28 01:43:15 +0200 (Fri, 28 Jul 2006) | 27 lines

  Patch #1520294: Support for getset and member descriptors in types.py,
  inspect.py, and pydoc.py.  Specifically, this allows for querying the type of
  an object against these built-in C types and more importantly, for getting
  their docstrings printed in the interactive interpreter's help() function.

  This patch includes a new built-in module called _types which provides
  definitions of getset and member descriptors for use by the types.py module.
  These types are exposed as types.GetSetDescriptorType and
  types.MemberDescriptorType.  Query functions are provided as
  inspect.isgetsetdescriptor() and inspect.ismemberdescriptor().  The
  implementations of these are robust enough to work with Python implementations
  other than CPython, which may not have these fundamental types.

  The patch also includes documentation and test suite updates.

  I commit these changes now under these guiding principles:

  1. Silence is assent.  The release manager has not said "no", and of the few
     people that cared enough to respond to the thread, the worst vote was "0".

  2. It's easier to ask for forgiveness than permission.

  3. It's so dang easy to revert stuff in svn, that you could view this as a
     forcing function. :)

  Windows build patches will follow.
........
  r50882 | tim.peters | 2006-07-28 01:44:37 +0200 (Fri, 28 Jul 2006) | 4 lines

  Bug #1529297:  The rewrite of doctest for Python 2.4 unintentionally
  lost that tests are sorted by name before being run.  ``DocTestFinder``
  has been changed to sort the list of tests it returns.
........
  r50883 | tim.peters | 2006-07-28 01:45:48 +0200 (Fri, 28 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50884 | tim.peters | 2006-07-28 01:46:36 +0200 (Fri, 28 Jul 2006) | 2 lines

  Add missing svn:eol-style property to text files.
........
  r50885 | barry.warsaw | 2006-07-28 01:50:40 +0200 (Fri, 28 Jul 2006) | 4 lines

  Enable the building of the _types module on Windows.

  Note that this has only been tested for VS 2003 since that's all I have.
........
  r50887 | tim.peters | 2006-07-28 02:23:15 +0200 (Fri, 28 Jul 2006) | 7 lines

  defdict_reduce():  Plug leaks.

  We didn't notice these before because test_defaultdict didn't
  actually do anything before Georg fixed that earlier today.
  Neal's next refleak run then showed test_defaultdict leaking
  9 references on each run.  That's repaired by this checkin.
........
  r50888 | tim.peters | 2006-07-28 02:30:00 +0200 (Fri, 28 Jul 2006) | 2 lines

  News about the repaired memory leak in defaultdict.
........
  r50889 | gregory.p.smith | 2006-07-28 03:35:25 +0200 (Fri, 28 Jul 2006) | 7 lines

  - pybsddb Bug #1527939: bsddb module DBEnv dbremove and dbrename
    methods now allow their database parameter to be None as the
    sleepycat API allows.

  Also adds an appropriate test case for DBEnv.dbrename and dbremove.
........
  r50895 | neal.norwitz | 2006-07-28 06:22:34 +0200 (Fri, 28 Jul 2006) | 1 line

  Ensure the actual number matches the expected count
........
  r50896 | tim.peters | 2006-07-28 06:51:59 +0200 (Fri, 28 Jul 2006) | 6 lines

  Live with that "the hardware address" is an ill-defined
  concept, and that different ways of trying to find "the
  hardware address" may return different results.  Certainly
  true on both of my Windows boxes, and in different ways
  (see whining on python-dev).
........
  r50897 | neal.norwitz | 2006-07-28 09:21:27 +0200 (Fri, 28 Jul 2006) | 3 lines

  Try to find the MAC addr on various flavours of Unix.  This seems hopeless.
  The reduces the test_uuid failures, but there's still another method failing.
........
  r50898 | martin.v.loewis | 2006-07-28 09:45:49 +0200 (Fri, 28 Jul 2006) | 2 lines

  Add UUID for upcoming 2.5b3.
........
  r50899 | matt.fleming | 2006-07-28 13:27:27 +0200 (Fri, 28 Jul 2006) | 3 lines

  Allow socketmodule to compile on NetBSD -current, whose bluetooth API
  differs from both Linux and FreeBSD. Accepted by Neal Norwitz.
........
  r50900 | andrew.kuchling | 2006-07-28 14:07:12 +0200 (Fri, 28 Jul 2006) | 1 line

  [Patch #1529811] Correction to description of r|* mode
........
  r50901 | andrew.kuchling | 2006-07-28 14:18:22 +0200 (Fri, 28 Jul 2006) | 1 line

  Typo fix
........
  r50902 | andrew.kuchling | 2006-07-28 14:32:43 +0200 (Fri, 28 Jul 2006) | 1 line

  Add example
........
  r50903 | andrew.kuchling | 2006-07-28 14:33:19 +0200 (Fri, 28 Jul 2006) | 1 line

  Add example
........
  r50904 | andrew.kuchling | 2006-07-28 14:45:55 +0200 (Fri, 28 Jul 2006) | 1 line

  Don't overwrite built-in name; add some blank lines for readability
........
  r50905 | andrew.kuchling | 2006-07-28 14:48:07 +0200 (Fri, 28 Jul 2006) | 1 line

  Add example.  Should I propagate this example to all the other DBM-ish modules, too?
........
  r50912 | georg.brandl | 2006-07-28 20:31:39 +0200 (Fri, 28 Jul 2006) | 3 lines

  Patch #1529686: also run test_email_codecs with regrtest.py.
........
  r50913 | georg.brandl | 2006-07-28 20:36:01 +0200 (Fri, 28 Jul 2006) | 3 lines

  Fix spelling.
........
  r50915 | thomas.heller | 2006-07-28 21:42:40 +0200 (Fri, 28 Jul 2006) | 3 lines

  Remove a useless XXX comment.
  Cosmetic changes to the code so that the #ifdef _UNICODE block
  doesn't mess emacs code formatting.
........
  r50916 | phillip.eby | 2006-07-28 23:12:07 +0200 (Fri, 28 Jul 2006) | 5 lines

  Bug #1529871: The speed enhancement patch #921466 broke Python's compliance
  with PEP 302.  This was fixed by adding an ``imp.NullImporter`` type that is
  used in ``sys.path_importer_cache`` to cache non-directory paths and avoid
  excessive filesystem operations during imports.
........
  r50917 | phillip.eby | 2006-07-28 23:31:54 +0200 (Fri, 28 Jul 2006) | 2 lines

  Fix svn merge spew.
........
  r50918 | thomas.heller | 2006-07-28 23:43:20 +0200 (Fri, 28 Jul 2006) | 4 lines

  Patch #1529514: More openbsd platforms for ctypes.
  Regenerated Modules/_ctypes/libffi/configure with autoconf 2.59.

  Approved by Neal.
........
  r50922 | georg.brandl | 2006-07-29 10:51:21 +0200 (Sat, 29 Jul 2006) | 2 lines

  Bug #835255: The "closure" argument to new.function() is now documented.
........
  r50924 | georg.brandl | 2006-07-29 11:33:26 +0200 (Sat, 29 Jul 2006) | 3 lines

  Bug #1441397: The compiler module now recognizes module and function
  docstrings correctly as it did in Python 2.4.
........
  r50925 | georg.brandl | 2006-07-29 12:25:46 +0200 (Sat, 29 Jul 2006) | 4 lines

  Revert rev 42617, it was introduced to work around bug #1441397.
  test_compiler now passes again.
........
  r50926 | fred.drake | 2006-07-29 15:22:49 +0200 (Sat, 29 Jul 2006) | 1 line

  update target version number
........
  r50927 | andrew.kuchling | 2006-07-29 15:56:48 +0200 (Sat, 29 Jul 2006) | 1 line

  Add example
........
  r50928 | andrew.kuchling | 2006-07-29 16:04:47 +0200 (Sat, 29 Jul 2006) | 1 line

  Update URL
........
  r50930 | andrew.kuchling | 2006-07-29 16:08:15 +0200 (Sat, 29 Jul 2006) | 1 line

  Reword paragraph to match the order of the subsequent sections
........
  r50931 | andrew.kuchling | 2006-07-29 16:21:15 +0200 (Sat, 29 Jul 2006) | 1 line

  [Bug #1529157] Mention raw_input() and input(); while I'm at it, reword the description a bit
........
  r50932 | andrew.kuchling | 2006-07-29 16:42:48 +0200 (Sat, 29 Jul 2006) | 1 line

  [Bug #1519571] Document some missing functions: setup(), title(), done()
........
  r50933 | andrew.kuchling | 2006-07-29 16:43:55 +0200 (Sat, 29 Jul 2006) | 1 line

  Fix docstring punctuation
........
  r50934 | andrew.kuchling | 2006-07-29 17:10:32 +0200 (Sat, 29 Jul 2006) | 1 line

  [Bug #1414697] Change docstring of set/frozenset types to specify that the contents are unique.  Raymond, please feel free to edit or revert.
........
  r50935 | andrew.kuchling | 2006-07-29 17:35:21 +0200 (Sat, 29 Jul 2006) | 1 line

  [Bug #1530382] Document SSL.server(), .issuer() methods
........
  r50936 | andrew.kuchling | 2006-07-29 17:42:46 +0200 (Sat, 29 Jul 2006) | 1 line

  Typo fix
........
  r50937 | andrew.kuchling | 2006-07-29 17:43:13 +0200 (Sat, 29 Jul 2006) | 1 line

  Tweak wording
........
  r50938 | matt.fleming | 2006-07-29 17:55:30 +0200 (Sat, 29 Jul 2006) | 2 lines

  Fix typo
........
  r50939 | andrew.kuchling | 2006-07-29 17:57:08 +0200 (Sat, 29 Jul 2006) | 6 lines

  [Bug #1528258] Mention that the 'data' argument can be None.

  The constructor docs referred the reader to the add_data() method's docs,
  but they weren't very helpful.  I've simply copied an earlier explanation
  of 'data' that's more useful.
........
  r50940 | andrew.kuchling | 2006-07-29 18:08:40 +0200 (Sat, 29 Jul 2006) | 1 line

  Set bug/patch count.  Take a bow, everyone!
........
  r50941 | fred.drake | 2006-07-29 18:56:15 +0200 (Sat, 29 Jul 2006) | 18 lines

  expunge the xmlcore changes:
    41667, 41668 - initial switch to xmlcore
    47044        - mention of xmlcore in What's New
    50687        - mention of xmlcore in the library reference

  re-apply xmlcore changes to xml:
    41674        - line ending changes (re-applied manually), directory props
    41677        - add cElementTree wrapper
    41678        - PSF licensing for etree
    41812        - whitespace normalization
    42724        - fix svn:eol-style settings
    43681, 43682 - remove Python version-compatibility cruft from minidom
    46773        - fix encoding of \r\n\t in attr values in saxutils
    47269        - added XMLParser alias for cElementTree compatibility

  additional tests were added in Lib/test/test_sax.py that failed with
  the xmlcore changes; these relate to SF bugs #1511497, #1513611
........
  r50942 | andrew.kuchling | 2006-07-29 20:14:07 +0200 (Sat, 29 Jul 2006) | 17 lines

  Reorganize the docs for 'file' and 'open()' after some discussion with Fred.

  We want to encourage users to write open() when opening a file, but
  open() was described with a single paragraph and
  'file' had lots of explanation of the mode and bufsize arguments.

  I've shrunk the description of 'file' to cross-reference to the 'File
  objects' section, and to open() for an explanation of the arguments.

  open() now has all the paragraphs about the mode string.  The bufsize
  argument was moved up so that it isn't buried at the end; now there's
  1 paragraph on mode, 1 on bufsize, and then 3 more on mode.  Various
  other edits and rearrangements were made in the process.

  It's probably best to read the final text and not to try to make sense
  of the diffs.
........
  r50943 | fred.drake | 2006-07-29 20:19:19 +0200 (Sat, 29 Jul 2006) | 1 line

  restore test un-intentionally removed in the xmlcore purge (revision 50941)
........
  r50944 | fred.drake | 2006-07-29 20:33:29 +0200 (Sat, 29 Jul 2006) | 3 lines

  make the reference to older versions of the documentation a link
  to the right page on python.org
........
  r50945 | fred.drake | 2006-07-29 21:09:01 +0200 (Sat, 29 Jul 2006) | 1 line

  document the footnote usage pattern
........
  r50947 | fred.drake | 2006-07-29 21:14:10 +0200 (Sat, 29 Jul 2006) | 1 line

  emphasize and oddball nuance of LaTeX comment syntax
........
  r50948 | andrew.kuchling | 2006-07-29 21:24:04 +0200 (Sat, 29 Jul 2006) | 1 line

  [Patch #1490989 from Skip Montanaro]  Mention debugging builds in the API documentation.  I've changed Skip's patch to point to Misc/SpecialBuilds and fiddled with the markup a bit.
........
  r50949 | neal.norwitz | 2006-07-29 21:29:35 +0200 (Sat, 29 Jul 2006) | 6 lines

  Disable these tests until they are reliable across platforms.
  These problems may mask more important, real problems.

  One or both methods are known to fail on: Solaris, OpenBSD, Debian, Ubuntu.
  They pass on Windows and some Linux boxes.
........
  r50950 | andrew.kuchling | 2006-07-29 21:50:37 +0200 (Sat, 29 Jul 2006) | 1 line

  [Patch #1068277] Clarify that os.path.exists() can return False depending on permissions.  Fred approved committing this patch in December 2004!
........
  r50952 | fred.drake | 2006-07-29 22:04:42 +0200 (Sat, 29 Jul 2006) | 6 lines

  SF bug #1193966: Weakref types documentation misplaced

  The information about supporting weakrefs with types defined in C extensions
  is moved to the Extending & Embedding manual.  Py_TPFLAGS_HAVE_WEAKREFS is
  no longer mentioned since it is part of Py_TPFLAGS_DEFAULT.
........
  r50953 | skip.montanaro | 2006-07-29 22:06:05 +0200 (Sat, 29 Jul 2006) | 4 lines

  Add a comment to the csv reader documentation that explains why the
  treatment of newlines changed in 2.5.  Pulled almost verbatim from a comment
  by Andrew McNamara in <http://python.org/sf/1465014>.
........
  r50954 | neal.norwitz | 2006-07-29 22:20:52 +0200 (Sat, 29 Jul 2006) | 3 lines

  If the executable doesn't exist, there's no reason to try to start it.
  This prevents garbage about command not found being printed on Solaris.
........
  r50955 | fred.drake | 2006-07-29 22:21:25 +0200 (Sat, 29 Jul 2006) | 1 line

  fix minor markup error that introduced extra punctuation
........
  r50957 | neal.norwitz | 2006-07-29 22:37:08 +0200 (Sat, 29 Jul 2006) | 3 lines

  Disable test_getnode too, since this is also unreliable.
........
  r50958 | andrew.kuchling | 2006-07-29 23:27:12 +0200 (Sat, 29 Jul 2006) | 1 line

  Follow TeX's conventions for hyphens
........
  r50959 | andrew.kuchling | 2006-07-29 23:30:21 +0200 (Sat, 29 Jul 2006) | 1 line

  Fix case for 'Unix'
........
  r50960 | fred.drake | 2006-07-30 01:34:57 +0200 (Sun, 30 Jul 2006) | 1 line

  markup cleanups
........
  r50961 | andrew.kuchling | 2006-07-30 02:27:34 +0200 (Sun, 30 Jul 2006) | 1 line

  Minor typo fixes
........
  r50962 | andrew.kuchling | 2006-07-30 02:37:56 +0200 (Sun, 30 Jul 2006) | 1 line

  [Bug #793553] Correct description of keyword arguments for SSL authentication
........
  r50963 | tim.peters | 2006-07-30 02:58:15 +0200 (Sun, 30 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50964 | fred.drake | 2006-07-30 05:03:43 +0200 (Sun, 30 Jul 2006) | 1 line

  lots of markup nits, most commonly Unix/unix --> \UNIX
........
  r50965 | fred.drake | 2006-07-30 07:41:28 +0200 (Sun, 30 Jul 2006) | 1 line

  update information on wxPython, from Robin Dunn
........
  r50966 | fred.drake | 2006-07-30 07:49:49 +0200 (Sun, 30 Jul 2006) | 4 lines

  remove possibly-outdated comment on what GUI toolkit is most commonly used;
  it is hard to know whether this is right, and it does not add valuable reference information
  at any rate
........
  r50967 | fred.drake | 2006-07-30 07:55:39 +0200 (Sun, 30 Jul 2006) | 3 lines

  - remove yet another reference to how commonly Tkinter is (thought to be) used
  - fix an internal section reference
........
  r50968 | neal.norwitz | 2006-07-30 08:53:31 +0200 (Sun, 30 Jul 2006) | 4 lines

  Patch #1531113: Fix augmented assignment with yield expressions.
  Also fix a SystemError when trying to assign to yield expressions.
........
  r50969 | neal.norwitz | 2006-07-30 08:55:48 +0200 (Sun, 30 Jul 2006) | 5 lines

  Add PyErr_WarnEx() so C code can pass the stacklevel to warnings.warn().
  This provides the proper warning for struct.pack().
  PyErr_Warn() is now deprecated in favor of PyErr_WarnEx().
  As mentioned by Tim Peters on python-dev.
........
  r50970 | neal.norwitz | 2006-07-30 08:57:04 +0200 (Sun, 30 Jul 2006) | 3 lines

  Bug #1515471: string.replace() accepts character buffers again.
  Pass the char* and size around rather than PyObject's.
........
  r50971 | neal.norwitz | 2006-07-30 08:59:13 +0200 (Sun, 30 Jul 2006) | 1 line

  Whitespace normalization
........
  r50973 | georg.brandl | 2006-07-30 12:53:32 +0200 (Sun, 30 Jul 2006) | 3 lines

  Clarify that __op__ methods must return NotImplemented if they don't support the operation.
........
  r50974 | georg.brandl | 2006-07-30 13:07:23 +0200 (Sun, 30 Jul 2006) | 3 lines

  Bug #1002398: The documentation for os.path.sameopenfile now correctly
  refers to file descriptors, not file objects.
........
  r50977 | martin.v.loewis | 2006-07-30 15:00:31 +0200 (Sun, 30 Jul 2006) | 3 lines

  Don't copy directory stat times in shutil.copytree on Windows
  Fixes #1525866.
........
  r50978 | martin.v.loewis | 2006-07-30 15:14:05 +0200 (Sun, 30 Jul 2006) | 3 lines

  Base __version__ on sys.version_info, as distutils is
  no longer maintained separatedly.
........
  r50979 | martin.v.loewis | 2006-07-30 15:27:31 +0200 (Sun, 30 Jul 2006) | 3 lines

  Mention Cygwin in distutils error message about a missing VS 2003.
  Fixes #1257728.
........
  r50982 | martin.v.loewis | 2006-07-30 16:09:47 +0200 (Sun, 30 Jul 2006) | 5 lines

  Drop usage of test -e in configure as it is not portable.
  Fixes #1439538
  Will backport to 2.4
  Also regenerate pyconfig.h.in.
........
  r50984 | georg.brandl | 2006-07-30 18:20:10 +0200 (Sun, 30 Jul 2006) | 3 lines

  Fix makefile changes for python-config.
........
  r50985 | george.yoshida | 2006-07-30 18:37:37 +0200 (Sun, 30 Jul 2006) | 2 lines

  Rename struct.pack_to to struct.pack_into as changed in revision 46642.
........
  r50986 | george.yoshida | 2006-07-30 18:41:30 +0200 (Sun, 30 Jul 2006) | 2 lines

  Typo fix
........
  r50987 | neal.norwitz | 2006-07-30 21:18:13 +0200 (Sun, 30 Jul 2006) | 1 line

  Add some asserts and update comments
........
  r50988 | neal.norwitz | 2006-07-30 21:18:38 +0200 (Sun, 30 Jul 2006) | 1 line

  Verify that the signal handlers were really called
........
  r50989 | neal.norwitz | 2006-07-30 21:20:42 +0200 (Sun, 30 Jul 2006) | 3 lines

  Try to prevent hangs on Tru64/Alpha buildbot.  I'm not certain this will help
  and may need to be reverted if it causes problems.
........
  r50990 | georg.brandl | 2006-07-30 22:18:51 +0200 (Sun, 30 Jul 2006) | 2 lines

  Bug #1531349: right <-> left glitch in __rop__ description.
........
  r50992 | tim.peters | 2006-07-31 03:46:03 +0200 (Mon, 31 Jul 2006) | 2 lines

  Whitespace normalization.
........
  r50993 | andrew.mcnamara | 2006-07-31 04:27:48 +0200 (Mon, 31 Jul 2006) | 2 lines

  Redo the comment about the 2.5 change in quoted-newline handling.
........
  r50994 | tim.peters | 2006-07-31 04:40:23 +0200 (Mon, 31 Jul 2006) | 10 lines

  ZipFile.close():  Killed one of the struct.pack deprecation
  warnings on Win32.

  Also added an XXX about the line:

                  pos3 = self.fp.tell()

  `pos3` is never referenced, and I have no idea what the code
  intended to do instead.
........
  r50996 | tim.peters | 2006-07-31 04:53:03 +0200 (Mon, 31 Jul 2006) | 8 lines

  ZipFile.close():  Kill the other struct.pack deprecation
  warning on Windows.

  Afraid I can't detect a pattern to when the pack formats decide
  to use a signed or unsigned format code -- appears nearly
  arbitrary to my eyes.  So I left all the pack formats alone and
  changed the special-case data values instead.
........
  r50997 | skip.montanaro | 2006-07-31 05:09:45 +0200 (Mon, 31 Jul 2006) | 1 line

  minor tweaks
........
  r50998 | skip.montanaro | 2006-07-31 05:11:11 +0200 (Mon, 31 Jul 2006) | 1 line

  minor tweaks
........
  r50999 | andrew.kuchling | 2006-07-31 14:20:24 +0200 (Mon, 31 Jul 2006) | 1 line

  Add refcounts for PyErr_WarnEx
........
  r51000 | andrew.kuchling | 2006-07-31 14:39:05 +0200 (Mon, 31 Jul 2006) | 9 lines

  Document PyErr_WarnEx.  (Bad Neal!  No biscuit!)

  Is the explanation of the 'stacklevel' parameter clear?  Please feel free
  to edit it.

  I don't have LaTeX installed on this machine, so haven't verified that the
  markup is correct.  Will check tonight, or maybe the automatic doc build will
  tell me.
........
  r51001 | andrew.kuchling | 2006-07-31 14:52:26 +0200 (Mon, 31 Jul 2006) | 1 line

  Add PyErr_WarnEx()
........
  r51002 | andrew.kuchling | 2006-07-31 15:18:27 +0200 (Mon, 31 Jul 2006) | 1 line

  Mention csv newline changes
........
  r51003 | andrew.kuchling | 2006-07-31 17:22:58 +0200 (Mon, 31 Jul 2006) | 1 line

  Typo fix
........
  r51004 | andrew.kuchling | 2006-07-31 17:23:43 +0200 (Mon, 31 Jul 2006) | 1 line

  Remove reference to  notation
........
  r51005 | georg.brandl | 2006-07-31 18:00:34 +0200 (Mon, 31 Jul 2006) | 3 lines

  Fix function name.
........
  r51006 | andrew.kuchling | 2006-07-31 18:10:24 +0200 (Mon, 31 Jul 2006) | 1 line

  [Bug #1514540] Instead of putting the standard types in a section, put them in a chapter of their own.  This means string methods will now show up in the ToC.  (Should the types come before or after the functions+exceptions+constants chapter?  I've put them after, for now.)
........
  r51007 | andrew.kuchling | 2006-07-31 18:22:05 +0200 (Mon, 31 Jul 2006) | 1 line

  [Bug #848556] Remove \d* from second alternative to avoid exponential case when repeating match
........
  r51008 | andrew.kuchling | 2006-07-31 18:27:57 +0200 (Mon, 31 Jul 2006) | 1 line

  Update list of files; fix a typo
........
  r51013 | andrew.kuchling | 2006-08-01 18:24:30 +0200 (Tue, 01 Aug 2006) | 1 line

  typo fix
........
  r51018 | thomas.heller | 2006-08-01 18:54:43 +0200 (Tue, 01 Aug 2006) | 2 lines

  Fix a potential segfault and various potentail refcount leaks
  in the cast() function.
........
  r51020 | thomas.heller | 2006-08-01 19:46:10 +0200 (Tue, 01 Aug 2006) | 1 line

  Minimal useful docstring for CopyComPointer.
........
  r51021 | andrew.kuchling | 2006-08-01 20:16:15 +0200 (Tue, 01 Aug 2006) | 8 lines

  [Patch #1520905] Attempt to suppress core file created by test_subprocess.py.
  Patch by Douglas Greiman.

  The test_run_abort() testcase produces a core file on Unix systems,
  even though the test is successful. This can be confusing or alarming
  to someone who runs 'make test' and then finds that the Python
  interpreter apparently crashed.
........
  r51023 | georg.brandl | 2006-08-01 20:49:24 +0200 (Tue, 01 Aug 2006) | 3 lines

  os.urandom no longer masks unrelated exceptions like SystemExit or
  KeyboardInterrupt.
........
  r51025 | thomas.heller | 2006-08-01 21:14:15 +0200 (Tue, 01 Aug 2006) | 2 lines

  Speed up PyType_stgdict and PyObject_stgdict.
........
  r51027 | ronald.oussoren | 2006-08-01 22:30:31 +0200 (Tue, 01 Aug 2006) | 3 lines

  Make sure the postinstall action that optionally updates the user's profile
  on MacOS X actually works correctly in all cases.
........
  r51028 | ronald.oussoren | 2006-08-01 23:00:57 +0200 (Tue, 01 Aug 2006) | 4 lines

  This fixes bug #1527397: PythonLauncher runs scripts with the wrong working
  directory. It also fixes a bug where PythonLauncher failed to launch scripts
  when the scriptname (or the path to the script) contains quotes.
........
  r51031 | tim.peters | 2006-08-02 05:27:46 +0200 (Wed, 02 Aug 2006) | 2 lines

  Whitespace normalization.
........
  r51032 | tim.peters | 2006-08-02 06:12:36 +0200 (Wed, 02 Aug 2006) | 19 lines

  Try to squash struct.pack warnings on the "amd64 gentoo trunk"
  buildbot (& possibly other 64-bit boxes) during test_gzip.

  The native zlib crc32 function returns an unsigned 32-bit integer,
  which the Python wrapper implicitly casts to C long.  Therefore the
  same crc can "look negative" on a 32-bit box but "look positive" on
  a 64-bit box.  This patch papers over that platform difference when
  writing the crc to file.

  It may be better to change the Python wrapper, either to make
  the result "look positive" on all platforms (which means it may
  have to return a Python long at times on a 32-bit box), or to
  keep the sign the same across boxes.  But that would be a visible
  change in what users see, while the current hack changes no
  visible behavior (well, apart from stopping the struct deprecation
  warning).

  Note that the module-level write32() function is no longer used.
........
  r51033 | neal.norwitz | 2006-08-02 06:27:11 +0200 (Wed, 02 Aug 2006) | 4 lines

  Prevent memory leak on error.

  Reported by Klocwork #36
........
  r51034 | tim.peters | 2006-08-02 07:20:08 +0200 (Wed, 02 Aug 2006) | 9 lines

  _Stream.close():  Try to kill struct.pack() warnings when
  writing the crc to file on the "PPC64 Debian trunk" buildbot
  when running test_tarfile.

  This is again a case where the native zlib crc is an unsigned
  32-bit int, but the Python wrapper implicitly casts it to
  signed C long, so that "the sign bit looks different" on
  different platforms.
........
  r51035 | ronald.oussoren | 2006-08-02 08:10:10 +0200 (Wed, 02 Aug 2006) | 2 lines

  Updated documentation for the script that builds the OSX installer.
........
  r51036 | neal.norwitz | 2006-08-02 08:14:22 +0200 (Wed, 02 Aug 2006) | 2 lines

  _PyWeakref_GetWeakrefCount() now returns a Py_ssize_t instead of long.
........
  r51037 | neal.norwitz | 2006-08-02 08:15:10 +0200 (Wed, 02 Aug 2006) | 1 line

  v is already checked for NULL, so just DECREF it
........
  r51038 | neal.norwitz | 2006-08-02 08:19:19 +0200 (Wed, 02 Aug 2006) | 1 line

  Let us know when there was a problem and the child had to kill the parent
........
  r51039 | neal.norwitz | 2006-08-02 08:46:21 +0200 (Wed, 02 Aug 2006) | 5 lines

  Patch #1519025 and bug #926423: If a KeyboardInterrupt occurs during
  a socket operation on a socket with a timeout, the exception will be
  caught correctly.  Previously, the exception was not caught.
........
  r51040 | neal.norwitz | 2006-08-02 09:09:32 +0200 (Wed, 02 Aug 2006) | 1 line

  Add some explanation about Klocwork and Coverity static analysis
........
  r51041 | anthony.baxter | 2006-08-02 09:43:09 +0200 (Wed, 02 Aug 2006) | 1 line

  pre-release machinations
........
  r51043 | thomas.heller | 2006-08-02 13:35:31 +0200 (Wed, 02 Aug 2006) | 4 lines

  A few nore words about what ctypes does.
  Document that using the wrong calling convention can also raise
  'ValueError: Procedure called with the wrong number of arguments'.
........
  r51045 | thomas.heller | 2006-08-02 14:00:13 +0200 (Wed, 02 Aug 2006) | 1 line

  Fix a mistake.
........
  r51046 | martin.v.loewis | 2006-08-02 15:53:55 +0200 (Wed, 02 Aug 2006) | 3 lines

  Correction of patch #1455898: In the mbcs decoder, set final=False
  for stream decoder, but final=True for the decode function.
........
  r51049 | tim.peters | 2006-08-02 20:19:35 +0200 (Wed, 02 Aug 2006) | 2 lines

  Add missing svn:eol-style property to text files.
........
  r51079 | neal.norwitz | 2006-08-04 06:50:21 +0200 (Fri, 04 Aug 2006) | 3 lines

  Bug #1531405, format_exception no longer raises an exception if
  str(exception) raised an exception.
........
  r51080 | neal.norwitz | 2006-08-04 06:58:47 +0200 (Fri, 04 Aug 2006) | 11 lines

  Bug #1191458: tracing over for loops now produces a line event
  on each iteration.  I'm not positive this is the best way to handle
  this.  I'm also not sure that there aren't other cases where
  the lnotab is generated incorrectly.  It would be great if people
  that use pdb or tracing could test heavily.

  Also:
   * Remove dead/duplicated code that wasn't used/necessary
     because we already handled the docstring prior to entering the loop.
   * add some debugging code into the compiler (#if 0'd out).
........
  r51081 | neal.norwitz | 2006-08-04 07:09:28 +0200 (Fri, 04 Aug 2006) | 4 lines

  Bug #1333982: string/number constants were inappropriately stored
  in the byte code and co_consts even if they were not used, ie
  immediately popped off the stack.
........
  r51082 | neal.norwitz | 2006-08-04 07:12:19 +0200 (Fri, 04 Aug 2006) | 1 line

  There were really two issues
........
  r51084 | fred.drake | 2006-08-04 07:17:21 +0200 (Fri, 04 Aug 2006) | 1 line

  SF patch #1534048 (bug #1531003): fix typo in error message
........
  r51085 | gregory.p.smith | 2006-08-04 07:17:47 +0200 (Fri, 04 Aug 2006) | 3 lines

  fix typos
........
  r51087 | georg.brandl | 2006-08-04 08:03:53 +0200 (Fri, 04 Aug 2006) | 3 lines

  Fix bug caused by first decrefing, then increfing.
........
  r51109 | neil.schemenauer | 2006-08-04 18:20:30 +0200 (Fri, 04 Aug 2006) | 5 lines

  Fix the 'compiler' package to generate correct code for MAKE_CLOSURE.
  In the 2.5 development cycle, MAKE_CLOSURE as changed to take free
  variables as a tuple rather than as individual items on the stack.
  Closes patch #1534084.
........
  r51110 | georg.brandl | 2006-08-04 20:03:37 +0200 (Fri, 04 Aug 2006) | 3 lines

  Change fix for segfaulting property(), add a NEWS entry and a test.
........
  r51111 | georg.brandl | 2006-08-04 20:07:34 +0200 (Fri, 04 Aug 2006) | 3 lines

  Better fix for bug #1531405, not executing str(value) twice.
........
  r51112 | thomas.heller | 2006-08-04 20:17:40 +0200 (Fri, 04 Aug 2006) | 1 line

  On Windows, make PyErr_Warn an exported function again.
........
  r51113 | thomas.heller | 2006-08-04 20:57:34 +0200 (Fri, 04 Aug 2006) | 4 lines

  Fix #1530448 - fix ctypes build failure on solaris 10.

  The '-mimpure-text' linker flag is required when linking _ctypes.so.
........
  r51114 | thomas.heller | 2006-08-04 21:49:31 +0200 (Fri, 04 Aug 2006) | 3 lines

  Fix #1534738: win32 debug version of _msi must be _msi_d.pyd, not _msi.pyd.
  Fix the name of the pdb file as well.
........
  r51115 | andrew.kuchling | 2006-08-04 22:37:43 +0200 (Fri, 04 Aug 2006) | 1 line

  Typo fixes
........
  r51116 | andrew.kuchling | 2006-08-04 23:10:03 +0200 (Fri, 04 Aug 2006) | 1 line

  Fix mangled sentence
........
  r51118 | tim.peters | 2006-08-05 00:00:35 +0200 (Sat, 05 Aug 2006) | 2 lines

  Whitespace normalization.
........
  r51119 | bob.ippolito | 2006-08-05 01:59:21 +0200 (Sat, 05 Aug 2006) | 5 lines

  Fix #1530559, struct.pack raises TypeError where it used to convert.
  Passing float arguments to struct.pack when integers are expected
  now triggers a DeprecationWarning.
........
  r51123 | georg.brandl | 2006-08-05 08:10:54 +0200 (Sat, 05 Aug 2006) | 3 lines

  Patch #1534922: correct and enhance unittest docs.
........
  r51126 | georg.brandl | 2006-08-06 09:06:33 +0200 (Sun, 06 Aug 2006) | 2 lines

  Bug #1535182: really test the xreadlines() method of bz2 objects.
........
  r51128 | georg.brandl | 2006-08-06 09:26:21 +0200 (Sun, 06 Aug 2006) | 4 lines

  Bug #1535081: A leading underscore has been added to the names of
  the md5 and sha modules, so add it in Modules/Setup.dist too.
........
  r51129 | georg.brandl | 2006-08-06 10:23:54 +0200 (Sun, 06 Aug 2006) | 3 lines

  Bug #1535165: fixed a segfault in input() and raw_input() when
  sys.stdin is closed.
........
  r51131 | georg.brandl | 2006-08-06 11:17:16 +0200 (Sun, 06 Aug 2006) | 2 lines

  Don't produce output in test_builtin.
........
  r51133 | andrew.macintyre | 2006-08-06 14:37:03 +0200 (Sun, 06 Aug 2006) | 4 lines

  test_threading now skips testing alternate thread stack sizes on
  platforms that don't support changing thread stack size.
........
  r51134 | andrew.kuchling | 2006-08-07 00:07:04 +0200 (Mon, 07 Aug 2006) | 2 lines

  [Patch #1464056] Ensure that we use the panelw library when linking with ncursesw.
  Once I see how the buildbots react, I'll backport this to 2.4.
........
  r51137 | georg.brandl | 2006-08-08 13:52:34 +0200 (Tue, 08 Aug 2006) | 3 lines

  webbrowser: Silence stderr output if no gconftool or gnome browser found
........
  r51138 | georg.brandl | 2006-08-08 13:56:21 +0200 (Tue, 08 Aug 2006) | 7 lines

  Remove "non-mapping" and "non-sequence" from TypeErrors raised by
  PyMapping_Size and PySequence_Size.

  Because len() tries first sequence, then mapping size, it will always
  raise a "non-mapping object has no len" error which is confusing.
........
  r51139 | thomas.heller | 2006-08-08 19:37:00 +0200 (Tue, 08 Aug 2006) | 3 lines

  memcmp() can return values other than -1, 0, and +1 but tp_compare
  must not.
........
  r51140 | thomas.heller | 2006-08-08 19:39:20 +0200 (Tue, 08 Aug 2006) | 1 line

  Remove accidently committed, duplicated test.
........
  r51147 | andrew.kuchling | 2006-08-08 20:50:14 +0200 (Tue, 08 Aug 2006) | 1 line

  Reword paragraph to clarify
........
  r51148 | andrew.kuchling | 2006-08-08 20:56:08 +0200 (Tue, 08 Aug 2006) | 1 line

  Move obmalloc item into C API section
........
  r51149 | andrew.kuchling | 2006-08-08 21:00:14 +0200 (Tue, 08 Aug 2006) | 1 line

  'Other changes' section now has only one item; move the item elsewhere and remove the section
........
  r51150 | andrew.kuchling | 2006-08-08 21:00:34 +0200 (Tue, 08 Aug 2006) | 1 line

  Bump version number
........
  r51151 | georg.brandl | 2006-08-08 22:11:22 +0200 (Tue, 08 Aug 2006) | 2 lines

  Bug #1536828: typo: TypeType should have been StringType.
........
  r51153 | georg.brandl | 2006-08-08 22:13:13 +0200 (Tue, 08 Aug 2006) | 2 lines

  Bug #1536660: separate two words.
........
  r51155 | georg.brandl | 2006-08-08 22:48:10 +0200 (Tue, 08 Aug 2006) | 3 lines

  ``str`` is now the same object as ``types.StringType``.
........
  r51156 | tim.peters | 2006-08-09 02:52:26 +0200 (Wed, 09 Aug 2006) | 2 lines

  Whitespace normalization.
........
  r51158 | georg.brandl | 2006-08-09 09:03:22 +0200 (Wed, 09 Aug 2006) | 4 lines

  Introduce an upper bound on tuple nesting depth in
  C argument format strings; fixes rest of #1523610.
........
  r51160 | martin.v.loewis | 2006-08-09 09:57:39 +0200 (Wed, 09 Aug 2006) | 4 lines

  __hash__ may now return long int; the final hash
    value is obtained by invoking hash on the long int.
  Fixes #1536021.
........
  r51168 | andrew.kuchling | 2006-08-09 15:03:41 +0200 (Wed, 09 Aug 2006) | 1 line

  [Bug #1536021] Mention __hash__ change
........
  r51169 | andrew.kuchling | 2006-08-09 15:57:05 +0200 (Wed, 09 Aug 2006) | 1 line

  [Patch #1534027] Add notes on locale module changes
........
  r51170 | andrew.kuchling | 2006-08-09 16:05:35 +0200 (Wed, 09 Aug 2006) | 1 line

  Add missing 'self' parameters
........
  r51171 | andrew.kuchling | 2006-08-09 16:06:19 +0200 (Wed, 09 Aug 2006) | 1 line

  Reindent code
........
  r51172 | armin.rigo | 2006-08-09 16:55:26 +0200 (Wed, 09 Aug 2006) | 2 lines

  Fix and test for an infinite C recursion.
........
  r51173 | ronald.oussoren | 2006-08-09 16:56:33 +0200 (Wed, 09 Aug 2006) | 2 lines

  It's unlikely that future versions will require _POSIX_C_SOURCE
........
  r51178 | armin.rigo | 2006-08-09 17:37:26 +0200 (Wed, 09 Aug 2006) | 2 lines

  Concatenation on a long string breaks (SF #1526585).
........
  r51180 | kurt.kaiser | 2006-08-09 18:46:15 +0200 (Wed, 09 Aug 2006) | 8 lines

  1.  When used w/o subprocess, all exceptions were preceeded by an error
      message claiming they were IDLE internal errors (since 1.2a1).
  2.  Add Ronald Oussoren to CREDITS

  M    NEWS.txt
  M    PyShell.py
  M    CREDITS.txt
........
  r51181 | kurt.kaiser | 2006-08-09 19:47:15 +0200 (Wed, 09 Aug 2006) | 4 lines

  As a slight enhancement to the previous checkin, improve the
  internal error reporting by moving message to IDLE console.
........
  r51182 | andrew.kuchling | 2006-08-09 20:23:14 +0200 (Wed, 09 Aug 2006) | 1 line

  Typo fix
........
  r51183 | kurt.kaiser | 2006-08-09 22:34:46 +0200 (Wed, 09 Aug 2006) | 2 lines

  ToggleTab dialog was setting indent to 8 even if cancelled (since 1.2a1).
........
  r51184 | martin.v.loewis | 2006-08-10 01:42:18 +0200 (Thu, 10 Aug 2006) | 2 lines

  Add some commentary on -mimpure-text.
........
  r51185 | tim.peters | 2006-08-10 02:58:49 +0200 (Thu, 10 Aug 2006) | 2 lines

  Add missing svn:eol-style property to text files.
........
  r51186 | kurt.kaiser | 2006-08-10 03:41:17 +0200 (Thu, 10 Aug 2006) | 2 lines

  Changing tokenize (39046) to detect dedent broke tabnanny check (since 1.2a1)
........
  r51187 | tim.peters | 2006-08-10 05:01:26 +0200 (Thu, 10 Aug 2006) | 13 lines

  test_copytree_simple():  This was leaving behind two new temp
  directories each time it ran, at least on Windows.

  Several changes:  explicitly closed all files; wrapped long
  lines; stopped suppressing errors when removing a file or
  directory fails (removing /shouldn't/ fail!); and changed
  what appeared to be incorrect usage of os.removedirs() (that
  doesn't remove empty directories at and /under/ the given
  path, instead it must be given an empty leaf directory and
  then deletes empty directories moving /up/ the path -- could
  be that the conceptually simpler shutil.rmtree() was really
  actually intended here).
........
2006-08-11 14:57:12 +00:00

7974 lines
208 KiB
C

/*
Unicode implementation based on original code by Fredrik Lundh,
modified by Marc-Andre Lemburg <mal@lemburg.com> according to the
Unicode Integration Proposal (see file Misc/unicode.txt).
Major speed upgrades to the method implementations at the Reykjavik
NeedForSpeed sprint, by Fredrik Lundh and Andrew Dalke.
Copyright (c) Corporation for National Research Initiatives.
--------------------------------------------------------------------
The original string type implementation is:
Copyright (c) 1999 by Secret Labs AB
Copyright (c) 1999 by Fredrik Lundh
By obtaining, using, and/or copying this software and/or its
associated documentation, you agree that you have read, understood,
and will comply with the following terms and conditions:
Permission to use, copy, modify, and distribute this software and its
associated documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appears in all
copies, and that both that copyright notice and this permission notice
appear in supporting documentation, and that the name of Secret Labs
AB or the author not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
--------------------------------------------------------------------
*/
#define PY_SSIZE_T_CLEAN
#include "Python.h"
#include "unicodeobject.h"
#include "ucnhash.h"
#ifdef MS_WINDOWS
#include <windows.h>
#endif
/* Limit for the Unicode object free list */
#define MAX_UNICODE_FREELIST_SIZE 1024
/* Limit for the Unicode object free list stay alive optimization.
The implementation will keep allocated Unicode memory intact for
all objects on the free list having a size less than this
limit. This reduces malloc() overhead for small Unicode objects.
At worst this will result in MAX_UNICODE_FREELIST_SIZE *
(sizeof(PyUnicodeObject) + KEEPALIVE_SIZE_LIMIT +
malloc()-overhead) bytes of unused garbage.
Setting the limit to 0 effectively turns the feature off.
Note: This is an experimental feature ! If you get core dumps when
using Unicode objects, turn this feature off.
*/
#define KEEPALIVE_SIZE_LIMIT 9
/* Endianness switches; defaults to little endian */
#ifdef WORDS_BIGENDIAN
# define BYTEORDER_IS_BIG_ENDIAN
#else
# define BYTEORDER_IS_LITTLE_ENDIAN
#endif
/* --- Globals ------------------------------------------------------------
The globals are initialized by the _PyUnicode_Init() API and should
not be used before calling that API.
*/
#ifdef __cplusplus
extern "C" {
#endif
/* Free list for Unicode objects */
static PyUnicodeObject *unicode_freelist;
static int unicode_freelist_size;
/* The empty Unicode object is shared to improve performance. */
static PyUnicodeObject *unicode_empty;
/* Single character Unicode strings in the Latin-1 range are being
shared as well. */
static PyUnicodeObject *unicode_latin1[256];
/* Default encoding to use and assume when NULL is passed as encoding
parameter; it is initialized by _PyUnicode_Init().
Always use the PyUnicode_SetDefaultEncoding() and
PyUnicode_GetDefaultEncoding() APIs to access this global.
*/
static char unicode_default_encoding[100];
Py_UNICODE
PyUnicode_GetMax(void)
{
#ifdef Py_UNICODE_WIDE
return 0x10FFFF;
#else
/* This is actually an illegal character, so it should
not be passed to unichr. */
return 0xFFFF;
#endif
}
/* --- Bloom Filters ----------------------------------------------------- */
/* stuff to implement simple "bloom filters" for Unicode characters.
to keep things simple, we use a single bitmask, using the least 5
bits from each unicode characters as the bit index. */
/* the linebreak mask is set up by Unicode_Init below */
#define BLOOM_MASK unsigned long
static BLOOM_MASK bloom_linebreak;
#define BLOOM(mask, ch) ((mask & (1 << ((ch) & 0x1F))))
#define BLOOM_LINEBREAK(ch)\
(BLOOM(bloom_linebreak, (ch)) && Py_UNICODE_ISLINEBREAK((ch)))
Py_LOCAL_INLINE(BLOOM_MASK) make_bloom_mask(Py_UNICODE* ptr, Py_ssize_t len)
{
/* calculate simple bloom-style bitmask for a given unicode string */
long mask;
Py_ssize_t i;
mask = 0;
for (i = 0; i < len; i++)
mask |= (1 << (ptr[i] & 0x1F));
return mask;
}
Py_LOCAL_INLINE(int) unicode_member(Py_UNICODE chr, Py_UNICODE* set, Py_ssize_t setlen)
{
Py_ssize_t i;
for (i = 0; i < setlen; i++)
if (set[i] == chr)
return 1;
return 0;
}
#define BLOOM_MEMBER(mask, chr, set, setlen)\
BLOOM(mask, chr) && unicode_member(chr, set, setlen)
/* --- Unicode Object ----------------------------------------------------- */
static
int unicode_resize(register PyUnicodeObject *unicode,
Py_ssize_t length)
{
void *oldstr;
/* Shortcut if there's nothing much to do. */
if (unicode->length == length)
goto reset;
/* Resizing shared object (unicode_empty or single character
objects) in-place is not allowed. Use PyUnicode_Resize()
instead ! */
if (unicode == unicode_empty ||
(unicode->length == 1 &&
unicode->str[0] < 256U &&
unicode_latin1[unicode->str[0]] == unicode)) {
PyErr_SetString(PyExc_SystemError,
"can't resize shared unicode objects");
return -1;
}
/* We allocate one more byte to make sure the string is Ux0000 terminated.
The overallocation is also used by fastsearch, which assumes that it's
safe to look at str[length] (without making any assumptions about what
it contains). */
oldstr = unicode->str;
PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
if (!unicode->str) {
unicode->str = (Py_UNICODE *)oldstr;
PyErr_NoMemory();
return -1;
}
unicode->str[length] = 0;
unicode->length = length;
reset:
/* Reset the object caches */
if (unicode->defenc) {
Py_DECREF(unicode->defenc);
unicode->defenc = NULL;
}
unicode->hash = -1;
return 0;
}
/* We allocate one more byte to make sure the string is
Ux0000 terminated -- XXX is this needed ?
XXX This allocator could further be enhanced by assuring that the
free list never reduces its size below 1.
*/
static
PyUnicodeObject *_PyUnicode_New(Py_ssize_t length)
{
register PyUnicodeObject *unicode;
/* Optimization for empty strings */
if (length == 0 && unicode_empty != NULL) {
Py_INCREF(unicode_empty);
return unicode_empty;
}
/* Unicode freelist & memory allocation */
if (unicode_freelist) {
unicode = unicode_freelist;
unicode_freelist = *(PyUnicodeObject **)unicode;
unicode_freelist_size--;
if (unicode->str) {
/* Keep-Alive optimization: we only upsize the buffer,
never downsize it. */
if ((unicode->length < length) &&
unicode_resize(unicode, length) < 0) {
PyMem_DEL(unicode->str);
goto onError;
}
}
else {
unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
}
PyObject_INIT(unicode, &PyUnicode_Type);
}
else {
unicode = PyObject_New(PyUnicodeObject, &PyUnicode_Type);
if (unicode == NULL)
return NULL;
unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
}
if (!unicode->str) {
PyErr_NoMemory();
goto onError;
}
/* Initialize the first element to guard against cases where
* the caller fails before initializing str -- unicode_resize()
* reads str[0], and the Keep-Alive optimization can keep memory
* allocated for str alive across a call to unicode_dealloc(unicode).
* We don't want unicode_resize to read uninitialized memory in
* that case.
*/
unicode->str[0] = 0;
unicode->str[length] = 0;
unicode->length = length;
unicode->hash = -1;
unicode->defenc = NULL;
return unicode;
onError:
_Py_ForgetReference((PyObject *)unicode);
PyObject_Del(unicode);
return NULL;
}
static
void unicode_dealloc(register PyUnicodeObject *unicode)
{
if (PyUnicode_CheckExact(unicode) &&
unicode_freelist_size < MAX_UNICODE_FREELIST_SIZE) {
/* Keep-Alive optimization */
if (unicode->length >= KEEPALIVE_SIZE_LIMIT) {
PyMem_DEL(unicode->str);
unicode->str = NULL;
unicode->length = 0;
}
if (unicode->defenc) {
Py_DECREF(unicode->defenc);
unicode->defenc = NULL;
}
/* Add to free list */
*(PyUnicodeObject **)unicode = unicode_freelist;
unicode_freelist = unicode;
unicode_freelist_size++;
}
else {
PyMem_DEL(unicode->str);
Py_XDECREF(unicode->defenc);
unicode->ob_type->tp_free((PyObject *)unicode);
}
}
int PyUnicode_Resize(PyObject **unicode, Py_ssize_t length)
{
register PyUnicodeObject *v;
/* Argument checks */
if (unicode == NULL) {
PyErr_BadInternalCall();
return -1;
}
v = (PyUnicodeObject *)*unicode;
if (v == NULL || !PyUnicode_Check(v) || v->ob_refcnt != 1 || length < 0) {
PyErr_BadInternalCall();
return -1;
}
/* Resizing unicode_empty and single character objects is not
possible since these are being shared. We simply return a fresh
copy with the same Unicode content. */
if (v->length != length &&
(v == unicode_empty || v->length == 1)) {
PyUnicodeObject *w = _PyUnicode_New(length);
if (w == NULL)
return -1;
Py_UNICODE_COPY(w->str, v->str,
length < v->length ? length : v->length);
Py_DECREF(*unicode);
*unicode = (PyObject *)w;
return 0;
}
/* Note that we don't have to modify *unicode for unshared Unicode
objects, since we can modify them in-place. */
return unicode_resize(v, length);
}
/* Internal API for use in unicodeobject.c only ! */
#define _PyUnicode_Resize(unicodevar, length) \
PyUnicode_Resize(((PyObject **)(unicodevar)), length)
PyObject *PyUnicode_FromUnicode(const Py_UNICODE *u,
Py_ssize_t size)
{
PyUnicodeObject *unicode;
/* If the Unicode data is known at construction time, we can apply
some optimizations which share commonly used objects. */
if (u != NULL) {
/* Optimization for empty strings */
if (size == 0 && unicode_empty != NULL) {
Py_INCREF(unicode_empty);
return (PyObject *)unicode_empty;
}
/* Single character Unicode objects in the Latin-1 range are
shared when using this constructor */
if (size == 1 && *u < 256) {
unicode = unicode_latin1[*u];
if (!unicode) {
unicode = _PyUnicode_New(1);
if (!unicode)
return NULL;
unicode->str[0] = *u;
unicode_latin1[*u] = unicode;
}
Py_INCREF(unicode);
return (PyObject *)unicode;
}
}
unicode = _PyUnicode_New(size);
if (!unicode)
return NULL;
/* Copy the Unicode data into the new object */
if (u != NULL)
Py_UNICODE_COPY(unicode->str, u, size);
return (PyObject *)unicode;
}
#ifdef HAVE_WCHAR_H
PyObject *PyUnicode_FromWideChar(register const wchar_t *w,
Py_ssize_t size)
{
PyUnicodeObject *unicode;
if (w == NULL) {
PyErr_BadInternalCall();
return NULL;
}
unicode = _PyUnicode_New(size);
if (!unicode)
return NULL;
/* Copy the wchar_t data into the new object */
#ifdef HAVE_USABLE_WCHAR_T
memcpy(unicode->str, w, size * sizeof(wchar_t));
#else
{
register Py_UNICODE *u;
register Py_ssize_t i;
u = PyUnicode_AS_UNICODE(unicode);
for (i = size; i > 0; i--)
*u++ = *w++;
}
#endif
return (PyObject *)unicode;
}
Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode,
wchar_t *w,
Py_ssize_t size)
{
if (unicode == NULL) {
PyErr_BadInternalCall();
return -1;
}
/* If possible, try to copy the 0-termination as well */
if (size > PyUnicode_GET_SIZE(unicode))
size = PyUnicode_GET_SIZE(unicode) + 1;
#ifdef HAVE_USABLE_WCHAR_T
memcpy(w, unicode->str, size * sizeof(wchar_t));
#else
{
register Py_UNICODE *u;
register Py_ssize_t i;
u = PyUnicode_AS_UNICODE(unicode);
for (i = size; i > 0; i--)
*w++ = *u++;
}
#endif
if (size > PyUnicode_GET_SIZE(unicode))
return PyUnicode_GET_SIZE(unicode);
else
return size;
}
#endif
PyObject *PyUnicode_FromOrdinal(int ordinal)
{
Py_UNICODE s[1];
#ifdef Py_UNICODE_WIDE
if (ordinal < 0 || ordinal > 0x10ffff) {
PyErr_SetString(PyExc_ValueError,
"unichr() arg not in range(0x110000) "
"(wide Python build)");
return NULL;
}
#else
if (ordinal < 0 || ordinal > 0xffff) {
PyErr_SetString(PyExc_ValueError,
"unichr() arg not in range(0x10000) "
"(narrow Python build)");
return NULL;
}
#endif
s[0] = (Py_UNICODE)ordinal;
return PyUnicode_FromUnicode(s, 1);
}
PyObject *PyUnicode_FromObject(register PyObject *obj)
{
/* XXX Perhaps we should make this API an alias of
PyObject_Unicode() instead ?! */
if (PyUnicode_CheckExact(obj)) {
Py_INCREF(obj);
return obj;
}
if (PyUnicode_Check(obj)) {
/* For a Unicode subtype that's not a Unicode object,
return a true Unicode object with the same data. */
return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(obj),
PyUnicode_GET_SIZE(obj));
}
return PyUnicode_FromEncodedObject(obj, NULL, "strict");
}
PyObject *PyUnicode_FromEncodedObject(register PyObject *obj,
const char *encoding,
const char *errors)
{
const char *s = NULL;
Py_ssize_t len;
PyObject *v;
if (obj == NULL) {
PyErr_BadInternalCall();
return NULL;
}
#if 0
/* For b/w compatibility we also accept Unicode objects provided
that no encodings is given and then redirect to
PyObject_Unicode() which then applies the additional logic for
Unicode subclasses.
NOTE: This API should really only be used for object which
represent *encoded* Unicode !
*/
if (PyUnicode_Check(obj)) {
if (encoding) {
PyErr_SetString(PyExc_TypeError,
"decoding Unicode is not supported");
return NULL;
}
return PyObject_Unicode(obj);
}
#else
if (PyUnicode_Check(obj)) {
PyErr_SetString(PyExc_TypeError,
"decoding Unicode is not supported");
return NULL;
}
#endif
/* Coerce object */
if (PyString_Check(obj)) {
s = PyString_AS_STRING(obj);
len = PyString_GET_SIZE(obj);
}
else if (PyObject_AsCharBuffer(obj, &s, &len)) {
/* Overwrite the error message with something more useful in
case of a TypeError. */
if (PyErr_ExceptionMatches(PyExc_TypeError))
PyErr_Format(PyExc_TypeError,
"coercing to Unicode: need string or buffer, "
"%.80s found",
obj->ob_type->tp_name);
goto onError;
}
/* Convert to Unicode */
if (len == 0) {
Py_INCREF(unicode_empty);
v = (PyObject *)unicode_empty;
}
else
v = PyUnicode_Decode(s, len, encoding, errors);
return v;
onError:
return NULL;
}
PyObject *PyUnicode_Decode(const char *s,
Py_ssize_t size,
const char *encoding,
const char *errors)
{
PyObject *buffer = NULL, *unicode;
if (encoding == NULL)
encoding = PyUnicode_GetDefaultEncoding();
/* Shortcuts for common default encodings */
if (strcmp(encoding, "utf-8") == 0)
return PyUnicode_DecodeUTF8(s, size, errors);
else if (strcmp(encoding, "latin-1") == 0)
return PyUnicode_DecodeLatin1(s, size, errors);
#if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T)
else if (strcmp(encoding, "mbcs") == 0)
return PyUnicode_DecodeMBCS(s, size, errors);
#endif
else if (strcmp(encoding, "ascii") == 0)
return PyUnicode_DecodeASCII(s, size, errors);
/* Decode via the codec registry */
buffer = PyBuffer_FromMemory((void *)s, size);
if (buffer == NULL)
goto onError;
unicode = PyCodec_Decode(buffer, encoding, errors);
if (unicode == NULL)
goto onError;
if (!PyUnicode_Check(unicode)) {
PyErr_Format(PyExc_TypeError,
"decoder did not return an unicode object (type=%.400s)",
unicode->ob_type->tp_name);
Py_DECREF(unicode);
goto onError;
}
Py_DECREF(buffer);
return unicode;
onError:
Py_XDECREF(buffer);
return NULL;
}
PyObject *PyUnicode_AsDecodedObject(PyObject *unicode,
const char *encoding,
const char *errors)
{
PyObject *v;
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
goto onError;
}
if (encoding == NULL)
encoding = PyUnicode_GetDefaultEncoding();
/* Decode via the codec registry */
v = PyCodec_Decode(unicode, encoding, errors);
if (v == NULL)
goto onError;
return v;
onError:
return NULL;
}
PyObject *PyUnicode_Encode(const Py_UNICODE *s,
Py_ssize_t size,
const char *encoding,
const char *errors)
{
PyObject *v, *unicode;
unicode = PyUnicode_FromUnicode(s, size);
if (unicode == NULL)
return NULL;
v = PyUnicode_AsEncodedString(unicode, encoding, errors);
Py_DECREF(unicode);
return v;
}
PyObject *PyUnicode_AsEncodedObject(PyObject *unicode,
const char *encoding,
const char *errors)
{
PyObject *v;
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
goto onError;
}
if (encoding == NULL)
encoding = PyUnicode_GetDefaultEncoding();
/* Encode via the codec registry */
v = PyCodec_Encode(unicode, encoding, errors);
if (v == NULL)
goto onError;
return v;
onError:
return NULL;
}
PyObject *PyUnicode_AsEncodedString(PyObject *unicode,
const char *encoding,
const char *errors)
{
PyObject *v;
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
goto onError;
}
if (encoding == NULL)
encoding = PyUnicode_GetDefaultEncoding();
/* Shortcuts for common default encodings */
if (errors == NULL) {
if (strcmp(encoding, "utf-8") == 0)
return PyUnicode_AsUTF8String(unicode);
else if (strcmp(encoding, "latin-1") == 0)
return PyUnicode_AsLatin1String(unicode);
#if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T)
else if (strcmp(encoding, "mbcs") == 0)
return PyUnicode_AsMBCSString(unicode);
#endif
else if (strcmp(encoding, "ascii") == 0)
return PyUnicode_AsASCIIString(unicode);
}
/* Encode via the codec registry */
v = PyCodec_Encode(unicode, encoding, errors);
if (v == NULL)
goto onError;
if (!PyString_Check(v)) {
PyErr_Format(PyExc_TypeError,
"encoder did not return a string object (type=%.400s)",
v->ob_type->tp_name);
Py_DECREF(v);
goto onError;
}
return v;
onError:
return NULL;
}
PyObject *_PyUnicode_AsDefaultEncodedString(PyObject *unicode,
const char *errors)
{
PyObject *v = ((PyUnicodeObject *)unicode)->defenc;
if (v)
return v;
v = PyUnicode_AsEncodedString(unicode, NULL, errors);
if (v && errors == NULL)
((PyUnicodeObject *)unicode)->defenc = v;
return v;
}
Py_UNICODE *PyUnicode_AsUnicode(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
goto onError;
}
return PyUnicode_AS_UNICODE(unicode);
onError:
return NULL;
}
Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
goto onError;
}
return PyUnicode_GET_SIZE(unicode);
onError:
return -1;
}
const char *PyUnicode_GetDefaultEncoding(void)
{
return unicode_default_encoding;
}
int PyUnicode_SetDefaultEncoding(const char *encoding)
{
PyObject *v;
/* Make sure the encoding is valid. As side effect, this also
loads the encoding into the codec registry cache. */
v = _PyCodec_Lookup(encoding);
if (v == NULL)
goto onError;
Py_DECREF(v);
strncpy(unicode_default_encoding,
encoding,
sizeof(unicode_default_encoding));
return 0;
onError:
return -1;
}
/* error handling callback helper:
build arguments, call the callback and check the arguments,
if no exception occurred, copy the replacement to the output
and adjust various state variables.
return 0 on success, -1 on error
*/
static
int unicode_decode_call_errorhandler(const char *errors, PyObject **errorHandler,
const char *encoding, const char *reason,
const char *input, Py_ssize_t insize, Py_ssize_t *startinpos, Py_ssize_t *endinpos, PyObject **exceptionObject, const char **inptr,
PyObject **output, Py_ssize_t *outpos, Py_UNICODE **outptr)
{
static char *argparse = "O!n;decoding error handler must return (unicode, int) tuple";
PyObject *restuple = NULL;
PyObject *repunicode = NULL;
Py_ssize_t outsize = PyUnicode_GET_SIZE(*output);
Py_ssize_t requiredsize;
Py_ssize_t newpos;
Py_UNICODE *repptr;
Py_ssize_t repsize;
int res = -1;
if (*errorHandler == NULL) {
*errorHandler = PyCodec_LookupError(errors);
if (*errorHandler == NULL)
goto onError;
}
if (*exceptionObject == NULL) {
*exceptionObject = PyUnicodeDecodeError_Create(
encoding, input, insize, *startinpos, *endinpos, reason);
if (*exceptionObject == NULL)
goto onError;
}
else {
if (PyUnicodeDecodeError_SetStart(*exceptionObject, *startinpos))
goto onError;
if (PyUnicodeDecodeError_SetEnd(*exceptionObject, *endinpos))
goto onError;
if (PyUnicodeDecodeError_SetReason(*exceptionObject, reason))
goto onError;
}
restuple = PyObject_CallFunctionObjArgs(*errorHandler, *exceptionObject, NULL);
if (restuple == NULL)
goto onError;
if (!PyTuple_Check(restuple)) {
PyErr_Format(PyExc_TypeError, &argparse[4]);
goto onError;
}
if (!PyArg_ParseTuple(restuple, argparse, &PyUnicode_Type, &repunicode, &newpos))
goto onError;
if (newpos<0)
newpos = insize+newpos;
if (newpos<0 || newpos>insize) {
PyErr_Format(PyExc_IndexError, "position %zd from error handler out of bounds", newpos);
goto onError;
}
/* need more space? (at least enough for what we
have+the replacement+the rest of the string (starting
at the new input position), so we won't have to check space
when there are no errors in the rest of the string) */
repptr = PyUnicode_AS_UNICODE(repunicode);
repsize = PyUnicode_GET_SIZE(repunicode);
requiredsize = *outpos + repsize + insize-newpos;
if (requiredsize > outsize) {
if (requiredsize<2*outsize)
requiredsize = 2*outsize;
if (PyUnicode_Resize(output, requiredsize) < 0)
goto onError;
*outptr = PyUnicode_AS_UNICODE(*output) + *outpos;
}
*endinpos = newpos;
*inptr = input + newpos;
Py_UNICODE_COPY(*outptr, repptr, repsize);
*outptr += repsize;
*outpos += repsize;
/* we made it! */
res = 0;
onError:
Py_XDECREF(restuple);
return res;
}
/* --- UTF-7 Codec -------------------------------------------------------- */
/* see RFC2152 for details */
static
char utf7_special[128] = {
/* indicate whether a UTF-7 character is special i.e. cannot be directly
encoded:
0 - not special
1 - special
2 - whitespace (optional)
3 - RFC2152 Set O (optional) */
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 3, 3, 3, 3, 3, 3, 0, 0, 0, 3, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0,
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 3, 3, 3,
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 1, 1,
};
/* Note: The comparison (c) <= 0 is a trick to work-around gcc
warnings about the comparison always being false; since
utf7_special[0] is 1, we can safely make that one comparison
true */
#define SPECIAL(c, encodeO, encodeWS) \
((c) > 127 || (c) <= 0 || utf7_special[(c)] == 1 || \
(encodeWS && (utf7_special[(c)] == 2)) || \
(encodeO && (utf7_special[(c)] == 3)))
#define B64(n) \
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[(n) & 0x3f])
#define B64CHAR(c) \
(isalnum(c) || (c) == '+' || (c) == '/')
#define UB64(c) \
((c) == '+' ? 62 : (c) == '/' ? 63 : (c) >= 'a' ? \
(c) - 71 : (c) >= 'A' ? (c) - 65 : (c) + 4 )
#define ENCODE(out, ch, bits) \
while (bits >= 6) { \
*out++ = B64(ch >> (bits-6)); \
bits -= 6; \
}
#define DECODE(out, ch, bits, surrogate) \
while (bits >= 16) { \
Py_UNICODE outCh = (Py_UNICODE) ((ch >> (bits-16)) & 0xffff); \
bits -= 16; \
if (surrogate) { \
/* We have already generated an error for the high surrogate \
so let's not bother seeing if the low surrogate is correct or not */ \
surrogate = 0; \
} else if (0xDC00 <= outCh && outCh <= 0xDFFF) { \
/* This is a surrogate pair. Unfortunately we can't represent \
it in a 16-bit character */ \
surrogate = 1; \
errmsg = "code pairs are not supported"; \
goto utf7Error; \
} else { \
*out++ = outCh; \
} \
}
PyObject *PyUnicode_DecodeUTF7(const char *s,
Py_ssize_t size,
const char *errors)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
const char *e;
PyUnicodeObject *unicode;
Py_UNICODE *p;
const char *errmsg = "";
int inShift = 0;
unsigned int bitsleft = 0;
unsigned long charsleft = 0;
int surrogate = 0;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
unicode = _PyUnicode_New(size);
if (!unicode)
return NULL;
if (size == 0)
return (PyObject *)unicode;
p = unicode->str;
e = s + size;
while (s < e) {
Py_UNICODE ch;
restart:
ch = *s;
if (inShift) {
if ((ch == '-') || !B64CHAR(ch)) {
inShift = 0;
s++;
/* p, charsleft, bitsleft, surrogate = */ DECODE(p, charsleft, bitsleft, surrogate);
if (bitsleft >= 6) {
/* The shift sequence has a partial character in it. If
bitsleft < 6 then we could just classify it as padding
but that is not the case here */
errmsg = "partial character in shift sequence";
goto utf7Error;
}
/* According to RFC2152 the remaining bits should be zero. We
choose to signal an error/insert a replacement character
here so indicate the potential of a misencoded character. */
/* On x86, a << b == a << (b%32) so make sure that bitsleft != 0 */
if (bitsleft && charsleft << (sizeof(charsleft) * 8 - bitsleft)) {
errmsg = "non-zero padding bits in shift sequence";
goto utf7Error;
}
if (ch == '-') {
if ((s < e) && (*(s) == '-')) {
*p++ = '-';
inShift = 1;
}
} else if (SPECIAL(ch,0,0)) {
errmsg = "unexpected special character";
goto utf7Error;
} else {
*p++ = ch;
}
} else {
charsleft = (charsleft << 6) | UB64(ch);
bitsleft += 6;
s++;
/* p, charsleft, bitsleft, surrogate = */ DECODE(p, charsleft, bitsleft, surrogate);
}
}
else if ( ch == '+' ) {
startinpos = s-starts;
s++;
if (s < e && *s == '-') {
s++;
*p++ = '+';
} else
{
inShift = 1;
bitsleft = 0;
}
}
else if (SPECIAL(ch,0,0)) {
errmsg = "unexpected special character";
s++;
goto utf7Error;
}
else {
*p++ = ch;
s++;
}
continue;
utf7Error:
outpos = p-PyUnicode_AS_UNICODE(unicode);
endinpos = s-starts;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"utf7", errmsg,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&unicode, &outpos, &p))
goto onError;
}
if (inShift) {
outpos = p-PyUnicode_AS_UNICODE(unicode);
endinpos = size;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"utf7", "unterminated shift sequence",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&unicode, &outpos, &p))
goto onError;
if (s < e)
goto restart;
}
if (_PyUnicode_Resize(&unicode, p - PyUnicode_AS_UNICODE(unicode)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)unicode;
onError:
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
Py_DECREF(unicode);
return NULL;
}
PyObject *PyUnicode_EncodeUTF7(const Py_UNICODE *s,
Py_ssize_t size,
int encodeSetO,
int encodeWhiteSpace,
const char *errors)
{
PyObject *v;
/* It might be possible to tighten this worst case */
Py_ssize_t cbAllocated = 5 * size;
int inShift = 0;
Py_ssize_t i = 0;
unsigned int bitsleft = 0;
unsigned long charsleft = 0;
char * out;
char * start;
if (size == 0)
return PyString_FromStringAndSize(NULL, 0);
v = PyString_FromStringAndSize(NULL, cbAllocated);
if (v == NULL)
return NULL;
start = out = PyString_AS_STRING(v);
for (;i < size; ++i) {
Py_UNICODE ch = s[i];
if (!inShift) {
if (ch == '+') {
*out++ = '+';
*out++ = '-';
} else if (SPECIAL(ch, encodeSetO, encodeWhiteSpace)) {
charsleft = ch;
bitsleft = 16;
*out++ = '+';
/* out, charsleft, bitsleft = */ ENCODE(out, charsleft, bitsleft);
inShift = bitsleft > 0;
} else {
*out++ = (char) ch;
}
} else {
if (!SPECIAL(ch, encodeSetO, encodeWhiteSpace)) {
*out++ = B64(charsleft << (6-bitsleft));
charsleft = 0;
bitsleft = 0;
/* Characters not in the BASE64 set implicitly unshift the sequence
so no '-' is required, except if the character is itself a '-' */
if (B64CHAR(ch) || ch == '-') {
*out++ = '-';
}
inShift = 0;
*out++ = (char) ch;
} else {
bitsleft += 16;
charsleft = (charsleft << 16) | ch;
/* out, charsleft, bitsleft = */ ENCODE(out, charsleft, bitsleft);
/* If the next character is special then we dont' need to terminate
the shift sequence. If the next character is not a BASE64 character
or '-' then the shift sequence will be terminated implicitly and we
don't have to insert a '-'. */
if (bitsleft == 0) {
if (i + 1 < size) {
Py_UNICODE ch2 = s[i+1];
if (SPECIAL(ch2, encodeSetO, encodeWhiteSpace)) {
} else if (B64CHAR(ch2) || ch2 == '-') {
*out++ = '-';
inShift = 0;
} else {
inShift = 0;
}
}
else {
*out++ = '-';
inShift = 0;
}
}
}
}
}
if (bitsleft) {
*out++= B64(charsleft << (6-bitsleft) );
*out++ = '-';
}
_PyString_Resize(&v, out - start);
return v;
}
#undef SPECIAL
#undef B64
#undef B64CHAR
#undef UB64
#undef ENCODE
#undef DECODE
/* --- UTF-8 Codec -------------------------------------------------------- */
static
char utf8_code_length[256] = {
/* Map UTF-8 encoded prefix byte to sequence length. zero means
illegal prefix. see RFC 2279 for details */
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0, 0
};
PyObject *PyUnicode_DecodeUTF8(const char *s,
Py_ssize_t size,
const char *errors)
{
return PyUnicode_DecodeUTF8Stateful(s, size, errors, NULL);
}
PyObject *PyUnicode_DecodeUTF8Stateful(const char *s,
Py_ssize_t size,
const char *errors,
Py_ssize_t *consumed)
{
const char *starts = s;
int n;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
const char *e;
PyUnicodeObject *unicode;
Py_UNICODE *p;
const char *errmsg = "";
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* Note: size will always be longer than the resulting Unicode
character count */
unicode = _PyUnicode_New(size);
if (!unicode)
return NULL;
if (size == 0) {
if (consumed)
*consumed = 0;
return (PyObject *)unicode;
}
/* Unpack UTF-8 encoded data */
p = unicode->str;
e = s + size;
while (s < e) {
Py_UCS4 ch = (unsigned char)*s;
if (ch < 0x80) {
*p++ = (Py_UNICODE)ch;
s++;
continue;
}
n = utf8_code_length[ch];
if (s + n > e) {
if (consumed)
break;
else {
errmsg = "unexpected end of data";
startinpos = s-starts;
endinpos = size;
goto utf8Error;
}
}
switch (n) {
case 0:
errmsg = "unexpected code byte";
startinpos = s-starts;
endinpos = startinpos+1;
goto utf8Error;
case 1:
errmsg = "internal error";
startinpos = s-starts;
endinpos = startinpos+1;
goto utf8Error;
case 2:
if ((s[1] & 0xc0) != 0x80) {
errmsg = "invalid data";
startinpos = s-starts;
endinpos = startinpos+2;
goto utf8Error;
}
ch = ((s[0] & 0x1f) << 6) + (s[1] & 0x3f);
if (ch < 0x80) {
startinpos = s-starts;
endinpos = startinpos+2;
errmsg = "illegal encoding";
goto utf8Error;
}
else
*p++ = (Py_UNICODE)ch;
break;
case 3:
if ((s[1] & 0xc0) != 0x80 ||
(s[2] & 0xc0) != 0x80) {
errmsg = "invalid data";
startinpos = s-starts;
endinpos = startinpos+3;
goto utf8Error;
}
ch = ((s[0] & 0x0f) << 12) + ((s[1] & 0x3f) << 6) + (s[2] & 0x3f);
if (ch < 0x0800) {
/* Note: UTF-8 encodings of surrogates are considered
legal UTF-8 sequences;
XXX For wide builds (UCS-4) we should probably try
to recombine the surrogates into a single code
unit.
*/
errmsg = "illegal encoding";
startinpos = s-starts;
endinpos = startinpos+3;
goto utf8Error;
}
else
*p++ = (Py_UNICODE)ch;
break;
case 4:
if ((s[1] & 0xc0) != 0x80 ||
(s[2] & 0xc0) != 0x80 ||
(s[3] & 0xc0) != 0x80) {
errmsg = "invalid data";
startinpos = s-starts;
endinpos = startinpos+4;
goto utf8Error;
}
ch = ((s[0] & 0x7) << 18) + ((s[1] & 0x3f) << 12) +
((s[2] & 0x3f) << 6) + (s[3] & 0x3f);
/* validate and convert to UTF-16 */
if ((ch < 0x10000) /* minimum value allowed for 4
byte encoding */
|| (ch > 0x10ffff)) /* maximum value allowed for
UTF-16 */
{
errmsg = "illegal encoding";
startinpos = s-starts;
endinpos = startinpos+4;
goto utf8Error;
}
#ifdef Py_UNICODE_WIDE
*p++ = (Py_UNICODE)ch;
#else
/* compute and append the two surrogates: */
/* translate from 10000..10FFFF to 0..FFFF */
ch -= 0x10000;
/* high surrogate = top 10 bits added to D800 */
*p++ = (Py_UNICODE)(0xD800 + (ch >> 10));
/* low surrogate = bottom 10 bits added to DC00 */
*p++ = (Py_UNICODE)(0xDC00 + (ch & 0x03FF));
#endif
break;
default:
/* Other sizes are only needed for UCS-4 */
errmsg = "unsupported Unicode code range";
startinpos = s-starts;
endinpos = startinpos+n;
goto utf8Error;
}
s += n;
continue;
utf8Error:
outpos = p-PyUnicode_AS_UNICODE(unicode);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"utf8", errmsg,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&unicode, &outpos, &p))
goto onError;
}
if (consumed)
*consumed = s-starts;
/* Adjust length */
if (_PyUnicode_Resize(&unicode, p - unicode->str) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)unicode;
onError:
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
Py_DECREF(unicode);
return NULL;
}
/* Allocation strategy: if the string is short, convert into a stack buffer
and allocate exactly as much space needed at the end. Else allocate the
maximum possible needed (4 result bytes per Unicode character), and return
the excess memory at the end.
*/
PyObject *
PyUnicode_EncodeUTF8(const Py_UNICODE *s,
Py_ssize_t size,
const char *errors)
{
#define MAX_SHORT_UNICHARS 300 /* largest size we'll do on the stack */
Py_ssize_t i; /* index into s of next input byte */
PyObject *v; /* result string object */
char *p; /* next free byte in output buffer */
Py_ssize_t nallocated; /* number of result bytes allocated */
Py_ssize_t nneeded; /* number of result bytes needed */
char stackbuf[MAX_SHORT_UNICHARS * 4];
assert(s != NULL);
assert(size >= 0);
if (size <= MAX_SHORT_UNICHARS) {
/* Write into the stack buffer; nallocated can't overflow.
* At the end, we'll allocate exactly as much heap space as it
* turns out we need.
*/
nallocated = Py_SAFE_DOWNCAST(sizeof(stackbuf), size_t, int);
v = NULL; /* will allocate after we're done */
p = stackbuf;
}
else {
/* Overallocate on the heap, and give the excess back at the end. */
nallocated = size * 4;
if (nallocated / 4 != size) /* overflow! */
return PyErr_NoMemory();
v = PyString_FromStringAndSize(NULL, nallocated);
if (v == NULL)
return NULL;
p = PyString_AS_STRING(v);
}
for (i = 0; i < size;) {
Py_UCS4 ch = s[i++];
if (ch < 0x80)
/* Encode ASCII */
*p++ = (char) ch;
else if (ch < 0x0800) {
/* Encode Latin-1 */
*p++ = (char)(0xc0 | (ch >> 6));
*p++ = (char)(0x80 | (ch & 0x3f));
}
else {
/* Encode UCS2 Unicode ordinals */
if (ch < 0x10000) {
/* Special case: check for high surrogate */
if (0xD800 <= ch && ch <= 0xDBFF && i != size) {
Py_UCS4 ch2 = s[i];
/* Check for low surrogate and combine the two to
form a UCS4 value */
if (0xDC00 <= ch2 && ch2 <= 0xDFFF) {
ch = ((ch - 0xD800) << 10 | (ch2 - 0xDC00)) + 0x10000;
i++;
goto encodeUCS4;
}
/* Fall through: handles isolated high surrogates */
}
*p++ = (char)(0xe0 | (ch >> 12));
*p++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*p++ = (char)(0x80 | (ch & 0x3f));
continue;
}
encodeUCS4:
/* Encode UCS4 Unicode ordinals */
*p++ = (char)(0xf0 | (ch >> 18));
*p++ = (char)(0x80 | ((ch >> 12) & 0x3f));
*p++ = (char)(0x80 | ((ch >> 6) & 0x3f));
*p++ = (char)(0x80 | (ch & 0x3f));
}
}
if (v == NULL) {
/* This was stack allocated. */
nneeded = p - stackbuf;
assert(nneeded <= nallocated);
v = PyString_FromStringAndSize(stackbuf, nneeded);
}
else {
/* Cut back to size actually needed. */
nneeded = p - PyString_AS_STRING(v);
assert(nneeded <= nallocated);
_PyString_Resize(&v, nneeded);
}
return v;
#undef MAX_SHORT_UNICHARS
}
PyObject *PyUnicode_AsUTF8String(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
NULL);
}
/* --- UTF-16 Codec ------------------------------------------------------- */
PyObject *
PyUnicode_DecodeUTF16(const char *s,
Py_ssize_t size,
const char *errors,
int *byteorder)
{
return PyUnicode_DecodeUTF16Stateful(s, size, errors, byteorder, NULL);
}
PyObject *
PyUnicode_DecodeUTF16Stateful(const char *s,
Py_ssize_t size,
const char *errors,
int *byteorder,
Py_ssize_t *consumed)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
PyUnicodeObject *unicode;
Py_UNICODE *p;
const unsigned char *q, *e;
int bo = 0; /* assume native ordering by default */
const char *errmsg = "";
/* Offsets from q for retrieving byte pairs in the right order. */
#ifdef BYTEORDER_IS_LITTLE_ENDIAN
int ihi = 1, ilo = 0;
#else
int ihi = 0, ilo = 1;
#endif
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* Note: size will always be longer than the resulting Unicode
character count */
unicode = _PyUnicode_New(size);
if (!unicode)
return NULL;
if (size == 0)
return (PyObject *)unicode;
/* Unpack UTF-16 encoded data */
p = unicode->str;
q = (unsigned char *)s;
e = q + size;
if (byteorder)
bo = *byteorder;
/* Check for BOM marks (U+FEFF) in the input and adjust current
byte order setting accordingly. In native mode, the leading BOM
mark is skipped, in all other modes, it is copied to the output
stream as-is (giving a ZWNBSP character). */
if (bo == 0) {
if (size >= 2) {
const Py_UNICODE bom = (q[ihi] << 8) | q[ilo];
#ifdef BYTEORDER_IS_LITTLE_ENDIAN
if (bom == 0xFEFF) {
q += 2;
bo = -1;
}
else if (bom == 0xFFFE) {
q += 2;
bo = 1;
}
#else
if (bom == 0xFEFF) {
q += 2;
bo = 1;
}
else if (bom == 0xFFFE) {
q += 2;
bo = -1;
}
#endif
}
}
if (bo == -1) {
/* force LE */
ihi = 1;
ilo = 0;
}
else if (bo == 1) {
/* force BE */
ihi = 0;
ilo = 1;
}
while (q < e) {
Py_UNICODE ch;
/* remaining bytes at the end? (size should be even) */
if (e-q<2) {
if (consumed)
break;
errmsg = "truncated data";
startinpos = ((const char *)q)-starts;
endinpos = ((const char *)e)-starts;
goto utf16Error;
/* The remaining input chars are ignored if the callback
chooses to skip the input */
}
ch = (q[ihi] << 8) | q[ilo];
q += 2;
if (ch < 0xD800 || ch > 0xDFFF) {
*p++ = ch;
continue;
}
/* UTF-16 code pair: */
if (q >= e) {
errmsg = "unexpected end of data";
startinpos = (((const char *)q)-2)-starts;
endinpos = ((const char *)e)-starts;
goto utf16Error;
}
if (0xD800 <= ch && ch <= 0xDBFF) {
Py_UNICODE ch2 = (q[ihi] << 8) | q[ilo];
q += 2;
if (0xDC00 <= ch2 && ch2 <= 0xDFFF) {
#ifndef Py_UNICODE_WIDE
*p++ = ch;
*p++ = ch2;
#else
*p++ = (((ch & 0x3FF)<<10) | (ch2 & 0x3FF)) + 0x10000;
#endif
continue;
}
else {
errmsg = "illegal UTF-16 surrogate";
startinpos = (((const char *)q)-4)-starts;
endinpos = startinpos+2;
goto utf16Error;
}
}
errmsg = "illegal encoding";
startinpos = (((const char *)q)-2)-starts;
endinpos = startinpos+2;
/* Fall through to report the error */
utf16Error:
outpos = p-PyUnicode_AS_UNICODE(unicode);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"utf16", errmsg,
starts, size, &startinpos, &endinpos, &exc, (const char **)&q,
(PyObject **)&unicode, &outpos, &p))
goto onError;
}
if (byteorder)
*byteorder = bo;
if (consumed)
*consumed = (const char *)q-starts;
/* Adjust length */
if (_PyUnicode_Resize(&unicode, p - unicode->str) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)unicode;
onError:
Py_DECREF(unicode);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
PyObject *
PyUnicode_EncodeUTF16(const Py_UNICODE *s,
Py_ssize_t size,
const char *errors,
int byteorder)
{
PyObject *v;
unsigned char *p;
#ifdef Py_UNICODE_WIDE
int i, pairs;
#else
const int pairs = 0;
#endif
/* Offsets from p for storing byte pairs in the right order. */
#ifdef BYTEORDER_IS_LITTLE_ENDIAN
int ihi = 1, ilo = 0;
#else
int ihi = 0, ilo = 1;
#endif
#define STORECHAR(CH) \
do { \
p[ihi] = ((CH) >> 8) & 0xff; \
p[ilo] = (CH) & 0xff; \
p += 2; \
} while(0)
#ifdef Py_UNICODE_WIDE
for (i = pairs = 0; i < size; i++)
if (s[i] >= 0x10000)
pairs++;
#endif
v = PyString_FromStringAndSize(NULL,
2 * (size + pairs + (byteorder == 0)));
if (v == NULL)
return NULL;
p = (unsigned char *)PyString_AS_STRING(v);
if (byteorder == 0)
STORECHAR(0xFEFF);
if (size == 0)
return v;
if (byteorder == -1) {
/* force LE */
ihi = 1;
ilo = 0;
}
else if (byteorder == 1) {
/* force BE */
ihi = 0;
ilo = 1;
}
while (size-- > 0) {
Py_UNICODE ch = *s++;
Py_UNICODE ch2 = 0;
#ifdef Py_UNICODE_WIDE
if (ch >= 0x10000) {
ch2 = 0xDC00 | ((ch-0x10000) & 0x3FF);
ch = 0xD800 | ((ch-0x10000) >> 10);
}
#endif
STORECHAR(ch);
if (ch2)
STORECHAR(ch2);
}
return v;
#undef STORECHAR
}
PyObject *PyUnicode_AsUTF16String(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeUTF16(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
NULL,
0);
}
/* --- Unicode Escape Codec ----------------------------------------------- */
static _PyUnicode_Name_CAPI *ucnhash_CAPI = NULL;
PyObject *PyUnicode_DecodeUnicodeEscape(const char *s,
Py_ssize_t size,
const char *errors)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
int i;
PyUnicodeObject *v;
Py_UNICODE *p;
const char *end;
char* message;
Py_UCS4 chr = 0xffffffff; /* in case 'getcode' messes up */
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* Escaped strings will always be longer than the resulting
Unicode string, so we start with size here and then reduce the
length after conversion to the true value.
(but if the error callback returns a long replacement string
we'll have to allocate more space) */
v = _PyUnicode_New(size);
if (v == NULL)
goto onError;
if (size == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
end = s + size;
while (s < end) {
unsigned char c;
Py_UNICODE x;
int digits;
/* Non-escape characters are interpreted as Unicode ordinals */
if (*s != '\\') {
*p++ = (unsigned char) *s++;
continue;
}
startinpos = s-starts;
/* \ - Escapes */
s++;
switch (*s++) {
/* \x escapes */
case '\n': break;
case '\\': *p++ = '\\'; break;
case '\'': *p++ = '\''; break;
case '\"': *p++ = '\"'; break;
case 'b': *p++ = '\b'; break;
case 'f': *p++ = '\014'; break; /* FF */
case 't': *p++ = '\t'; break;
case 'n': *p++ = '\n'; break;
case 'r': *p++ = '\r'; break;
case 'v': *p++ = '\013'; break; /* VT */
case 'a': *p++ = '\007'; break; /* BEL, not classic C */
/* \OOO (octal) escapes */
case '0': case '1': case '2': case '3':
case '4': case '5': case '6': case '7':
x = s[-1] - '0';
if ('0' <= *s && *s <= '7') {
x = (x<<3) + *s++ - '0';
if ('0' <= *s && *s <= '7')
x = (x<<3) + *s++ - '0';
}
*p++ = x;
break;
/* hex escapes */
/* \xXX */
case 'x':
digits = 2;
message = "truncated \\xXX escape";
goto hexescape;
/* \uXXXX */
case 'u':
digits = 4;
message = "truncated \\uXXXX escape";
goto hexescape;
/* \UXXXXXXXX */
case 'U':
digits = 8;
message = "truncated \\UXXXXXXXX escape";
hexescape:
chr = 0;
outpos = p-PyUnicode_AS_UNICODE(v);
if (s+digits>end) {
endinpos = size;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicodeescape", "end of string in escape sequence",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
goto nextByte;
}
for (i = 0; i < digits; ++i) {
c = (unsigned char) s[i];
if (!isxdigit(c)) {
endinpos = (s+i+1)-starts;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicodeescape", message,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
goto nextByte;
}
chr = (chr<<4) & ~0xF;
if (c >= '0' && c <= '9')
chr += c - '0';
else if (c >= 'a' && c <= 'f')
chr += 10 + c - 'a';
else
chr += 10 + c - 'A';
}
s += i;
if (chr == 0xffffffff && PyErr_Occurred())
/* _decoding_error will have already written into the
target buffer. */
break;
store:
/* when we get here, chr is a 32-bit unicode character */
if (chr <= 0xffff)
/* UCS-2 character */
*p++ = (Py_UNICODE) chr;
else if (chr <= 0x10ffff) {
/* UCS-4 character. Either store directly, or as
surrogate pair. */
#ifdef Py_UNICODE_WIDE
*p++ = chr;
#else
chr -= 0x10000L;
*p++ = 0xD800 + (Py_UNICODE) (chr >> 10);
*p++ = 0xDC00 + (Py_UNICODE) (chr & 0x03FF);
#endif
} else {
endinpos = s-starts;
outpos = p-PyUnicode_AS_UNICODE(v);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicodeescape", "illegal Unicode character",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
}
break;
/* \N{name} */
case 'N':
message = "malformed \\N character escape";
if (ucnhash_CAPI == NULL) {
/* load the unicode data module */
PyObject *m, *api;
m = PyImport_ImportModule("unicodedata");
if (m == NULL)
goto ucnhashError;
api = PyObject_GetAttrString(m, "ucnhash_CAPI");
Py_DECREF(m);
if (api == NULL)
goto ucnhashError;
ucnhash_CAPI = (_PyUnicode_Name_CAPI *)PyCObject_AsVoidPtr(api);
Py_DECREF(api);
if (ucnhash_CAPI == NULL)
goto ucnhashError;
}
if (*s == '{') {
const char *start = s+1;
/* look for the closing brace */
while (*s != '}' && s < end)
s++;
if (s > start && s < end && *s == '}') {
/* found a name. look it up in the unicode database */
message = "unknown Unicode character name";
s++;
if (ucnhash_CAPI->getcode(NULL, start, (int)(s-start-1), &chr))
goto store;
}
}
endinpos = s-starts;
outpos = p-PyUnicode_AS_UNICODE(v);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicodeescape", message,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
break;
default:
if (s > end) {
message = "\\ at end of string";
s--;
endinpos = s-starts;
outpos = p-PyUnicode_AS_UNICODE(v);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicodeescape", message,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
}
else {
*p++ = '\\';
*p++ = (unsigned char)s[-1];
}
break;
}
nextByte:
;
}
if (_PyUnicode_Resize(&v, p - PyUnicode_AS_UNICODE(v)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)v;
ucnhashError:
PyErr_SetString(
PyExc_UnicodeError,
"\\N escapes not supported (can't load unicodedata module)"
);
Py_XDECREF(v);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
onError:
Py_XDECREF(v);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
/* Return a Unicode-Escape string version of the Unicode object.
If quotes is true, the string is enclosed in u"" or u'' quotes as
appropriate.
*/
Py_LOCAL_INLINE(const Py_UNICODE *) findchar(const Py_UNICODE *s,
Py_ssize_t size,
Py_UNICODE ch)
{
/* like wcschr, but doesn't stop at NULL characters */
while (size-- > 0) {
if (*s == ch)
return s;
s++;
}
return NULL;
}
static
PyObject *unicodeescape_string(const Py_UNICODE *s,
Py_ssize_t size,
int quotes)
{
PyObject *repr;
char *p;
static const char *hexdigit = "0123456789abcdef";
repr = PyString_FromStringAndSize(NULL, 2 + 6*size + 1);
if (repr == NULL)
return NULL;
p = PyString_AS_STRING(repr);
if (quotes) {
*p++ = 'u';
*p++ = (findchar(s, size, '\'') &&
!findchar(s, size, '"')) ? '"' : '\'';
}
while (size-- > 0) {
Py_UNICODE ch = *s++;
/* Escape quotes and backslashes */
if ((quotes &&
ch == (Py_UNICODE) PyString_AS_STRING(repr)[1]) || ch == '\\') {
*p++ = '\\';
*p++ = (char) ch;
continue;
}
#ifdef Py_UNICODE_WIDE
/* Map 21-bit characters to '\U00xxxxxx' */
else if (ch >= 0x10000) {
Py_ssize_t offset = p - PyString_AS_STRING(repr);
/* Resize the string if necessary */
if (offset + 12 > PyString_GET_SIZE(repr)) {
if (_PyString_Resize(&repr, PyString_GET_SIZE(repr) + 100))
return NULL;
p = PyString_AS_STRING(repr) + offset;
}
*p++ = '\\';
*p++ = 'U';
*p++ = hexdigit[(ch >> 28) & 0x0000000F];
*p++ = hexdigit[(ch >> 24) & 0x0000000F];
*p++ = hexdigit[(ch >> 20) & 0x0000000F];
*p++ = hexdigit[(ch >> 16) & 0x0000000F];
*p++ = hexdigit[(ch >> 12) & 0x0000000F];
*p++ = hexdigit[(ch >> 8) & 0x0000000F];
*p++ = hexdigit[(ch >> 4) & 0x0000000F];
*p++ = hexdigit[ch & 0x0000000F];
continue;
}
#endif
/* Map UTF-16 surrogate pairs to Unicode \UXXXXXXXX escapes */
else if (ch >= 0xD800 && ch < 0xDC00) {
Py_UNICODE ch2;
Py_UCS4 ucs;
ch2 = *s++;
size--;
if (ch2 >= 0xDC00 && ch2 <= 0xDFFF) {
ucs = (((ch & 0x03FF) << 10) | (ch2 & 0x03FF)) + 0x00010000;
*p++ = '\\';
*p++ = 'U';
*p++ = hexdigit[(ucs >> 28) & 0x0000000F];
*p++ = hexdigit[(ucs >> 24) & 0x0000000F];
*p++ = hexdigit[(ucs >> 20) & 0x0000000F];
*p++ = hexdigit[(ucs >> 16) & 0x0000000F];
*p++ = hexdigit[(ucs >> 12) & 0x0000000F];
*p++ = hexdigit[(ucs >> 8) & 0x0000000F];
*p++ = hexdigit[(ucs >> 4) & 0x0000000F];
*p++ = hexdigit[ucs & 0x0000000F];
continue;
}
/* Fall through: isolated surrogates are copied as-is */
s--;
size++;
}
/* Map 16-bit characters to '\uxxxx' */
if (ch >= 256) {
*p++ = '\\';
*p++ = 'u';
*p++ = hexdigit[(ch >> 12) & 0x000F];
*p++ = hexdigit[(ch >> 8) & 0x000F];
*p++ = hexdigit[(ch >> 4) & 0x000F];
*p++ = hexdigit[ch & 0x000F];
}
/* Map special whitespace to '\t', \n', '\r' */
else if (ch == '\t') {
*p++ = '\\';
*p++ = 't';
}
else if (ch == '\n') {
*p++ = '\\';
*p++ = 'n';
}
else if (ch == '\r') {
*p++ = '\\';
*p++ = 'r';
}
/* Map non-printable US ASCII to '\xhh' */
else if (ch < ' ' || ch >= 0x7F) {
*p++ = '\\';
*p++ = 'x';
*p++ = hexdigit[(ch >> 4) & 0x000F];
*p++ = hexdigit[ch & 0x000F];
}
/* Copy everything else as-is */
else
*p++ = (char) ch;
}
if (quotes)
*p++ = PyString_AS_STRING(repr)[1];
*p = '\0';
_PyString_Resize(&repr, p - PyString_AS_STRING(repr));
return repr;
}
PyObject *PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s,
Py_ssize_t size)
{
return unicodeescape_string(s, size, 0);
}
PyObject *PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeUnicodeEscape(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode));
}
/* --- Raw Unicode Escape Codec ------------------------------------------- */
PyObject *PyUnicode_DecodeRawUnicodeEscape(const char *s,
Py_ssize_t size,
const char *errors)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
PyUnicodeObject *v;
Py_UNICODE *p;
const char *end;
const char *bs;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* Escaped strings will always be longer than the resulting
Unicode string, so we start with size here and then reduce the
length after conversion to the true value. (But decoding error
handler might have to resize the string) */
v = _PyUnicode_New(size);
if (v == NULL)
goto onError;
if (size == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
end = s + size;
while (s < end) {
unsigned char c;
Py_UCS4 x;
int i;
int count;
/* Non-escape characters are interpreted as Unicode ordinals */
if (*s != '\\') {
*p++ = (unsigned char)*s++;
continue;
}
startinpos = s-starts;
/* \u-escapes are only interpreted iff the number of leading
backslashes if odd */
bs = s;
for (;s < end;) {
if (*s != '\\')
break;
*p++ = (unsigned char)*s++;
}
if (((s - bs) & 1) == 0 ||
s >= end ||
(*s != 'u' && *s != 'U')) {
continue;
}
p--;
count = *s=='u' ? 4 : 8;
s++;
/* \uXXXX with 4 hex digits, \Uxxxxxxxx with 8 */
outpos = p-PyUnicode_AS_UNICODE(v);
for (x = 0, i = 0; i < count; ++i, ++s) {
c = (unsigned char)*s;
if (!isxdigit(c)) {
endinpos = s-starts;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"rawunicodeescape", "truncated \\uXXXX",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
goto nextByte;
}
x = (x<<4) & ~0xF;
if (c >= '0' && c <= '9')
x += c - '0';
else if (c >= 'a' && c <= 'f')
x += 10 + c - 'a';
else
x += 10 + c - 'A';
}
#ifndef Py_UNICODE_WIDE
if (x > 0x10000) {
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"rawunicodeescape", "\\Uxxxxxxxx out of range",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
}
#endif
*p++ = x;
nextByte:
;
}
if (_PyUnicode_Resize(&v, p - PyUnicode_AS_UNICODE(v)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)v;
onError:
Py_XDECREF(v);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
PyObject *PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s,
Py_ssize_t size)
{
PyObject *repr;
char *p;
char *q;
static const char *hexdigit = "0123456789abcdef";
#ifdef Py_UNICODE_WIDE
repr = PyString_FromStringAndSize(NULL, 10 * size);
#else
repr = PyString_FromStringAndSize(NULL, 6 * size);
#endif
if (repr == NULL)
return NULL;
if (size == 0)
return repr;
p = q = PyString_AS_STRING(repr);
while (size-- > 0) {
Py_UNICODE ch = *s++;
#ifdef Py_UNICODE_WIDE
/* Map 32-bit characters to '\Uxxxxxxxx' */
if (ch >= 0x10000) {
*p++ = '\\';
*p++ = 'U';
*p++ = hexdigit[(ch >> 28) & 0xf];
*p++ = hexdigit[(ch >> 24) & 0xf];
*p++ = hexdigit[(ch >> 20) & 0xf];
*p++ = hexdigit[(ch >> 16) & 0xf];
*p++ = hexdigit[(ch >> 12) & 0xf];
*p++ = hexdigit[(ch >> 8) & 0xf];
*p++ = hexdigit[(ch >> 4) & 0xf];
*p++ = hexdigit[ch & 15];
}
else
#endif
/* Map 16-bit characters to '\uxxxx' */
if (ch >= 256) {
*p++ = '\\';
*p++ = 'u';
*p++ = hexdigit[(ch >> 12) & 0xf];
*p++ = hexdigit[(ch >> 8) & 0xf];
*p++ = hexdigit[(ch >> 4) & 0xf];
*p++ = hexdigit[ch & 15];
}
/* Copy everything else as-is */
else
*p++ = (char) ch;
}
*p = '\0';
_PyString_Resize(&repr, p - q);
return repr;
}
PyObject *PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeRawUnicodeEscape(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode));
}
/* --- Unicode Internal Codec ------------------------------------------- */
PyObject *_PyUnicode_DecodeUnicodeInternal(const char *s,
Py_ssize_t size,
const char *errors)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
PyUnicodeObject *v;
Py_UNICODE *p;
const char *end;
const char *reason;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
#ifdef Py_UNICODE_WIDE
Py_UNICODE unimax = PyUnicode_GetMax();
#endif
v = _PyUnicode_New((size+Py_UNICODE_SIZE-1)/ Py_UNICODE_SIZE);
if (v == NULL)
goto onError;
if (PyUnicode_GetSize((PyObject *)v) == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
end = s + size;
while (s < end) {
memcpy(p, s, sizeof(Py_UNICODE));
/* We have to sanity check the raw data, otherwise doom looms for
some malformed UCS-4 data. */
if (
#ifdef Py_UNICODE_WIDE
*p > unimax || *p < 0 ||
#endif
end-s < Py_UNICODE_SIZE
)
{
startinpos = s - starts;
if (end-s < Py_UNICODE_SIZE) {
endinpos = end-starts;
reason = "truncated input";
}
else {
endinpos = s - starts + Py_UNICODE_SIZE;
reason = "illegal code point (> 0x10FFFF)";
}
outpos = p - PyUnicode_AS_UNICODE(v);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"unicode_internal", reason,
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p)) {
goto onError;
}
}
else {
p++;
s += Py_UNICODE_SIZE;
}
}
if (_PyUnicode_Resize(&v, p - PyUnicode_AS_UNICODE(v)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)v;
onError:
Py_XDECREF(v);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
/* --- Latin-1 Codec ------------------------------------------------------ */
PyObject *PyUnicode_DecodeLatin1(const char *s,
Py_ssize_t size,
const char *errors)
{
PyUnicodeObject *v;
Py_UNICODE *p;
/* Latin-1 is equivalent to the first 256 ordinals in Unicode. */
if (size == 1) {
Py_UNICODE r = *(unsigned char*)s;
return PyUnicode_FromUnicode(&r, 1);
}
v = _PyUnicode_New(size);
if (v == NULL)
goto onError;
if (size == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
while (size-- > 0)
*p++ = (unsigned char)*s++;
return (PyObject *)v;
onError:
Py_XDECREF(v);
return NULL;
}
/* create or adjust a UnicodeEncodeError */
static void make_encode_exception(PyObject **exceptionObject,
const char *encoding,
const Py_UNICODE *unicode, Py_ssize_t size,
Py_ssize_t startpos, Py_ssize_t endpos,
const char *reason)
{
if (*exceptionObject == NULL) {
*exceptionObject = PyUnicodeEncodeError_Create(
encoding, unicode, size, startpos, endpos, reason);
}
else {
if (PyUnicodeEncodeError_SetStart(*exceptionObject, startpos))
goto onError;
if (PyUnicodeEncodeError_SetEnd(*exceptionObject, endpos))
goto onError;
if (PyUnicodeEncodeError_SetReason(*exceptionObject, reason))
goto onError;
return;
onError:
Py_DECREF(*exceptionObject);
*exceptionObject = NULL;
}
}
/* raises a UnicodeEncodeError */
static void raise_encode_exception(PyObject **exceptionObject,
const char *encoding,
const Py_UNICODE *unicode, Py_ssize_t size,
Py_ssize_t startpos, Py_ssize_t endpos,
const char *reason)
{
make_encode_exception(exceptionObject,
encoding, unicode, size, startpos, endpos, reason);
if (*exceptionObject != NULL)
PyCodec_StrictErrors(*exceptionObject);
}
/* error handling callback helper:
build arguments, call the callback and check the arguments,
put the result into newpos and return the replacement string, which
has to be freed by the caller */
static PyObject *unicode_encode_call_errorhandler(const char *errors,
PyObject **errorHandler,
const char *encoding, const char *reason,
const Py_UNICODE *unicode, Py_ssize_t size, PyObject **exceptionObject,
Py_ssize_t startpos, Py_ssize_t endpos,
Py_ssize_t *newpos)
{
static char *argparse = "O!n;encoding error handler must return (unicode, int) tuple";
PyObject *restuple;
PyObject *resunicode;
if (*errorHandler == NULL) {
*errorHandler = PyCodec_LookupError(errors);
if (*errorHandler == NULL)
return NULL;
}
make_encode_exception(exceptionObject,
encoding, unicode, size, startpos, endpos, reason);
if (*exceptionObject == NULL)
return NULL;
restuple = PyObject_CallFunctionObjArgs(
*errorHandler, *exceptionObject, NULL);
if (restuple == NULL)
return NULL;
if (!PyTuple_Check(restuple)) {
PyErr_Format(PyExc_TypeError, &argparse[4]);
Py_DECREF(restuple);
return NULL;
}
if (!PyArg_ParseTuple(restuple, argparse, &PyUnicode_Type,
&resunicode, newpos)) {
Py_DECREF(restuple);
return NULL;
}
if (*newpos<0)
*newpos = size+*newpos;
if (*newpos<0 || *newpos>size) {
PyErr_Format(PyExc_IndexError, "position %zd from error handler out of bounds", *newpos);
Py_DECREF(restuple);
return NULL;
}
Py_INCREF(resunicode);
Py_DECREF(restuple);
return resunicode;
}
static PyObject *unicode_encode_ucs1(const Py_UNICODE *p,
Py_ssize_t size,
const char *errors,
int limit)
{
/* output object */
PyObject *res;
/* pointers to the beginning and end+1 of input */
const Py_UNICODE *startp = p;
const Py_UNICODE *endp = p + size;
/* pointer to the beginning of the unencodable characters */
/* const Py_UNICODE *badp = NULL; */
/* pointer into the output */
char *str;
/* current output position */
Py_ssize_t respos = 0;
Py_ssize_t ressize;
const char *encoding = (limit == 256) ? "latin-1" : "ascii";
const char *reason = (limit == 256) ? "ordinal not in range(256)" : "ordinal not in range(128)";
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* the following variable is used for caching string comparisons
* -1=not initialized, 0=unknown, 1=strict, 2=replace, 3=ignore, 4=xmlcharrefreplace */
int known_errorHandler = -1;
/* allocate enough for a simple encoding without
replacements, if we need more, we'll resize */
res = PyString_FromStringAndSize(NULL, size);
if (res == NULL)
goto onError;
if (size == 0)
return res;
str = PyString_AS_STRING(res);
ressize = size;
while (p<endp) {
Py_UNICODE c = *p;
/* can we encode this? */
if (c<limit) {
/* no overflow check, because we know that the space is enough */
*str++ = (char)c;
++p;
}
else {
Py_ssize_t unicodepos = p-startp;
Py_ssize_t requiredsize;
PyObject *repunicode;
Py_ssize_t repsize;
Py_ssize_t newpos;
Py_ssize_t respos;
Py_UNICODE *uni2;
/* startpos for collecting unencodable chars */
const Py_UNICODE *collstart = p;
const Py_UNICODE *collend = p;
/* find all unecodable characters */
while ((collend < endp) && ((*collend)>=limit))
++collend;
/* cache callback name lookup (if not done yet, i.e. it's the first error) */
if (known_errorHandler==-1) {
if ((errors==NULL) || (!strcmp(errors, "strict")))
known_errorHandler = 1;
else if (!strcmp(errors, "replace"))
known_errorHandler = 2;
else if (!strcmp(errors, "ignore"))
known_errorHandler = 3;
else if (!strcmp(errors, "xmlcharrefreplace"))
known_errorHandler = 4;
else
known_errorHandler = 0;
}
switch (known_errorHandler) {
case 1: /* strict */
raise_encode_exception(&exc, encoding, startp, size, collstart-startp, collend-startp, reason);
goto onError;
case 2: /* replace */
while (collstart++<collend)
*str++ = '?'; /* fall through */
case 3: /* ignore */
p = collend;
break;
case 4: /* xmlcharrefreplace */
respos = str-PyString_AS_STRING(res);
/* determine replacement size (temporarily (mis)uses p) */
for (p = collstart, repsize = 0; p < collend; ++p) {
if (*p<10)
repsize += 2+1+1;
else if (*p<100)
repsize += 2+2+1;
else if (*p<1000)
repsize += 2+3+1;
else if (*p<10000)
repsize += 2+4+1;
#ifndef Py_UNICODE_WIDE
else
repsize += 2+5+1;
#else
else if (*p<100000)
repsize += 2+5+1;
else if (*p<1000000)
repsize += 2+6+1;
else
repsize += 2+7+1;
#endif
}
requiredsize = respos+repsize+(endp-collend);
if (requiredsize > ressize) {
if (requiredsize<2*ressize)
requiredsize = 2*ressize;
if (_PyString_Resize(&res, requiredsize))
goto onError;
str = PyString_AS_STRING(res) + respos;
ressize = requiredsize;
}
/* generate replacement (temporarily (mis)uses p) */
for (p = collstart; p < collend; ++p) {
str += sprintf(str, "&#%d;", (int)*p);
}
p = collend;
break;
default:
repunicode = unicode_encode_call_errorhandler(errors, &errorHandler,
encoding, reason, startp, size, &exc,
collstart-startp, collend-startp, &newpos);
if (repunicode == NULL)
goto onError;
/* need more space? (at least enough for what we
have+the replacement+the rest of the string, so
we won't have to check space for encodable characters) */
respos = str-PyString_AS_STRING(res);
repsize = PyUnicode_GET_SIZE(repunicode);
requiredsize = respos+repsize+(endp-collend);
if (requiredsize > ressize) {
if (requiredsize<2*ressize)
requiredsize = 2*ressize;
if (_PyString_Resize(&res, requiredsize)) {
Py_DECREF(repunicode);
goto onError;
}
str = PyString_AS_STRING(res) + respos;
ressize = requiredsize;
}
/* check if there is anything unencodable in the replacement
and copy it to the output */
for (uni2 = PyUnicode_AS_UNICODE(repunicode);repsize-->0; ++uni2, ++str) {
c = *uni2;
if (c >= limit) {
raise_encode_exception(&exc, encoding, startp, size,
unicodepos, unicodepos+1, reason);
Py_DECREF(repunicode);
goto onError;
}
*str = (char)c;
}
p = startp + newpos;
Py_DECREF(repunicode);
}
}
}
/* Resize if we allocated to much */
respos = str-PyString_AS_STRING(res);
if (respos<ressize)
/* If this falls res will be NULL */
_PyString_Resize(&res, respos);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return res;
onError:
Py_XDECREF(res);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
PyObject *PyUnicode_EncodeLatin1(const Py_UNICODE *p,
Py_ssize_t size,
const char *errors)
{
return unicode_encode_ucs1(p, size, errors, 256);
}
PyObject *PyUnicode_AsLatin1String(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeLatin1(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
NULL);
}
/* --- 7-bit ASCII Codec -------------------------------------------------- */
PyObject *PyUnicode_DecodeASCII(const char *s,
Py_ssize_t size,
const char *errors)
{
const char *starts = s;
PyUnicodeObject *v;
Py_UNICODE *p;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
const char *e;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* ASCII is equivalent to the first 128 ordinals in Unicode. */
if (size == 1 && *(unsigned char*)s < 128) {
Py_UNICODE r = *(unsigned char*)s;
return PyUnicode_FromUnicode(&r, 1);
}
v = _PyUnicode_New(size);
if (v == NULL)
goto onError;
if (size == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
e = s + size;
while (s < e) {
register unsigned char c = (unsigned char)*s;
if (c < 128) {
*p++ = c;
++s;
}
else {
startinpos = s-starts;
endinpos = startinpos + 1;
outpos = p - (Py_UNICODE *)PyUnicode_AS_UNICODE(v);
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"ascii", "ordinal not in range(128)",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p))
goto onError;
}
}
if (p - PyUnicode_AS_UNICODE(v) < PyString_GET_SIZE(v))
if (_PyUnicode_Resize(&v, p - PyUnicode_AS_UNICODE(v)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)v;
onError:
Py_XDECREF(v);
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return NULL;
}
PyObject *PyUnicode_EncodeASCII(const Py_UNICODE *p,
Py_ssize_t size,
const char *errors)
{
return unicode_encode_ucs1(p, size, errors, 128);
}
PyObject *PyUnicode_AsASCIIString(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeASCII(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
NULL);
}
#if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T)
/* --- MBCS codecs for Windows -------------------------------------------- */
#if SIZEOF_INT < SIZEOF_SSIZE_T
#define NEED_RETRY
#endif
/* XXX This code is limited to "true" double-byte encodings, as
a) it assumes an incomplete character consists of a single byte, and
b) IsDBCSLeadByte (probably) does not work for non-DBCS multi-byte
encodings, see IsDBCSLeadByteEx documentation. */
static int is_dbcs_lead_byte(const char *s, int offset)
{
const char *curr = s + offset;
if (IsDBCSLeadByte(*curr)) {
const char *prev = CharPrev(s, curr);
return (prev == curr) || !IsDBCSLeadByte(*prev) || (curr - prev == 2);
}
return 0;
}
/*
* Decode MBCS string into unicode object. If 'final' is set, converts
* trailing lead-byte too. Returns consumed size if succeed, -1 otherwise.
*/
static int decode_mbcs(PyUnicodeObject **v,
const char *s, /* MBCS string */
int size, /* sizeof MBCS string */
int final)
{
Py_UNICODE *p;
Py_ssize_t n = 0;
int usize = 0;
assert(size >= 0);
/* Skip trailing lead-byte unless 'final' is set */
if (!final && size >= 1 && is_dbcs_lead_byte(s, size - 1))
--size;
/* First get the size of the result */
if (size > 0) {
usize = MultiByteToWideChar(CP_ACP, 0, s, size, NULL, 0);
if (usize == 0) {
PyErr_SetFromWindowsErrWithFilename(0, NULL);
return -1;
}
}
if (*v == NULL) {
/* Create unicode object */
*v = _PyUnicode_New(usize);
if (*v == NULL)
return -1;
}
else {
/* Extend unicode object */
n = PyUnicode_GET_SIZE(*v);
if (_PyUnicode_Resize(v, n + usize) < 0)
return -1;
}
/* Do the conversion */
if (size > 0) {
p = PyUnicode_AS_UNICODE(*v) + n;
if (0 == MultiByteToWideChar(CP_ACP, 0, s, size, p, usize)) {
PyErr_SetFromWindowsErrWithFilename(0, NULL);
return -1;
}
}
return size;
}
PyObject *PyUnicode_DecodeMBCSStateful(const char *s,
Py_ssize_t size,
const char *errors,
Py_ssize_t *consumed)
{
PyUnicodeObject *v = NULL;
int done;
if (consumed)
*consumed = 0;
#ifdef NEED_RETRY
retry:
if (size > INT_MAX)
done = decode_mbcs(&v, s, INT_MAX, 0);
else
#endif
done = decode_mbcs(&v, s, (int)size, !consumed);
if (done < 0) {
Py_XDECREF(v);
return NULL;
}
if (consumed)
*consumed += done;
#ifdef NEED_RETRY
if (size > INT_MAX) {
s += done;
size -= done;
goto retry;
}
#endif
return (PyObject *)v;
}
PyObject *PyUnicode_DecodeMBCS(const char *s,
Py_ssize_t size,
const char *errors)
{
return PyUnicode_DecodeMBCSStateful(s, size, errors, NULL);
}
/*
* Convert unicode into string object (MBCS).
* Returns 0 if succeed, -1 otherwise.
*/
static int encode_mbcs(PyObject **repr,
const Py_UNICODE *p, /* unicode */
int size) /* size of unicode */
{
int mbcssize = 0;
Py_ssize_t n = 0;
assert(size >= 0);
/* First get the size of the result */
if (size > 0) {
mbcssize = WideCharToMultiByte(CP_ACP, 0, p, size, NULL, 0, NULL, NULL);
if (mbcssize == 0) {
PyErr_SetFromWindowsErrWithFilename(0, NULL);
return -1;
}
}
if (*repr == NULL) {
/* Create string object */
*repr = PyString_FromStringAndSize(NULL, mbcssize);
if (*repr == NULL)
return -1;
}
else {
/* Extend string object */
n = PyString_Size(*repr);
if (_PyString_Resize(repr, n + mbcssize) < 0)
return -1;
}
/* Do the conversion */
if (size > 0) {
char *s = PyString_AS_STRING(*repr) + n;
if (0 == WideCharToMultiByte(CP_ACP, 0, p, size, s, mbcssize, NULL, NULL)) {
PyErr_SetFromWindowsErrWithFilename(0, NULL);
return -1;
}
}
return 0;
}
PyObject *PyUnicode_EncodeMBCS(const Py_UNICODE *p,
Py_ssize_t size,
const char *errors)
{
PyObject *repr = NULL;
int ret;
#ifdef NEED_RETRY
retry:
if (size > INT_MAX)
ret = encode_mbcs(&repr, p, INT_MAX);
else
#endif
ret = encode_mbcs(&repr, p, (int)size);
if (ret < 0) {
Py_XDECREF(repr);
return NULL;
}
#ifdef NEED_RETRY
if (size > INT_MAX) {
p += INT_MAX;
size -= INT_MAX;
goto retry;
}
#endif
return repr;
}
PyObject *PyUnicode_AsMBCSString(PyObject *unicode)
{
if (!PyUnicode_Check(unicode)) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeMBCS(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
NULL);
}
#undef NEED_RETRY
#endif /* MS_WINDOWS */
/* --- Character Mapping Codec -------------------------------------------- */
PyObject *PyUnicode_DecodeCharmap(const char *s,
Py_ssize_t size,
PyObject *mapping,
const char *errors)
{
const char *starts = s;
Py_ssize_t startinpos;
Py_ssize_t endinpos;
Py_ssize_t outpos;
const char *e;
PyUnicodeObject *v;
Py_UNICODE *p;
Py_ssize_t extrachars = 0;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
Py_UNICODE *mapstring = NULL;
Py_ssize_t maplen = 0;
/* Default to Latin-1 */
if (mapping == NULL)
return PyUnicode_DecodeLatin1(s, size, errors);
v = _PyUnicode_New(size);
if (v == NULL)
goto onError;
if (size == 0)
return (PyObject *)v;
p = PyUnicode_AS_UNICODE(v);
e = s + size;
if (PyUnicode_CheckExact(mapping)) {
mapstring = PyUnicode_AS_UNICODE(mapping);
maplen = PyUnicode_GET_SIZE(mapping);
while (s < e) {
unsigned char ch = *s;
Py_UNICODE x = 0xfffe; /* illegal value */
if (ch < maplen)
x = mapstring[ch];
if (x == 0xfffe) {
/* undefined mapping */
outpos = p-PyUnicode_AS_UNICODE(v);
startinpos = s-starts;
endinpos = startinpos+1;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"charmap", "character maps to <undefined>",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p)) {
goto onError;
}
continue;
}
*p++ = x;
++s;
}
}
else {
while (s < e) {
unsigned char ch = *s;
PyObject *w, *x;
/* Get mapping (char ordinal -> integer, Unicode char or None) */
w = PyInt_FromLong((long)ch);
if (w == NULL)
goto onError;
x = PyObject_GetItem(mapping, w);
Py_DECREF(w);
if (x == NULL) {
if (PyErr_ExceptionMatches(PyExc_LookupError)) {
/* No mapping found means: mapping is undefined. */
PyErr_Clear();
x = Py_None;
Py_INCREF(x);
} else
goto onError;
}
/* Apply mapping */
if (PyInt_Check(x)) {
long value = PyInt_AS_LONG(x);
if (value < 0 || value > 65535) {
PyErr_SetString(PyExc_TypeError,
"character mapping must be in range(65536)");
Py_DECREF(x);
goto onError;
}
*p++ = (Py_UNICODE)value;
}
else if (x == Py_None) {
/* undefined mapping */
outpos = p-PyUnicode_AS_UNICODE(v);
startinpos = s-starts;
endinpos = startinpos+1;
if (unicode_decode_call_errorhandler(
errors, &errorHandler,
"charmap", "character maps to <undefined>",
starts, size, &startinpos, &endinpos, &exc, &s,
(PyObject **)&v, &outpos, &p)) {
Py_DECREF(x);
goto onError;
}
Py_DECREF(x);
continue;
}
else if (PyUnicode_Check(x)) {
Py_ssize_t targetsize = PyUnicode_GET_SIZE(x);
if (targetsize == 1)
/* 1-1 mapping */
*p++ = *PyUnicode_AS_UNICODE(x);
else if (targetsize > 1) {
/* 1-n mapping */
if (targetsize > extrachars) {
/* resize first */
Py_ssize_t oldpos = p - PyUnicode_AS_UNICODE(v);
Py_ssize_t needed = (targetsize - extrachars) + \
(targetsize << 2);
extrachars += needed;
if (_PyUnicode_Resize(&v,
PyUnicode_GET_SIZE(v) + needed) < 0) {
Py_DECREF(x);
goto onError;
}
p = PyUnicode_AS_UNICODE(v) + oldpos;
}
Py_UNICODE_COPY(p,
PyUnicode_AS_UNICODE(x),
targetsize);
p += targetsize;
extrachars -= targetsize;
}
/* 1-0 mapping: skip the character */
}
else {
/* wrong return value */
PyErr_SetString(PyExc_TypeError,
"character mapping must return integer, None or unicode");
Py_DECREF(x);
goto onError;
}
Py_DECREF(x);
++s;
}
}
if (p - PyUnicode_AS_UNICODE(v) < PyUnicode_GET_SIZE(v))
if (_PyUnicode_Resize(&v, p - PyUnicode_AS_UNICODE(v)) < 0)
goto onError;
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
return (PyObject *)v;
onError:
Py_XDECREF(errorHandler);
Py_XDECREF(exc);
Py_XDECREF(v);
return NULL;
}
/* Charmap encoding: the lookup table */
struct encoding_map{
PyObject_HEAD
unsigned char level1[32];
int count2, count3;
unsigned char level23[1];
};
static PyObject*
encoding_map_size(PyObject *obj, PyObject* args)
{
struct encoding_map *map = (struct encoding_map*)obj;
return PyInt_FromLong(sizeof(*map) - 1 + 16*map->count2 +
128*map->count3);
}
static PyMethodDef encoding_map_methods[] = {
{"size", encoding_map_size, METH_NOARGS,
PyDoc_STR("Return the size (in bytes) of this object") },
{ 0 }
};
static void
encoding_map_dealloc(PyObject* o)
{
PyObject_FREE(o);
}
static PyTypeObject EncodingMapType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"EncodingMap", /*tp_name*/
sizeof(struct encoding_map), /*tp_basicsize*/
0, /*tp_itemsize*/
/* methods */
encoding_map_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash*/
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT, /*tp_flags*/
0, /*tp_doc*/
0, /*tp_traverse*/
0, /*tp_clear*/
0, /*tp_richcompare*/
0, /*tp_weaklistoffset*/
0, /*tp_iter*/
0, /*tp_iternext*/
encoding_map_methods, /*tp_methods*/
0, /*tp_members*/
0, /*tp_getset*/
0, /*tp_base*/
0, /*tp_dict*/
0, /*tp_descr_get*/
0, /*tp_descr_set*/
0, /*tp_dictoffset*/
0, /*tp_init*/
0, /*tp_alloc*/
0, /*tp_new*/
0, /*tp_free*/
0, /*tp_is_gc*/
};
PyObject*
PyUnicode_BuildEncodingMap(PyObject* string)
{
Py_UNICODE *decode;
PyObject *result;
struct encoding_map *mresult;
int i;
int need_dict = 0;
unsigned char level1[32];
unsigned char level2[512];
unsigned char *mlevel1, *mlevel2, *mlevel3;
int count2 = 0, count3 = 0;
if (!PyUnicode_Check(string) || PyUnicode_GetSize(string) != 256) {
PyErr_BadArgument();
return NULL;
}
decode = PyUnicode_AS_UNICODE(string);
memset(level1, 0xFF, sizeof level1);
memset(level2, 0xFF, sizeof level2);
/* If there isn't a one-to-one mapping of NULL to \0,
or if there are non-BMP characters, we need to use
a mapping dictionary. */
if (decode[0] != 0)
need_dict = 1;
for (i = 1; i < 256; i++) {
int l1, l2;
if (decode[i] == 0
#ifdef Py_UNICODE_WIDE
|| decode[i] > 0xFFFF
#endif
) {
need_dict = 1;
break;
}
if (decode[i] == 0xFFFE)
/* unmapped character */
continue;
l1 = decode[i] >> 11;
l2 = decode[i] >> 7;
if (level1[l1] == 0xFF)
level1[l1] = count2++;
if (level2[l2] == 0xFF)
level2[l2] = count3++;
}
if (count2 >= 0xFF || count3 >= 0xFF)
need_dict = 1;
if (need_dict) {
PyObject *result = PyDict_New();
PyObject *key, *value;
if (!result)
return NULL;
for (i = 0; i < 256; i++) {
key = value = NULL;
key = PyInt_FromLong(decode[i]);
value = PyInt_FromLong(i);
if (!key || !value)
goto failed1;
if (PyDict_SetItem(result, key, value) == -1)
goto failed1;
Py_DECREF(key);
Py_DECREF(value);
}
return result;
failed1:
Py_XDECREF(key);
Py_XDECREF(value);
Py_DECREF(result);
return NULL;
}
/* Create a three-level trie */
result = PyObject_MALLOC(sizeof(struct encoding_map) +
16*count2 + 128*count3 - 1);
if (!result)
return PyErr_NoMemory();
PyObject_Init(result, &EncodingMapType);
mresult = (struct encoding_map*)result;
mresult->count2 = count2;
mresult->count3 = count3;
mlevel1 = mresult->level1;
mlevel2 = mresult->level23;
mlevel3 = mresult->level23 + 16*count2;
memcpy(mlevel1, level1, 32);
memset(mlevel2, 0xFF, 16*count2);
memset(mlevel3, 0, 128*count3);
count3 = 0;
for (i = 1; i < 256; i++) {
int o1, o2, o3, i2, i3;
if (decode[i] == 0xFFFE)
/* unmapped character */
continue;
o1 = decode[i]>>11;
o2 = (decode[i]>>7) & 0xF;
i2 = 16*mlevel1[o1] + o2;
if (mlevel2[i2] == 0xFF)
mlevel2[i2] = count3++;
o3 = decode[i] & 0x7F;
i3 = 128*mlevel2[i2] + o3;
mlevel3[i3] = i;
}
return result;
}
static int
encoding_map_lookup(Py_UNICODE c, PyObject *mapping)
{
struct encoding_map *map = (struct encoding_map*)mapping;
int l1 = c>>11;
int l2 = (c>>7) & 0xF;
int l3 = c & 0x7F;
int i;
#ifdef Py_UNICODE_WIDE
if (c > 0xFFFF) {
return -1;
}
#endif
if (c == 0)
return 0;
/* level 1*/
i = map->level1[l1];
if (i == 0xFF) {
return -1;
}
/* level 2*/
i = map->level23[16*i+l2];
if (i == 0xFF) {
return -1;
}
/* level 3 */
i = map->level23[16*map->count2 + 128*i + l3];
if (i == 0) {
return -1;
}
return i;
}
/* Lookup the character ch in the mapping. If the character
can't be found, Py_None is returned (or NULL, if another
error occurred). */
static PyObject *charmapencode_lookup(Py_UNICODE c, PyObject *mapping)
{
PyObject *w = PyInt_FromLong((long)c);
PyObject *x;
if (w == NULL)
return NULL;
x = PyObject_GetItem(mapping, w);
Py_DECREF(w);
if (x == NULL) {
if (PyErr_ExceptionMatches(PyExc_LookupError)) {
/* No mapping found means: mapping is undefined. */
PyErr_Clear();
x = Py_None;
Py_INCREF(x);
return x;
} else
return NULL;
}
else if (x == Py_None)
return x;
else if (PyInt_Check(x)) {
long value = PyInt_AS_LONG(x);
if (value < 0 || value > 255) {
PyErr_SetString(PyExc_TypeError,
"character mapping must be in range(256)");
Py_DECREF(x);
return NULL;
}
return x;
}
else if (PyString_Check(x))
return x;
else {
/* wrong return value */
PyErr_SetString(PyExc_TypeError,
"character mapping must return integer, None or str");
Py_DECREF(x);
return NULL;
}
}
static int
charmapencode_resize(PyObject **outobj, Py_ssize_t *outpos, Py_ssize_t requiredsize)
{
Py_ssize_t outsize = PyString_GET_SIZE(*outobj);
/* exponentially overallocate to minimize reallocations */
if (requiredsize < 2*outsize)
requiredsize = 2*outsize;
if (_PyString_Resize(outobj, requiredsize)) {
return 0;
}
return 1;
}
typedef enum charmapencode_result {
enc_SUCCESS, enc_FAILED, enc_EXCEPTION
}charmapencode_result;
/* lookup the character, put the result in the output string and adjust
various state variables. Reallocate the output string if not enough
space is available. Return a new reference to the object that
was put in the output buffer, or Py_None, if the mapping was undefined
(in which case no character was written) or NULL, if a
reallocation error occurred. The caller must decref the result */
static
charmapencode_result charmapencode_output(Py_UNICODE c, PyObject *mapping,
PyObject **outobj, Py_ssize_t *outpos)
{
PyObject *rep;
char *outstart;
Py_ssize_t outsize = PyString_GET_SIZE(*outobj);
if (mapping->ob_type == &EncodingMapType) {
int res = encoding_map_lookup(c, mapping);
Py_ssize_t requiredsize = *outpos+1;
if (res == -1)
return enc_FAILED;
if (outsize<requiredsize)
if (!charmapencode_resize(outobj, outpos, requiredsize))
return enc_EXCEPTION;
outstart = PyString_AS_STRING(*outobj);
outstart[(*outpos)++] = (char)res;
return enc_SUCCESS;
}
rep = charmapencode_lookup(c, mapping);
if (rep==NULL)
return enc_EXCEPTION;
else if (rep==Py_None) {
Py_DECREF(rep);
return enc_FAILED;
} else {
if (PyInt_Check(rep)) {
Py_ssize_t requiredsize = *outpos+1;
if (outsize<requiredsize)
if (!charmapencode_resize(outobj, outpos, requiredsize)) {
Py_DECREF(rep);
return enc_EXCEPTION;
}
outstart = PyString_AS_STRING(*outobj);
outstart[(*outpos)++] = (char)PyInt_AS_LONG(rep);
}
else {
const char *repchars = PyString_AS_STRING(rep);
Py_ssize_t repsize = PyString_GET_SIZE(rep);
Py_ssize_t requiredsize = *outpos+repsize;
if (outsize<requiredsize)
if (!charmapencode_resize(outobj, outpos, requiredsize)) {
Py_DECREF(rep);
return enc_EXCEPTION;
}
outstart = PyString_AS_STRING(*outobj);
memcpy(outstart + *outpos, repchars, repsize);
*outpos += repsize;
}
}
Py_DECREF(rep);
return enc_SUCCESS;
}
/* handle an error in PyUnicode_EncodeCharmap
Return 0 on success, -1 on error */
static
int charmap_encoding_error(
const Py_UNICODE *p, Py_ssize_t size, Py_ssize_t *inpos, PyObject *mapping,
PyObject **exceptionObject,
int *known_errorHandler, PyObject **errorHandler, const char *errors,
PyObject **res, Py_ssize_t *respos)
{
PyObject *repunicode = NULL; /* initialize to prevent gcc warning */
Py_ssize_t repsize;
Py_ssize_t newpos;
Py_UNICODE *uni2;
/* startpos for collecting unencodable chars */
Py_ssize_t collstartpos = *inpos;
Py_ssize_t collendpos = *inpos+1;
Py_ssize_t collpos;
char *encoding = "charmap";
char *reason = "character maps to <undefined>";
charmapencode_result x;
/* find all unencodable characters */
while (collendpos < size) {
PyObject *rep;
if (mapping->ob_type == &EncodingMapType) {
int res = encoding_map_lookup(p[collendpos], mapping);
if (res != -1)
break;
++collendpos;
continue;
}
rep = charmapencode_lookup(p[collendpos], mapping);
if (rep==NULL)
return -1;
else if (rep!=Py_None) {
Py_DECREF(rep);
break;
}
Py_DECREF(rep);
++collendpos;
}
/* cache callback name lookup
* (if not done yet, i.e. it's the first error) */
if (*known_errorHandler==-1) {
if ((errors==NULL) || (!strcmp(errors, "strict")))
*known_errorHandler = 1;
else if (!strcmp(errors, "replace"))
*known_errorHandler = 2;
else if (!strcmp(errors, "ignore"))
*known_errorHandler = 3;
else if (!strcmp(errors, "xmlcharrefreplace"))
*known_errorHandler = 4;
else
*known_errorHandler = 0;
}
switch (*known_errorHandler) {
case 1: /* strict */
raise_encode_exception(exceptionObject, encoding, p, size, collstartpos, collendpos, reason);
return -1;
case 2: /* replace */
for (collpos = collstartpos; collpos<collendpos; ++collpos) {
x = charmapencode_output('?', mapping, res, respos);
if (x==enc_EXCEPTION) {
return -1;
}
else if (x==enc_FAILED) {
raise_encode_exception(exceptionObject, encoding, p, size, collstartpos, collendpos, reason);
return -1;
}
}
/* fall through */
case 3: /* ignore */
*inpos = collendpos;
break;
case 4: /* xmlcharrefreplace */
/* generate replacement (temporarily (mis)uses p) */
for (collpos = collstartpos; collpos < collendpos; ++collpos) {
char buffer[2+29+1+1];
char *cp;
sprintf(buffer, "&#%d;", (int)p[collpos]);
for (cp = buffer; *cp; ++cp) {
x = charmapencode_output(*cp, mapping, res, respos);
if (x==enc_EXCEPTION)
return -1;
else if (x==enc_FAILED) {
raise_encode_exception(exceptionObject, encoding, p, size, collstartpos, collendpos, reason);
return -1;
}
}
}
*inpos = collendpos;
break;
default:
repunicode = unicode_encode_call_errorhandler(errors, errorHandler,
encoding, reason, p, size, exceptionObject,
collstartpos, collendpos, &newpos);
if (repunicode == NULL)
return -1;
/* generate replacement */
repsize = PyUnicode_GET_SIZE(repunicode);
for (uni2 = PyUnicode_AS_UNICODE(repunicode); repsize-->0; ++uni2) {
x = charmapencode_output(*uni2, mapping, res, respos);
if (x==enc_EXCEPTION) {
return -1;
}
else if (x==enc_FAILED) {
Py_DECREF(repunicode);
raise_encode_exception(exceptionObject, encoding, p, size, collstartpos, collendpos, reason);
return -1;
}
}
*inpos = newpos;
Py_DECREF(repunicode);
}
return 0;
}
PyObject *PyUnicode_EncodeCharmap(const Py_UNICODE *p,
Py_ssize_t size,
PyObject *mapping,
const char *errors)
{
/* output object */
PyObject *res = NULL;
/* current input position */
Py_ssize_t inpos = 0;
/* current output position */
Py_ssize_t respos = 0;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* the following variable is used for caching string comparisons
* -1=not initialized, 0=unknown, 1=strict, 2=replace,
* 3=ignore, 4=xmlcharrefreplace */
int known_errorHandler = -1;
/* Default to Latin-1 */
if (mapping == NULL)
return PyUnicode_EncodeLatin1(p, size, errors);
/* allocate enough for a simple encoding without
replacements, if we need more, we'll resize */
res = PyString_FromStringAndSize(NULL, size);
if (res == NULL)
goto onError;
if (size == 0)
return res;
while (inpos<size) {
/* try to encode it */
charmapencode_result x = charmapencode_output(p[inpos], mapping, &res, &respos);
if (x==enc_EXCEPTION) /* error */
goto onError;
if (x==enc_FAILED) { /* unencodable character */
if (charmap_encoding_error(p, size, &inpos, mapping,
&exc,
&known_errorHandler, &errorHandler, errors,
&res, &respos)) {
goto onError;
}
}
else
/* done with this character => adjust input position */
++inpos;
}
/* Resize if we allocated to much */
if (respos<PyString_GET_SIZE(res)) {
if (_PyString_Resize(&res, respos))
goto onError;
}
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return res;
onError:
Py_XDECREF(res);
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return NULL;
}
PyObject *PyUnicode_AsCharmapString(PyObject *unicode,
PyObject *mapping)
{
if (!PyUnicode_Check(unicode) || mapping == NULL) {
PyErr_BadArgument();
return NULL;
}
return PyUnicode_EncodeCharmap(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
mapping,
NULL);
}
/* create or adjust a UnicodeTranslateError */
static void make_translate_exception(PyObject **exceptionObject,
const Py_UNICODE *unicode, Py_ssize_t size,
Py_ssize_t startpos, Py_ssize_t endpos,
const char *reason)
{
if (*exceptionObject == NULL) {
*exceptionObject = PyUnicodeTranslateError_Create(
unicode, size, startpos, endpos, reason);
}
else {
if (PyUnicodeTranslateError_SetStart(*exceptionObject, startpos))
goto onError;
if (PyUnicodeTranslateError_SetEnd(*exceptionObject, endpos))
goto onError;
if (PyUnicodeTranslateError_SetReason(*exceptionObject, reason))
goto onError;
return;
onError:
Py_DECREF(*exceptionObject);
*exceptionObject = NULL;
}
}
/* raises a UnicodeTranslateError */
static void raise_translate_exception(PyObject **exceptionObject,
const Py_UNICODE *unicode, Py_ssize_t size,
Py_ssize_t startpos, Py_ssize_t endpos,
const char *reason)
{
make_translate_exception(exceptionObject,
unicode, size, startpos, endpos, reason);
if (*exceptionObject != NULL)
PyCodec_StrictErrors(*exceptionObject);
}
/* error handling callback helper:
build arguments, call the callback and check the arguments,
put the result into newpos and return the replacement string, which
has to be freed by the caller */
static PyObject *unicode_translate_call_errorhandler(const char *errors,
PyObject **errorHandler,
const char *reason,
const Py_UNICODE *unicode, Py_ssize_t size, PyObject **exceptionObject,
Py_ssize_t startpos, Py_ssize_t endpos,
Py_ssize_t *newpos)
{
static char *argparse = "O!n;translating error handler must return (unicode, int) tuple";
Py_ssize_t i_newpos;
PyObject *restuple;
PyObject *resunicode;
if (*errorHandler == NULL) {
*errorHandler = PyCodec_LookupError(errors);
if (*errorHandler == NULL)
return NULL;
}
make_translate_exception(exceptionObject,
unicode, size, startpos, endpos, reason);
if (*exceptionObject == NULL)
return NULL;
restuple = PyObject_CallFunctionObjArgs(
*errorHandler, *exceptionObject, NULL);
if (restuple == NULL)
return NULL;
if (!PyTuple_Check(restuple)) {
PyErr_Format(PyExc_TypeError, &argparse[4]);
Py_DECREF(restuple);
return NULL;
}
if (!PyArg_ParseTuple(restuple, argparse, &PyUnicode_Type,
&resunicode, &i_newpos)) {
Py_DECREF(restuple);
return NULL;
}
if (i_newpos<0)
*newpos = size+i_newpos;
else
*newpos = i_newpos;
if (*newpos<0 || *newpos>size) {
PyErr_Format(PyExc_IndexError, "position %zd from error handler out of bounds", *newpos);
Py_DECREF(restuple);
return NULL;
}
Py_INCREF(resunicode);
Py_DECREF(restuple);
return resunicode;
}
/* Lookup the character ch in the mapping and put the result in result,
which must be decrefed by the caller.
Return 0 on success, -1 on error */
static
int charmaptranslate_lookup(Py_UNICODE c, PyObject *mapping, PyObject **result)
{
PyObject *w = PyInt_FromLong((long)c);
PyObject *x;
if (w == NULL)
return -1;
x = PyObject_GetItem(mapping, w);
Py_DECREF(w);
if (x == NULL) {
if (PyErr_ExceptionMatches(PyExc_LookupError)) {
/* No mapping found means: use 1:1 mapping. */
PyErr_Clear();
*result = NULL;
return 0;
} else
return -1;
}
else if (x == Py_None) {
*result = x;
return 0;
}
else if (PyInt_Check(x)) {
long value = PyInt_AS_LONG(x);
long max = PyUnicode_GetMax();
if (value < 0 || value > max) {
PyErr_Format(PyExc_TypeError,
"character mapping must be in range(0x%lx)", max+1);
Py_DECREF(x);
return -1;
}
*result = x;
return 0;
}
else if (PyUnicode_Check(x)) {
*result = x;
return 0;
}
else {
/* wrong return value */
PyErr_SetString(PyExc_TypeError,
"character mapping must return integer, None or unicode");
Py_DECREF(x);
return -1;
}
}
/* ensure that *outobj is at least requiredsize characters long,
if not reallocate and adjust various state variables.
Return 0 on success, -1 on error */
static
int charmaptranslate_makespace(PyObject **outobj, Py_UNICODE **outp,
Py_ssize_t requiredsize)
{
Py_ssize_t oldsize = PyUnicode_GET_SIZE(*outobj);
if (requiredsize > oldsize) {
/* remember old output position */
Py_ssize_t outpos = *outp-PyUnicode_AS_UNICODE(*outobj);
/* exponentially overallocate to minimize reallocations */
if (requiredsize < 2 * oldsize)
requiredsize = 2 * oldsize;
if (_PyUnicode_Resize(outobj, requiredsize) < 0)
return -1;
*outp = PyUnicode_AS_UNICODE(*outobj) + outpos;
}
return 0;
}
/* lookup the character, put the result in the output string and adjust
various state variables. Return a new reference to the object that
was put in the output buffer in *result, or Py_None, if the mapping was
undefined (in which case no character was written).
The called must decref result.
Return 0 on success, -1 on error. */
static
int charmaptranslate_output(const Py_UNICODE *startinp, const Py_UNICODE *curinp,
Py_ssize_t insize, PyObject *mapping, PyObject **outobj, Py_UNICODE **outp,
PyObject **res)
{
if (charmaptranslate_lookup(*curinp, mapping, res))
return -1;
if (*res==NULL) {
/* not found => default to 1:1 mapping */
*(*outp)++ = *curinp;
}
else if (*res==Py_None)
;
else if (PyInt_Check(*res)) {
/* no overflow check, because we know that the space is enough */
*(*outp)++ = (Py_UNICODE)PyInt_AS_LONG(*res);
}
else if (PyUnicode_Check(*res)) {
Py_ssize_t repsize = PyUnicode_GET_SIZE(*res);
if (repsize==1) {
/* no overflow check, because we know that the space is enough */
*(*outp)++ = *PyUnicode_AS_UNICODE(*res);
}
else if (repsize!=0) {
/* more than one character */
Py_ssize_t requiredsize = (*outp-PyUnicode_AS_UNICODE(*outobj)) +
(insize - (curinp-startinp)) +
repsize - 1;
if (charmaptranslate_makespace(outobj, outp, requiredsize))
return -1;
memcpy(*outp, PyUnicode_AS_UNICODE(*res), sizeof(Py_UNICODE)*repsize);
*outp += repsize;
}
}
else
return -1;
return 0;
}
PyObject *PyUnicode_TranslateCharmap(const Py_UNICODE *p,
Py_ssize_t size,
PyObject *mapping,
const char *errors)
{
/* output object */
PyObject *res = NULL;
/* pointers to the beginning and end+1 of input */
const Py_UNICODE *startp = p;
const Py_UNICODE *endp = p + size;
/* pointer into the output */
Py_UNICODE *str;
/* current output position */
Py_ssize_t respos = 0;
char *reason = "character maps to <undefined>";
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
/* the following variable is used for caching string comparisons
* -1=not initialized, 0=unknown, 1=strict, 2=replace,
* 3=ignore, 4=xmlcharrefreplace */
int known_errorHandler = -1;
if (mapping == NULL) {
PyErr_BadArgument();
return NULL;
}
/* allocate enough for a simple 1:1 translation without
replacements, if we need more, we'll resize */
res = PyUnicode_FromUnicode(NULL, size);
if (res == NULL)
goto onError;
if (size == 0)
return res;
str = PyUnicode_AS_UNICODE(res);
while (p<endp) {
/* try to encode it */
PyObject *x = NULL;
if (charmaptranslate_output(startp, p, size, mapping, &res, &str, &x)) {
Py_XDECREF(x);
goto onError;
}
Py_XDECREF(x);
if (x!=Py_None) /* it worked => adjust input pointer */
++p;
else { /* untranslatable character */
PyObject *repunicode = NULL; /* initialize to prevent gcc warning */
Py_ssize_t repsize;
Py_ssize_t newpos;
Py_UNICODE *uni2;
/* startpos for collecting untranslatable chars */
const Py_UNICODE *collstart = p;
const Py_UNICODE *collend = p+1;
const Py_UNICODE *coll;
/* find all untranslatable characters */
while (collend < endp) {
if (charmaptranslate_lookup(*collend, mapping, &x))
goto onError;
Py_XDECREF(x);
if (x!=Py_None)
break;
++collend;
}
/* cache callback name lookup
* (if not done yet, i.e. it's the first error) */
if (known_errorHandler==-1) {
if ((errors==NULL) || (!strcmp(errors, "strict")))
known_errorHandler = 1;
else if (!strcmp(errors, "replace"))
known_errorHandler = 2;
else if (!strcmp(errors, "ignore"))
known_errorHandler = 3;
else if (!strcmp(errors, "xmlcharrefreplace"))
known_errorHandler = 4;
else
known_errorHandler = 0;
}
switch (known_errorHandler) {
case 1: /* strict */
raise_translate_exception(&exc, startp, size, collstart-startp, collend-startp, reason);
goto onError;
case 2: /* replace */
/* No need to check for space, this is a 1:1 replacement */
for (coll = collstart; coll<collend; ++coll)
*str++ = '?';
/* fall through */
case 3: /* ignore */
p = collend;
break;
case 4: /* xmlcharrefreplace */
/* generate replacement (temporarily (mis)uses p) */
for (p = collstart; p < collend; ++p) {
char buffer[2+29+1+1];
char *cp;
sprintf(buffer, "&#%d;", (int)*p);
if (charmaptranslate_makespace(&res, &str,
(str-PyUnicode_AS_UNICODE(res))+strlen(buffer)+(endp-collend)))
goto onError;
for (cp = buffer; *cp; ++cp)
*str++ = *cp;
}
p = collend;
break;
default:
repunicode = unicode_translate_call_errorhandler(errors, &errorHandler,
reason, startp, size, &exc,
collstart-startp, collend-startp, &newpos);
if (repunicode == NULL)
goto onError;
/* generate replacement */
repsize = PyUnicode_GET_SIZE(repunicode);
if (charmaptranslate_makespace(&res, &str,
(str-PyUnicode_AS_UNICODE(res))+repsize+(endp-collend))) {
Py_DECREF(repunicode);
goto onError;
}
for (uni2 = PyUnicode_AS_UNICODE(repunicode); repsize-->0; ++uni2)
*str++ = *uni2;
p = startp + newpos;
Py_DECREF(repunicode);
}
}
}
/* Resize if we allocated to much */
respos = str-PyUnicode_AS_UNICODE(res);
if (respos<PyUnicode_GET_SIZE(res)) {
if (_PyUnicode_Resize(&res, respos) < 0)
goto onError;
}
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return res;
onError:
Py_XDECREF(res);
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return NULL;
}
PyObject *PyUnicode_Translate(PyObject *str,
PyObject *mapping,
const char *errors)
{
PyObject *result;
str = PyUnicode_FromObject(str);
if (str == NULL)
goto onError;
result = PyUnicode_TranslateCharmap(PyUnicode_AS_UNICODE(str),
PyUnicode_GET_SIZE(str),
mapping,
errors);
Py_DECREF(str);
return result;
onError:
Py_XDECREF(str);
return NULL;
}
/* --- Decimal Encoder ---------------------------------------------------- */
int PyUnicode_EncodeDecimal(Py_UNICODE *s,
Py_ssize_t length,
char *output,
const char *errors)
{
Py_UNICODE *p, *end;
PyObject *errorHandler = NULL;
PyObject *exc = NULL;
const char *encoding = "decimal";
const char *reason = "invalid decimal Unicode string";
/* the following variable is used for caching string comparisons
* -1=not initialized, 0=unknown, 1=strict, 2=replace, 3=ignore, 4=xmlcharrefreplace */
int known_errorHandler = -1;
if (output == NULL) {
PyErr_BadArgument();
return -1;
}
p = s;
end = s + length;
while (p < end) {
register Py_UNICODE ch = *p;
int decimal;
PyObject *repunicode;
Py_ssize_t repsize;
Py_ssize_t newpos;
Py_UNICODE *uni2;
Py_UNICODE *collstart;
Py_UNICODE *collend;
if (Py_UNICODE_ISSPACE(ch)) {
*output++ = ' ';
++p;
continue;
}
decimal = Py_UNICODE_TODECIMAL(ch);
if (decimal >= 0) {
*output++ = '0' + decimal;
++p;
continue;
}
if (0 < ch && ch < 256) {
*output++ = (char)ch;
++p;
continue;
}
/* All other characters are considered unencodable */
collstart = p;
collend = p+1;
while (collend < end) {
if ((0 < *collend && *collend < 256) ||
!Py_UNICODE_ISSPACE(*collend) ||
Py_UNICODE_TODECIMAL(*collend))
break;
}
/* cache callback name lookup
* (if not done yet, i.e. it's the first error) */
if (known_errorHandler==-1) {
if ((errors==NULL) || (!strcmp(errors, "strict")))
known_errorHandler = 1;
else if (!strcmp(errors, "replace"))
known_errorHandler = 2;
else if (!strcmp(errors, "ignore"))
known_errorHandler = 3;
else if (!strcmp(errors, "xmlcharrefreplace"))
known_errorHandler = 4;
else
known_errorHandler = 0;
}
switch (known_errorHandler) {
case 1: /* strict */
raise_encode_exception(&exc, encoding, s, length, collstart-s, collend-s, reason);
goto onError;
case 2: /* replace */
for (p = collstart; p < collend; ++p)
*output++ = '?';
/* fall through */
case 3: /* ignore */
p = collend;
break;
case 4: /* xmlcharrefreplace */
/* generate replacement (temporarily (mis)uses p) */
for (p = collstart; p < collend; ++p)
output += sprintf(output, "&#%d;", (int)*p);
p = collend;
break;
default:
repunicode = unicode_encode_call_errorhandler(errors, &errorHandler,
encoding, reason, s, length, &exc,
collstart-s, collend-s, &newpos);
if (repunicode == NULL)
goto onError;
/* generate replacement */
repsize = PyUnicode_GET_SIZE(repunicode);
for (uni2 = PyUnicode_AS_UNICODE(repunicode); repsize-->0; ++uni2) {
Py_UNICODE ch = *uni2;
if (Py_UNICODE_ISSPACE(ch))
*output++ = ' ';
else {
decimal = Py_UNICODE_TODECIMAL(ch);
if (decimal >= 0)
*output++ = '0' + decimal;
else if (0 < ch && ch < 256)
*output++ = (char)ch;
else {
Py_DECREF(repunicode);
raise_encode_exception(&exc, encoding,
s, length, collstart-s, collend-s, reason);
goto onError;
}
}
}
p = s + newpos;
Py_DECREF(repunicode);
}
}
/* 0-terminate the output string */
*output++ = '\0';
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return 0;
onError:
Py_XDECREF(exc);
Py_XDECREF(errorHandler);
return -1;
}
/* --- Helpers ------------------------------------------------------------ */
#define STRINGLIB_CHAR Py_UNICODE
#define STRINGLIB_LEN PyUnicode_GET_SIZE
#define STRINGLIB_NEW PyUnicode_FromUnicode
#define STRINGLIB_STR PyUnicode_AS_UNICODE
Py_LOCAL_INLINE(int)
STRINGLIB_CMP(const Py_UNICODE* str, const Py_UNICODE* other, Py_ssize_t len)
{
if (str[0] != other[0])
return 1;
return memcmp((void*) str, (void*) other, len * sizeof(Py_UNICODE));
}
#define STRINGLIB_EMPTY unicode_empty
#include "stringlib/fastsearch.h"
#include "stringlib/count.h"
#include "stringlib/find.h"
#include "stringlib/partition.h"
/* helper macro to fixup start/end slice values */
#define FIX_START_END(obj) \
if (start < 0) \
start += (obj)->length; \
if (start < 0) \
start = 0; \
if (end > (obj)->length) \
end = (obj)->length; \
if (end < 0) \
end += (obj)->length; \
if (end < 0) \
end = 0;
Py_ssize_t PyUnicode_Count(PyObject *str,
PyObject *substr,
Py_ssize_t start,
Py_ssize_t end)
{
Py_ssize_t result;
PyUnicodeObject* str_obj;
PyUnicodeObject* sub_obj;
str_obj = (PyUnicodeObject*) PyUnicode_FromObject(str);
if (!str_obj)
return -1;
sub_obj = (PyUnicodeObject*) PyUnicode_FromObject(substr);
if (!sub_obj) {
Py_DECREF(str_obj);
return -1;
}
FIX_START_END(str_obj);
result = stringlib_count(
str_obj->str + start, end - start, sub_obj->str, sub_obj->length
);
Py_DECREF(sub_obj);
Py_DECREF(str_obj);
return result;
}
Py_ssize_t PyUnicode_Find(PyObject *str,
PyObject *sub,
Py_ssize_t start,
Py_ssize_t end,
int direction)
{
Py_ssize_t result;
str = PyUnicode_FromObject(str);
if (!str)
return -2;
sub = PyUnicode_FromObject(sub);
if (!sub) {
Py_DECREF(str);
return -2;
}
if (direction > 0)
result = stringlib_find_slice(
PyUnicode_AS_UNICODE(str), PyUnicode_GET_SIZE(str),
PyUnicode_AS_UNICODE(sub), PyUnicode_GET_SIZE(sub),
start, end
);
else
result = stringlib_rfind_slice(
PyUnicode_AS_UNICODE(str), PyUnicode_GET_SIZE(str),
PyUnicode_AS_UNICODE(sub), PyUnicode_GET_SIZE(sub),
start, end
);
Py_DECREF(str);
Py_DECREF(sub);
return result;
}
static
int tailmatch(PyUnicodeObject *self,
PyUnicodeObject *substring,
Py_ssize_t start,
Py_ssize_t end,
int direction)
{
if (substring->length == 0)
return 1;
FIX_START_END(self);
end -= substring->length;
if (end < start)
return 0;
if (direction > 0) {
if (Py_UNICODE_MATCH(self, end, substring))
return 1;
} else {
if (Py_UNICODE_MATCH(self, start, substring))
return 1;
}
return 0;
}
Py_ssize_t PyUnicode_Tailmatch(PyObject *str,
PyObject *substr,
Py_ssize_t start,
Py_ssize_t end,
int direction)
{
Py_ssize_t result;
str = PyUnicode_FromObject(str);
if (str == NULL)
return -1;
substr = PyUnicode_FromObject(substr);
if (substr == NULL) {
Py_DECREF(str);
return -1;
}
result = tailmatch((PyUnicodeObject *)str,
(PyUnicodeObject *)substr,
start, end, direction);
Py_DECREF(str);
Py_DECREF(substr);
return result;
}
/* Apply fixfct filter to the Unicode object self and return a
reference to the modified object */
static
PyObject *fixup(PyUnicodeObject *self,
int (*fixfct)(PyUnicodeObject *s))
{
PyUnicodeObject *u;
u = (PyUnicodeObject*) PyUnicode_FromUnicode(NULL, self->length);
if (u == NULL)
return NULL;
Py_UNICODE_COPY(u->str, self->str, self->length);
if (!fixfct(u) && PyUnicode_CheckExact(self)) {
/* fixfct should return TRUE if it modified the buffer. If
FALSE, return a reference to the original buffer instead
(to save space, not time) */
Py_INCREF(self);
Py_DECREF(u);
return (PyObject*) self;
}
return (PyObject*) u;
}
static
int fixupper(PyUnicodeObject *self)
{
Py_ssize_t len = self->length;
Py_UNICODE *s = self->str;
int status = 0;
while (len-- > 0) {
register Py_UNICODE ch;
ch = Py_UNICODE_TOUPPER(*s);
if (ch != *s) {
status = 1;
*s = ch;
}
s++;
}
return status;
}
static
int fixlower(PyUnicodeObject *self)
{
Py_ssize_t len = self->length;
Py_UNICODE *s = self->str;
int status = 0;
while (len-- > 0) {
register Py_UNICODE ch;
ch = Py_UNICODE_TOLOWER(*s);
if (ch != *s) {
status = 1;
*s = ch;
}
s++;
}
return status;
}
static
int fixswapcase(PyUnicodeObject *self)
{
Py_ssize_t len = self->length;
Py_UNICODE *s = self->str;
int status = 0;
while (len-- > 0) {
if (Py_UNICODE_ISUPPER(*s)) {
*s = Py_UNICODE_TOLOWER(*s);
status = 1;
} else if (Py_UNICODE_ISLOWER(*s)) {
*s = Py_UNICODE_TOUPPER(*s);
status = 1;
}
s++;
}
return status;
}
static
int fixcapitalize(PyUnicodeObject *self)
{
Py_ssize_t len = self->length;
Py_UNICODE *s = self->str;
int status = 0;
if (len == 0)
return 0;
if (Py_UNICODE_ISLOWER(*s)) {
*s = Py_UNICODE_TOUPPER(*s);
status = 1;
}
s++;
while (--len > 0) {
if (Py_UNICODE_ISUPPER(*s)) {
*s = Py_UNICODE_TOLOWER(*s);
status = 1;
}
s++;
}
return status;
}
static
int fixtitle(PyUnicodeObject *self)
{
register Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register Py_UNICODE *e;
int previous_is_cased;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1) {
Py_UNICODE ch = Py_UNICODE_TOTITLE(*p);
if (*p != ch) {
*p = ch;
return 1;
}
else
return 0;
}
e = p + PyUnicode_GET_SIZE(self);
previous_is_cased = 0;
for (; p < e; p++) {
register const Py_UNICODE ch = *p;
if (previous_is_cased)
*p = Py_UNICODE_TOLOWER(ch);
else
*p = Py_UNICODE_TOTITLE(ch);
if (Py_UNICODE_ISLOWER(ch) ||
Py_UNICODE_ISUPPER(ch) ||
Py_UNICODE_ISTITLE(ch))
previous_is_cased = 1;
else
previous_is_cased = 0;
}
return 1;
}
PyObject *
PyUnicode_Join(PyObject *separator, PyObject *seq)
{
PyObject *internal_separator = NULL;
const Py_UNICODE blank = ' ';
const Py_UNICODE *sep = &blank;
Py_ssize_t seplen = 1;
PyUnicodeObject *res = NULL; /* the result */
Py_ssize_t res_alloc = 100; /* # allocated bytes for string in res */
Py_ssize_t res_used; /* # used bytes */
Py_UNICODE *res_p; /* pointer to free byte in res's string area */
PyObject *fseq; /* PySequence_Fast(seq) */
Py_ssize_t seqlen; /* len(fseq) -- number of items in sequence */
PyObject *item;
Py_ssize_t i;
fseq = PySequence_Fast(seq, "");
if (fseq == NULL) {
return NULL;
}
/* Grrrr. A codec may be invoked to convert str objects to
* Unicode, and so it's possible to call back into Python code
* during PyUnicode_FromObject(), and so it's possible for a sick
* codec to change the size of fseq (if seq is a list). Therefore
* we have to keep refetching the size -- can't assume seqlen
* is invariant.
*/
seqlen = PySequence_Fast_GET_SIZE(fseq);
/* If empty sequence, return u"". */
if (seqlen == 0) {
res = _PyUnicode_New(0); /* empty sequence; return u"" */
goto Done;
}
/* If singleton sequence with an exact Unicode, return that. */
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(fseq, 0);
if (PyUnicode_CheckExact(item)) {
Py_INCREF(item);
res = (PyUnicodeObject *)item;
goto Done;
}
}
/* At least two items to join, or one that isn't exact Unicode. */
if (seqlen > 1) {
/* Set up sep and seplen -- they're needed. */
if (separator == NULL) {
sep = &blank;
seplen = 1;
}
else {
internal_separator = PyUnicode_FromObject(separator);
if (internal_separator == NULL)
goto onError;
sep = PyUnicode_AS_UNICODE(internal_separator);
seplen = PyUnicode_GET_SIZE(internal_separator);
/* In case PyUnicode_FromObject() mutated seq. */
seqlen = PySequence_Fast_GET_SIZE(fseq);
}
}
/* Get space. */
res = _PyUnicode_New(res_alloc);
if (res == NULL)
goto onError;
res_p = PyUnicode_AS_UNICODE(res);
res_used = 0;
for (i = 0; i < seqlen; ++i) {
Py_ssize_t itemlen;
Py_ssize_t new_res_used;
item = PySequence_Fast_GET_ITEM(fseq, i);
/* Convert item to Unicode. */
if (! PyUnicode_Check(item) && ! PyString_Check(item)) {
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string or Unicode,"
" %.80s found",
i, item->ob_type->tp_name);
goto onError;
}
item = PyUnicode_FromObject(item);
if (item == NULL)
goto onError;
/* We own a reference to item from here on. */
/* In case PyUnicode_FromObject() mutated seq. */
seqlen = PySequence_Fast_GET_SIZE(fseq);
/* Make sure we have enough space for the separator and the item. */
itemlen = PyUnicode_GET_SIZE(item);
new_res_used = res_used + itemlen;
if (new_res_used < 0)
goto Overflow;
if (i < seqlen - 1) {
new_res_used += seplen;
if (new_res_used < 0)
goto Overflow;
}
if (new_res_used > res_alloc) {
/* double allocated size until it's big enough */
do {
res_alloc += res_alloc;
if (res_alloc <= 0)
goto Overflow;
} while (new_res_used > res_alloc);
if (_PyUnicode_Resize(&res, res_alloc) < 0) {
Py_DECREF(item);
goto onError;
}
res_p = PyUnicode_AS_UNICODE(res) + res_used;
}
/* Copy item, and maybe the separator. */
Py_UNICODE_COPY(res_p, PyUnicode_AS_UNICODE(item), itemlen);
res_p += itemlen;
if (i < seqlen - 1) {
Py_UNICODE_COPY(res_p, sep, seplen);
res_p += seplen;
}
Py_DECREF(item);
res_used = new_res_used;
}
/* Shrink res to match the used area; this probably can't fail,
* but it's cheap to check.
*/
if (_PyUnicode_Resize(&res, res_used) < 0)
goto onError;
Done:
Py_XDECREF(internal_separator);
Py_DECREF(fseq);
return (PyObject *)res;
Overflow:
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(item);
/* fall through */
onError:
Py_XDECREF(internal_separator);
Py_DECREF(fseq);
Py_XDECREF(res);
return NULL;
}
static
PyUnicodeObject *pad(PyUnicodeObject *self,
Py_ssize_t left,
Py_ssize_t right,
Py_UNICODE fill)
{
PyUnicodeObject *u;
if (left < 0)
left = 0;
if (right < 0)
right = 0;
if (left == 0 && right == 0 && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return self;
}
u = _PyUnicode_New(left + self->length + right);
if (u) {
if (left)
Py_UNICODE_FILL(u->str, fill, left);
Py_UNICODE_COPY(u->str + left, self->str, self->length);
if (right)
Py_UNICODE_FILL(u->str + left + self->length, fill, right);
}
return u;
}
#define SPLIT_APPEND(data, left, right) \
str = PyUnicode_FromUnicode((data) + (left), (right) - (left)); \
if (!str) \
goto onError; \
if (PyList_Append(list, str)) { \
Py_DECREF(str); \
goto onError; \
} \
else \
Py_DECREF(str);
static
PyObject *split_whitespace(PyUnicodeObject *self,
PyObject *list,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
PyObject *str;
for (i = j = 0; i < len; ) {
/* find a token */
while (i < len && Py_UNICODE_ISSPACE(self->str[i]))
i++;
j = i;
while (i < len && !Py_UNICODE_ISSPACE(self->str[i]))
i++;
if (j < i) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, j, i);
while (i < len && Py_UNICODE_ISSPACE(self->str[i]))
i++;
j = i;
}
}
if (j < len) {
SPLIT_APPEND(self->str, j, len);
}
return list;
onError:
Py_DECREF(list);
return NULL;
}
PyObject *PyUnicode_Splitlines(PyObject *string,
int keepends)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len;
PyObject *list;
PyObject *str;
Py_UNICODE *data;
string = PyUnicode_FromObject(string);
if (string == NULL)
return NULL;
data = PyUnicode_AS_UNICODE(string);
len = PyUnicode_GET_SIZE(string);
list = PyList_New(0);
if (!list)
goto onError;
for (i = j = 0; i < len; ) {
Py_ssize_t eol;
/* Find a line and append it */
while (i < len && !BLOOM_LINEBREAK(data[i]))
i++;
/* Skip the line break reading CRLF as one line break */
eol = i;
if (i < len) {
if (data[i] == '\r' && i + 1 < len &&
data[i+1] == '\n')
i += 2;
else
i++;
if (keepends)
eol = i;
}
SPLIT_APPEND(data, j, eol);
j = i;
}
if (j < len) {
SPLIT_APPEND(data, j, len);
}
Py_DECREF(string);
return list;
onError:
Py_XDECREF(list);
Py_DECREF(string);
return NULL;
}
static
PyObject *split_char(PyUnicodeObject *self,
PyObject *list,
Py_UNICODE ch,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
PyObject *str;
for (i = j = 0; i < len; ) {
if (self->str[i] == ch) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, j, i);
i = j = i + 1;
} else
i++;
}
if (j <= len) {
SPLIT_APPEND(self->str, j, len);
}
return list;
onError:
Py_DECREF(list);
return NULL;
}
static
PyObject *split_substring(PyUnicodeObject *self,
PyObject *list,
PyUnicodeObject *substring,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
Py_ssize_t sublen = substring->length;
PyObject *str;
for (i = j = 0; i <= len - sublen; ) {
if (Py_UNICODE_MATCH(self, i, substring)) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, j, i);
i = j = i + sublen;
} else
i++;
}
if (j <= len) {
SPLIT_APPEND(self->str, j, len);
}
return list;
onError:
Py_DECREF(list);
return NULL;
}
static
PyObject *rsplit_whitespace(PyUnicodeObject *self,
PyObject *list,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
PyObject *str;
for (i = j = len - 1; i >= 0; ) {
/* find a token */
while (i >= 0 && Py_UNICODE_ISSPACE(self->str[i]))
i--;
j = i;
while (i >= 0 && !Py_UNICODE_ISSPACE(self->str[i]))
i--;
if (j > i) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, i + 1, j + 1);
while (i >= 0 && Py_UNICODE_ISSPACE(self->str[i]))
i--;
j = i;
}
}
if (j >= 0) {
SPLIT_APPEND(self->str, 0, j + 1);
}
if (PyList_Reverse(list) < 0)
goto onError;
return list;
onError:
Py_DECREF(list);
return NULL;
}
static
PyObject *rsplit_char(PyUnicodeObject *self,
PyObject *list,
Py_UNICODE ch,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
PyObject *str;
for (i = j = len - 1; i >= 0; ) {
if (self->str[i] == ch) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, i + 1, j + 1);
j = i = i - 1;
} else
i--;
}
if (j >= -1) {
SPLIT_APPEND(self->str, 0, j + 1);
}
if (PyList_Reverse(list) < 0)
goto onError;
return list;
onError:
Py_DECREF(list);
return NULL;
}
static
PyObject *rsplit_substring(PyUnicodeObject *self,
PyObject *list,
PyUnicodeObject *substring,
Py_ssize_t maxcount)
{
register Py_ssize_t i;
register Py_ssize_t j;
Py_ssize_t len = self->length;
Py_ssize_t sublen = substring->length;
PyObject *str;
for (i = len - sublen, j = len; i >= 0; ) {
if (Py_UNICODE_MATCH(self, i, substring)) {
if (maxcount-- <= 0)
break;
SPLIT_APPEND(self->str, i + sublen, j);
j = i;
i -= sublen;
} else
i--;
}
if (j >= 0) {
SPLIT_APPEND(self->str, 0, j);
}
if (PyList_Reverse(list) < 0)
goto onError;
return list;
onError:
Py_DECREF(list);
return NULL;
}
#undef SPLIT_APPEND
static
PyObject *split(PyUnicodeObject *self,
PyUnicodeObject *substring,
Py_ssize_t maxcount)
{
PyObject *list;
if (maxcount < 0)
maxcount = PY_SSIZE_T_MAX;
list = PyList_New(0);
if (!list)
return NULL;
if (substring == NULL)
return split_whitespace(self,list,maxcount);
else if (substring->length == 1)
return split_char(self,list,substring->str[0],maxcount);
else if (substring->length == 0) {
Py_DECREF(list);
PyErr_SetString(PyExc_ValueError, "empty separator");
return NULL;
}
else
return split_substring(self,list,substring,maxcount);
}
static
PyObject *rsplit(PyUnicodeObject *self,
PyUnicodeObject *substring,
Py_ssize_t maxcount)
{
PyObject *list;
if (maxcount < 0)
maxcount = PY_SSIZE_T_MAX;
list = PyList_New(0);
if (!list)
return NULL;
if (substring == NULL)
return rsplit_whitespace(self,list,maxcount);
else if (substring->length == 1)
return rsplit_char(self,list,substring->str[0],maxcount);
else if (substring->length == 0) {
Py_DECREF(list);
PyErr_SetString(PyExc_ValueError, "empty separator");
return NULL;
}
else
return rsplit_substring(self,list,substring,maxcount);
}
static
PyObject *replace(PyUnicodeObject *self,
PyUnicodeObject *str1,
PyUnicodeObject *str2,
Py_ssize_t maxcount)
{
PyUnicodeObject *u;
if (maxcount < 0)
maxcount = PY_SSIZE_T_MAX;
if (str1->length == str2->length) {
/* same length */
Py_ssize_t i;
if (str1->length == 1) {
/* replace characters */
Py_UNICODE u1, u2;
if (!findchar(self->str, self->length, str1->str[0]))
goto nothing;
u = (PyUnicodeObject*) PyUnicode_FromUnicode(NULL, self->length);
if (!u)
return NULL;
Py_UNICODE_COPY(u->str, self->str, self->length);
u1 = str1->str[0];
u2 = str2->str[0];
for (i = 0; i < u->length; i++)
if (u->str[i] == u1) {
if (--maxcount < 0)
break;
u->str[i] = u2;
}
} else {
i = fastsearch(
self->str, self->length, str1->str, str1->length, FAST_SEARCH
);
if (i < 0)
goto nothing;
u = (PyUnicodeObject*) PyUnicode_FromUnicode(NULL, self->length);
if (!u)
return NULL;
Py_UNICODE_COPY(u->str, self->str, self->length);
while (i <= self->length - str1->length)
if (Py_UNICODE_MATCH(self, i, str1)) {
if (--maxcount < 0)
break;
Py_UNICODE_COPY(u->str+i, str2->str, str2->length);
i += str1->length;
} else
i++;
}
} else {
Py_ssize_t n, i, j, e;
Py_ssize_t product, new_size, delta;
Py_UNICODE *p;
/* replace strings */
n = stringlib_count(self->str, self->length, str1->str, str1->length);
if (n > maxcount)
n = maxcount;
if (n == 0)
goto nothing;
/* new_size = self->length + n * (str2->length - str1->length)); */
delta = (str2->length - str1->length);
if (delta == 0) {
new_size = self->length;
} else {
product = n * (str2->length - str1->length);
if ((product / (str2->length - str1->length)) != n) {
PyErr_SetString(PyExc_OverflowError,
"replace string is too long");
return NULL;
}
new_size = self->length + product;
if (new_size < 0) {
PyErr_SetString(PyExc_OverflowError,
"replace string is too long");
return NULL;
}
}
u = _PyUnicode_New(new_size);
if (!u)
return NULL;
i = 0;
p = u->str;
e = self->length - str1->length;
if (str1->length > 0) {
while (n-- > 0) {
/* look for next match */
j = i;
while (j <= e) {
if (Py_UNICODE_MATCH(self, j, str1))
break;
j++;
}
if (j > i) {
if (j > e)
break;
/* copy unchanged part [i:j] */
Py_UNICODE_COPY(p, self->str+i, j-i);
p += j - i;
}
/* copy substitution string */
if (str2->length > 0) {
Py_UNICODE_COPY(p, str2->str, str2->length);
p += str2->length;
}
i = j + str1->length;
}
if (i < self->length)
/* copy tail [i:] */
Py_UNICODE_COPY(p, self->str+i, self->length-i);
} else {
/* interleave */
while (n > 0) {
Py_UNICODE_COPY(p, str2->str, str2->length);
p += str2->length;
if (--n <= 0)
break;
*p++ = self->str[i++];
}
Py_UNICODE_COPY(p, self->str+i, self->length-i);
}
}
return (PyObject *) u;
nothing:
/* nothing to replace; return original string (when possible) */
if (PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject *) self;
}
return PyUnicode_FromUnicode(self->str, self->length);
}
/* --- Unicode Object Methods --------------------------------------------- */
PyDoc_STRVAR(title__doc__,
"S.title() -> unicode\n\
\n\
Return a titlecased version of S, i.e. words start with title case\n\
characters, all remaining cased characters have lower case.");
static PyObject*
unicode_title(PyUnicodeObject *self)
{
return fixup(self, fixtitle);
}
PyDoc_STRVAR(capitalize__doc__,
"S.capitalize() -> unicode\n\
\n\
Return a capitalized version of S, i.e. make the first character\n\
have upper case.");
static PyObject*
unicode_capitalize(PyUnicodeObject *self)
{
return fixup(self, fixcapitalize);
}
#if 0
PyDoc_STRVAR(capwords__doc__,
"S.capwords() -> unicode\n\
\n\
Apply .capitalize() to all words in S and return the result with\n\
normalized whitespace (all whitespace strings are replaced by ' ').");
static PyObject*
unicode_capwords(PyUnicodeObject *self)
{
PyObject *list;
PyObject *item;
Py_ssize_t i;
/* Split into words */
list = split(self, NULL, -1);
if (!list)
return NULL;
/* Capitalize each word */
for (i = 0; i < PyList_GET_SIZE(list); i++) {
item = fixup((PyUnicodeObject *)PyList_GET_ITEM(list, i),
fixcapitalize);
if (item == NULL)
goto onError;
Py_DECREF(PyList_GET_ITEM(list, i));
PyList_SET_ITEM(list, i, item);
}
/* Join the words to form a new string */
item = PyUnicode_Join(NULL, list);
onError:
Py_DECREF(list);
return (PyObject *)item;
}
#endif
/* Argument converter. Coerces to a single unicode character */
static int
convert_uc(PyObject *obj, void *addr)
{
Py_UNICODE *fillcharloc = (Py_UNICODE *)addr;
PyObject *uniobj;
Py_UNICODE *unistr;
uniobj = PyUnicode_FromObject(obj);
if (uniobj == NULL) {
PyErr_SetString(PyExc_TypeError,
"The fill character cannot be converted to Unicode");
return 0;
}
if (PyUnicode_GET_SIZE(uniobj) != 1) {
PyErr_SetString(PyExc_TypeError,
"The fill character must be exactly one character long");
Py_DECREF(uniobj);
return 0;
}
unistr = PyUnicode_AS_UNICODE(uniobj);
*fillcharloc = unistr[0];
Py_DECREF(uniobj);
return 1;
}
PyDoc_STRVAR(center__doc__,
"S.center(width[, fillchar]) -> unicode\n\
\n\
Return S centered in a Unicode string of length width. Padding is\n\
done using the specified fill character (default is a space)");
static PyObject *
unicode_center(PyUnicodeObject *self, PyObject *args)
{
Py_ssize_t marg, left;
Py_ssize_t width;
Py_UNICODE fillchar = ' ';
if (!PyArg_ParseTuple(args, "n|O&:center", &width, convert_uc, &fillchar))
return NULL;
if (self->length >= width && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*) self;
}
marg = width - self->length;
left = marg / 2 + (marg & width & 1);
return (PyObject*) pad(self, left, marg - left, fillchar);
}
#if 0
/* This code should go into some future Unicode collation support
module. The basic comparison should compare ordinals on a naive
basis (this is what Java does and thus JPython too). */
/* speedy UTF-16 code point order comparison */
/* gleaned from: */
/* http://www-4.ibm.com/software/developer/library/utf16.html?dwzone=unicode */
static short utf16Fixup[32] =
{
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0x2000, -0x800, -0x800, -0x800, -0x800
};
static int
unicode_compare(PyUnicodeObject *str1, PyUnicodeObject *str2)
{
Py_ssize_t len1, len2;
Py_UNICODE *s1 = str1->str;
Py_UNICODE *s2 = str2->str;
len1 = str1->length;
len2 = str2->length;
while (len1 > 0 && len2 > 0) {
Py_UNICODE c1, c2;
c1 = *s1++;
c2 = *s2++;
if (c1 > (1<<11) * 26)
c1 += utf16Fixup[c1>>11];
if (c2 > (1<<11) * 26)
c2 += utf16Fixup[c2>>11];
/* now c1 and c2 are in UTF-32-compatible order */
if (c1 != c2)
return (c1 < c2) ? -1 : 1;
len1--; len2--;
}
return (len1 < len2) ? -1 : (len1 != len2);
}
#else
static int
unicode_compare(PyUnicodeObject *str1, PyUnicodeObject *str2)
{
register Py_ssize_t len1, len2;
Py_UNICODE *s1 = str1->str;
Py_UNICODE *s2 = str2->str;
len1 = str1->length;
len2 = str2->length;
while (len1 > 0 && len2 > 0) {
Py_UNICODE c1, c2;
c1 = *s1++;
c2 = *s2++;
if (c1 != c2)
return (c1 < c2) ? -1 : 1;
len1--; len2--;
}
return (len1 < len2) ? -1 : (len1 != len2);
}
#endif
int PyUnicode_Compare(PyObject *left,
PyObject *right)
{
PyUnicodeObject *u = NULL, *v = NULL;
int result;
/* Coerce the two arguments */
u = (PyUnicodeObject *)PyUnicode_FromObject(left);
if (u == NULL)
goto onError;
v = (PyUnicodeObject *)PyUnicode_FromObject(right);
if (v == NULL)
goto onError;
/* Shortcut for empty or interned objects */
if (v == u) {
Py_DECREF(u);
Py_DECREF(v);
return 0;
}
result = unicode_compare(u, v);
Py_DECREF(u);
Py_DECREF(v);
return result;
onError:
Py_XDECREF(u);
Py_XDECREF(v);
return -1;
}
int PyUnicode_Contains(PyObject *container,
PyObject *element)
{
PyObject *str, *sub;
int result;
/* Coerce the two arguments */
sub = PyUnicode_FromObject(element);
if (!sub) {
PyErr_SetString(PyExc_TypeError,
"'in <string>' requires string as left operand");
return -1;
}
str = PyUnicode_FromObject(container);
if (!str) {
Py_DECREF(sub);
return -1;
}
result = stringlib_contains_obj(str, sub);
Py_DECREF(str);
Py_DECREF(sub);
return result;
}
/* Concat to string or Unicode object giving a new Unicode object. */
PyObject *PyUnicode_Concat(PyObject *left,
PyObject *right)
{
PyUnicodeObject *u = NULL, *v = NULL, *w;
/* Coerce the two arguments */
u = (PyUnicodeObject *)PyUnicode_FromObject(left);
if (u == NULL)
goto onError;
v = (PyUnicodeObject *)PyUnicode_FromObject(right);
if (v == NULL)
goto onError;
/* Shortcuts */
if (v == unicode_empty) {
Py_DECREF(v);
return (PyObject *)u;
}
if (u == unicode_empty) {
Py_DECREF(u);
return (PyObject *)v;
}
/* Concat the two Unicode strings */
w = _PyUnicode_New(u->length + v->length);
if (w == NULL)
goto onError;
Py_UNICODE_COPY(w->str, u->str, u->length);
Py_UNICODE_COPY(w->str + u->length, v->str, v->length);
Py_DECREF(u);
Py_DECREF(v);
return (PyObject *)w;
onError:
Py_XDECREF(u);
Py_XDECREF(v);
return NULL;
}
PyDoc_STRVAR(count__doc__,
"S.count(sub[, start[, end]]) -> int\n\
\n\
Return the number of non-overlapping occurrences of substring sub in\n\
Unicode string S[start:end]. Optional arguments start and end are\n\
interpreted as in slice notation.");
static PyObject *
unicode_count(PyUnicodeObject *self, PyObject *args)
{
PyUnicodeObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
PyObject *result;
if (!PyArg_ParseTuple(args, "O|O&O&:count", &substring,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
substring = (PyUnicodeObject *)PyUnicode_FromObject(
(PyObject *)substring);
if (substring == NULL)
return NULL;
FIX_START_END(self);
result = PyInt_FromSsize_t(
stringlib_count(self->str + start, end - start,
substring->str, substring->length)
);
Py_DECREF(substring);
return result;
}
PyDoc_STRVAR(encode__doc__,
"S.encode([encoding[,errors]]) -> string or unicode\n\
\n\
Encodes S using the codec registered for encoding. encoding defaults\n\
to the default encoding. errors may be given to set a different error\n\
handling scheme. Default is 'strict' meaning that encoding errors raise\n\
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and\n\
'xmlcharrefreplace' as well as any other name registered with\n\
codecs.register_error that can handle UnicodeEncodeErrors.");
static PyObject *
unicode_encode(PyUnicodeObject *self, PyObject *args)
{
char *encoding = NULL;
char *errors = NULL;
PyObject *v;
if (!PyArg_ParseTuple(args, "|ss:encode", &encoding, &errors))
return NULL;
v = PyUnicode_AsEncodedObject((PyObject *)self, encoding, errors);
if (v == NULL)
goto onError;
if (!PyString_Check(v) && !PyUnicode_Check(v)) {
PyErr_Format(PyExc_TypeError,
"encoder did not return a string/unicode object "
"(type=%.400s)",
v->ob_type->tp_name);
Py_DECREF(v);
return NULL;
}
return v;
onError:
return NULL;
}
PyDoc_STRVAR(decode__doc__,
"S.decode([encoding[,errors]]) -> string or unicode\n\
\n\
Decodes S using the codec registered for encoding. encoding defaults\n\
to the default encoding. errors may be given to set a different error\n\
handling scheme. Default is 'strict' meaning that encoding errors raise\n\
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'\n\
as well as any other name registerd with codecs.register_error that is\n\
able to handle UnicodeDecodeErrors.");
static PyObject *
unicode_decode(PyUnicodeObject *self, PyObject *args)
{
char *encoding = NULL;
char *errors = NULL;
PyObject *v;
if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors))
return NULL;
v = PyUnicode_AsDecodedObject((PyObject *)self, encoding, errors);
if (v == NULL)
goto onError;
if (!PyString_Check(v) && !PyUnicode_Check(v)) {
PyErr_Format(PyExc_TypeError,
"decoder did not return a string/unicode object "
"(type=%.400s)",
v->ob_type->tp_name);
Py_DECREF(v);
return NULL;
}
return v;
onError:
return NULL;
}
PyDoc_STRVAR(expandtabs__doc__,
"S.expandtabs([tabsize]) -> unicode\n\
\n\
Return a copy of S where all tab characters are expanded using spaces.\n\
If tabsize is not given, a tab size of 8 characters is assumed.");
static PyObject*
unicode_expandtabs(PyUnicodeObject *self, PyObject *args)
{
Py_UNICODE *e;
Py_UNICODE *p;
Py_UNICODE *q;
Py_ssize_t i, j;
PyUnicodeObject *u;
int tabsize = 8;
if (!PyArg_ParseTuple(args, "|i:expandtabs", &tabsize))
return NULL;
/* First pass: determine size of output string */
i = j = 0;
e = self->str + self->length;
for (p = self->str; p < e; p++)
if (*p == '\t') {
if (tabsize > 0)
j += tabsize - (j % tabsize);
}
else {
j++;
if (*p == '\n' || *p == '\r') {
i += j;
j = 0;
}
}
/* Second pass: create output string and fill it */
u = _PyUnicode_New(i + j);
if (!u)
return NULL;
j = 0;
q = u->str;
for (p = self->str; p < e; p++)
if (*p == '\t') {
if (tabsize > 0) {
i = tabsize - (j % tabsize);
j += i;
while (i--)
*q++ = ' ';
}
}
else {
j++;
*q++ = *p;
if (*p == '\n' || *p == '\r')
j = 0;
}
return (PyObject*) u;
}
PyDoc_STRVAR(find__doc__,
"S.find(sub [,start [,end]]) -> int\n\
\n\
Return the lowest index in S where substring sub is found,\n\
such that sub is contained within s[start,end]. Optional\n\
arguments start and end are interpreted as in slice notation.\n\
\n\
Return -1 on failure.");
static PyObject *
unicode_find(PyUnicodeObject *self, PyObject *args)
{
PyObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
Py_ssize_t result;
if (!PyArg_ParseTuple(args, "O|O&O&:find", &substring,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
substring = PyUnicode_FromObject(substring);
if (!substring)
return NULL;
result = stringlib_find_slice(
PyUnicode_AS_UNICODE(self), PyUnicode_GET_SIZE(self),
PyUnicode_AS_UNICODE(substring), PyUnicode_GET_SIZE(substring),
start, end
);
Py_DECREF(substring);
return PyInt_FromSsize_t(result);
}
static PyObject *
unicode_getitem(PyUnicodeObject *self, Py_ssize_t index)
{
if (index < 0 || index >= self->length) {
PyErr_SetString(PyExc_IndexError, "string index out of range");
return NULL;
}
return (PyObject*) PyUnicode_FromUnicode(&self->str[index], 1);
}
static long
unicode_hash(PyUnicodeObject *self)
{
/* Since Unicode objects compare equal to their ASCII string
counterparts, they should use the individual character values
as basis for their hash value. This is needed to assure that
strings and Unicode objects behave in the same way as
dictionary keys. */
register Py_ssize_t len;
register Py_UNICODE *p;
register long x;
if (self->hash != -1)
return self->hash;
len = PyUnicode_GET_SIZE(self);
p = PyUnicode_AS_UNICODE(self);
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= PyUnicode_GET_SIZE(self);
if (x == -1)
x = -2;
self->hash = x;
return x;
}
PyDoc_STRVAR(index__doc__,
"S.index(sub [,start [,end]]) -> int\n\
\n\
Like S.find() but raise ValueError when the substring is not found.");
static PyObject *
unicode_index(PyUnicodeObject *self, PyObject *args)
{
Py_ssize_t result;
PyObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
if (!PyArg_ParseTuple(args, "O|O&O&:index", &substring,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
substring = PyUnicode_FromObject(substring);
if (!substring)
return NULL;
result = stringlib_find_slice(
PyUnicode_AS_UNICODE(self), PyUnicode_GET_SIZE(self),
PyUnicode_AS_UNICODE(substring), PyUnicode_GET_SIZE(substring),
start, end
);
Py_DECREF(substring);
if (result < 0) {
PyErr_SetString(PyExc_ValueError, "substring not found");
return NULL;
}
return PyInt_FromSsize_t(result);
}
PyDoc_STRVAR(islower__doc__,
"S.islower() -> bool\n\
\n\
Return True if all cased characters in S are lowercase and there is\n\
at least one cased character in S, False otherwise.");
static PyObject*
unicode_islower(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
int cased;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1)
return PyBool_FromLong(Py_UNICODE_ISLOWER(*p));
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
cased = 0;
for (; p < e; p++) {
register const Py_UNICODE ch = *p;
if (Py_UNICODE_ISUPPER(ch) || Py_UNICODE_ISTITLE(ch))
return PyBool_FromLong(0);
else if (!cased && Py_UNICODE_ISLOWER(ch))
cased = 1;
}
return PyBool_FromLong(cased);
}
PyDoc_STRVAR(isupper__doc__,
"S.isupper() -> bool\n\
\n\
Return True if all cased characters in S are uppercase and there is\n\
at least one cased character in S, False otherwise.");
static PyObject*
unicode_isupper(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
int cased;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1)
return PyBool_FromLong(Py_UNICODE_ISUPPER(*p) != 0);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
cased = 0;
for (; p < e; p++) {
register const Py_UNICODE ch = *p;
if (Py_UNICODE_ISLOWER(ch) || Py_UNICODE_ISTITLE(ch))
return PyBool_FromLong(0);
else if (!cased && Py_UNICODE_ISUPPER(ch))
cased = 1;
}
return PyBool_FromLong(cased);
}
PyDoc_STRVAR(istitle__doc__,
"S.istitle() -> bool\n\
\n\
Return True if S is a titlecased string and there is at least one\n\
character in S, i.e. upper- and titlecase characters may only\n\
follow uncased characters and lowercase characters only cased ones.\n\
Return False otherwise.");
static PyObject*
unicode_istitle(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
int cased, previous_is_cased;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1)
return PyBool_FromLong((Py_UNICODE_ISTITLE(*p) != 0) ||
(Py_UNICODE_ISUPPER(*p) != 0));
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
cased = 0;
previous_is_cased = 0;
for (; p < e; p++) {
register const Py_UNICODE ch = *p;
if (Py_UNICODE_ISUPPER(ch) || Py_UNICODE_ISTITLE(ch)) {
if (previous_is_cased)
return PyBool_FromLong(0);
previous_is_cased = 1;
cased = 1;
}
else if (Py_UNICODE_ISLOWER(ch)) {
if (!previous_is_cased)
return PyBool_FromLong(0);
previous_is_cased = 1;
cased = 1;
}
else
previous_is_cased = 0;
}
return PyBool_FromLong(cased);
}
PyDoc_STRVAR(isspace__doc__,
"S.isspace() -> bool\n\
\n\
Return True if all characters in S are whitespace\n\
and there is at least one character in S, False otherwise.");
static PyObject*
unicode_isspace(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISSPACE(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISSPACE(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(isalpha__doc__,
"S.isalpha() -> bool\n\
\n\
Return True if all characters in S are alphabetic\n\
and there is at least one character in S, False otherwise.");
static PyObject*
unicode_isalpha(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISALPHA(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISALPHA(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(isalnum__doc__,
"S.isalnum() -> bool\n\
\n\
Return True if all characters in S are alphanumeric\n\
and there is at least one character in S, False otherwise.");
static PyObject*
unicode_isalnum(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISALNUM(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISALNUM(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(isdecimal__doc__,
"S.isdecimal() -> bool\n\
\n\
Return True if there are only decimal characters in S,\n\
False otherwise.");
static PyObject*
unicode_isdecimal(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISDECIMAL(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISDECIMAL(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(isdigit__doc__,
"S.isdigit() -> bool\n\
\n\
Return True if all characters in S are digits\n\
and there is at least one character in S, False otherwise.");
static PyObject*
unicode_isdigit(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISDIGIT(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISDIGIT(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(isnumeric__doc__,
"S.isnumeric() -> bool\n\
\n\
Return True if there are only numeric characters in S,\n\
False otherwise.");
static PyObject*
unicode_isnumeric(PyUnicodeObject *self)
{
register const Py_UNICODE *p = PyUnicode_AS_UNICODE(self);
register const Py_UNICODE *e;
/* Shortcut for single character strings */
if (PyUnicode_GET_SIZE(self) == 1 &&
Py_UNICODE_ISNUMERIC(*p))
return PyBool_FromLong(1);
/* Special case for empty strings */
if (PyUnicode_GET_SIZE(self) == 0)
return PyBool_FromLong(0);
e = p + PyUnicode_GET_SIZE(self);
for (; p < e; p++) {
if (!Py_UNICODE_ISNUMERIC(*p))
return PyBool_FromLong(0);
}
return PyBool_FromLong(1);
}
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> unicode\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject*
unicode_join(PyObject *self, PyObject *data)
{
return PyUnicode_Join(self, data);
}
static Py_ssize_t
unicode_length(PyUnicodeObject *self)
{
return self->length;
}
PyDoc_STRVAR(ljust__doc__,
"S.ljust(width[, fillchar]) -> int\n\
\n\
Return S left justified in a Unicode string of length width. Padding is\n\
done using the specified fill character (default is a space).");
static PyObject *
unicode_ljust(PyUnicodeObject *self, PyObject *args)
{
Py_ssize_t width;
Py_UNICODE fillchar = ' ';
if (!PyArg_ParseTuple(args, "n|O&:ljust", &width, convert_uc, &fillchar))
return NULL;
if (self->length >= width && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*) self;
}
return (PyObject*) pad(self, 0, width - self->length, fillchar);
}
PyDoc_STRVAR(lower__doc__,
"S.lower() -> unicode\n\
\n\
Return a copy of the string S converted to lowercase.");
static PyObject*
unicode_lower(PyUnicodeObject *self)
{
return fixup(self, fixlower);
}
#define LEFTSTRIP 0
#define RIGHTSTRIP 1
#define BOTHSTRIP 2
/* Arrays indexed by above */
static const char *stripformat[] = {"|O:lstrip", "|O:rstrip", "|O:strip"};
#define STRIPNAME(i) (stripformat[i]+3)
/* externally visible for str.strip(unicode) */
PyObject *
_PyUnicode_XStrip(PyUnicodeObject *self, int striptype, PyObject *sepobj)
{
Py_UNICODE *s = PyUnicode_AS_UNICODE(self);
Py_ssize_t len = PyUnicode_GET_SIZE(self);
Py_UNICODE *sep = PyUnicode_AS_UNICODE(sepobj);
Py_ssize_t seplen = PyUnicode_GET_SIZE(sepobj);
Py_ssize_t i, j;
BLOOM_MASK sepmask = make_bloom_mask(sep, seplen);
i = 0;
if (striptype != RIGHTSTRIP) {
while (i < len && BLOOM_MEMBER(sepmask, s[i], sep, seplen)) {
i++;
}
}
j = len;
if (striptype != LEFTSTRIP) {
do {
j--;
} while (j >= i && BLOOM_MEMBER(sepmask, s[j], sep, seplen));
j++;
}
if (i == 0 && j == len && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*)self;
}
else
return PyUnicode_FromUnicode(s+i, j-i);
}
static PyObject *
do_strip(PyUnicodeObject *self, int striptype)
{
Py_UNICODE *s = PyUnicode_AS_UNICODE(self);
Py_ssize_t len = PyUnicode_GET_SIZE(self), i, j;
i = 0;
if (striptype != RIGHTSTRIP) {
while (i < len && Py_UNICODE_ISSPACE(s[i])) {
i++;
}
}
j = len;
if (striptype != LEFTSTRIP) {
do {
j--;
} while (j >= i && Py_UNICODE_ISSPACE(s[j]));
j++;
}
if (i == 0 && j == len && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*)self;
}
else
return PyUnicode_FromUnicode(s+i, j-i);
}
static PyObject *
do_argstrip(PyUnicodeObject *self, int striptype, PyObject *args)
{
PyObject *sep = NULL;
if (!PyArg_ParseTuple(args, (char *)stripformat[striptype], &sep))
return NULL;
if (sep != NULL && sep != Py_None) {
if (PyUnicode_Check(sep))
return _PyUnicode_XStrip(self, striptype, sep);
else if (PyString_Check(sep)) {
PyObject *res;
sep = PyUnicode_FromObject(sep);
if (sep==NULL)
return NULL;
res = _PyUnicode_XStrip(self, striptype, sep);
Py_DECREF(sep);
return res;
}
else {
PyErr_Format(PyExc_TypeError,
"%s arg must be None, unicode or str",
STRIPNAME(striptype));
return NULL;
}
}
return do_strip(self, striptype);
}
PyDoc_STRVAR(strip__doc__,
"S.strip([chars]) -> unicode\n\
\n\
Return a copy of the string S with leading and trailing\n\
whitespace removed.\n\
If chars is given and not None, remove characters in chars instead.\n\
If chars is a str, it will be converted to unicode before stripping");
static PyObject *
unicode_strip(PyUnicodeObject *self, PyObject *args)
{
if (PyTuple_GET_SIZE(args) == 0)
return do_strip(self, BOTHSTRIP); /* Common case */
else
return do_argstrip(self, BOTHSTRIP, args);
}
PyDoc_STRVAR(lstrip__doc__,
"S.lstrip([chars]) -> unicode\n\
\n\
Return a copy of the string S with leading whitespace removed.\n\
If chars is given and not None, remove characters in chars instead.\n\
If chars is a str, it will be converted to unicode before stripping");
static PyObject *
unicode_lstrip(PyUnicodeObject *self, PyObject *args)
{
if (PyTuple_GET_SIZE(args) == 0)
return do_strip(self, LEFTSTRIP); /* Common case */
else
return do_argstrip(self, LEFTSTRIP, args);
}
PyDoc_STRVAR(rstrip__doc__,
"S.rstrip([chars]) -> unicode\n\
\n\
Return a copy of the string S with trailing whitespace removed.\n\
If chars is given and not None, remove characters in chars instead.\n\
If chars is a str, it will be converted to unicode before stripping");
static PyObject *
unicode_rstrip(PyUnicodeObject *self, PyObject *args)
{
if (PyTuple_GET_SIZE(args) == 0)
return do_strip(self, RIGHTSTRIP); /* Common case */
else
return do_argstrip(self, RIGHTSTRIP, args);
}
static PyObject*
unicode_repeat(PyUnicodeObject *str, Py_ssize_t len)
{
PyUnicodeObject *u;
Py_UNICODE *p;
Py_ssize_t nchars;
size_t nbytes;
if (len < 0)
len = 0;
if (len == 1 && PyUnicode_CheckExact(str)) {
/* no repeat, return original string */
Py_INCREF(str);
return (PyObject*) str;
}
/* ensure # of chars needed doesn't overflow int and # of bytes
* needed doesn't overflow size_t
*/
nchars = len * str->length;
if (len && nchars / len != str->length) {
PyErr_SetString(PyExc_OverflowError,
"repeated string is too long");
return NULL;
}
nbytes = (nchars + 1) * sizeof(Py_UNICODE);
if (nbytes / sizeof(Py_UNICODE) != (size_t)(nchars + 1)) {
PyErr_SetString(PyExc_OverflowError,
"repeated string is too long");
return NULL;
}
u = _PyUnicode_New(nchars);
if (!u)
return NULL;
p = u->str;
if (str->length == 1 && len > 0) {
Py_UNICODE_FILL(p, str->str[0], len);
} else {
Py_ssize_t done = 0; /* number of characters copied this far */
if (done < nchars) {
Py_UNICODE_COPY(p, str->str, str->length);
done = str->length;
}
while (done < nchars) {
int n = (done <= nchars-done) ? done : nchars-done;
Py_UNICODE_COPY(p+done, p, n);
done += n;
}
}
return (PyObject*) u;
}
PyObject *PyUnicode_Replace(PyObject *obj,
PyObject *subobj,
PyObject *replobj,
Py_ssize_t maxcount)
{
PyObject *self;
PyObject *str1;
PyObject *str2;
PyObject *result;
self = PyUnicode_FromObject(obj);
if (self == NULL)
return NULL;
str1 = PyUnicode_FromObject(subobj);
if (str1 == NULL) {
Py_DECREF(self);
return NULL;
}
str2 = PyUnicode_FromObject(replobj);
if (str2 == NULL) {
Py_DECREF(self);
Py_DECREF(str1);
return NULL;
}
result = replace((PyUnicodeObject *)self,
(PyUnicodeObject *)str1,
(PyUnicodeObject *)str2,
maxcount);
Py_DECREF(self);
Py_DECREF(str1);
Py_DECREF(str2);
return result;
}
PyDoc_STRVAR(replace__doc__,
"S.replace (old, new[, maxsplit]) -> unicode\n\
\n\
Return a copy of S with all occurrences of substring\n\
old replaced by new. If the optional argument maxsplit is\n\
given, only the first maxsplit occurrences are replaced.");
static PyObject*
unicode_replace(PyUnicodeObject *self, PyObject *args)
{
PyUnicodeObject *str1;
PyUnicodeObject *str2;
Py_ssize_t maxcount = -1;
PyObject *result;
if (!PyArg_ParseTuple(args, "OO|n:replace", &str1, &str2, &maxcount))
return NULL;
str1 = (PyUnicodeObject *)PyUnicode_FromObject((PyObject *)str1);
if (str1 == NULL)
return NULL;
str2 = (PyUnicodeObject *)PyUnicode_FromObject((PyObject *)str2);
if (str2 == NULL) {
Py_DECREF(str1);
return NULL;
}
result = replace(self, str1, str2, maxcount);
Py_DECREF(str1);
Py_DECREF(str2);
return result;
}
static
PyObject *unicode_repr(PyObject *unicode)
{
return unicodeescape_string(PyUnicode_AS_UNICODE(unicode),
PyUnicode_GET_SIZE(unicode),
1);
}
PyDoc_STRVAR(rfind__doc__,
"S.rfind(sub [,start [,end]]) -> int\n\
\n\
Return the highest index in S where substring sub is found,\n\
such that sub is contained within s[start,end]. Optional\n\
arguments start and end are interpreted as in slice notation.\n\
\n\
Return -1 on failure.");
static PyObject *
unicode_rfind(PyUnicodeObject *self, PyObject *args)
{
PyObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
Py_ssize_t result;
if (!PyArg_ParseTuple(args, "O|O&O&:rfind", &substring,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
substring = PyUnicode_FromObject(substring);
if (!substring)
return NULL;
result = stringlib_rfind_slice(
PyUnicode_AS_UNICODE(self), PyUnicode_GET_SIZE(self),
PyUnicode_AS_UNICODE(substring), PyUnicode_GET_SIZE(substring),
start, end
);
Py_DECREF(substring);
return PyInt_FromSsize_t(result);
}
PyDoc_STRVAR(rindex__doc__,
"S.rindex(sub [,start [,end]]) -> int\n\
\n\
Like S.rfind() but raise ValueError when the substring is not found.");
static PyObject *
unicode_rindex(PyUnicodeObject *self, PyObject *args)
{
PyObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
Py_ssize_t result;
if (!PyArg_ParseTuple(args, "O|O&O&:rindex", &substring,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
substring = PyUnicode_FromObject(substring);
if (!substring)
return NULL;
result = stringlib_rfind_slice(
PyUnicode_AS_UNICODE(self), PyUnicode_GET_SIZE(self),
PyUnicode_AS_UNICODE(substring), PyUnicode_GET_SIZE(substring),
start, end
);
Py_DECREF(substring);
if (result < 0) {
PyErr_SetString(PyExc_ValueError, "substring not found");
return NULL;
}
return PyInt_FromSsize_t(result);
}
PyDoc_STRVAR(rjust__doc__,
"S.rjust(width[, fillchar]) -> unicode\n\
\n\
Return S right justified in a Unicode string of length width. Padding is\n\
done using the specified fill character (default is a space).");
static PyObject *
unicode_rjust(PyUnicodeObject *self, PyObject *args)
{
Py_ssize_t width;
Py_UNICODE fillchar = ' ';
if (!PyArg_ParseTuple(args, "n|O&:rjust", &width, convert_uc, &fillchar))
return NULL;
if (self->length >= width && PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*) self;
}
return (PyObject*) pad(self, width - self->length, 0, fillchar);
}
static PyObject*
unicode_slice(PyUnicodeObject *self, Py_ssize_t start, Py_ssize_t end)
{
/* standard clamping */
if (start < 0)
start = 0;
if (end < 0)
end = 0;
if (end > self->length)
end = self->length;
if (start == 0 && end == self->length && PyUnicode_CheckExact(self)) {
/* full slice, return original string */
Py_INCREF(self);
return (PyObject*) self;
}
if (start > end)
start = end;
/* copy slice */
return (PyObject*) PyUnicode_FromUnicode(self->str + start,
end - start);
}
PyObject *PyUnicode_Split(PyObject *s,
PyObject *sep,
Py_ssize_t maxsplit)
{
PyObject *result;
s = PyUnicode_FromObject(s);
if (s == NULL)
return NULL;
if (sep != NULL) {
sep = PyUnicode_FromObject(sep);
if (sep == NULL) {
Py_DECREF(s);
return NULL;
}
}
result = split((PyUnicodeObject *)s, (PyUnicodeObject *)sep, maxsplit);
Py_DECREF(s);
Py_XDECREF(sep);
return result;
}
PyDoc_STRVAR(split__doc__,
"S.split([sep [,maxsplit]]) -> list of strings\n\
\n\
Return a list of the words in S, using sep as the\n\
delimiter string. If maxsplit is given, at most maxsplit\n\
splits are done. If sep is not specified or is None,\n\
any whitespace string is a separator.");
static PyObject*
unicode_split(PyUnicodeObject *self, PyObject *args)
{
PyObject *substring = Py_None;
Py_ssize_t maxcount = -1;
if (!PyArg_ParseTuple(args, "|On:split", &substring, &maxcount))
return NULL;
if (substring == Py_None)
return split(self, NULL, maxcount);
else if (PyUnicode_Check(substring))
return split(self, (PyUnicodeObject *)substring, maxcount);
else
return PyUnicode_Split((PyObject *)self, substring, maxcount);
}
PyObject *
PyUnicode_Partition(PyObject *str_in, PyObject *sep_in)
{
PyObject* str_obj;
PyObject* sep_obj;
PyObject* out;
str_obj = PyUnicode_FromObject(str_in);
if (!str_obj)
return NULL;
sep_obj = PyUnicode_FromObject(sep_in);
if (!sep_obj) {
Py_DECREF(str_obj);
return NULL;
}
out = stringlib_partition(
str_obj, PyUnicode_AS_UNICODE(str_obj), PyUnicode_GET_SIZE(str_obj),
sep_obj, PyUnicode_AS_UNICODE(sep_obj), PyUnicode_GET_SIZE(sep_obj)
);
Py_DECREF(sep_obj);
Py_DECREF(str_obj);
return out;
}
PyObject *
PyUnicode_RPartition(PyObject *str_in, PyObject *sep_in)
{
PyObject* str_obj;
PyObject* sep_obj;
PyObject* out;
str_obj = PyUnicode_FromObject(str_in);
if (!str_obj)
return NULL;
sep_obj = PyUnicode_FromObject(sep_in);
if (!sep_obj) {
Py_DECREF(str_obj);
return NULL;
}
out = stringlib_rpartition(
str_obj, PyUnicode_AS_UNICODE(str_obj), PyUnicode_GET_SIZE(str_obj),
sep_obj, PyUnicode_AS_UNICODE(sep_obj), PyUnicode_GET_SIZE(sep_obj)
);
Py_DECREF(sep_obj);
Py_DECREF(str_obj);
return out;
}
PyDoc_STRVAR(partition__doc__,
"S.partition(sep) -> (head, sep, tail)\n\
\n\
Searches for the separator sep in S, and returns the part before it,\n\
the separator itself, and the part after it. If the separator is not\n\
found, returns S and two empty strings.");
static PyObject*
unicode_partition(PyUnicodeObject *self, PyObject *separator)
{
return PyUnicode_Partition((PyObject *)self, separator);
}
PyDoc_STRVAR(rpartition__doc__,
"S.rpartition(sep) -> (head, sep, tail)\n\
\n\
Searches for the separator sep in S, starting at the end of S, and returns\n\
the part before it, the separator itself, and the part after it. If the\n\
separator is not found, returns S and two empty strings.");
static PyObject*
unicode_rpartition(PyUnicodeObject *self, PyObject *separator)
{
return PyUnicode_RPartition((PyObject *)self, separator);
}
PyObject *PyUnicode_RSplit(PyObject *s,
PyObject *sep,
Py_ssize_t maxsplit)
{
PyObject *result;
s = PyUnicode_FromObject(s);
if (s == NULL)
return NULL;
if (sep != NULL) {
sep = PyUnicode_FromObject(sep);
if (sep == NULL) {
Py_DECREF(s);
return NULL;
}
}
result = rsplit((PyUnicodeObject *)s, (PyUnicodeObject *)sep, maxsplit);
Py_DECREF(s);
Py_XDECREF(sep);
return result;
}
PyDoc_STRVAR(rsplit__doc__,
"S.rsplit([sep [,maxsplit]]) -> list of strings\n\
\n\
Return a list of the words in S, using sep as the\n\
delimiter string, starting at the end of the string and\n\
working to the front. If maxsplit is given, at most maxsplit\n\
splits are done. If sep is not specified, any whitespace string\n\
is a separator.");
static PyObject*
unicode_rsplit(PyUnicodeObject *self, PyObject *args)
{
PyObject *substring = Py_None;
Py_ssize_t maxcount = -1;
if (!PyArg_ParseTuple(args, "|On:rsplit", &substring, &maxcount))
return NULL;
if (substring == Py_None)
return rsplit(self, NULL, maxcount);
else if (PyUnicode_Check(substring))
return rsplit(self, (PyUnicodeObject *)substring, maxcount);
else
return PyUnicode_RSplit((PyObject *)self, substring, maxcount);
}
PyDoc_STRVAR(splitlines__doc__,
"S.splitlines([keepends]]) -> list of strings\n\
\n\
Return a list of the lines in S, breaking at line boundaries.\n\
Line breaks are not included in the resulting list unless keepends\n\
is given and true.");
static PyObject*
unicode_splitlines(PyUnicodeObject *self, PyObject *args)
{
int keepends = 0;
if (!PyArg_ParseTuple(args, "|i:splitlines", &keepends))
return NULL;
return PyUnicode_Splitlines((PyObject *)self, keepends);
}
static
PyObject *unicode_str(PyUnicodeObject *self)
{
return PyUnicode_AsEncodedString((PyObject *)self, NULL, NULL);
}
PyDoc_STRVAR(swapcase__doc__,
"S.swapcase() -> unicode\n\
\n\
Return a copy of S with uppercase characters converted to lowercase\n\
and vice versa.");
static PyObject*
unicode_swapcase(PyUnicodeObject *self)
{
return fixup(self, fixswapcase);
}
PyDoc_STRVAR(translate__doc__,
"S.translate(table) -> unicode\n\
\n\
Return a copy of the string S, where all characters have been mapped\n\
through the given translation table, which must be a mapping of\n\
Unicode ordinals to Unicode ordinals, Unicode strings or None.\n\
Unmapped characters are left untouched. Characters mapped to None\n\
are deleted.");
static PyObject*
unicode_translate(PyUnicodeObject *self, PyObject *table)
{
return PyUnicode_TranslateCharmap(self->str,
self->length,
table,
"ignore");
}
PyDoc_STRVAR(upper__doc__,
"S.upper() -> unicode\n\
\n\
Return a copy of S converted to uppercase.");
static PyObject*
unicode_upper(PyUnicodeObject *self)
{
return fixup(self, fixupper);
}
PyDoc_STRVAR(zfill__doc__,
"S.zfill(width) -> unicode\n\
\n\
Pad a numeric string x with zeros on the left, to fill a field\n\
of the specified width. The string x is never truncated.");
static PyObject *
unicode_zfill(PyUnicodeObject *self, PyObject *args)
{
Py_ssize_t fill;
PyUnicodeObject *u;
Py_ssize_t width;
if (!PyArg_ParseTuple(args, "n:zfill", &width))
return NULL;
if (self->length >= width) {
if (PyUnicode_CheckExact(self)) {
Py_INCREF(self);
return (PyObject*) self;
}
else
return PyUnicode_FromUnicode(
PyUnicode_AS_UNICODE(self),
PyUnicode_GET_SIZE(self)
);
}
fill = width - self->length;
u = pad(self, fill, 0, '0');
if (u == NULL)
return NULL;
if (u->str[fill] == '+' || u->str[fill] == '-') {
/* move sign to beginning of string */
u->str[0] = u->str[fill];
u->str[fill] = '0';
}
return (PyObject*) u;
}
#if 0
static PyObject*
unicode_freelistsize(PyUnicodeObject *self)
{
return PyInt_FromLong(unicode_freelist_size);
}
#endif
PyDoc_STRVAR(startswith__doc__,
"S.startswith(prefix[, start[, end]]) -> bool\n\
\n\
Return True if S starts with the specified prefix, False otherwise.\n\
With optional start, test S beginning at that position.\n\
With optional end, stop comparing S at that position.\n\
prefix can also be a tuple of strings to try.");
static PyObject *
unicode_startswith(PyUnicodeObject *self,
PyObject *args)
{
PyObject *subobj;
PyUnicodeObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
int result;
if (!PyArg_ParseTuple(args, "O|O&O&:startswith", &subobj,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
if (PyTuple_Check(subobj)) {
Py_ssize_t i;
for (i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
substring = (PyUnicodeObject *)PyUnicode_FromObject(
PyTuple_GET_ITEM(subobj, i));
if (substring == NULL)
return NULL;
result = tailmatch(self, substring, start, end, -1);
Py_DECREF(substring);
if (result) {
Py_RETURN_TRUE;
}
}
/* nothing matched */
Py_RETURN_FALSE;
}
substring = (PyUnicodeObject *)PyUnicode_FromObject(subobj);
if (substring == NULL)
return NULL;
result = tailmatch(self, substring, start, end, -1);
Py_DECREF(substring);
return PyBool_FromLong(result);
}
PyDoc_STRVAR(endswith__doc__,
"S.endswith(suffix[, start[, end]]) -> bool\n\
\n\
Return True if S ends with the specified suffix, False otherwise.\n\
With optional start, test S beginning at that position.\n\
With optional end, stop comparing S at that position.\n\
suffix can also be a tuple of strings to try.");
static PyObject *
unicode_endswith(PyUnicodeObject *self,
PyObject *args)
{
PyObject *subobj;
PyUnicodeObject *substring;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
int result;
if (!PyArg_ParseTuple(args, "O|O&O&:endswith", &subobj,
_PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
return NULL;
if (PyTuple_Check(subobj)) {
Py_ssize_t i;
for (i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
substring = (PyUnicodeObject *)PyUnicode_FromObject(
PyTuple_GET_ITEM(subobj, i));
if (substring == NULL)
return NULL;
result = tailmatch(self, substring, start, end, +1);
Py_DECREF(substring);
if (result) {
Py_RETURN_TRUE;
}
}
Py_RETURN_FALSE;
}
substring = (PyUnicodeObject *)PyUnicode_FromObject(subobj);
if (substring == NULL)
return NULL;
result = tailmatch(self, substring, start, end, +1);
Py_DECREF(substring);
return PyBool_FromLong(result);
}
static PyObject *
unicode_getnewargs(PyUnicodeObject *v)
{
return Py_BuildValue("(u#)", v->str, v->length);
}
static PyMethodDef unicode_methods[] = {
/* Order is according to common usage: often used methods should
appear first, since lookup is done sequentially. */
{"encode", (PyCFunction) unicode_encode, METH_VARARGS, encode__doc__},
{"replace", (PyCFunction) unicode_replace, METH_VARARGS, replace__doc__},
{"split", (PyCFunction) unicode_split, METH_VARARGS, split__doc__},
{"rsplit", (PyCFunction) unicode_rsplit, METH_VARARGS, rsplit__doc__},
{"join", (PyCFunction) unicode_join, METH_O, join__doc__},
{"capitalize", (PyCFunction) unicode_capitalize, METH_NOARGS, capitalize__doc__},
{"title", (PyCFunction) unicode_title, METH_NOARGS, title__doc__},
{"center", (PyCFunction) unicode_center, METH_VARARGS, center__doc__},
{"count", (PyCFunction) unicode_count, METH_VARARGS, count__doc__},
{"expandtabs", (PyCFunction) unicode_expandtabs, METH_VARARGS, expandtabs__doc__},
{"find", (PyCFunction) unicode_find, METH_VARARGS, find__doc__},
{"partition", (PyCFunction) unicode_partition, METH_O, partition__doc__},
{"index", (PyCFunction) unicode_index, METH_VARARGS, index__doc__},
{"ljust", (PyCFunction) unicode_ljust, METH_VARARGS, ljust__doc__},
{"lower", (PyCFunction) unicode_lower, METH_NOARGS, lower__doc__},
{"lstrip", (PyCFunction) unicode_lstrip, METH_VARARGS, lstrip__doc__},
{"decode", (PyCFunction) unicode_decode, METH_VARARGS, decode__doc__},
/* {"maketrans", (PyCFunction) unicode_maketrans, METH_VARARGS, maketrans__doc__}, */
{"rfind", (PyCFunction) unicode_rfind, METH_VARARGS, rfind__doc__},
{"rindex", (PyCFunction) unicode_rindex, METH_VARARGS, rindex__doc__},
{"rjust", (PyCFunction) unicode_rjust, METH_VARARGS, rjust__doc__},
{"rstrip", (PyCFunction) unicode_rstrip, METH_VARARGS, rstrip__doc__},
{"rpartition", (PyCFunction) unicode_rpartition, METH_O, rpartition__doc__},
{"splitlines", (PyCFunction) unicode_splitlines, METH_VARARGS, splitlines__doc__},
{"strip", (PyCFunction) unicode_strip, METH_VARARGS, strip__doc__},
{"swapcase", (PyCFunction) unicode_swapcase, METH_NOARGS, swapcase__doc__},
{"translate", (PyCFunction) unicode_translate, METH_O, translate__doc__},
{"upper", (PyCFunction) unicode_upper, METH_NOARGS, upper__doc__},
{"startswith", (PyCFunction) unicode_startswith, METH_VARARGS, startswith__doc__},
{"endswith", (PyCFunction) unicode_endswith, METH_VARARGS, endswith__doc__},
{"islower", (PyCFunction) unicode_islower, METH_NOARGS, islower__doc__},
{"isupper", (PyCFunction) unicode_isupper, METH_NOARGS, isupper__doc__},
{"istitle", (PyCFunction) unicode_istitle, METH_NOARGS, istitle__doc__},
{"isspace", (PyCFunction) unicode_isspace, METH_NOARGS, isspace__doc__},
{"isdecimal", (PyCFunction) unicode_isdecimal, METH_NOARGS, isdecimal__doc__},
{"isdigit", (PyCFunction) unicode_isdigit, METH_NOARGS, isdigit__doc__},
{"isnumeric", (PyCFunction) unicode_isnumeric, METH_NOARGS, isnumeric__doc__},
{"isalpha", (PyCFunction) unicode_isalpha, METH_NOARGS, isalpha__doc__},
{"isalnum", (PyCFunction) unicode_isalnum, METH_NOARGS, isalnum__doc__},
{"zfill", (PyCFunction) unicode_zfill, METH_VARARGS, zfill__doc__},
#if 0
{"capwords", (PyCFunction) unicode_capwords, METH_NOARGS, capwords__doc__},
#endif
#if 0
/* This one is just used for debugging the implementation. */
{"freelistsize", (PyCFunction) unicode_freelistsize, METH_NOARGS},
#endif
{"__getnewargs__", (PyCFunction)unicode_getnewargs, METH_NOARGS},
{NULL, NULL}
};
static PyObject *
unicode_mod(PyObject *v, PyObject *w)
{
if (!PyUnicode_Check(v)) {
Py_INCREF(Py_NotImplemented);
return Py_NotImplemented;
}
return PyUnicode_Format(v, w);
}
static PyNumberMethods unicode_as_number = {
0, /*nb_add*/
0, /*nb_subtract*/
0, /*nb_multiply*/
unicode_mod, /*nb_remainder*/
};
static PySequenceMethods unicode_as_sequence = {
(lenfunc) unicode_length, /* sq_length */
PyUnicode_Concat, /* sq_concat */
(ssizeargfunc) unicode_repeat, /* sq_repeat */
(ssizeargfunc) unicode_getitem, /* sq_item */
(ssizessizeargfunc) unicode_slice, /* sq_slice */
0, /* sq_ass_item */
0, /* sq_ass_slice */
PyUnicode_Contains, /* sq_contains */
};
static PyObject*
unicode_subscript(PyUnicodeObject* self, PyObject* item)
{
PyNumberMethods *nb = item->ob_type->tp_as_number;
if (nb != NULL && nb->nb_index != NULL) {
Py_ssize_t i = nb->nb_index(item);
if (i == -1 && PyErr_Occurred())
return NULL;
if (i < 0)
i += PyUnicode_GET_SIZE(self);
return unicode_getitem(self, i);
} else if (PySlice_Check(item)) {
Py_ssize_t start, stop, step, slicelength, cur, i;
Py_UNICODE* source_buf;
Py_UNICODE* result_buf;
PyObject* result;
if (PySlice_GetIndicesEx((PySliceObject*)item, PyUnicode_GET_SIZE(self),
&start, &stop, &step, &slicelength) < 0) {
return NULL;
}
if (slicelength <= 0) {
return PyUnicode_FromUnicode(NULL, 0);
} else {
source_buf = PyUnicode_AS_UNICODE((PyObject*)self);
result_buf = (Py_UNICODE *)PyMem_MALLOC(slicelength*
sizeof(Py_UNICODE));
if (result_buf == NULL)
return PyErr_NoMemory();
for (cur = start, i = 0; i < slicelength; cur += step, i++) {
result_buf[i] = source_buf[cur];
}
result = PyUnicode_FromUnicode(result_buf, slicelength);
PyMem_FREE(result_buf);
return result;
}
} else {
PyErr_SetString(PyExc_TypeError, "string indices must be integers");
return NULL;
}
}
static PyMappingMethods unicode_as_mapping = {
(lenfunc)unicode_length, /* mp_length */
(binaryfunc)unicode_subscript, /* mp_subscript */
(objobjargproc)0, /* mp_ass_subscript */
};
static Py_ssize_t
unicode_buffer_getreadbuf(PyUnicodeObject *self,
Py_ssize_t index,
const void **ptr)
{
if (index != 0) {
PyErr_SetString(PyExc_SystemError,
"accessing non-existent unicode segment");
return -1;
}
*ptr = (void *) self->str;
return PyUnicode_GET_DATA_SIZE(self);
}
static Py_ssize_t
unicode_buffer_getwritebuf(PyUnicodeObject *self, Py_ssize_t index,
const void **ptr)
{
PyErr_SetString(PyExc_TypeError,
"cannot use unicode as modifiable buffer");
return -1;
}
static int
unicode_buffer_getsegcount(PyUnicodeObject *self,
Py_ssize_t *lenp)
{
if (lenp)
*lenp = PyUnicode_GET_DATA_SIZE(self);
return 1;
}
static Py_ssize_t
unicode_buffer_getcharbuf(PyUnicodeObject *self,
Py_ssize_t index,
const void **ptr)
{
PyObject *str;
if (index != 0) {
PyErr_SetString(PyExc_SystemError,
"accessing non-existent unicode segment");
return -1;
}
str = _PyUnicode_AsDefaultEncodedString((PyObject *)self, NULL);
if (str == NULL)
return -1;
*ptr = (void *) PyString_AS_STRING(str);
return PyString_GET_SIZE(str);
}
/* Helpers for PyUnicode_Format() */
static PyObject *
getnextarg(PyObject *args, Py_ssize_t arglen, Py_ssize_t *p_argidx)
{
Py_ssize_t argidx = *p_argidx;
if (argidx < arglen) {
(*p_argidx)++;
if (arglen < 0)
return args;
else
return PyTuple_GetItem(args, argidx);
}
PyErr_SetString(PyExc_TypeError,
"not enough arguments for format string");
return NULL;
}
#define F_LJUST (1<<0)
#define F_SIGN (1<<1)
#define F_BLANK (1<<2)
#define F_ALT (1<<3)
#define F_ZERO (1<<4)
static Py_ssize_t
strtounicode(Py_UNICODE *buffer, const char *charbuffer)
{
register Py_ssize_t i;
Py_ssize_t len = strlen(charbuffer);
for (i = len - 1; i >= 0; i--)
buffer[i] = (Py_UNICODE) charbuffer[i];
return len;
}
static int
doubletounicode(Py_UNICODE *buffer, size_t len, const char *format, double x)
{
Py_ssize_t result;
PyOS_ascii_formatd((char *)buffer, len, format, x);
result = strtounicode(buffer, (char *)buffer);
return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}
static int
longtounicode(Py_UNICODE *buffer, size_t len, const char *format, long x)
{
Py_ssize_t result;
PyOS_snprintf((char *)buffer, len, format, x);
result = strtounicode(buffer, (char *)buffer);
return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}
/* XXX To save some code duplication, formatfloat/long/int could have been
shared with stringobject.c, converting from 8-bit to Unicode after the
formatting is done. */
static int
formatfloat(Py_UNICODE *buf,
size_t buflen,
int flags,
int prec,
int type,
PyObject *v)
{
/* fmt = '%#.' + `prec` + `type`
worst case length = 3 + 10 (len of INT_MAX) + 1 = 14 (use 20)*/
char fmt[20];
double x;
x = PyFloat_AsDouble(v);
if (x == -1.0 && PyErr_Occurred())
return -1;
if (prec < 0)
prec = 6;
if (type == 'f' && (fabs(x) / 1e25) >= 1e25)
type = 'g';
/* Worst case length calc to ensure no buffer overrun:
'g' formats:
fmt = %#.<prec>g
buf = '-' + [0-9]*prec + '.' + 'e+' + (longest exp
for any double rep.)
len = 1 + prec + 1 + 2 + 5 = 9 + prec
'f' formats:
buf = '-' + [0-9]*x + '.' + [0-9]*prec (with x < 50)
len = 1 + 50 + 1 + prec = 52 + prec
If prec=0 the effective precision is 1 (the leading digit is
always given), therefore increase the length by one.
*/
if ((type == 'g' && buflen <= (size_t)10 + (size_t)prec) ||
(type == 'f' && buflen <= (size_t)53 + (size_t)prec)) {
PyErr_SetString(PyExc_OverflowError,
"formatted float is too long (precision too large?)");
return -1;
}
PyOS_snprintf(fmt, sizeof(fmt), "%%%s.%d%c",
(flags&F_ALT) ? "#" : "",
prec, type);
return doubletounicode(buf, buflen, fmt, x);
}
static PyObject*
formatlong(PyObject *val, int flags, int prec, int type)
{
char *buf;
int i, len;
PyObject *str; /* temporary string object. */
PyUnicodeObject *result;
str = _PyString_FormatLong(val, flags, prec, type, &buf, &len);
if (!str)
return NULL;
result = _PyUnicode_New(len);
if (!result) {
Py_DECREF(str);
return NULL;
}
for (i = 0; i < len; i++)
result->str[i] = buf[i];
result->str[len] = 0;
Py_DECREF(str);
return (PyObject*)result;
}
static int
formatint(Py_UNICODE *buf,
size_t buflen,
int flags,
int prec,
int type,
PyObject *v)
{
/* fmt = '%#.' + `prec` + 'l' + `type`
* worst case length = 3 + 19 (worst len of INT_MAX on 64-bit machine)
* + 1 + 1
* = 24
*/
char fmt[64]; /* plenty big enough! */
char *sign;
long x;
x = PyInt_AsLong(v);
if (x == -1 && PyErr_Occurred())
return -1;
if (x < 0 && type == 'u') {
type = 'd';
}
if (x < 0 && (type == 'x' || type == 'X' || type == 'o'))
sign = "-";
else
sign = "";
if (prec < 0)
prec = 1;
/* buf = '+'/'-'/'' + '0'/'0x'/'' + '[0-9]'*max(prec, len(x in octal))
* worst case buf = '-0x' + [0-9]*prec, where prec >= 11
*/
if (buflen <= 14 || buflen <= (size_t)3 + (size_t)prec) {
PyErr_SetString(PyExc_OverflowError,
"formatted integer is too long (precision too large?)");
return -1;
}
if ((flags & F_ALT) &&
(type == 'x' || type == 'X')) {
/* When converting under %#x or %#X, there are a number
* of issues that cause pain:
* - when 0 is being converted, the C standard leaves off
* the '0x' or '0X', which is inconsistent with other
* %#x/%#X conversions and inconsistent with Python's
* hex() function
* - there are platforms that violate the standard and
* convert 0 with the '0x' or '0X'
* (Metrowerks, Compaq Tru64)
* - there are platforms that give '0x' when converting
* under %#X, but convert 0 in accordance with the
* standard (OS/2 EMX)
*
* We can achieve the desired consistency by inserting our
* own '0x' or '0X' prefix, and substituting %x/%X in place
* of %#x/%#X.
*
* Note that this is the same approach as used in
* formatint() in stringobject.c
*/
PyOS_snprintf(fmt, sizeof(fmt), "%s0%c%%.%dl%c",
sign, type, prec, type);
}
else {
PyOS_snprintf(fmt, sizeof(fmt), "%s%%%s.%dl%c",
sign, (flags&F_ALT) ? "#" : "",
prec, type);
}
if (sign[0])
return longtounicode(buf, buflen, fmt, -x);
else
return longtounicode(buf, buflen, fmt, x);
}
static int
formatchar(Py_UNICODE *buf,
size_t buflen,
PyObject *v)
{
/* presume that the buffer is at least 2 characters long */
if (PyUnicode_Check(v)) {
if (PyUnicode_GET_SIZE(v) != 1)
goto onError;
buf[0] = PyUnicode_AS_UNICODE(v)[0];
}
else if (PyString_Check(v)) {
if (PyString_GET_SIZE(v) != 1)
goto onError;
buf[0] = (Py_UNICODE)PyString_AS_STRING(v)[0];
}
else {
/* Integer input truncated to a character */
long x;
x = PyInt_AsLong(v);
if (x == -1 && PyErr_Occurred())
goto onError;
#ifdef Py_UNICODE_WIDE
if (x < 0 || x > 0x10ffff) {
PyErr_SetString(PyExc_OverflowError,
"%c arg not in range(0x110000) "
"(wide Python build)");
return -1;
}
#else
if (x < 0 || x > 0xffff) {
PyErr_SetString(PyExc_OverflowError,
"%c arg not in range(0x10000) "
"(narrow Python build)");
return -1;
}
#endif
buf[0] = (Py_UNICODE) x;
}
buf[1] = '\0';
return 1;
onError:
PyErr_SetString(PyExc_TypeError,
"%c requires int or char");
return -1;
}
/* fmt%(v1,v2,...) is roughly equivalent to sprintf(fmt, v1, v2, ...)
FORMATBUFLEN is the length of the buffer in which the floats, ints, &
chars are formatted. XXX This is a magic number. Each formatting
routine does bounds checking to ensure no overflow, but a better
solution may be to malloc a buffer of appropriate size for each
format. For now, the current solution is sufficient.
*/
#define FORMATBUFLEN (size_t)120
PyObject *PyUnicode_Format(PyObject *format,
PyObject *args)
{
Py_UNICODE *fmt, *res;
Py_ssize_t fmtcnt, rescnt, reslen, arglen, argidx;
int args_owned = 0;
PyUnicodeObject *result = NULL;
PyObject *dict = NULL;
PyObject *uformat;
if (format == NULL || args == NULL) {
PyErr_BadInternalCall();
return NULL;
}
uformat = PyUnicode_FromObject(format);
if (uformat == NULL)
return NULL;
fmt = PyUnicode_AS_UNICODE(uformat);
fmtcnt = PyUnicode_GET_SIZE(uformat);
reslen = rescnt = fmtcnt + 100;
result = _PyUnicode_New(reslen);
if (result == NULL)
goto onError;
res = PyUnicode_AS_UNICODE(result);
if (PyTuple_Check(args)) {
arglen = PyTuple_Size(args);
argidx = 0;
}
else {
arglen = -1;
argidx = -2;
}
if (args->ob_type->tp_as_mapping && !PyTuple_Check(args) &&
!PyObject_TypeCheck(args, &PyBaseString_Type))
dict = args;
while (--fmtcnt >= 0) {
if (*fmt != '%') {
if (--rescnt < 0) {
rescnt = fmtcnt + 100;
reslen += rescnt;
if (_PyUnicode_Resize(&result, reslen) < 0)
goto onError;
res = PyUnicode_AS_UNICODE(result) + reslen - rescnt;
--rescnt;
}
*res++ = *fmt++;
}
else {
/* Got a format specifier */
int flags = 0;
Py_ssize_t width = -1;
int prec = -1;
Py_UNICODE c = '\0';
Py_UNICODE fill;
PyObject *v = NULL;
PyObject *temp = NULL;
Py_UNICODE *pbuf;
Py_UNICODE sign;
Py_ssize_t len;
Py_UNICODE formatbuf[FORMATBUFLEN]; /* For format{float,int,char}() */
fmt++;
if (*fmt == '(') {
Py_UNICODE *keystart;
Py_ssize_t keylen;
PyObject *key;
int pcount = 1;
if (dict == NULL) {
PyErr_SetString(PyExc_TypeError,
"format requires a mapping");
goto onError;
}
++fmt;
--fmtcnt;
keystart = fmt;
/* Skip over balanced parentheses */
while (pcount > 0 && --fmtcnt >= 0) {
if (*fmt == ')')
--pcount;
else if (*fmt == '(')
++pcount;
fmt++;
}
keylen = fmt - keystart - 1;
if (fmtcnt < 0 || pcount > 0) {
PyErr_SetString(PyExc_ValueError,
"incomplete format key");
goto onError;
}
#if 0
/* keys are converted to strings using UTF-8 and
then looked up since Python uses strings to hold
variables names etc. in its namespaces and we
wouldn't want to break common idioms. */
key = PyUnicode_EncodeUTF8(keystart,
keylen,
NULL);
#else
key = PyUnicode_FromUnicode(keystart, keylen);
#endif
if (key == NULL)
goto onError;
if (args_owned) {
Py_DECREF(args);
args_owned = 0;
}
args = PyObject_GetItem(dict, key);
Py_DECREF(key);
if (args == NULL) {
goto onError;
}
args_owned = 1;
arglen = -1;
argidx = -2;
}
while (--fmtcnt >= 0) {
switch (c = *fmt++) {
case '-': flags |= F_LJUST; continue;
case '+': flags |= F_SIGN; continue;
case ' ': flags |= F_BLANK; continue;
case '#': flags |= F_ALT; continue;
case '0': flags |= F_ZERO; continue;
}
break;
}
if (c == '*') {
v = getnextarg(args, arglen, &argidx);
if (v == NULL)
goto onError;
if (!PyInt_Check(v)) {
PyErr_SetString(PyExc_TypeError,
"* wants int");
goto onError;
}
width = PyInt_AsLong(v);
if (width < 0) {
flags |= F_LJUST;
width = -width;
}
if (--fmtcnt >= 0)
c = *fmt++;
}
else if (c >= '0' && c <= '9') {
width = c - '0';
while (--fmtcnt >= 0) {
c = *fmt++;
if (c < '0' || c > '9')
break;
if ((width*10) / 10 != width) {
PyErr_SetString(PyExc_ValueError,
"width too big");
goto onError;
}
width = width*10 + (c - '0');
}
}
if (c == '.') {
prec = 0;
if (--fmtcnt >= 0)
c = *fmt++;
if (c == '*') {
v = getnextarg(args, arglen, &argidx);
if (v == NULL)
goto onError;
if (!PyInt_Check(v)) {
PyErr_SetString(PyExc_TypeError,
"* wants int");
goto onError;
}
prec = PyInt_AsLong(v);
if (prec < 0)
prec = 0;
if (--fmtcnt >= 0)
c = *fmt++;
}
else if (c >= '0' && c <= '9') {
prec = c - '0';
while (--fmtcnt >= 0) {
c = Py_CHARMASK(*fmt++);
if (c < '0' || c > '9')
break;
if ((prec*10) / 10 != prec) {
PyErr_SetString(PyExc_ValueError,
"prec too big");
goto onError;
}
prec = prec*10 + (c - '0');
}
}
} /* prec */
if (fmtcnt >= 0) {
if (c == 'h' || c == 'l' || c == 'L') {
if (--fmtcnt >= 0)
c = *fmt++;
}
}
if (fmtcnt < 0) {
PyErr_SetString(PyExc_ValueError,
"incomplete format");
goto onError;
}
if (c != '%') {
v = getnextarg(args, arglen, &argidx);
if (v == NULL)
goto onError;
}
sign = 0;
fill = ' ';
switch (c) {
case '%':
pbuf = formatbuf;
/* presume that buffer length is at least 1 */
pbuf[0] = '%';
len = 1;
break;
case 's':
case 'r':
if (PyUnicode_Check(v) && c == 's') {
temp = v;
Py_INCREF(temp);
}
else {
PyObject *unicode;
if (c == 's')
temp = PyObject_Unicode(v);
else
temp = PyObject_Repr(v);
if (temp == NULL)
goto onError;
if (PyUnicode_Check(temp))
/* nothing to do */;
else if (PyString_Check(temp)) {
/* convert to string to Unicode */
unicode = PyUnicode_Decode(PyString_AS_STRING(temp),
PyString_GET_SIZE(temp),
NULL,
"strict");
Py_DECREF(temp);
temp = unicode;
if (temp == NULL)
goto onError;
}
else {
Py_DECREF(temp);
PyErr_SetString(PyExc_TypeError,
"%s argument has non-string str()");
goto onError;
}
}
pbuf = PyUnicode_AS_UNICODE(temp);
len = PyUnicode_GET_SIZE(temp);
if (prec >= 0 && len > prec)
len = prec;
break;
case 'i':
case 'd':
case 'u':
case 'o':
case 'x':
case 'X':
if (c == 'i')
c = 'd';
if (PyLong_Check(v)) {
temp = formatlong(v, flags, prec, c);
if (!temp)
goto onError;
pbuf = PyUnicode_AS_UNICODE(temp);
len = PyUnicode_GET_SIZE(temp);
sign = 1;
}
else {
pbuf = formatbuf;
len = formatint(pbuf, sizeof(formatbuf)/sizeof(Py_UNICODE),
flags, prec, c, v);
if (len < 0)
goto onError;
sign = 1;
}
if (flags & F_ZERO)
fill = '0';
break;
case 'e':
case 'E':
case 'f':
case 'F':
case 'g':
case 'G':
if (c == 'F')
c = 'f';
pbuf = formatbuf;
len = formatfloat(pbuf, sizeof(formatbuf)/sizeof(Py_UNICODE),
flags, prec, c, v);
if (len < 0)
goto onError;
sign = 1;
if (flags & F_ZERO)
fill = '0';
break;
case 'c':
pbuf = formatbuf;
len = formatchar(pbuf, sizeof(formatbuf)/sizeof(Py_UNICODE), v);
if (len < 0)
goto onError;
break;
default:
PyErr_Format(PyExc_ValueError,
"unsupported format character '%c' (0x%x) "
"at index %i",
(31<=c && c<=126) ? (char)c : '?',
(int)c,
(int)(fmt -1 - PyUnicode_AS_UNICODE(uformat)));
goto onError;
}
if (sign) {
if (*pbuf == '-' || *pbuf == '+') {
sign = *pbuf++;
len--;
}
else if (flags & F_SIGN)
sign = '+';
else if (flags & F_BLANK)
sign = ' ';
else
sign = 0;
}
if (width < len)
width = len;
if (rescnt - (sign != 0) < width) {
reslen -= rescnt;
rescnt = width + fmtcnt + 100;
reslen += rescnt;
if (reslen < 0) {
Py_XDECREF(temp);
PyErr_NoMemory();
goto onError;
}
if (_PyUnicode_Resize(&result, reslen) < 0) {
Py_XDECREF(temp);
goto onError;
}
res = PyUnicode_AS_UNICODE(result)
+ reslen - rescnt;
}
if (sign) {
if (fill != ' ')
*res++ = sign;
rescnt--;
if (width > len)
width--;
}
if ((flags & F_ALT) && (c == 'x' || c == 'X')) {
assert(pbuf[0] == '0');
assert(pbuf[1] == c);
if (fill != ' ') {
*res++ = *pbuf++;
*res++ = *pbuf++;
}
rescnt -= 2;
width -= 2;
if (width < 0)
width = 0;
len -= 2;
}
if (width > len && !(flags & F_LJUST)) {
do {
--rescnt;
*res++ = fill;
} while (--width > len);
}
if (fill == ' ') {
if (sign)
*res++ = sign;
if ((flags & F_ALT) && (c == 'x' || c == 'X')) {
assert(pbuf[0] == '0');
assert(pbuf[1] == c);
*res++ = *pbuf++;
*res++ = *pbuf++;
}
}
Py_UNICODE_COPY(res, pbuf, len);
res += len;
rescnt -= len;
while (--width >= len) {
--rescnt;
*res++ = ' ';
}
if (dict && (argidx < arglen) && c != '%') {
PyErr_SetString(PyExc_TypeError,
"not all arguments converted during string formatting");
Py_XDECREF(temp);
goto onError;
}
Py_XDECREF(temp);
} /* '%' */
} /* until end */
if (argidx < arglen && !dict) {
PyErr_SetString(PyExc_TypeError,
"not all arguments converted during string formatting");
goto onError;
}
if (_PyUnicode_Resize(&result, reslen - rescnt) < 0)
goto onError;
if (args_owned) {
Py_DECREF(args);
}
Py_DECREF(uformat);
return (PyObject *)result;
onError:
Py_XDECREF(result);
Py_DECREF(uformat);
if (args_owned) {
Py_DECREF(args);
}
return NULL;
}
static PyBufferProcs unicode_as_buffer = {
(readbufferproc) unicode_buffer_getreadbuf,
(writebufferproc) unicode_buffer_getwritebuf,
(segcountproc) unicode_buffer_getsegcount,
(charbufferproc) unicode_buffer_getcharbuf,
};
static PyObject *
unicode_subtype_new(PyTypeObject *type, PyObject *args, PyObject *kwds);
static PyObject *
unicode_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
PyObject *x = NULL;
static char *kwlist[] = {"string", "encoding", "errors", 0};
char *encoding = NULL;
char *errors = NULL;
if (type != &PyUnicode_Type)
return unicode_subtype_new(type, args, kwds);
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|Oss:unicode",
kwlist, &x, &encoding, &errors))
return NULL;
if (x == NULL)
return (PyObject *)_PyUnicode_New(0);
if (encoding == NULL && errors == NULL)
return PyObject_Unicode(x);
else
return PyUnicode_FromEncodedObject(x, encoding, errors);
}
static PyObject *
unicode_subtype_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
PyUnicodeObject *tmp, *pnew;
Py_ssize_t n;
assert(PyType_IsSubtype(type, &PyUnicode_Type));
tmp = (PyUnicodeObject *)unicode_new(&PyUnicode_Type, args, kwds);
if (tmp == NULL)
return NULL;
assert(PyUnicode_Check(tmp));
pnew = (PyUnicodeObject *) type->tp_alloc(type, n = tmp->length);
if (pnew == NULL) {
Py_DECREF(tmp);
return NULL;
}
pnew->str = PyMem_NEW(Py_UNICODE, n+1);
if (pnew->str == NULL) {
_Py_ForgetReference((PyObject *)pnew);
PyObject_Del(pnew);
Py_DECREF(tmp);
return PyErr_NoMemory();
}
Py_UNICODE_COPY(pnew->str, tmp->str, n+1);
pnew->length = n;
pnew->hash = tmp->hash;
Py_DECREF(tmp);
return (PyObject *)pnew;
}
PyDoc_STRVAR(unicode_doc,
"unicode(string [, encoding[, errors]]) -> object\n\
\n\
Create a new Unicode object from the given encoded string.\n\
encoding defaults to the current default string encoding.\n\
errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.");
PyTypeObject PyUnicode_Type = {
PyObject_HEAD_INIT(&PyType_Type)
0, /* ob_size */
"unicode", /* tp_name */
sizeof(PyUnicodeObject), /* tp_size */
0, /* tp_itemsize */
/* Slots */
(destructor)unicode_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
(cmpfunc) unicode_compare, /* tp_compare */
unicode_repr, /* tp_repr */
&unicode_as_number, /* tp_as_number */
&unicode_as_sequence, /* tp_as_sequence */
&unicode_as_mapping, /* tp_as_mapping */
(hashfunc) unicode_hash, /* tp_hash*/
0, /* tp_call*/
(reprfunc) unicode_str, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
&unicode_as_buffer, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
unicode_doc, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
unicode_methods, /* tp_methods */
0, /* tp_members */
0, /* tp_getset */
&PyBaseString_Type, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
unicode_new, /* tp_new */
PyObject_Del, /* tp_free */
};
/* Initialize the Unicode implementation */
void _PyUnicode_Init(void)
{
int i;
/* XXX - move this array to unicodectype.c ? */
Py_UNICODE linebreak[] = {
0x000A, /* LINE FEED */
0x000D, /* CARRIAGE RETURN */
0x001C, /* FILE SEPARATOR */
0x001D, /* GROUP SEPARATOR */
0x001E, /* RECORD SEPARATOR */
0x0085, /* NEXT LINE */
0x2028, /* LINE SEPARATOR */
0x2029, /* PARAGRAPH SEPARATOR */
};
/* Init the implementation */
unicode_freelist = NULL;
unicode_freelist_size = 0;
unicode_empty = _PyUnicode_New(0);
if (!unicode_empty)
return;
strcpy(unicode_default_encoding, "ascii");
for (i = 0; i < 256; i++)
unicode_latin1[i] = NULL;
if (PyType_Ready(&PyUnicode_Type) < 0)
Py_FatalError("Can't initialize 'unicode'");
/* initialize the linebreak bloom filter */
bloom_linebreak = make_bloom_mask(
linebreak, sizeof(linebreak) / sizeof(linebreak[0])
);
PyType_Ready(&EncodingMapType);
}
/* Finalize the Unicode implementation */
void
_PyUnicode_Fini(void)
{
PyUnicodeObject *u;
int i;
Py_XDECREF(unicode_empty);
unicode_empty = NULL;
for (i = 0; i < 256; i++) {
if (unicode_latin1[i]) {
Py_DECREF(unicode_latin1[i]);
unicode_latin1[i] = NULL;
}
}
for (u = unicode_freelist; u != NULL;) {
PyUnicodeObject *v = u;
u = *(PyUnicodeObject **)u;
if (v->str)
PyMem_DEL(v->str);
Py_XDECREF(v->defenc);
PyObject_Del(v);
}
unicode_freelist = NULL;
unicode_freelist_size = 0;
}
#ifdef __cplusplus
}
#endif
/*
Local variables:
c-basic-offset: 4
indent-tabs-mode: nil
End:
*/