Issue 9873: the URL parsing functions now accept ASCII encoded byte sequences in addition to character strings

2025-11-01 06:01:29 +00:00 · 2010-11-30 15:48:08 +00:00 · 2010-11-30 15:48:08 +00:00 · 9fc443cf59
commit 9fc443cf59
parent 43f0c27be7
5 changed files with 606 additions and 140 deletions
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@ -24,7 +24,15 @@ following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
 ``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
 ``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
-The :mod:`urllib.parse` module defines the following functions:
+The :mod:`urllib.parse` module defines functions that fall into two broad
 categories: URL parsing and URL quoting. These are covered in detail in
 the following sections.
 URL Parsing
 -----------
 The URL parsing functions focus on splitting a URL string into its components,
 or on combining URL components into a URL string.
 .. function:: urlparse(urlstring, scheme='', allow_fragments=True)
@ -242,6 +250,161 @@ The :mod:`urllib.parse` module defines the following functions:
   string.  If there is no fragment identifier in *url*, return *url* unmodified
   and an empty string.
   The return value is actually an instance of a subclass of :class:`tuple`.  This
   class has the following additional read-only convenience attributes:
   +------------------+-------+-------------------------+----------------------+
   | Attribute        | Index | Value                   | Value if not present |
   +==================+=======+=========================+======================+
   | :attr:`url`      | 0     | URL with no fragment    | empty string         |
   +------------------+-------+-------------------------+----------------------+
   | :attr:`fragment` | 1     | Fragment identifier     | empty string         |
   +------------------+-------+-------------------------+----------------------+
   See section :ref:`urlparse-result-object` for more information on the result
   object.
   .. versionchanged:: 3.2
      Result is a structured object rather than a simple 2-tuple
 Parsing ASCII Encoded Bytes
 ---------------------------
 The URL parsing functions were originally designed to operate on character
 strings only. In practice, it is useful to be able to manipulate properly
 quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
 URL parsing functions in this module all operate on :class:`bytes` and
 :class:`bytearray` objects in addition to :class:`str` objects.
 If :class:`str` data is passed in, the result will also contain only
 :class:`str` data. If :class:`bytes` or :class:`bytearray` data is
 passed in, the result will contain only :class:`bytes` data.
 Attempting to mix :class:`str` data with :class:`bytes` or
 :class:`bytearray` in a single function call will result in a
 :exc:`TypeError` being thrown, while attempting to pass in non-ASCII
 byte values will trigger :exc:`UnicodeDecodeError`.
 To support easier conversion of result objects between :class:`str` and
 :class:`bytes`, all return values from URL parsing functions provide
 either an :meth:`encode` method (when the result contains :class:`str`
 data) or a :meth:`decode` method (when the result contains :class:`bytes`
 data). The signatures of these methods match those of the corresponding
 :class:`str` and :class:`bytes` methods (except that the default encoding
 is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
 corresponding type that contains either :class:`bytes` data (for
 :meth:`encode` methods) or :class:`str` data (for
 :meth:`decode` methods).
 Applications that need to operate on potentially improperly quoted URLs
 that may contain non-ASCII data will need to do their own decoding from
 bytes to characters before invoking the URL parsing methods.
 The behaviour described in this section applies only to the URL parsing
 functions. The URL quoting functions use their own rules when producing
 or consuming byte sequences as detailed in the documentation of the
 individual URL quoting functions.
 .. versionchanged:: 3.2
   URL parsing functions now accept ASCII encoded byte sequences
 .. _urlparse-result-object:
 Structured Parse Results
 ------------------------
 The result objects from the :func:`urlparse`, :func:`urlsplit`  and
 :func:`urldefrag`functions are subclasses of the :class:`tuple` type.
 These subclasses add the attributes listed in the documentation for
 those functions, the encoding and decoding support described in the
 previous section, as well as an additional method:
 .. method:: urllib.parse.SplitResult.geturl()
   Return the re-combined version of the original URL as a string. This may
   differ from the original URL in that the scheme may be normalized to lower
   case and empty components may be dropped. Specifically, empty parameters,
   queries, and fragment identifiers will be removed.
   For :func:`urldefrag` results, only empty fragment identifiers will be removed.
   For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
   made to the URL returned by this method.
   The result of this method remains unchanged if passed back through the original
   parsing function:
      >>> from urllib.parse import urlsplit
      >>> url = 'HTTP://www.Python.org/doc/#'
      >>> r1 = urlsplit(url)
      >>> r1.geturl()
      'http://www.Python.org/doc/'
      >>> r2 = urlsplit(r1.geturl())
      >>> r2.geturl()
      'http://www.Python.org/doc/'
 The following classes provide the implementations of the structured parse
 results when operating on :class:`str` objects:
 .. class:: DefragResult(url, fragment)
   Concrete class for :func:`urldefrag` results containing :class:`str`
   data. The :meth:`encode` method returns a :class:`DefragResultBytes`
   instance.
   .. versionadded:: 3.2
 .. class:: ParseResult(scheme, netloc, path, params, query, fragment)
   Concrete class for :func:`urlparse` results containing :class:`str`
   data. The :meth:`encode` method returns a :class:`ParseResultBytes`
   instance.
 .. class:: SplitResult(scheme, netloc, path, query, fragment)
   Concrete class for :func:`urlsplit` results containing :class:`str`
   data. The :meth:`encode` method returns a :class:`SplitResultBytes`
   instance.
 The following classes provide the implementations of the parse results when
 operating on :class:`bytes` or :class:`bytearray` objects:
 .. class:: DefragResultBytes(url, fragment)
   Concrete class for :func:`urldefrag` results containing :class:`bytes`
   data. The :meth:`decode` method returns a :class:`DefragResult`
   instance.
   .. versionadded:: 3.2
 .. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
   Concrete class for :func:`urlparse` results containing :class:`bytes`
   data. The :meth:`decode` method returns a :class:`ParseResult`
   instance.
   .. versionadded:: 3.2
 .. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
   Concrete class for :func:`urlsplit` results containing :class:`bytes`
   data. The :meth:`decode` method returns a :class:`SplitResult`
   instance.
   .. versionadded:: 3.2
 URL Quoting
 -----------
 The URL quoting functions focus on taking program data and making it safe
 for use as URL components by quoting special characters and appropriately
 encoding non-ASCII text. They also support reversing these operations to
 recreate the original data from the contents of a URL component if that
 task isn't already covered by the URL parsing functions above.
 .. function:: quote(string, safe='/', encoding=None, errors=None)
@ -322,8 +485,7 @@ The :mod:`urllib.parse` module defines the following functions:
   If it is a :class:`str`, unescaped non-ASCII characters in *string*
   are encoded into UTF-8 bytes.
-   Example: ``unquote_to_bytes('a%26%EF')`` yields
+   Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
   ``b'a&\xef'``.
 .. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
@ -340,12 +502,13 @@ The :mod:`urllib.parse` module defines the following functions:
   the optional parameter *doseq* is evaluates to *True*, individual
   ``key=value`` pairs separated by ``'&'`` are generated for each element of
   the value sequence for the key.  The order of parameters in the encoded
-   string will match the order of parameter tuples in the sequence. This module
+   string will match the order of parameter tuples in the sequence.
   provides the functions :func:`parse_qs` and :func:`parse_qsl` which are used
   to parse query strings into Python data structures.
   When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
-   parameters are sent the :func:`quote_plus` for encoding.
+   parameters are passed down to :func:`quote_plus` for encoding.
   To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
   provided in this module to parse query strings into Python data structures.
   .. versionchanged:: 3.2
      Query parameter supports bytes and string objects.
@ -376,57 +539,3 @@ The :mod:`urllib.parse` module defines the following functions:
   :rfc:`1738` - Uniform Resource Locators (URL)
      This specifies the formal syntax and semantics of absolute URLs.
 .. _urlparse-result-object:
 Results of :func:`urlparse` and :func:`urlsplit`
 ------------------------------------------------
 The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
 subclasses of the :class:`tuple` type.  These subclasses add the attributes
 described in those functions, as well as provide an additional method:
 .. method:: ParseResult.geturl()
   Return the re-combined version of the original URL as a string. This may differ
   from the original URL in that the scheme will always be normalized to lower case
   and empty components may be dropped. Specifically, empty parameters, queries,
   and fragment identifiers will be removed.
   The result of this method is a fixpoint if passed back through the original
   parsing function:
      >>> import urllib.parse
      >>> url = 'HTTP://www.Python.org/doc/#'
      >>> r1 = urllib.parse.urlsplit(url)
      >>> r1.geturl()
      'http://www.Python.org/doc/'
      >>> r2 = urllib.parse.urlsplit(r1.geturl())
      >>> r2.geturl()
      'http://www.Python.org/doc/'
 The following classes provide the implementations of the parse results:
 .. class:: BaseResult
   Base class for the concrete result classes.  This provides most of the
   attribute definitions.  It does not provide a :meth:`geturl` method.  It is
   derived from :class:`tuple`, but does not override the :meth:`__init__` or
   :meth:`__new__` methods.
 .. class:: ParseResult(scheme, netloc, path, params, query, fragment)
   Concrete class for :func:`urlparse` results.  The :meth:`__new__` method is
   overridden to support checking that the right number of arguments are passed.
 .. class:: SplitResult(scheme, netloc, path, query, fragment)
   Concrete class for :func:`urlsplit` results.  The :meth:`__new__` method is
   overridden to support checking that the right number of arguments are passed.
--- a/Doc/whatsnew/3.2.rst
+++ b/Doc/whatsnew/3.2.rst
@ -573,6 +573,14 @@ New, Improved, and Deprecated Modules
  (Contributed by Rodolpho Eckhardt and Nick Coghlan, :issue:`10220`.)
 .. XXX: Mention inspect.getattr_static (Michael Foord)
 .. XXX: Mention urllib.parse changes
          Issue 9873 (Nick Coghlan):
            - ASCII byte sequence support in URL parsing
            - named tuple for urldefrag return value
          Issue 5468 (Dan Mahn) for urlencode:
            - bytes input support
            - non-UTF8 percent encoding of non-ASCII characters
          Issue 2987 for IPv6 (RFC2732) support in urlparse
 Multi-threading
 ===============
--- a/Lib/test/test_urlparse.py
+++ b/Lib/test/test_urlparse.py
@ -24,6 +24,17 @@
    ("&a=b", [('a', 'b')]),
    ("a=a+b&b=b+c", [('a', 'a b'), ('b', 'b c')]),
    ("a=1&a=2", [('a', '1'), ('a', '2')]),
    (b"", []),
    (b"&", []),
    (b"&&", []),
    (b"=", [(b'', b'')]),
    (b"=a", [(b'', b'a')]),
    (b"a", [(b'a', b'')]),
    (b"a=", [(b'a', b'')]),
    (b"a=", [(b'a', b'')]),
    (b"&a=b", [(b'a', b'b')]),
    (b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
    (b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
 ]
 class UrlParseTestCase(unittest.TestCase):
@ -86,7 +97,7 @@ def test_qsl(self):
    def test_roundtrips(self):
-        testcases = [
+        str_cases = [
            ('file:///tmp/junk.txt',
             ('file', '', '/tmp/junk.txt', '', '', ''),
             ('file', '', '/tmp/junk.txt', '', '')),
@ -110,16 +121,21 @@ def test_roundtrips(self):
            ('git+ssh', 'git@github.com','/user/project.git',
             '','',''),
            ('git+ssh', 'git@github.com','/user/project.git',
-             '', ''))
+             '', '')),
            ]
-        for url, parsed, split in testcases:
+        def _encode(t):
            return (t[0].encode('ascii'),
                    tuple(x.encode('ascii') for x in t[1]),
                    tuple(x.encode('ascii') for x in t[2]))
        bytes_cases = [_encode(x) for x in str_cases]
        for url, parsed, split in str_cases + bytes_cases:
            self.checkRoundtrips(url, parsed, split)
    def test_http_roundtrips(self):
        # urllib.parse.urlsplit treats 'http:' as an optimized special case,
        # so we test both 'http:' and 'https:' in all the following.
        # Three cheers for white box knowledge!
-        testcases = [
+        str_cases = [
            ('://www.python.org',
             ('www.python.org', '', '', '', ''),
             ('www.python.org', '', '', '')),
@ -136,19 +152,34 @@ def test_http_roundtrips(self):
             ('a', '/b/c/d', 'p', 'q', 'f'),
             ('a', '/b/c/d;p', 'q', 'f')),
            ]
-        for scheme in ('http', 'https'):
+        def _encode(t):
-            for url, parsed, split in testcases:
+            return (t[0].encode('ascii'),
-                url = scheme + url
+                    tuple(x.encode('ascii') for x in t[1]),
-                parsed = (scheme,) + parsed
+                    tuple(x.encode('ascii') for x in t[2]))
-                split = (scheme,) + split
+        bytes_cases = [_encode(x) for x in str_cases]
-                self.checkRoundtrips(url, parsed, split)
+        str_schemes = ('http', 'https')
        bytes_schemes = (b'http', b'https')
        str_tests = str_schemes, str_cases
        bytes_tests = bytes_schemes, bytes_cases
        for schemes, test_cases in (str_tests, bytes_tests):
            for scheme in schemes:
                for url, parsed, split in test_cases:
                    url = scheme + url
                    parsed = (scheme,) + parsed
                    split = (scheme,) + split
                    self.checkRoundtrips(url, parsed, split)
    def checkJoin(self, base, relurl, expected):
-        self.assertEqual(urllib.parse.urljoin(base, relurl), expected,
+        str_components = (base, relurl, expected)
-                         (base, relurl, expected))
+        self.assertEqual(urllib.parse.urljoin(base, relurl), expected)
        bytes_components = baseb, relurlb, expectedb = [
                            x.encode('ascii') for x in str_components]
        self.assertEqual(urllib.parse.urljoin(baseb, relurlb), expectedb)
    def test_unparse_parse(self):
-        for u in ['Python', './Python','x-newscheme://foo.com/stuff','x://y','x:/y','x:/','/',]:
+        str_cases = ['Python', './Python','x-newscheme://foo.com/stuff','x://y','x:/y','x:/','/',]
        bytes_cases = [x.encode('ascii') for x in str_cases]
        for u in str_cases + bytes_cases:
            self.assertEqual(urllib.parse.urlunsplit(urllib.parse.urlsplit(u)), u)
            self.assertEqual(urllib.parse.urlunparse(urllib.parse.urlparse(u)), u)
@ -328,7 +359,7 @@ def test_urljoins(self):
        self.checkJoin(SIMPLE_BASE, 'http:g?y/./x','http://a/b/c/g?y/./x')
    def test_RFC2732(self):
-        for url, hostname, port in [
+        str_cases = [
            ('http://Test.python.org:5432/foo/', 'test.python.org', 5432),
            ('http://12.34.56.78:5432/foo/', '12.34.56.78', 5432),
            ('http://[::1]:5432/foo/', '::1', 5432),
@ -349,20 +380,26 @@ def test_RFC2732(self):
            ('http://[::12.34.56.78]/foo/', '::12.34.56.78', None),
            ('http://[::ffff:12.34.56.78]/foo/',
             '::ffff:12.34.56.78', None),
-            ]:
+            ]
        def _encode(t):
            return t[0].encode('ascii'), t[1].encode('ascii'), t[2]
        bytes_cases = [_encode(x) for x in str_cases]
        for url, hostname, port in str_cases + bytes_cases:
            urlparsed = urllib.parse.urlparse(url)
            self.assertEqual((urlparsed.hostname, urlparsed.port) , (hostname, port))
-        for invalid_url in [
+        str_cases = [
                'http://::12.34.56.78]/',
                'http://[::1/foo/',
                'ftp://[::1/foo/bad]/bad',
                'http://[::1/foo/bad]/bad',
-                'http://[::ffff:12.34.56.78']:
+                'http://[::ffff:12.34.56.78']
        bytes_cases = [x.encode('ascii') for x in str_cases]
        for invalid_url in str_cases + bytes_cases:
            self.assertRaises(ValueError, urllib.parse.urlparse, invalid_url)
    def test_urldefrag(self):
-        for url, defrag, frag in [
+        str_cases = [
            ('http://python.org#frag', 'http://python.org', 'frag'),
            ('http://python.org', 'http://python.org', ''),
            ('http://python.org/#frag', 'http://python.org/', 'frag'),
@ -373,8 +410,16 @@ def test_urldefrag(self):
            ('http://python.org/p?q', 'http://python.org/p?q', ''),
            (RFC1808_BASE, 'http://a/b/c/d;p?q', 'f'),
            (RFC2396_BASE, 'http://a/b/c/d;p?q', ''),
-            ]:
+        ]
-            self.assertEqual(urllib.parse.urldefrag(url), (defrag, frag))
+        def _encode(t):
            return type(t)(x.encode('ascii') for x in t)
        bytes_cases = [_encode(x) for x in str_cases]
        for url, defrag, frag in str_cases + bytes_cases:
            result = urllib.parse.urldefrag(url)
            self.assertEqual(result.geturl(), url)
            self.assertEqual(result, (defrag, frag))
            self.assertEqual(result.url, defrag)
            self.assertEqual(result.fragment, frag)
    def test_urlsplit_attributes(self):
        url = "HTTP://WWW.PYTHON.ORG/doc/#frag"
@ -390,7 +435,8 @@ def test_urlsplit_attributes(self):
        self.assertEqual(p.port, None)
        # geturl() won't return exactly the original URL in this case
        # since the scheme is always case-normalized
-        #self.assertEqual(p.geturl(), url)
+        # We handle this by ignoring the first 4 characters of the URL
        self.assertEqual(p.geturl()[4:], url[4:])
        url = "http://User:Pass@www.python.org:080/doc/?query=yes#frag"
        p = urllib.parse.urlsplit(url)
@ -422,6 +468,45 @@ def test_urlsplit_attributes(self):
        self.assertEqual(p.port, 80)
        self.assertEqual(p.geturl(), url)
        # And check them all again, only with bytes this time
        url = b"HTTP://WWW.PYTHON.ORG/doc/#frag"
        p = urllib.parse.urlsplit(url)
        self.assertEqual(p.scheme, b"http")
        self.assertEqual(p.netloc, b"WWW.PYTHON.ORG")
        self.assertEqual(p.path, b"/doc/")
        self.assertEqual(p.query, b"")
        self.assertEqual(p.fragment, b"frag")
        self.assertEqual(p.username, None)
        self.assertEqual(p.password, None)
        self.assertEqual(p.hostname, b"www.python.org")
        self.assertEqual(p.port, None)
        self.assertEqual(p.geturl()[4:], url[4:])
        url = b"http://User:Pass@www.python.org:080/doc/?query=yes#frag"
        p = urllib.parse.urlsplit(url)
        self.assertEqual(p.scheme, b"http")
        self.assertEqual(p.netloc, b"User:Pass@www.python.org:080")
        self.assertEqual(p.path, b"/doc/")
        self.assertEqual(p.query, b"query=yes")
        self.assertEqual(p.fragment, b"frag")
        self.assertEqual(p.username, b"User")
        self.assertEqual(p.password, b"Pass")
        self.assertEqual(p.hostname, b"www.python.org")
        self.assertEqual(p.port, 80)
        self.assertEqual(p.geturl(), url)
        url = b"http://User@example.com:Pass@www.python.org:080/doc/?query=yes#frag"
        p = urllib.parse.urlsplit(url)
        self.assertEqual(p.scheme, b"http")
        self.assertEqual(p.netloc, b"User@example.com:Pass@www.python.org:080")
        self.assertEqual(p.path, b"/doc/")
        self.assertEqual(p.query, b"query=yes")
        self.assertEqual(p.fragment, b"frag")
        self.assertEqual(p.username, b"User@example.com")
        self.assertEqual(p.password, b"Pass")
        self.assertEqual(p.hostname, b"www.python.org")
        self.assertEqual(p.port, 80)
        self.assertEqual(p.geturl(), url)
    def test_attributes_bad_port(self):
        """Check handling of non-integer ports."""
@ -433,6 +518,15 @@ def test_attributes_bad_port(self):
        self.assertEqual(p.netloc, "www.example.net:foo")
        self.assertRaises(ValueError, lambda: p.port)
        # Once again, repeat ourselves to test bytes
        p = urllib.parse.urlsplit(b"http://www.example.net:foo")
        self.assertEqual(p.netloc, b"www.example.net:foo")
        self.assertRaises(ValueError, lambda: p.port)
        p = urllib.parse.urlparse(b"http://www.example.net:foo")
        self.assertEqual(p.netloc, b"www.example.net:foo")
        self.assertRaises(ValueError, lambda: p.port)
    def test_attributes_without_netloc(self):
        # This example is straight from RFC 3261.  It looks like it
        # should allow the username, hostname, and port to be filled
@ -456,10 +550,30 @@ def test_attributes_without_netloc(self):
        self.assertEqual(p.port, None)
        self.assertEqual(p.geturl(), uri)
        # You guessed it, repeating the test with bytes input
        uri = b"sip:alice@atlanta.com;maddr=239.255.255.1;ttl=15"
        p = urllib.parse.urlsplit(uri)
        self.assertEqual(p.netloc, b"")
        self.assertEqual(p.username, None)
        self.assertEqual(p.password, None)
        self.assertEqual(p.hostname, None)
        self.assertEqual(p.port, None)
        self.assertEqual(p.geturl(), uri)
        p = urllib.parse.urlparse(uri)
        self.assertEqual(p.netloc, b"")
        self.assertEqual(p.username, None)
        self.assertEqual(p.password, None)
        self.assertEqual(p.hostname, None)
        self.assertEqual(p.port, None)
        self.assertEqual(p.geturl(), uri)
    def test_noslash(self):
        # Issue 1637: http://foo.com?query is legal
        self.assertEqual(urllib.parse.urlparse("http://example.com?blahblah=/foo"),
                         ('http', 'example.com', '', '', 'blahblah=/foo', ''))
        self.assertEqual(urllib.parse.urlparse(b"http://example.com?blahblah=/foo"),
                         (b'http', b'example.com', b'', b'', b'blahblah=/foo', b''))
    def test_withoutscheme(self):
        # Test urlparse without scheme
@ -472,6 +586,13 @@ def test_withoutscheme(self):
                ('','www.python.org:80','','','',''))
        self.assertEqual(urllib.parse.urlparse("http://www.python.org:80"),
                ('http','www.python.org:80','','','',''))
        # Repeat for bytes input
        self.assertEqual(urllib.parse.urlparse(b"path"),
                (b'',b'',b'path',b'',b'',b''))
        self.assertEqual(urllib.parse.urlparse(b"//www.python.org:80"),
                (b'',b'www.python.org:80',b'',b'',b'',b''))
        self.assertEqual(urllib.parse.urlparse(b"http://www.python.org:80"),
                (b'http',b'www.python.org:80',b'',b'',b'',b''))
    def test_portseparator(self):
        # Issue 754016 makes changes for port separator ':' from scheme separator
@ -481,6 +602,13 @@ def test_portseparator(self):
        self.assertEqual(urllib.parse.urlparse("https:"),('https','','','','',''))
        self.assertEqual(urllib.parse.urlparse("http://www.python.org:80"),
                ('http','www.python.org:80','','','',''))
        # As usual, need to check bytes input as well
        self.assertEqual(urllib.parse.urlparse(b"path:80"),
                (b'',b'',b'path:80',b'',b'',b''))
        self.assertEqual(urllib.parse.urlparse(b"http:"),(b'http',b'',b'',b'',b'',b''))
        self.assertEqual(urllib.parse.urlparse(b"https:"),(b'https',b'',b'',b'',b'',b''))
        self.assertEqual(urllib.parse.urlparse(b"http://www.python.org:80"),
                (b'http',b'www.python.org:80',b'',b'',b'',b''))
    def test_usingsys(self):
        # Issue 3314: sys module is used in the error
@ -492,6 +620,71 @@ def test_anyscheme(self):
                         ('s3', 'foo.com', '/stuff', '', '', ''))
        self.assertEqual(urllib.parse.urlparse("x-newscheme://foo.com/stuff"),
                         ('x-newscheme', 'foo.com', '/stuff', '', '', ''))
        # And for bytes...
        self.assertEqual(urllib.parse.urlparse(b"s3://foo.com/stuff"),
                         (b's3', b'foo.com', b'/stuff', b'', b'', b''))
        self.assertEqual(urllib.parse.urlparse(b"x-newscheme://foo.com/stuff"),
                         (b'x-newscheme', b'foo.com', b'/stuff', b'', b'', b''))
    def test_mixed_types_rejected(self):
        # Several functions that process either strings or ASCII encoded bytes
        # accept multiple arguments. Check they reject mixed type input
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlparse("www.python.org", b"http")
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlparse(b"www.python.org", "http")
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlsplit("www.python.org", b"http")
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlsplit(b"www.python.org", "http")
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlunparse(( b"http", "www.python.org","","","",""))
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlunparse(("http", b"www.python.org","","","",""))
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlunsplit((b"http", "www.python.org","","",""))
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urlunsplit(("http", b"www.python.org","","",""))
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urljoin("http://python.org", b"http://python.org")
        with self.assertRaisesRegexp(TypeError, "Cannot mix str"):
            urllib.parse.urljoin(b"http://python.org", "http://python.org")
    def _check_result_type(self, str_type):
        num_args = len(str_type._fields)
        bytes_type = str_type._encoded_counterpart
        self.assertIs(bytes_type._decoded_counterpart, str_type)
        str_args = ('',) * num_args
        bytes_args = (b'',) * num_args
        str_result = str_type(*str_args)
        bytes_result = bytes_type(*bytes_args)
        encoding = 'ascii'
        errors = 'strict'
        self.assertEqual(str_result, str_args)
        self.assertEqual(bytes_result.decode(), str_args)
        self.assertEqual(bytes_result.decode(), str_result)
        self.assertEqual(bytes_result.decode(encoding), str_args)
        self.assertEqual(bytes_result.decode(encoding), str_result)
        self.assertEqual(bytes_result.decode(encoding, errors), str_args)
        self.assertEqual(bytes_result.decode(encoding, errors), str_result)
        self.assertEqual(bytes_result, bytes_args)
        self.assertEqual(str_result.encode(), bytes_args)
        self.assertEqual(str_result.encode(), bytes_result)
        self.assertEqual(str_result.encode(encoding), bytes_args)
        self.assertEqual(str_result.encode(encoding), bytes_result)
        self.assertEqual(str_result.encode(encoding, errors), bytes_args)
        self.assertEqual(str_result.encode(encoding, errors), bytes_result)
    def test_result_pairs(self):
        # Check encoding and decoding between result pairs
        result_types = [
          urllib.parse.DefragResult,
          urllib.parse.SplitResult,
          urllib.parse.ParseResult,
        ]
        for result_type in result_types:
            self._check_result_type(result_type)
 def test_main():
    support.run_unittest(UrlParseTestCase)
--- a/Lib/urllib/parse.py
+++ b/Lib/urllib/parse.py
@ -60,6 +60,7 @@
                '0123456789'
                '+-.')
 # XXX: Consider replacing with functools.lru_cache
 MAX_CACHE_SIZE = 20
 _parse_cache = {}
@ -69,66 +70,210 @@ def clear_cache():
    _safe_quoters.clear()
-class ResultMixin(object):
+# Helpers for bytes handling
-    """Shared methods for the parsed result objects."""
+# For 3.2, we deliberately require applications that
 # handle improperly quoted URLs to do their own
 # decoding and encoding. If valid use cases are
 # presented, we may relax this by using latin-1
 # decoding internally for 3.3
 _implicit_encoding = 'ascii'
 _implicit_errors = 'strict'
 def _noop(obj):
    return obj
 def _encode_result(obj, encoding=_implicit_encoding,
                        errors=_implicit_errors):
    return obj.encode(encoding, errors)
 def _decode_args(args, encoding=_implicit_encoding,
                       errors=_implicit_errors):
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
 def _coerce_args(*args):
    # Invokes decode if necessary to create str args
    # and returns the coerced inputs along with
    # an appropriate result coercion function
    #   - noop for str inputs
    #   - encoding function otherwise
    str_input = isinstance(args[0], str)
    for arg in args[1:]:
        # We special-case the empty string to support the
        # "scheme=''" default argument to some functions
        if arg and isinstance(arg, str) != str_input:
            raise TypeError("Cannot mix str and non-str arguments")
    if str_input:
        return args + (_noop,)
    return _decode_args(args) + (_encode_result,)
 # Result objects are more helpful than simple tuples
 class _ResultMixinStr(object):
    """Standard approach to encoding parsed results from str to bytes"""
    __slots__ = ()
    def encode(self, encoding='ascii', errors='strict'):
        return self._encoded_counterpart(*(x.encode(encoding, errors) for x in self))
 class _ResultMixinBytes(object):
    """Standard approach to decoding parsed results from bytes to str"""
    __slots__ = ()
    def decode(self, encoding='ascii', errors='strict'):
        return self._decoded_counterpart(*(x.decode(encoding, errors) for x in self))
 class _NetlocResultMixinBase(object):
    """Shared methods for the parsed result objects containing a netloc element"""
    __slots__ = ()
    @property
    def username(self):
-        netloc = self.netloc
+        return self._userinfo[0]
        if "@" in netloc:
            userinfo = netloc.rsplit("@", 1)[0]
            if ":" in userinfo:
                userinfo = userinfo.split(":", 1)[0]
            return userinfo
        return None
    @property
    def password(self):
-        netloc = self.netloc
+        return self._userinfo[1]
        if "@" in netloc:
            userinfo = netloc.rsplit("@", 1)[0]
            if ":" in userinfo:
                return userinfo.split(":", 1)[1]
        return None
    @property
    def hostname(self):
-        netloc = self.netloc.split('@')[-1]
+        hostname = self._hostinfo[0]
-        if '[' in netloc and ']' in netloc:
+        if not hostname:
-            return netloc.split(']')[0][1:].lower()
+            hostname = None
-        elif ':' in netloc:
+        elif hostname is not None:
-            return netloc.split(':')[0].lower()
+            hostname = hostname.lower()
-        elif netloc == '':
+        return hostname
            return None
        else:
            return netloc.lower()
    @property
    def port(self):
-        netloc = self.netloc.split('@')[-1].split(']')[-1]
+        port = self._hostinfo[1]
-        if ':' in netloc:
+        if port is not None:
-            port = netloc.split(':')[1]
+            port = int(port, 10)
-            return int(port, 10)
+        return port
 class _NetlocResultMixinStr(_NetlocResultMixinBase, _ResultMixinStr):
    __slots__ = ()
    @property
    def _userinfo(self):
        netloc = self.netloc
        userinfo, have_info, hostinfo = netloc.rpartition('@')
        if have_info:
            username, have_password, password = userinfo.partition(':')
            if not have_password:
                password = None
        else:
-            return None
+            username = password = None
        return username, password
    @property
    def _hostinfo(self):
        netloc = self.netloc
        _, _, hostinfo = netloc.rpartition('@')
        _, have_open_br, bracketed = hostinfo.partition('[')
        if have_open_br:
            hostname, _, port = bracketed.partition(']')
            _, have_port, port = port.partition(':')
        else:
            hostname, have_port, port = hostinfo.partition(':')
        if not have_port:
            port = None
        return hostname, port
 class _NetlocResultMixinBytes(_NetlocResultMixinBase, _ResultMixinBytes):
    __slots__ = ()
    @property
    def _userinfo(self):
        netloc = self.netloc
        userinfo, have_info, hostinfo = netloc.rpartition(b'@')
        if have_info:
            username, have_password, password = userinfo.partition(b':')
            if not have_password:
                password = None
        else:
            username = password = None
        return username, password
    @property
    def _hostinfo(self):
        netloc = self.netloc
        _, _, hostinfo = netloc.rpartition(b'@')
        _, have_open_br, bracketed = hostinfo.partition(b'[')
        if have_open_br:
            hostname, _, port = bracketed.partition(b']')
            _, have_port, port = port.partition(b':')
        else:
            hostname, have_port, port = hostinfo.partition(b':')
        if not have_port:
            port = None
        return hostname, port
 from collections import namedtuple
-class SplitResult(namedtuple('SplitResult', 'scheme netloc path query fragment'), ResultMixin):
+_DefragResultBase = namedtuple('DefragResult', 'url fragment')
 _SplitResultBase = namedtuple('SplitResult', 'scheme netloc path query fragment')
 _ParseResultBase = namedtuple('ParseResult', 'scheme netloc path params query fragment')
 # For backwards compatibility, alias _NetlocResultMixinStr
 # ResultBase is no longer part of the documented API, but it is
 # retained since deprecating it isn't worth the hassle
 ResultBase = _NetlocResultMixinStr
 # Structured result objects for string data
 class DefragResult(_DefragResultBase, _ResultMixinStr):
    __slots__ = ()
    def geturl(self):
        if self.fragment:
            return self.url + '#' + self.fragment
        else:
            return self.url
 class SplitResult(_SplitResultBase, _NetlocResultMixinStr):
    __slots__ = ()
    def geturl(self):
        return urlunsplit(self)
-
+class ParseResult(_ParseResultBase, _NetlocResultMixinStr):
 class ParseResult(namedtuple('ParseResult', 'scheme netloc path params query fragment'), ResultMixin):
    __slots__ = ()
    def geturl(self):
        return urlunparse(self)
 # Structured result objects for bytes data
 class DefragResultBytes(_DefragResultBase, _ResultMixinBytes):
    __slots__ = ()
    def geturl(self):
        if self.fragment:
            return self.url + b'#' + self.fragment
        else:
            return self.url
 class SplitResultBytes(_SplitResultBase, _NetlocResultMixinBytes):
    __slots__ = ()
    def geturl(self):
        return urlunsplit(self)
 class ParseResultBytes(_ParseResultBase, _NetlocResultMixinBytes):
    __slots__ = ()
    def geturl(self):
        return urlunparse(self)
 # Set up the encode/decode result pairs
 def _fix_result_transcoding():
    _result_pairs = (
        (DefragResult, DefragResultBytes),
        (SplitResult, SplitResultBytes),
        (ParseResult, ParseResultBytes),
    )
    for _decoded, _encoded in _result_pairs:
        _decoded._encoded_counterpart = _encoded
        _encoded._decoded_counterpart = _decoded
 _fix_result_transcoding()
 del _fix_result_transcoding
 def urlparse(url, scheme='', allow_fragments=True):
    """Parse a URL into 6 components:
@ -136,13 +281,15 @@ def urlparse(url, scheme='', allow_fragments=True):
    Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    tuple = urlsplit(url, scheme, allow_fragments)
    scheme, netloc, url, query, fragment = tuple
    if scheme in uses_params and ';' in url:
        url, params = _splitparams(url)
    else:
        params = ''
-    return ParseResult(scheme, netloc, url, params, query, fragment)
+    result = ParseResult(scheme, netloc, url, params, query, fragment)
    return _coerce_result(result)
 def _splitparams(url):
    if '/'  in url:
@ -167,11 +314,12 @@ def urlsplit(url, scheme='', allow_fragments=True):
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    allow_fragments = bool(allow_fragments)
    key = url, scheme, allow_fragments, type(url), type(scheme)
    cached = _parse_cache.get(key, None)
    if cached:
-        return cached
+        return _coerce_result(cached)
    if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
        clear_cache()
    netloc = query = fragment = ''
@ -191,7 +339,7 @@ def urlsplit(url, scheme='', allow_fragments=True):
                url, query = url.split('?', 1)
            v = SplitResult(scheme, netloc, url, query, fragment)
            _parse_cache[key] = v
-            return v
+            return _coerce_result(v)
        if url.endswith(':') or not url[i+1].isdigit():
            for c in url[:i]:
                if c not in scheme_chars:
@ -209,17 +357,18 @@ def urlsplit(url, scheme='', allow_fragments=True):
        url, query = url.split('?', 1)
    v = SplitResult(scheme, netloc, url, query, fragment)
    _parse_cache[key] = v
-    return v
+    return _coerce_result(v)
 def urlunparse(components):
    """Put a parsed URL back together again.  This may result in a
    slightly different, but equivalent URL, if the URL that was parsed
    originally had redundant delimiters, e.g. a ? with an empty query
    (the draft states that these are equivalent)."""
-    scheme, netloc, url, params, query, fragment = components
+    scheme, netloc, url, params, query, fragment, _coerce_result = (
                                                  _coerce_args(*components))
    if params:
        url = "%s;%s" % (url, params)
-    return urlunsplit((scheme, netloc, url, query, fragment))
+    return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))
 def urlunsplit(components):
    """Combine the elements of a tuple as returned by urlsplit() into a
@ -227,7 +376,8 @@ def urlunsplit(components):
    This may result in a slightly different, but equivalent URL, if the URL that
    was parsed originally had unnecessary delimiters (for example, a ? with an
    empty query; the RFC states that these are equivalent)."""
-    scheme, netloc, url, query, fragment = components
+    scheme, netloc, url, query, fragment, _coerce_result = (
                                          _coerce_args(*components))
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url
        url = '//' + (netloc or '') + url
@ -237,7 +387,7 @@ def urlunsplit(components):
        url = url + '?' + query
    if fragment:
        url = url + '#' + fragment
-    return url
+    return _coerce_result(url)
 def urljoin(base, url, allow_fragments=True):
    """Join a base URL and a possibly relative URL to form an absolute
@ -246,32 +396,33 @@ def urljoin(base, url, allow_fragments=True):
        return url
    if not url:
        return base
    base, url, _coerce_result = _coerce_args(base, url)
    bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
            urlparse(base, '', allow_fragments)
    scheme, netloc, path, params, query, fragment = \
            urlparse(url, bscheme, allow_fragments)
    if scheme != bscheme or scheme not in uses_relative:
-        return url
+        return _coerce_result(url)
    if scheme in uses_netloc:
        if netloc:
-            return urlunparse((scheme, netloc, path,
+            return _coerce_result(urlunparse((scheme, netloc, path,
-                               params, query, fragment))
+                                              params, query, fragment)))
        netloc = bnetloc
    if path[:1] == '/':
-        return urlunparse((scheme, netloc, path,
+        return _coerce_result(urlunparse((scheme, netloc, path,
-                           params, query, fragment))
+                                          params, query, fragment)))
    if not path:
        path = bpath
        if not params:
            params = bparams
        else:
            path = path[:-1]
-            return urlunparse((scheme, netloc, path,
+            return _coerce_result(urlunparse((scheme, netloc, path,
-                                params, query, fragment))
+                                              params, query, fragment)))
        if not query:
            query = bquery
-        return urlunparse((scheme, netloc, path,
+        return _coerce_result(urlunparse((scheme, netloc, path,
-                           params, query, fragment))
+                                          params, query, fragment)))
    segments = bpath.split('/')[:-1] + path.split('/')
    # XXX The stuff below is bogus in various ways...
    if segments[-1] == '.':
@ -293,8 +444,8 @@ def urljoin(base, url, allow_fragments=True):
        segments[-1] = ''
    elif len(segments) >= 2 and segments[-1] == '..':
        segments[-2:] = ['']
-    return urlunparse((scheme, netloc, '/'.join(segments),
+    return _coerce_result(urlunparse((scheme, netloc, '/'.join(segments),
-                       params, query, fragment))
+                                      params, query, fragment)))
 def urldefrag(url):
    """Removes any existing fragment from URL.
@ -303,12 +454,14 @@ def urldefrag(url):
    the URL contained no fragments, the second element is the
    empty string.
    """
    url, _coerce_result = _coerce_args(url)
    if '#' in url:
        s, n, p, a, q, frag = urlparse(url)
        defrag = urlunparse((s, n, p, a, q, ''))
        return defrag, frag
    else:
-        return url, ''
+        frag = ''
        defrag = url
    return _coerce_result(DefragResult(defrag, frag))
 def unquote_to_bytes(string):
    """unquote_to_bytes('abc%20def') -> b'abc def'."""
@ -420,6 +573,7 @@ def parse_qsl(qs, keep_blank_values=False, strict_parsing=False):
    Returns a list, as G-d intended.
    """
    qs, _coerce_result = _coerce_args(qs)
    pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
    r = []
    for name_value in pairs:
@ -435,10 +589,9 @@ def parse_qsl(qs, keep_blank_values=False, strict_parsing=False):
            else:
                continue
        if len(nv[1]) or keep_blank_values:
-            name = unquote(nv[0].replace('+', ' '))
+            name = _coerce_result(unquote(nv[0].replace('+', ' ')))
-            value = unquote(nv[1].replace('+', ' '))
+            value = _coerce_result(unquote(nv[1].replace('+', ' ')))
            r.append((name, value))
    return r
 def unquote_plus(string, encoding='utf-8', errors='replace'):
--- a/Misc/NEWS
+++ b/Misc/NEWS
@ -43,6 +43,9 @@ Core and Builtins
 Library
 -------
 - Issue #9873: The URL parsing functions in urllib.parse now accept
  ASCII byte sequences as input in addition to character strings.
 - Issue #10586: The statistics API for the new functools.lru_cache has
  been changed to a single cache_info() method returning a named tuple.