mirror of
https://github.com/python/cpython.git
synced 2026-06-18 15:42:21 +00:00
[3.14] gh-134837: Correct and improve base85 documentation for base64 module (GH-145843) (GH-149743)
(cherry picked from commit e667d62f11)
Co-authored-by: David Huggins-Daines <dhd@ecolingui.ca>
This commit is contained in:
parent
76ecef8099
commit
9ad8a1b955
2 changed files with 81 additions and 43 deletions
|
|
@ -16,8 +16,10 @@
|
|||
This module provides functions for encoding binary data to printable
|
||||
ASCII characters and decoding such encodings back to binary data.
|
||||
This includes the :ref:`encodings specified in <base64-rfc-4648>`
|
||||
:rfc:`4648` (Base64, Base32 and Base16)
|
||||
and the non-standard :ref:`Base85 encodings <base64-base-85>`.
|
||||
:rfc:`4648` (Base64, Base32 and Base16), the :ref:`Base85 encoding
|
||||
<base64-base-85>` specified in `PDF 2.0
|
||||
<https://pdfa.org/resource/iso-32000-2/>`_, and non-standard variants
|
||||
of Base85 used elsewhere.
|
||||
|
||||
There are two interfaces provided by this module. The modern interface
|
||||
supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
|
||||
|
|
@ -189,19 +191,28 @@ POST request.
|
|||
Base85 Encodings
|
||||
-----------------
|
||||
|
||||
Base85 encoding is not formally specified but rather a de facto standard,
|
||||
thus different systems perform the encoding differently.
|
||||
Base85 encoding is a family of algorithms which represent four bytes
|
||||
using five ASCII characters. Originally implemented in the Unix
|
||||
``btoa(1)`` utility, a version of it was later adopted by Adobe in the
|
||||
PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
|
||||
This version, in both its ``btoa`` and PDF variants, is implemented by
|
||||
:func:`a85encode`.
|
||||
|
||||
The :func:`a85encode` and :func:`b85encode` functions in this module are two implementations of
|
||||
the de facto standard. You should call the function with the Base85
|
||||
implementation used by the software you intend to work with.
|
||||
A separate version, using a different output character set, was
|
||||
defined as an April Fool's joke in :rfc:`1924` but is now used by Git
|
||||
and other software. This version is implemented by :func:`b85encode`.
|
||||
|
||||
The two functions present in this module differ in how they handle the following:
|
||||
Finally, a third version, using yet another output character set
|
||||
designed for safe inclusion in programming language strings, is
|
||||
defined by ZeroMQ and implemented here by :func:`z85encode`.
|
||||
|
||||
* Whether to include enclosing ``<~`` and ``~>`` markers
|
||||
* Whether to include newline characters
|
||||
* The set of ASCII characters used for encoding
|
||||
* Handling of null bytes
|
||||
The functions present in this module differ in how they handle the following:
|
||||
|
||||
* Whether to include and expect enclosing ``<~`` and ``~>`` markers.
|
||||
* Whether to fold the input into multiple lines.
|
||||
* The set of ASCII characters used for encoding.
|
||||
* Compact encodings of sequences of spaces and null bytes.
|
||||
* The encoding of zero-padding bytes applied to the input.
|
||||
|
||||
Refer to the documentation of the individual functions for more information.
|
||||
|
||||
|
|
@ -212,17 +223,22 @@ Refer to the documentation of the individual functions for more information.
|
|||
|
||||
*foldspaces* is an optional flag that uses the special short sequence 'y'
|
||||
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
|
||||
feature is not supported by the "standard" Ascii85 encoding.
|
||||
feature is not supported by the standard encoding used in PDF.
|
||||
|
||||
*wrapcol* controls whether the output should have newline (``b'\n'``)
|
||||
characters added to it. If this is non-zero, each output line will be
|
||||
at most this many characters long, excluding the trailing newline.
|
||||
|
||||
*pad* controls whether the input is padded to a multiple of 4
|
||||
before encoding. Note that the ``btoa`` implementation always pads.
|
||||
*pad* controls whether zero-padding applied to the end of the input
|
||||
is fully retained in the output encoding, as done by ``btoa``,
|
||||
producing an exact multiple of 5 bytes of output. This is not part
|
||||
of the standard encoding used in PDF, as it does not preserve the
|
||||
length of the data.
|
||||
|
||||
*adobe* controls whether the encoded byte sequence is framed with ``<~``
|
||||
and ``~>``, which is used by the Adobe implementation.
|
||||
*adobe* controls whether the encoded byte sequence is framed with
|
||||
``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
|
||||
that while ASCII85Decode streams in PDF documents *must* be
|
||||
terminated with ``~>``, they *must not* use a leading ``<~``.
|
||||
|
||||
.. versionadded:: 3.4
|
||||
|
||||
|
|
@ -234,10 +250,12 @@ Refer to the documentation of the individual functions for more information.
|
|||
|
||||
*foldspaces* is a flag that specifies whether the 'y' short sequence
|
||||
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
|
||||
This feature is not supported by the "standard" Ascii85 encoding.
|
||||
This feature is not supported by the standard Ascii85 encoding used in
|
||||
PDF and PostScript.
|
||||
|
||||
*adobe* controls whether the input sequence is in Adobe Ascii85 format
|
||||
(i.e. is framed with <~ and ~>).
|
||||
*adobe* controls whether the ``<~`` and ``~>`` markers are
|
||||
present. While the leading ``<~`` is not required, the input must
|
||||
end with ``~>``, or a :exc:`ValueError` is raised.
|
||||
|
||||
*ignorechars* should be a byte string containing characters to ignore
|
||||
from the input. This should only contain whitespace characters, and by
|
||||
|
|
@ -251,8 +269,11 @@ Refer to the documentation of the individual functions for more information.
|
|||
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
|
||||
git-style binary diffs) and return the encoded :class:`bytes`.
|
||||
|
||||
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
|
||||
multiple of 4 bytes before encoding.
|
||||
The input is padded with ``b'\0'`` so its length is a multiple of 4
|
||||
bytes before encoding. If *pad* is true, all the resulting
|
||||
characters are retained in the output, which will always be a
|
||||
multiple of 5 bytes, and thus the length of the data may not be
|
||||
preserved on decoding.
|
||||
|
||||
.. versionadded:: 3.4
|
||||
|
||||
|
|
@ -260,8 +281,7 @@ Refer to the documentation of the individual functions for more information.
|
|||
.. function:: b85decode(b)
|
||||
|
||||
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
|
||||
return the decoded :class:`bytes`. Padding is implicitly removed, if
|
||||
necessary.
|
||||
return the decoded :class:`bytes`.
|
||||
|
||||
.. versionadded:: 3.4
|
||||
|
||||
|
|
@ -269,8 +289,12 @@ Refer to the documentation of the individual functions for more information.
|
|||
.. function:: z85encode(s)
|
||||
|
||||
Encode the :term:`bytes-like object` *s* using Z85 (as used in ZeroMQ)
|
||||
and return the encoded :class:`bytes`. See `Z85 specification
|
||||
<https://rfc.zeromq.org/spec/32/>`_ for more information.
|
||||
and return the encoded :class:`bytes`.
|
||||
|
||||
The `ZeroMQ specification <https://rfc.zeromq.org/spec/32/>`_
|
||||
requires the length of Z85-encoded data to be a multiple of 5
|
||||
bytes. To produce compliant data frames, you must pad the input
|
||||
data to this function to a multiple of 4 bytes.
|
||||
|
||||
.. versionadded:: 3.13
|
||||
|
||||
|
|
@ -278,8 +302,7 @@ Refer to the documentation of the individual functions for more information.
|
|||
.. function:: z85decode(s)
|
||||
|
||||
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
|
||||
return the decoded :class:`bytes`. See `Z85 specification
|
||||
<https://rfc.zeromq.org/spec/32/>`_ for more information.
|
||||
return the decoded :class:`bytes`.
|
||||
|
||||
.. versionadded:: 3.13
|
||||
|
||||
|
|
@ -352,3 +375,11 @@ recommended to review the security section for any code deployed to production.
|
|||
Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
|
||||
base64 encoding.
|
||||
|
||||
`ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/>`_
|
||||
Section 7.4.3, "ASCII85Decode Filter," provides the definition
|
||||
of the Ascii85 encoding used in PDF and PostScript, including
|
||||
the output character set and the details of data length preservation
|
||||
using zero-padding and partial output groups.
|
||||
|
||||
`ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/>`_
|
||||
The "Formal Specification" section provides the character set used in Z85.
|
||||
|
|
|
|||
|
|
@ -325,17 +325,20 @@ def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
|
|||
|
||||
foldspaces is an optional flag that uses the special short sequence 'y'
|
||||
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
|
||||
feature is not supported by the "standard" Adobe encoding.
|
||||
feature is not supported by the standard encoding used in PDF.
|
||||
|
||||
wrapcol controls whether the output should have newline (b'\\n') characters
|
||||
added to it. If this is non-zero, each output line will be at most this
|
||||
many characters long, excluding the trailing newline.
|
||||
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
|
||||
every wrapcol characters.
|
||||
|
||||
pad controls whether the input is padded to a multiple of 4 before
|
||||
encoding. Note that the btoa implementation always pads.
|
||||
pad controls whether zero-padding applied to the end of the input
|
||||
is fully retained in the output encoding, as done by btoa,
|
||||
producing an exact multiple of 5 bytes of output.
|
||||
|
||||
adobe controls whether the encoded byte sequence is framed with <~
|
||||
and ~>, as in a PostScript base-85 string literal. Note that
|
||||
while ASCII85Decode streams in PDF documents must be terminated
|
||||
with ~>, they must not use a leading <~.
|
||||
|
||||
adobe controls whether the encoded byte sequence is framed with <~ and ~>,
|
||||
which is used by the Adobe implementation.
|
||||
"""
|
||||
global _a85chars, _a85chars2
|
||||
# Delay the initialization of tables to not waste memory
|
||||
|
|
@ -364,12 +367,14 @@ def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
|
|||
def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
|
||||
"""Decode the Ascii85 encoded bytes-like object or ASCII string b.
|
||||
|
||||
foldspaces is a flag that specifies whether the 'y' short sequence should be
|
||||
accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is
|
||||
not supported by the "standard" Adobe encoding.
|
||||
foldspaces is a flag that specifies whether the 'y' short sequence
|
||||
should be accepted as shorthand for 4 consecutive spaces (ASCII
|
||||
0x20). This feature is not supported by the standard Ascii85
|
||||
encoding used in PDF and PostScript.
|
||||
|
||||
adobe controls whether the input sequence is in Adobe Ascii85 format (i.e.
|
||||
is framed with <~ and ~>).
|
||||
adobe controls whether the <~ and ~> markers are present. While
|
||||
the leading <~ is not required, the input must end with ~>, or a
|
||||
ValueError is raised.
|
||||
|
||||
ignorechars should be a byte string containing characters to ignore from the
|
||||
input. This should only contain whitespace characters, and by default
|
||||
|
|
@ -442,8 +447,10 @@ def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
|
|||
def b85encode(b, pad=False):
|
||||
"""Encode bytes-like object b in base85 format and return a bytes object.
|
||||
|
||||
If pad is true, the input is padded with b'\\0' so its length is a multiple of
|
||||
4 bytes before encoding.
|
||||
The input is padded with b'\0' so its length is a multiple of 4
|
||||
bytes before encoding. If pad is true, all the resulting
|
||||
characters are retained in the output, which will always be a
|
||||
multiple of 5 bytes.
|
||||
"""
|
||||
global _b85chars, _b85chars2
|
||||
# Delay the initialization of tables to not waste memory
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue