| 
									
										
										
										
											2012-05-27 17:10:36 -04:00
										 |  |  | :mod:`email.utils`: Miscellaneous utilities
 | 
					
						
							|  |  |  | -------------------------------------------
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. module:: email.utils
 | 
					
						
							|  |  |  |    :synopsis: Miscellaneous email package utilities.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-11 15:02:54 -04:00
										 |  |  | **Source code:** :source:`Lib/email/utils.py`
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --------------
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-09-07 21:15:59 -04:00
										 |  |  | There are a couple of useful utilities provided in the :mod:`email.utils`
 | 
					
						
							|  |  |  | module:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: localtime(dt=None)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-03-19 19:20:20 -05:00
										 |  |  |    Return local time as an aware datetime object.  If called without
 | 
					
						
							|  |  |  |    arguments, return current time.  Otherwise *dt* argument should be a
 | 
					
						
							|  |  |  |    :class:`~datetime.datetime` instance, and it is converted to the local time
 | 
					
						
							|  |  |  |    zone according to the system time zone database.  If *dt* is naive (that
 | 
					
						
							|  |  |  |    is, ``dt.tzinfo`` is ``None``), it is assumed to be in local time.  The
 | 
					
						
							|  |  |  |    *isdst* parameter is ignored.
 | 
					
						
							| 
									
										
										
										
											2016-09-07 21:15:59 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-03-19 19:20:20 -05:00
										 |  |  |    .. versionadded:: 3.3
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. deprecated-removed:: 3.12 3.14
 | 
					
						
							|  |  |  |       The *isdst* parameter.
 | 
					
						
							| 
									
										
										
										
											2016-09-07 21:15:59 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. function:: make_msgid(idstring=None, domain=None)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Returns a string suitable for an :rfc:`2822`\ -compliant
 | 
					
						
							|  |  |  |    :mailheader:`Message-ID` header.  Optional *idstring* if given, is a string
 | 
					
						
							|  |  |  |    used to strengthen the uniqueness of the message id.  Optional *domain* if
 | 
					
						
							|  |  |  |    given provides the portion of the msgid after the '@'.  The default is the
 | 
					
						
							|  |  |  |    local hostname.  It is not normally necessary to override this default, but
 | 
					
						
							|  |  |  |    may be useful certain cases, such as a constructing distributed system that
 | 
					
						
							|  |  |  |    uses a consistent domain name across multiple hosts.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. versionchanged:: 3.2
 | 
					
						
							|  |  |  |       Added the *domain* keyword.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The remaining functions are part of the legacy (``Compat32``) email API.  There
 | 
					
						
							|  |  |  | is no need to directly use these with the new API, since the parsing and
 | 
					
						
							|  |  |  | formatting they provide is done automatically by the header parsing machinery
 | 
					
						
							|  |  |  | of the new API.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: quote(str)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Return a new string with backslashes in *str* replaced by two backslashes, and
 | 
					
						
							|  |  |  |    double quotes replaced by backslash-double quote.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: unquote(str)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Return a new string which is an *unquoted* version of *str*. If *str* ends and
 | 
					
						
							|  |  |  |    begins with double quotes, they are stripped off.  Likewise if *str* ends and
 | 
					
						
							|  |  |  |    begins with angle brackets, they are stripped off.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: parseaddr(address)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Parse address -- which should be the value of some address-containing field such
 | 
					
						
							|  |  |  |    as :mailheader:`To` or :mailheader:`Cc` -- into its constituent *realname* and
 | 
					
						
							|  |  |  |    *email address* parts.  Returns a tuple of that information, unless the parse
 | 
					
						
							|  |  |  |    fails, in which case a 2-tuple of ``('', '')`` is returned.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-07-10 18:00:55 -05:00
										 |  |  |    .. versionchanged:: 3.12
 | 
					
						
							|  |  |  |       For security reasons, addresses that were ambiguous and could parse into
 | 
					
						
							|  |  |  |       multiple different addresses now cause ``('', '')`` to be returned
 | 
					
						
							|  |  |  |       instead of only one of the *potential* addresses.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-04-06 09:35:57 -04:00
										 |  |  | .. function:: formataddr(pair, charset='utf-8')
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
 | 
					
						
							|  |  |  |    email_address)`` and returns the string value suitable for a :mailheader:`To` or
 | 
					
						
							|  |  |  |    :mailheader:`Cc` header.  If the first element of *pair* is false, then the
 | 
					
						
							|  |  |  |    second element is returned unmodified.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-04-06 09:35:57 -04:00
										 |  |  |    Optional *charset* is the character set that will be used in the :rfc:`2047`
 | 
					
						
							|  |  |  |    encoding of the ``realname`` if the ``realname`` contains non-ASCII
 | 
					
						
							|  |  |  |    characters.  Can be an instance of :class:`str` or a
 | 
					
						
							|  |  |  |    :class:`~email.charset.Charset`.  Defaults to ``utf-8``.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-06-24 22:48:30 +02:00
										 |  |  |    .. versionchanged:: 3.3
 | 
					
						
							|  |  |  |       Added the *charset* option.
 | 
					
						
							| 
									
										
										
										
											2011-04-06 09:35:57 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. function:: getaddresses(fieldvalues)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    This method returns a list of 2-tuples of the form returned by ``parseaddr()``.
 | 
					
						
							|  |  |  |    *fieldvalues* is a sequence of header field values as might be returned by
 | 
					
						
							| 
									
										
										
										
											2013-08-19 09:59:18 +03:00
										 |  |  |    :meth:`Message.get_all <email.message.Message.get_all>`.  Here's a simple
 | 
					
						
							| 
									
										
										
										
											2023-07-10 18:00:55 -05:00
										 |  |  |    example that gets all the recipients of a message:
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |       from email.utils import getaddresses
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       tos = msg.get_all('to', [])
 | 
					
						
							|  |  |  |       ccs = msg.get_all('cc', [])
 | 
					
						
							|  |  |  |       resent_tos = msg.get_all('resent-to', [])
 | 
					
						
							|  |  |  |       resent_ccs = msg.get_all('resent-cc', [])
 | 
					
						
							|  |  |  |       all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-07-10 18:00:55 -05:00
										 |  |  |    When parsing fails for a single fieldvalue, a 2-tuple of ``('', '')``
 | 
					
						
							|  |  |  |    is returned in its place.  Other errors in parsing the list of
 | 
					
						
							|  |  |  |    addresses such as a fieldvalue seemingly parsing into multiple
 | 
					
						
							|  |  |  |    addresses may result in a list containing a single empty 2-tuple
 | 
					
						
							|  |  |  |    ``[('', '')]`` being returned rather than returning potentially
 | 
					
						
							|  |  |  |    invalid output.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Example malformed input parsing:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. doctest::
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       >>> from email.utils import getaddresses
 | 
					
						
							|  |  |  |       >>> getaddresses(['alice@example.com <bob@example.com>', 'me@example.com'])
 | 
					
						
							|  |  |  |       [('', '')]
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. versionchanged:: 3.12
 | 
					
						
							|  |  |  |       The 2-tuple of ``('', '')`` in the returned values when parsing
 | 
					
						
							|  |  |  |       fails were added as to address a security issue.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. function:: parsedate(date)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Attempts to parse a date according to the rules in :rfc:`2822`. however, some
 | 
					
						
							|  |  |  |    mailers don't follow that format as specified, so :func:`parsedate` tries to
 | 
					
						
							|  |  |  |    guess correctly in such cases.  *date* is a string containing an :rfc:`2822`
 | 
					
						
							|  |  |  |    date, such as  ``"Mon, 20 Nov 1995 19:12:08 -0500"``.  If it succeeds in parsing
 | 
					
						
							|  |  |  |    the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
 | 
					
						
							|  |  |  |    :func:`time.mktime`; otherwise ``None`` will be returned.  Note that indexes 6,
 | 
					
						
							|  |  |  |    7, and 8 of the result tuple are not usable.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: parsedate_tz(date)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Performs the same function as :func:`parsedate`, but returns either ``None`` or
 | 
					
						
							|  |  |  |    a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
 | 
					
						
							|  |  |  |    :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
 | 
					
						
							|  |  |  |    (which is the official term for Greenwich Mean Time) [#]_.  If the input string
 | 
					
						
							| 
									
										
										
										
											2019-11-12 12:38:46 +00:00
										 |  |  |    has no timezone, the last element of the tuple returned is ``0``, which represents
 | 
					
						
							|  |  |  |    UTC. Note that indexes 6, 7, and 8 of the result tuple are not usable.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-07-20 11:41:21 -04:00
										 |  |  | .. function:: parsedate_to_datetime(date)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    The inverse of :func:`format_datetime`.  Performs the same function as
 | 
					
						
							| 
									
										
											  
											
												bpo-30681: Support invalid date format or value in email Date header (GH-22090)
I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.
The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.
In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.
For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.
In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.
Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.
This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.
Automerge-Triggered-By: GH:warsaw
											
										 
											2020-10-27 01:31:06 +01:00
										 |  |  |    :func:`parsedate`, but on success returns a :mod:`~datetime.datetime`;
 | 
					
						
							|  |  |  |    otherwise ``ValueError`` is raised if *date* contains an invalid value such
 | 
					
						
							|  |  |  |    as an hour greater than 23 or a timezone offset not between -24 and 24 hours.
 | 
					
						
							|  |  |  |    If the input date has a timezone of ``-0000``, the ``datetime`` will be a naive
 | 
					
						
							| 
									
										
										
										
											2011-07-20 11:41:21 -04:00
										 |  |  |    ``datetime``, and if the date is conforming to the RFCs it will represent a
 | 
					
						
							|  |  |  |    time in UTC but with no indication of the actual source timezone of the
 | 
					
						
							|  |  |  |    message the date comes from.  If the input date has any other valid timezone
 | 
					
						
							|  |  |  |    offset, the ``datetime`` will be an aware ``datetime`` with the
 | 
					
						
							|  |  |  |    corresponding a :class:`~datetime.timezone` :class:`~datetime.tzinfo`.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. versionadded:: 3.3
 | 
					
						
							| 
									
										
										
										
											2012-06-24 22:48:30 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | .. function:: mktime_tz(tuple)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-04-26 19:01:18 -04:00
										 |  |  |    Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC
 | 
					
						
							|  |  |  |    timestamp (seconds since the Epoch).  If the timezone item in the
 | 
					
						
							|  |  |  |    tuple is ``None``, assume local time.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-05-17 11:28:33 +00:00
										 |  |  | .. function:: formatdate(timeval=None, localtime=False, usegmt=False)
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    Returns a date string as per :rfc:`2822`, e.g.::
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       Fri, 09 Nov 2001 01:08:47 -0000
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Optional *timeval* if given is a floating point time value as accepted by
 | 
					
						
							|  |  |  |    :func:`time.gmtime` and :func:`time.localtime`, otherwise the current time is
 | 
					
						
							|  |  |  |    used.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Optional *localtime* is a flag that when ``True``, interprets *timeval*, and
 | 
					
						
							|  |  |  |    returns a date relative to the local timezone instead of UTC, properly taking
 | 
					
						
							|  |  |  |    daylight savings time into account. The default is ``False`` meaning UTC is
 | 
					
						
							|  |  |  |    used.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Optional *usegmt* is a flag that when ``True``, outputs a  date string with the
 | 
					
						
							|  |  |  |    timezone as an ascii string ``GMT``, rather than a numeric ``-0000``. This is
 | 
					
						
							|  |  |  |    needed for some protocols (such as HTTP). This only applies when *localtime* is
 | 
					
						
							| 
									
										
										
										
											2010-02-04 16:41:57 +00:00
										 |  |  |    ``False``.  The default is ``False``.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-07-20 11:41:21 -04:00
										 |  |  | .. function:: format_datetime(dt, usegmt=False)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Like ``formatdate``, but the input is a :mod:`datetime` instance.  If it is
 | 
					
						
							|  |  |  |    a naive datetime, it is assumed to be "UTC with no information about the
 | 
					
						
							|  |  |  |    source timezone", and the conventional ``-0000`` is used for the timezone.
 | 
					
						
							|  |  |  |    If it is an aware ``datetime``, then the numeric timezone offset is used.
 | 
					
						
							|  |  |  |    If it is an aware timezone with offset zero, then *usegmt* may be set to
 | 
					
						
							|  |  |  |    ``True``, in which case the string ``GMT`` is used instead of the numeric
 | 
					
						
							|  |  |  |    timezone offset.  This provides a way to generate standards conformant HTTP
 | 
					
						
							|  |  |  |    date headers.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    .. versionadded:: 3.3
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | .. function:: decode_rfc2231(s)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Decode the string *s* according to :rfc:`2231`.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-05-17 11:28:33 +00:00
										 |  |  | .. function:: encode_rfc2231(s, charset=None, language=None)
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    Encode the string *s* according to :rfc:`2231`.  Optional *charset* and
 | 
					
						
							|  |  |  |    *language*, if given is the character set name and language name to use.  If
 | 
					
						
							|  |  |  |    neither is given, *s* is returned as-is.  If *charset* is given but *language*
 | 
					
						
							|  |  |  |    is not, the string is encoded using the empty string for *language*.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-05-17 11:28:33 +00:00
										 |  |  | .. function:: collapse_rfc2231_value(value, errors='replace', fallback_charset='us-ascii')
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    When a header parameter is encoded in :rfc:`2231` format,
 | 
					
						
							| 
									
										
										
										
											2013-08-19 09:59:18 +03:00
										 |  |  |    :meth:`Message.get_param <email.message.Message.get_param>` may return a
 | 
					
						
							|  |  |  |    3-tuple containing the character set,
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |    language, and value.  :func:`collapse_rfc2231_value` turns this into a unicode
 | 
					
						
							| 
									
										
										
										
											2008-02-01 11:56:49 +00:00
										 |  |  |    string.  Optional *errors* is passed to the *errors* argument of :class:`str`'s
 | 
					
						
							| 
									
										
										
										
											2013-08-19 09:59:18 +03:00
										 |  |  |    :func:`~str.encode` method; it defaults to ``'replace'``.  Optional
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  |    *fallback_charset* specifies the character set to use if the one in the
 | 
					
						
							| 
									
										
										
										
											2008-02-01 11:56:49 +00:00
										 |  |  |    :rfc:`2231` header is not known by Python; it defaults to ``'us-ascii'``.
 | 
					
						
							| 
									
										
										
										
											2007-08-15 14:28:22 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |    For convenience, if the *value* passed to :func:`collapse_rfc2231_value` is not
 | 
					
						
							|  |  |  |    a tuple, it should be a string and it is returned unquoted.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. function:: decode_params(params)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Decode parameters list according to :rfc:`2231`.  *params* is a sequence of
 | 
					
						
							|  |  |  |    2-tuples containing elements of the form ``(content-type, string-value)``.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. rubric:: Footnotes
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. [#] Note that the sign of the timezone offset is the opposite of the sign of the
 | 
					
						
							|  |  |  |    ``time.timezone`` variable for the same timezone; the latter variable follows
 | 
					
						
							|  |  |  |    the POSIX standard while this module follows :rfc:`2822`.
 |