| 
									
										
										
										
											2012-05-25 15:01:48 -04:00
										 |  |  | :mod:`email` Package Architecture
 | 
					
						
							|  |  |  | =================================
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Overview
 | 
					
						
							|  |  |  | --------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The email package consists of three major components:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Model
 | 
					
						
							|  |  |  |         An object structure that represents an email message, and provides an
 | 
					
						
							|  |  |  |         API for creating, querying, and modifying a message.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Parser
 | 
					
						
							|  |  |  |         Takes a sequence of characters or bytes and produces a model of the
 | 
					
						
							|  |  |  |         email message represented by those characters or bytes.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Generator
 | 
					
						
							|  |  |  |         Takes a model and turns it into a sequence of characters or bytes.  The
 | 
					
						
							|  |  |  |         sequence can either be intended for human consumption (a printable
 | 
					
						
							|  |  |  |         unicode string) or bytes suitable for transmission over the wire.  In
 | 
					
						
							|  |  |  |         the latter case all data is properly encoded using the content transfer
 | 
					
						
							|  |  |  |         encodings specified by the relevant RFCs.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Conceptually the package is organized around the model.  The model provides both
 | 
					
						
							|  |  |  | "external" APIs intended for use by application programs using the library,
 | 
					
						
							|  |  |  | and "internal" APIs intended for use by the Parser and Generator components.
 | 
					
						
							| 
									
										
										
										
											2013-08-10 18:47:07 +03:00
										 |  |  | This division is intentionally a bit fuzzy; the API described by this
 | 
					
						
							|  |  |  | documentation is all a public, stable API.  This allows for an application
 | 
					
						
							|  |  |  | with special needs to implement its own parser and/or generator.
 | 
					
						
							| 
									
										
										
										
											2012-05-25 15:01:48 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | In addition to the three major functional components, there is a third key
 | 
					
						
							|  |  |  | component to the architecture:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Policy
 | 
					
						
							|  |  |  |         An object that specifies various behavioral settings and carries
 | 
					
						
							|  |  |  |         implementations of various behavior-controlling methods.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The Policy framework provides a simple and convenient way to control the
 | 
					
						
							|  |  |  | behavior of the library, making it possible for the library to be used in a
 | 
					
						
							|  |  |  | very flexible fashion while leveraging the common code required to parse,
 | 
					
						
							|  |  |  | represent, and generate message-like objects.  For example, in addition to the
 | 
					
						
							|  |  |  | default :rfc:`5322` email message policy, we also have a policy that manages
 | 
					
						
							|  |  |  | HTTP headers in a fashion compliant with :rfc:`2616`.  Individual policy
 | 
					
						
							|  |  |  | controls, such as the maximum line length produced by the generator, can also
 | 
					
						
							|  |  |  | be controlled individually to meet specialized application requirements.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The Model
 | 
					
						
							|  |  |  | ---------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The message model is implemented by the :class:`~email.message.Message` class.
 | 
					
						
							|  |  |  | The model divides a message into the two fundamental parts discussed by the
 | 
					
						
							|  |  |  | RFC: the header section and the body.  The `Message` object acts as a
 | 
					
						
							|  |  |  | pseudo-dictionary of named headers.  Its dictionary interface provides
 | 
					
						
							|  |  |  | convenient access to individual headers by name.  However, all headers are kept
 | 
					
						
							|  |  |  | internally in an ordered list, so that the information about the order of the
 | 
					
						
							|  |  |  | headers in the original message is preserved.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The `Message` object also has a `payload` that holds the body.  A `payload` can
 | 
					
						
							|  |  |  | be one of two things: data, or a list of `Message` objects.  The latter is used
 | 
					
						
							|  |  |  | to represent a multipart MIME message.  Lists can be nested arbitrarily deeply
 | 
					
						
							|  |  |  | in order to represent the message, with all terminal leaves having non-list
 | 
					
						
							|  |  |  | data payloads.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Message Lifecycle
 | 
					
						
							|  |  |  | -----------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The general lifecyle of a message is:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Creation
 | 
					
						
							|  |  |  |         A `Message` object can be created by a Parser, or it can be
 | 
					
						
							|  |  |  |         instantiated as an empty message by an application.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Manipulation
 | 
					
						
							|  |  |  |         The application may examine one or more headers, and/or the
 | 
					
						
							|  |  |  |         payload, and it may modify one or more headers and/or
 | 
					
						
							|  |  |  |         the payload.  This may be done on the top level `Message`
 | 
					
						
							|  |  |  |         object, or on any sub-object.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     Finalization
 | 
					
						
							|  |  |  |         The Model is converted into a unicode or binary stream,
 | 
					
						
							|  |  |  |         or the model is discarded.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Header Policy Control During Lifecycle
 | 
					
						
							|  |  |  | --------------------------------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | One of the major controls exerted by the Policy is the management of headers
 | 
					
						
							|  |  |  | during the `Message` lifecycle.  Most applications don't need to be aware of
 | 
					
						
							|  |  |  | this.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A header enters the model in one of two ways: via a Parser, or by being set to
 | 
					
						
							|  |  |  | a specific value by an application program after the Model already exists.
 | 
					
						
							|  |  |  | Similarly, a header exits the model in one of two ways: by being serialized by
 | 
					
						
							|  |  |  | a Generator, or by being retrieved from a Model by an application program.  The
 | 
					
						
							|  |  |  | Policy object provides hooks for all four of these pathways.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The model storage for headers is a list of (name, value) tuples.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The Parser identifies headers during parsing, and passes them to the
 | 
					
						
							|  |  |  | :meth:`~email.policy.Policy.header_source_parse` method of the Policy.  The
 | 
					
						
							|  |  |  | result of that method is the (name, value) tuple to be stored in the model.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When an application program supplies a header value (for example, through the
 | 
					
						
							|  |  |  | `Message` object `__setitem__` interface), the name and the value are passed to
 | 
					
						
							|  |  |  | the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which
 | 
					
						
							|  |  |  | returns the (name, value) tuple to be stored in the model.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When an application program retrieves a header (through any of the dict or list
 | 
					
						
							|  |  |  | interfaces of `Message`), the name and value are passed to the
 | 
					
						
							|  |  |  | :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to
 | 
					
						
							|  |  |  | obtain the value returned to the application.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When a Generator requests a header during serialization, the name and value are
 | 
					
						
							|  |  |  | passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which
 | 
					
						
							|  |  |  | returns a string containing line breaks in the appropriate places.  The
 | 
					
						
							|  |  |  | :meth:`~email.policy.Policy.cte_type` Policy control determines whether or
 | 
					
						
							|  |  |  | not Content Transfer Encoding is performed on the data in the header.  There is
 | 
					
						
							|  |  |  | also a :meth:`~email.policy.Policy.binary_fold` method for use by generators
 | 
					
						
							|  |  |  | that produce binary output, which returns the folded header as binary data,
 | 
					
						
							|  |  |  | possibly folded at different places than the corresponding string would be.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Handling Binary Data
 | 
					
						
							|  |  |  | --------------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In an ideal world all message data would conform to the RFCs, meaning that the
 | 
					
						
							|  |  |  | parser could decode the message into the idealized unicode message that the
 | 
					
						
							|  |  |  | sender originally wrote.  In the real world, the email package must also be
 | 
					
						
							|  |  |  | able to deal with badly formatted messages, including messages containing
 | 
					
						
							|  |  |  | non-ASCII characters that either have no indicated character set or are not
 | 
					
						
							|  |  |  | valid characters in the indicated character set.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Since email messages are *primarily* text data, and operations on message data
 | 
					
						
							|  |  |  | are primarily text operations (except for binary payloads of course), the model
 | 
					
						
							|  |  |  | stores all text data as unicode strings.  Un-decodable binary inside text
 | 
					
						
							|  |  |  | data is handled by using the `surrogateescape` error handler of the ASCII
 | 
					
						
							|  |  |  | codec.  As with the binary filenames the error handler was introduced to
 | 
					
						
							|  |  |  | handle, this allows the email package to "carry" the binary data received
 | 
					
						
							|  |  |  | during parsing along until the output stage, at which time it is regenerated
 | 
					
						
							|  |  |  | in its original form.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This carried binary data is almost entirely an implementation detail.  The one
 | 
					
						
							|  |  |  | place where it is visible in the API is in the "internal" API.  A Parser must
 | 
					
						
							|  |  |  | do the `surrogateescape` encoding of binary input data, and pass that data to
 | 
					
						
							|  |  |  | the appropriate Policy method.  The "internal" interface used by the Generator
 | 
					
						
							|  |  |  | to access header values preserves the `surrogateescaped` bytes.  All other
 | 
					
						
							|  |  |  | interfaces convert the binary data either back into bytes or into a safe form
 | 
					
						
							|  |  |  | (losing information in some cases).
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Backward Compatibility
 | 
					
						
							|  |  |  | ----------------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The :class:`~email.policy.Policy.Compat32` Policy provides backward
 | 
					
						
							|  |  |  | compatibility with version 5.1 of the email package.  It does this via the
 | 
					
						
							|  |  |  | following implementation of the four+1 Policy methods described above:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_source_parse
 | 
					
						
							|  |  |  |     Splits the first line on the colon to obtain the name, discards any spaces
 | 
					
						
							|  |  |  |     after the colon, and joins the remainder of the line with all of the
 | 
					
						
							|  |  |  |     remaining lines, preserving the linesep characters to obtain the value.
 | 
					
						
							|  |  |  |     Trailing carriage return and/or linefeed characters are stripped from the
 | 
					
						
							|  |  |  |     resulting value string.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_store_parse
 | 
					
						
							|  |  |  |     Returns the name and value exactly as received from the application.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_fetch_parse
 | 
					
						
							|  |  |  |     If the value contains any `surrogateescaped` binary data, return the value
 | 
					
						
							|  |  |  |     as a :class:`~email.header.Header` object, using the character set
 | 
					
						
							|  |  |  |     `unknown-8bit`.  Otherwise just returns the value.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | fold
 | 
					
						
							|  |  |  |     Uses :class:`~email.header.Header`'s folding to fold headers in the
 | 
					
						
							|  |  |  |     same way the email5.1 generator did.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | binary_fold
 | 
					
						
							|  |  |  |     Same as fold, but encodes to 'ascii'.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | New Algorithm
 | 
					
						
							|  |  |  | -------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_source_parse
 | 
					
						
							|  |  |  |     Same as legacy behavior.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_store_parse
 | 
					
						
							|  |  |  |     Same as legacy behavior.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | header_fetch_parse
 | 
					
						
							|  |  |  |     If the value is already a header object, returns it.  Otherwise, parses the
 | 
					
						
							|  |  |  |     value using the new parser, and returns the resulting object as the value.
 | 
					
						
							|  |  |  |     `surrogateescaped` bytes get turned into unicode unknown character code
 | 
					
						
							|  |  |  |     points.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | fold
 | 
					
						
							|  |  |  |     Uses the new header folding algorithm, respecting the policy settings.
 | 
					
						
							|  |  |  |     surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for
 | 
					
						
							|  |  |  |     ``cte_type=7bit`` or ``8bit``.  Returns a string.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     At some point there will also be a ``cte_type=unicode``, and for that
 | 
					
						
							|  |  |  |     policy fold will serialize the idealized unicode message with RFC-like
 | 
					
						
							|  |  |  |     folding, converting any surrogateescaped bytes into the unicode
 | 
					
						
							|  |  |  |     unknown character glyph.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | binary_fold
 | 
					
						
							|  |  |  |     Uses the new header folding algorithm, respecting the policy settings.
 | 
					
						
							|  |  |  |     surrogateescaped bytes are encoded using the `unknown-8bit` charset for
 | 
					
						
							|  |  |  |     ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``.
 | 
					
						
							|  |  |  |     Returns bytes.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     At some point there will also be a ``cte_type=unicode``, and for that
 | 
					
						
							|  |  |  |     policy binary_fold will serialize the message according to :rfc:``5335``.
 |