| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | \declaremodule{standard}{email.header} | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \modulesynopsis{Representing non-ASCII headers} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \rfc{2822} is the base standard that describes the format of email | 
					
						
							|  |  |  | messages.  It derives from the older \rfc{822} standard which came | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | into widespread use at a time when most email was composed of \ASCII{} | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | characters only.  \rfc{2822} is a specification written assuming email | 
					
						
							|  |  |  | contains only 7-bit \ASCII{} characters. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Of course, as email has been deployed worldwide, it has become | 
					
						
							|  |  |  | internationalized, such that language specific character sets can now | 
					
						
							|  |  |  | be used in email messages.  The base standard still requires email | 
					
						
							| 
									
										
										
										
											2005-01-01 00:28:46 +00:00
										 |  |  | messages to be transferred using only 7-bit \ASCII{} characters, so a | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | slew of RFCs have been written describing how to encode email | 
					
						
							|  |  |  | containing non-\ASCII{} characters into \rfc{2822}-compliant format. | 
					
						
							|  |  |  | These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}. | 
					
						
							|  |  |  | The \module{email} package supports these standards in its | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | \module{email.header} and \module{email.charset} modules. | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | If you want to include non-\ASCII{} characters in your email headers, | 
					
						
							|  |  |  | say in the \mailheader{Subject} or \mailheader{To} fields, you should | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | use the \class{Header} class and assign the field in the | 
					
						
							|  |  |  | \class{Message} object to an instance of \class{Header} instead of | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | using a string for the header value.  Import the \class{Header} class from the | 
					
						
							|  |  |  | \module{email.header} module.  For example: | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | >>> from email.message import Message | 
					
						
							|  |  |  | >>> from email.header import Header | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | >>> msg = Message() | 
					
						
							|  |  |  | >>> h = Header('p\xf6stal', 'iso-8859-1') | 
					
						
							|  |  |  | >>> msg['Subject'] = h | 
					
						
							|  |  |  | >>> print msg.as_string() | 
					
						
							|  |  |  | Subject: =?iso-8859-1?q?p=F6stal?= | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Notice here how we wanted the \mailheader{Subject} field to contain a | 
					
						
							|  |  |  | non-\ASCII{} character?  We did this by creating a \class{Header} | 
					
						
							|  |  |  | instance and passing in the character set that the byte string was | 
					
						
							|  |  |  | encoded in.  When the subsequent \class{Message} instance was | 
					
						
							|  |  |  | flattened, the \mailheader{Subject} field was properly \rfc{2047} | 
					
						
							|  |  |  | encoded.  MIME-aware mail readers would show this header using the | 
					
						
							|  |  |  | embedded ISO-8859-1 character. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \versionadded{2.2.2} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Here is the \class{Header} class description: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{, | 
					
						
							| 
									
										
										
										
											2002-12-30 19:17:37 +00:00
										 |  |  |     maxlinelen\optional{, header_name\optional{, continuation_ws\optional{, | 
					
						
							|  |  |  |     errors}}}}}}} | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | Create a MIME-compliant header that can contain strings in different | 
					
						
							|  |  |  | character sets. | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Optional \var{s} is the initial header value.  If \code{None} (the | 
					
						
							|  |  |  | default), the initial header value is not set.  You can later append | 
					
						
							|  |  |  | to the header with \method{append()} method calls.  \var{s} may be a | 
					
						
							|  |  |  | byte string or a Unicode string, but see the \method{append()} | 
					
						
							|  |  |  | documentation for semantics. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Optional \var{charset} serves two purposes: it has the same meaning as | 
					
						
							|  |  |  | the \var{charset} argument to the \method{append()} method.  It also | 
					
						
							|  |  |  | sets the default character set for all subsequent \method{append()} | 
					
						
							|  |  |  | calls that omit the \var{charset} argument.  If \var{charset} is not | 
					
						
							|  |  |  | provided in the constructor (the default), the \code{us-ascii} | 
					
						
							|  |  |  | character set is used both as \var{s}'s initial charset and as the | 
					
						
							|  |  |  | default for subsequent \method{append()} calls. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The maximum line length can be specified explicit via | 
					
						
							|  |  |  | \var{maxlinelen}.  For splitting the first line to a shorter value (to | 
					
						
							|  |  |  | account for the field header which isn't included in \var{s}, | 
					
						
							|  |  |  | e.g. \mailheader{Subject}) pass in the name of the field in | 
					
						
							|  |  |  | \var{header_name}.  The default \var{maxlinelen} is 76, and the | 
					
						
							|  |  |  | default value for \var{header_name} is \code{None}, meaning it is not | 
					
						
							|  |  |  | taken into account for the first line of a long, split header. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | Optional \var{continuation_ws} must be \rfc{2822}-compliant folding | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | whitespace, and is usually either a space or a hard tab character. | 
					
						
							|  |  |  | This character will be prepended to continuation lines. | 
					
						
							|  |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2002-12-30 19:17:37 +00:00
										 |  |  | Optional \var{errors} is passed straight through to the | 
					
						
							|  |  |  | \method{append()} method. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}} | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | Append the string \var{s} to the MIME header. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Optional \var{charset}, if given, should be a \class{Charset} instance | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | (see \refmodule{email.charset}) or the name of a character set, which | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | will be converted to a \class{Charset} instance.  A value of | 
					
						
							|  |  |  | \code{None} (the default) means that the \var{charset} given in the | 
					
						
							|  |  |  | constructor is used. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \var{s} may be a byte string or a Unicode string.  If it is a byte | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | string (i.e. \code{isinstance(s, str)} is true), then | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \var{charset} is the encoding of that byte string, and a | 
					
						
							|  |  |  | \exception{UnicodeError} will be raised if the string cannot be | 
					
						
							|  |  |  | decoded with that character set. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If \var{s} is a Unicode string, then \var{charset} is a hint | 
					
						
							|  |  |  | specifying the character set of the characters in the string.  In this | 
					
						
							|  |  |  | case, when producing an \rfc{2822}-compliant header using \rfc{2047} | 
					
						
							|  |  |  | rules, the Unicode string will be encoded using the following charsets | 
					
						
							|  |  |  | in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}.  The | 
					
						
							|  |  |  | first character set to not provoke a \exception{UnicodeError} is used. | 
					
						
							| 
									
										
										
										
											2002-12-30 19:17:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Optional \var{errors} is passed through to any \function{unicode()} or | 
					
						
							|  |  |  | \function{ustr.encode()} call, and defaults to ``strict''. | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-06 06:06:54 +00:00
										 |  |  | \begin{methoddesc}[Header]{encode}{\optional{splitchars}} | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | Encode a message header into an RFC-compliant format, possibly | 
					
						
							|  |  |  | wrapping long lines and encapsulating non-\ASCII{} parts in base64 or | 
					
						
							| 
									
										
										
										
											2003-03-06 06:06:54 +00:00
										 |  |  | quoted-printable encodings.  Optional \var{splitchars} is a string | 
					
						
							|  |  |  | containing characters to split long ASCII lines on, in rough support | 
					
						
							|  |  |  | of \rfc{2822}'s \emph{highest level syntactic breaks}.  This doesn't | 
					
						
							|  |  |  | affect \rfc{2047} encoded lines. | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The \class{Header} class also provides a number of methods to support | 
					
						
							|  |  |  | standard operators and built-in functions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Header]{__str__}{} | 
					
						
							|  |  |  | A synonym for \method{Header.encode()}.  Useful for | 
					
						
							| 
									
										
										
										
											2002-10-01 04:33:16 +00:00
										 |  |  | \code{str(aHeader)}. | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Header]{__unicode__}{} | 
					
						
							|  |  |  | A helper for the built-in \function{unicode()} function.  Returns the | 
					
						
							|  |  |  | header as a Unicode string. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Header]{__eq__}{other} | 
					
						
							|  |  |  | This method allows you to compare two \class{Header} instances for equality. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Header]{__ne__}{other} | 
					
						
							|  |  |  | This method allows you to compare two \class{Header} instances for inequality. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | The \module{email.header} module also provides the following | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | convenient functions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{decode_header}{header} | 
					
						
							|  |  |  | Decode a message header value without converting the character set. | 
					
						
							|  |  |  | The header value is in \var{header}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This function returns a list of \code{(decoded_string, charset)} pairs | 
					
						
							|  |  |  | containing each of the decoded parts of the header.  \var{charset} is | 
					
						
							|  |  |  | \code{None} for non-encoded parts of the header, otherwise a lower | 
					
						
							|  |  |  | case string containing the name of the character set specified in the | 
					
						
							|  |  |  | encoded string. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Here's an example: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | >>> from email.header import decode_header | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | >>> decode_header('=?iso-8859-1?q?p=F6stal?=') | 
					
						
							| 
									
										
										
										
											2004-09-28 02:54:54 +00:00
										 |  |  | [('p\xf6stal', 'iso-8859-1')] | 
					
						
							| 
									
										
										
										
											2002-10-01 01:05:52 +00:00
										 |  |  | \end{verbatim} | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{, | 
					
						
							|  |  |  |     header_name\optional{, continuation_ws}}}} | 
					
						
							|  |  |  | Create a \class{Header} instance from a sequence of pairs as returned | 
					
						
							|  |  |  | by \function{decode_header()}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \function{decode_header()} takes a header value string and returns a | 
					
						
							|  |  |  | sequence of pairs of the format \code{(decoded_string, charset)} where | 
					
						
							|  |  |  | \var{charset} is the name of the character set. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This function takes one of those sequence of pairs and returns a | 
					
						
							|  |  |  | \class{Header} instance.  Optional \var{maxlinelen}, | 
					
						
							|  |  |  | \var{header_name}, and \var{continuation_ws} are as in the | 
					
						
							|  |  |  | \class{Header} constructor. | 
					
						
							|  |  |  | \end{funcdesc} |