| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \section{\module{csv} --- CSV File Reading and Writing} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \declaremodule{standard}{csv} | 
					
						
							|  |  |  | \modulesynopsis{Write and read tabular data to and from delimited files.} | 
					
						
							| 
									
										
										
										
											2003-04-24 18:47:31 +00:00
										 |  |  | \sectionauthor{Skip Montanaro}{skip@pobox.com} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \versionadded{2.3} | 
					
						
							|  |  |  | \index{csv} | 
					
						
							|  |  |  | \indexii{data}{tabular} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The so-called CSV (Comma Separated Values) format is the most common import | 
					
						
							|  |  |  | and export format for spreadsheets and databases.  There is no ``CSV | 
					
						
							|  |  |  | standard'', so the format is operationally defined by the many applications | 
					
						
							|  |  |  | which read and write it.  The lack of a standard means that subtle | 
					
						
							|  |  |  | differences often exist in the data produced and consumed by different | 
					
						
							|  |  |  | applications.  These differences can make it annoying to process CSV files | 
					
						
							|  |  |  | from multiple sources.  Still, while the delimiters and quoting characters | 
					
						
							|  |  |  | vary, the overall format is similar enough that it is possible to write a | 
					
						
							|  |  |  | single module which can efficiently manipulate such data, hiding the details | 
					
						
							|  |  |  | of reading and writing the data from the programmer. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 15:14:49 +00:00
										 |  |  | The \module{csv} module implements classes to read and write tabular data in | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | CSV format.  It allows programmers to say, ``write this data in the format | 
					
						
							|  |  |  | preferred by Excel,'' or ``read data from this file which was generated by | 
					
						
							|  |  |  | Excel,'' without knowing the precise details of the CSV format used by | 
					
						
							|  |  |  | Excel.  Programmers can also describe the CSV formats understood by other | 
					
						
							|  |  |  | applications or define their own special-purpose CSV formats. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 15:14:49 +00:00
										 |  |  | The \module{csv} module's \class{reader} and \class{writer} objects read and | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | write sequences.  Programmers can also read and write data in dictionary | 
					
						
							|  |  |  | form using the \class{DictReader} and \class{DictWriter} classes. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \begin{notice} | 
					
						
							|  |  |  |   This version of the \module{csv} module doesn't support Unicode | 
					
						
							|  |  |  |   input.  Also, there are currently some issues regarding \ASCII{} NUL | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |   characters.  Accordingly, all input should be UTF-8 or printable | 
					
						
							|  |  |  |   \ASCII{} to be safe; see the examples in section~\ref{csv-examples}. | 
					
						
							|  |  |  |   These restrictions will be removed in the future. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{notice} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{seealso} | 
					
						
							|  |  |  | %  \seemodule{array}{Arrays of uniformly types numeric values.}
 | 
					
						
							|  |  |  |   \seepep{305}{CSV File API} | 
					
						
							|  |  |  |          {The Python Enhancement Proposal which proposed this addition | 
					
						
							|  |  |  |           to Python.} | 
					
						
							|  |  |  | \end{seealso} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-08-31 05:44:54 +00:00
										 |  |  | \subsection{Module Contents \label{csv-contents}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 15:14:49 +00:00
										 |  |  | The \module{csv} module defines the following functions: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{reader}{csvfile\optional{, | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  |                          dialect=\code{'excel'}}\optional{, fmtparam}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Return a reader object which will iterate over lines in the given | 
					
						
							|  |  |  | {}\var{csvfile}.  \var{csvfile} can be any object which supports the | 
					
						
							|  |  |  | iterator protocol and returns a string each time its \method{next} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | method is called - file objects and list objects are both suitable.   | 
					
						
							|  |  |  | If \var{csvfile} is a file object, it must be opened with | 
					
						
							| 
									
										
										
										
											2003-07-02 15:32:48 +00:00
										 |  |  | the 'b' flag on platforms where that makes a difference.  An optional | 
					
						
							|  |  |  | {}\var{dialect} parameter can be given | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | which is used to define a set of parameters specific to a particular CSV | 
					
						
							|  |  |  | dialect.  It may be an instance of a subclass of the \class{Dialect} | 
					
						
							|  |  |  | class or one of the strings returned by the \function{list_dialects} | 
					
						
							|  |  |  | function.  The other optional {}\var{fmtparam} keyword arguments can be | 
					
						
							|  |  |  | given to override individual formatting parameters in the current | 
					
						
							|  |  |  | dialect.  For more information about the dialect and formatting | 
					
						
							| 
									
										
										
										
											2003-09-10 18:54:49 +00:00
										 |  |  | parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Parameters'' for details of these parameters. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | All data read are returned as strings.  No automatic data type | 
					
						
							|  |  |  | conversion is performed. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{writer}{csvfile\optional{, | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  |                          dialect=\code{'excel'}}\optional{, fmtparam}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Return a writer object responsible for converting the user's data into | 
					
						
							| 
									
										
										
										
											2003-07-02 15:32:48 +00:00
										 |  |  | delimited strings on the given file-like object.  \var{csvfile} can be any | 
					
						
							|  |  |  | object with a \function{write} method.  If \var{csvfile} is a file object, | 
					
						
							|  |  |  | it must be opened with the 'b' flag on platforms where that makes a | 
					
						
							|  |  |  | difference.  An optional | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | {}\var{dialect} parameter can be given which is used to define a set of | 
					
						
							|  |  |  | parameters specific to a particular CSV dialect.  It may be an instance | 
					
						
							|  |  |  | of a subclass of the \class{Dialect} class or one of the strings | 
					
						
							|  |  |  | returned by the \function{list_dialects} function.  The other optional | 
					
						
							|  |  |  | {}\var{fmtparam} keyword arguments can be given to override individual | 
					
						
							|  |  |  | formatting parameters in the current dialect.  For more information | 
					
						
							|  |  |  | about the dialect and formatting parameters, see | 
					
						
							| 
									
										
										
										
											2003-09-10 18:54:49 +00:00
										 |  |  | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | details of these parameters.  To make it as easy as possible to | 
					
						
							|  |  |  | interface with modules which implement the DB API, the value | 
					
						
							|  |  |  | \constant{None} is written as the empty string.  While this isn't a | 
					
						
							|  |  |  | reversible transformation, it makes it easier to dump SQL NULL data values | 
					
						
							|  |  |  | to CSV files without preprocessing the data returned from a | 
					
						
							|  |  |  | \code{cursor.fetch*()} call.  All other non-string data are stringified | 
					
						
							|  |  |  | with \function{str()} before being written. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | \begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}} | 
					
						
							|  |  |  | Associate \var{dialect} with \var{name}.  \var{name} must be a string | 
					
						
							|  |  |  | or Unicode object. The dialect can be specified either by passing a | 
					
						
							|  |  |  | sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments, | 
					
						
							|  |  |  | or both, with keyword arguments overriding parameters of the dialect. | 
					
						
							|  |  |  | For more information about the dialect and formatting parameters, see | 
					
						
							|  |  |  | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' | 
					
						
							|  |  |  | for details of these parameters. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{unregister_dialect}{name} | 
					
						
							|  |  |  | Delete the dialect associated with \var{name} from the dialect registry.  An | 
					
						
							|  |  |  | \exception{Error} is raised if \var{name} is not a registered dialect | 
					
						
							|  |  |  | name. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{get_dialect}{name} | 
					
						
							|  |  |  | Return the dialect associated with \var{name}.  An \exception{Error} is | 
					
						
							|  |  |  | raised if \var{name} is not a registered dialect name. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{list_dialects}{} | 
					
						
							|  |  |  | Return the names of all registered dialects. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | \begin{funcdesc}{field_size_limit}{\optional{new_limit}} | 
					
						
							|  |  |  |   Returns the current maximum field size allowed by the parser. If | 
					
						
							|  |  |  |   \var{new_limit} is given, this becomes the new limit. | 
					
						
							|  |  |  |   \versionadded{2.5} | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 15:14:49 +00:00
										 |  |  | The \module{csv} module defines the following classes: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-10-03 14:03:01 +00:00
										 |  |  | \begin{classdesc}{DictReader}{csvfile\optional{, | 
					
						
							|  |  |  | 			      fieldnames=\constant{None},\optional{, | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  |                               restkey=\constant{None}\optional{, | 
					
						
							|  |  |  | 			      restval=\constant{None}\optional{, | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  |                               dialect=\code{'excel'}\optional{, | 
					
						
							| 
									
										
										
										
											2004-04-16 03:21:01 +00:00
										 |  |  | 			      *args, **kwds}}}}}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Create an object which operates like a regular reader but maps the | 
					
						
							| 
									
										
										
										
											2003-10-03 14:03:01 +00:00
										 |  |  | information read into a dict whose keys are given by the optional | 
					
						
							|  |  |  | {} \var{fieldnames} | 
					
						
							|  |  |  | parameter.  If the \var{fieldnames} parameter is omitted, the values in | 
					
						
							|  |  |  | the first row of the \var{csvfile} will be used as the fieldnames. | 
					
						
							|  |  |  | If the row read has fewer fields than the fieldnames sequence, | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | the value of \var{restval} will be used as the default value.  If the row | 
					
						
							|  |  |  | read has more fields than the fieldnames sequence, the remaining data is | 
					
						
							|  |  |  | added as a sequence keyed by the value of \var{restkey}.  If the row read | 
					
						
							|  |  |  | has fewer fields than the fieldnames sequence, the remaining keys take the | 
					
						
							| 
									
										
										
										
											2004-04-16 03:21:01 +00:00
										 |  |  | value of the optional \var{restval} parameter.  Any other optional or | 
					
						
							|  |  |  | keyword arguments are passed to the underlying \class{reader} instance. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, | 
					
						
							|  |  |  |                               restval=""\optional{, | 
					
						
							|  |  |  |                               extrasaction=\code{'raise'}\optional{, | 
					
						
							| 
									
										
										
										
											2004-04-16 03:21:01 +00:00
										 |  |  |                               dialect=\code{'excel'}\optional{, | 
					
						
							|  |  |  | 			      *args, **kwds}}}}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Create an object which operates like a regular writer but maps dictionaries | 
					
						
							|  |  |  | onto output rows.  The \var{fieldnames} parameter identifies the order in | 
					
						
							|  |  |  | which values in the dictionary passed to the \method{writerow()} method are | 
					
						
							|  |  |  | written to the \var{csvfile}.  The optional \var{restval} parameter | 
					
						
							|  |  |  | specifies the value to be written if the dictionary is missing a key in | 
					
						
							|  |  |  | \var{fieldnames}.  If the dictionary passed to the \method{writerow()} | 
					
						
							|  |  |  | method contains a key not found in \var{fieldnames}, the optional | 
					
						
							|  |  |  | \var{extrasaction} parameter indicates what action to take.  If it is set | 
					
						
							|  |  |  | to \code{'raise'} a \exception{ValueError} is raised.  If it is set to | 
					
						
							| 
									
										
										
										
											2004-04-16 03:21:01 +00:00
										 |  |  | \code{'ignore'}, extra values in the dictionary are ignored.  Any other | 
					
						
							|  |  |  | optional or keyword arguments are passed to the underlying \class{writer} | 
					
						
							|  |  |  | instance. | 
					
						
							| 
									
										
										
										
											2003-10-03 14:03:01 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Note that unlike the \class{DictReader} class, the \var{fieldnames} | 
					
						
							|  |  |  | parameter of the \class{DictWriter} is not optional.  Since Python's | 
					
						
							|  |  |  | \class{dict} objects are not ordered, there is not enough information | 
					
						
							|  |  |  | available to deduce the order in which the row should be written to the | 
					
						
							|  |  |  | \var{csvfile}. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc*}{Dialect}{} | 
					
						
							|  |  |  | The \class{Dialect} class is a container class relied on primarily for its | 
					
						
							|  |  |  | attributes, which are used to define the parameters for a specific | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \class{reader} or \class{writer} instance. | 
					
						
							|  |  |  | \end{classdesc*} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-05 06:58:15 +00:00
										 |  |  | \begin{classdesc}{excel}{} | 
					
						
							|  |  |  | The \class{excel} class defines the usual properties of an Excel-generated | 
					
						
							|  |  |  | CSV file. | 
					
						
							|  |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc}{excel_tab}{} | 
					
						
							|  |  |  | The \class{excel_tab} class defines the usual properties of an | 
					
						
							|  |  |  | Excel-generated TAB-delimited file. | 
					
						
							|  |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-05-19 15:33:36 +00:00
										 |  |  | \begin{classdesc}{Sniffer}{} | 
					
						
							|  |  |  | The \class{Sniffer} class is used to deduce the format of a CSV file. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{classdesc} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-12-28 15:56:58 +00:00
										 |  |  | The \class{Sniffer} class provides two methods: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-05-19 15:33:36 +00:00
										 |  |  | \begin{methoddesc}{sniff}{sample\optional{,delimiters=None}} | 
					
						
							|  |  |  | Analyze the given \var{sample} and return a \class{Dialect} subclass | 
					
						
							|  |  |  | reflecting the parameters found.  If the optional \var{delimiters} parameter | 
					
						
							|  |  |  | is given, it is interpreted as a string containing possible valid delimiter | 
					
						
							|  |  |  | characters. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{methoddesc} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \begin{methoddesc}{has_header}{sample} | 
					
						
							|  |  |  | Analyze the sample text (presumed to be in CSV format) and return | 
					
						
							|  |  |  | \constant{True} if the first row appears to be a series of column | 
					
						
							|  |  |  | headers. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The \module{csv} module defines the following constants: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-06-04 15:30:13 +00:00
										 |  |  | \begin{datadesc}{QUOTE_ALL} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Instructs \class{writer} objects to quote all fields. | 
					
						
							|  |  |  | \end{datadesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{datadesc}{QUOTE_MINIMAL} | 
					
						
							|  |  |  | Instructs \class{writer} objects to only quote those fields which contain | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | special characters such as \var{delimiter}, \var{quotechar} or any of the | 
					
						
							|  |  |  | characters in \var{lineterminator}. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{datadesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{datadesc}{QUOTE_NONNUMERIC} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Instructs \class{writer} objects to quote all non-numeric | 
					
						
							|  |  |  | fields.  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Instructs the reader to convert all non-quoted fields to type \var{float}. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{datadesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{datadesc}{QUOTE_NONE} | 
					
						
							|  |  |  | Instructs \class{writer} objects to never quote fields.  When the current | 
					
						
							|  |  |  | \var{delimiter} occurs in output data it is preceded by the current | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | \var{escapechar} character.  If \var{escapechar} is not set, the writer | 
					
						
							|  |  |  | will raise \exception{Error} if any characters that require escaping | 
					
						
							|  |  |  | are encountered. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Instructs \class{reader} to perform no special processing of quote characters. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{datadesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The \module{csv} module defines the following exception: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{excdesc}{Error} | 
					
						
							|  |  |  | Raised by any of the functions when an error is detected. | 
					
						
							|  |  |  | \end{excdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \subsection{Dialects and Formatting Parameters\label{csv-fmt-params}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | To make it easier to specify the format of input and output records, | 
					
						
							|  |  |  | specific formatting parameters are grouped together into dialects.  A | 
					
						
							|  |  |  | dialect is a subclass of the \class{Dialect} class having a set of specific | 
					
						
							|  |  |  | methods and a single \method{validate()} method.  When creating \class{reader} | 
					
						
							|  |  |  | or \class{writer} objects, the programmer can specify a string or a subclass | 
					
						
							|  |  |  | of the \class{Dialect} class as the dialect parameter.  In addition to, or | 
					
						
							|  |  |  | instead of, the \var{dialect} parameter, the programmer can also specify | 
					
						
							|  |  |  | individual formatting parameters, which have the same names as the | 
					
						
							| 
									
										
										
										
											2003-08-31 05:44:54 +00:00
										 |  |  | attributes defined below for the \class{Dialect} class. | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | Dialects support the following attributes: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{delimiter} | 
					
						
							|  |  |  | A one-character string used to separate fields.  It defaults to \code{','}. | 
					
						
							|  |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{doublequote} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Controls how instances of \var{quotechar} appearing inside a field should | 
					
						
							|  |  |  | be themselves be quoted.  When \constant{True}, the character is doubled. | 
					
						
							|  |  |  | When \constant{False}, the \var{escapechar} is used as a prefix to the | 
					
						
							|  |  |  | \var{quotechar}.  It defaults to \constant{True}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | On output, if \var{doublequote} is \constant{False} and no | 
					
						
							|  |  |  | \var{escapechar} is set, \exception{Error} is raised if a \var{quotechar} | 
					
						
							|  |  |  | is found in a field. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{escapechar} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | A one-character string used by the writer to escape the \var{delimiter} if | 
					
						
							|  |  |  | \var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar} | 
					
						
							|  |  |  | if \var{doublequote} is \constant{False}. On reading, the \var{escapechar} | 
					
						
							|  |  |  | removes any special meaning from the following character. It defaults | 
					
						
							|  |  |  | to \constant{None}, which disables escaping. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{lineterminator} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | The string used to terminate lines produced by the \class{writer}. | 
					
						
							|  |  |  | It defaults to \code{'\e r\e n'}.  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \note{The \class{reader} is hard-coded to recognise either \code{'\e r'} | 
					
						
							|  |  |  | or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This | 
					
						
							|  |  |  | behavior may change in the future.} | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{quotechar} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | A one-character string used to quote fields containing special characters, | 
					
						
							|  |  |  | such as the \var{delimiter} or \var{quotechar}, or which contain new-line | 
					
						
							|  |  |  | characters.  It defaults to \code{'"'}. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{quoting} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Controls when quotes should be generated by the writer and recognised | 
					
						
							|  |  |  | by the reader.  It can take on any of the \constant{QUOTE_*} constants | 
					
						
							|  |  |  | (see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}. | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[Dialect]{skipinitialspace} | 
					
						
							|  |  |  | When \constant{True}, whitespace immediately following the \var{delimiter} | 
					
						
							|  |  |  | is ignored.  The default is \constant{False}. | 
					
						
							|  |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \subsection{Reader Objects} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | Reader objects (\class{DictReader} instances and objects returned by | 
					
						
							| 
									
										
										
										
											2003-08-31 05:44:54 +00:00
										 |  |  | the \function{reader()} function) have the following public methods: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \begin{methoddesc}[csv reader]{next}{} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Return the next row of the reader's iterable object as a list, parsed | 
					
						
							|  |  |  | according to the current dialect. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Reader objects have the following public attributes: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[csv reader]{dialect} | 
					
						
							|  |  |  | A read-only description of the dialect in use by the parser. | 
					
						
							|  |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[csv reader]{line_num} | 
					
						
							|  |  |  |  The number of lines read from the source iterator. This is not the same | 
					
						
							|  |  |  |  as the number of records returned, as records can span multiple lines. | 
					
						
							|  |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \subsection{Writer Objects} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2004-01-21 13:47:04 +00:00
										 |  |  | \class{Writer} objects (\class{DictWriter} instances and objects returned by | 
					
						
							|  |  |  | the \function{writer()} function) have the following public methods.  A | 
					
						
							|  |  |  | {}\var{row} must be a sequence of strings or numbers for \class{Writer} | 
					
						
							|  |  |  | objects and a dictionary mapping fieldnames to strings or numbers (by | 
					
						
							|  |  |  | passing them through \function{str()} first) for {}\class{DictWriter} | 
					
						
							|  |  |  | objects.  Note that complex numbers are written out surrounded by parens. | 
					
						
							|  |  |  | This may cause some problems for other programs which read CSV files | 
					
						
							|  |  |  | (assuming they support complex numbers at all). | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \begin{methoddesc}[csv writer]{writerow}{row} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | Write the \var{row} parameter to the writer's file object, formatted | 
					
						
							|  |  |  | according to the current dialect. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | \begin{methoddesc}[csv writer]{writerows}{rows} | 
					
						
							| 
									
										
										
										
											2004-01-21 13:47:04 +00:00
										 |  |  | Write all the \var{rows} parameters (a list of \var{row} objects as | 
					
						
							|  |  |  | described above) to the writer's file object, formatted | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | according to the current dialect. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Writer objects have the following public attribute: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{memberdesc}[csv writer]{dialect} | 
					
						
							|  |  |  | A read-only description of the dialect in use by the writer. | 
					
						
							|  |  |  | \end{memberdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | \subsection{Examples\label{csv-examples}} | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | The simplest example of reading a CSV file: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | import csv | 
					
						
							| 
									
										
										
										
											2004-08-07 15:11:24 +00:00
										 |  |  | reader = csv.reader(open("some.csv", "rb")) | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | for row in reader: | 
					
						
							|  |  |  |     print row | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | Reading a file with an alternate format: | 
					
						
							| 
									
										
										
										
											2004-07-08 19:49:10 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE) | 
					
						
							| 
									
										
										
										
											2004-07-08 19:49:10 +00:00
										 |  |  | for row in reader: | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  |     print row | 
					
						
							| 
									
										
										
										
											2004-07-08 19:49:10 +00:00
										 |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | The corresponding simplest possible writing example is: | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2003-04-25 18:02:34 +00:00
										 |  |  | import csv | 
					
						
							| 
									
										
										
										
											2004-08-07 15:11:24 +00:00
										 |  |  | writer = csv.writer(open("some.csv", "wb")) | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | writer.writerows(someiterable) | 
					
						
							| 
									
										
										
										
											2003-03-20 23:29:12 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Registering a new dialect: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | reader = csv.reader(open("passwd", "rb"), 'unixpwd') | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A slightly more advanced use of the reader - catching and reporting errors: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv, sys | 
					
						
							|  |  |  | filename = "some.csv" | 
					
						
							|  |  |  | reader = csv.reader(open(filename, "rb")) | 
					
						
							|  |  |  | try: | 
					
						
							|  |  |  |     for row in reader: | 
					
						
							|  |  |  |         print row | 
					
						
							|  |  |  | except csv.Error, e: | 
					
						
							|  |  |  |     sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
 | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | And while the module doesn't directly support parsing strings, it can | 
					
						
							|  |  |  | easily be done: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv | 
					
						
							| 
									
										
										
										
											2006-02-16 14:57:05 +00:00
										 |  |  | for row in csv.reader(['one,two,three']): | 
					
						
							|  |  |  |     print row | 
					
						
							| 
									
										
										
										
											2005-01-12 11:47:57 +00:00
										 |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | The \module{csv} module doesn't directly support reading and writing | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL | 
					
						
							|  |  |  | characters.  So you can write functions or classes that handle the | 
					
						
							|  |  |  | encoding and decoding for you as long as you avoid encodings like | 
					
						
							|  |  |  | UTF-16 that use NULs.  UTF-8 is recommended. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \function{unicode_csv_reader} below is a generator that wraps | 
					
						
							|  |  |  | \class{csv.reader} to handle Unicode CSV data (a list of Unicode | 
					
						
							|  |  |  | strings).  \function{utf_8_encoder} is a generator that encodes the | 
					
						
							|  |  |  | Unicode strings as UTF-8, one string (or row) at a time.  The encoded | 
					
						
							|  |  |  | strings are parsed by the CSV reader, and | 
					
						
							|  |  |  | \function{unicode_csv_reader} decodes the UTF-8-encoded cells back | 
					
						
							|  |  |  | into Unicode: | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  | def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): | 
					
						
							|  |  |  |     # csv.py doesn't do Unicode; encode temporarily as UTF-8: | 
					
						
							|  |  |  |     csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), | 
					
						
							|  |  |  |                             dialect=dialect, **kwargs) | 
					
						
							|  |  |  |     for row in csv_reader: | 
					
						
							|  |  |  |         # decode UTF-8 back to Unicode, cell by cell: | 
					
						
							|  |  |  |         yield [unicode(cell, 'utf-8') for cell in row] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | def utf_8_encoder(unicode_csv_data): | 
					
						
							|  |  |  |     for line in unicode_csv_data: | 
					
						
							|  |  |  |         yield line.encode('utf-8') | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For all other encodings the following \class{UnicodeReader} and | 
					
						
							|  |  |  | \class{UnicodeWriter} classes can be used. They take an additional | 
					
						
							|  |  |  | \var{encoding} parameter in their constructor and make sure that the data | 
					
						
							|  |  |  | passes the real reader or writer encoded as UTF-8: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import csv, codecs, cStringIO | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | class UTF8Recoder: | 
					
						
							|  |  |  |     """ | 
					
						
							|  |  |  |     Iterator that reads an encoded stream and reencodes the input to UTF-8 | 
					
						
							|  |  |  |     """ | 
					
						
							|  |  |  |     def __init__(self, f, encoding): | 
					
						
							|  |  |  |         self.reader = codecs.getreader(encoding)(f) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def __iter__(self): | 
					
						
							|  |  |  |         return self | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def next(self): | 
					
						
							|  |  |  |         return self.reader.next().encode("utf-8") | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | class UnicodeReader: | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |     """ | 
					
						
							|  |  |  |     A CSV reader which will iterate over lines in the CSV file "f", | 
					
						
							|  |  |  |     which is encoded in the given encoding. | 
					
						
							|  |  |  |     """ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  |     def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |         f = UTF8Recoder(f, encoding) | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  |         self.reader = csv.reader(f, dialect=dialect, **kwds) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def next(self): | 
					
						
							|  |  |  |         row = self.reader.next() | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |         return [unicode(s, "utf-8") for s in row] | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |     def __iter__(self): | 
					
						
							|  |  |  |         return self | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | class UnicodeWriter: | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |     """ | 
					
						
							|  |  |  |     A CSV writer which will write rows to CSV file "f", | 
					
						
							|  |  |  |     which is encoded in the given encoding. | 
					
						
							|  |  |  |     """ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  |     def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |         # Redirect output to a queue | 
					
						
							|  |  |  |         self.queue = cStringIO.StringIO() | 
					
						
							|  |  |  |         self.writer = csv.writer(self.queue, dialect=dialect, **kwds) | 
					
						
							|  |  |  |         self.stream = f | 
					
						
							|  |  |  |         self.encoder = codecs.getincrementalencoder(encoding)() | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |     def writerow(self, row): | 
					
						
							| 
									
										
										
										
											2006-04-21 10:40:58 +00:00
										 |  |  |         self.writer.writerow([s.encode("utf-8") for s in row]) | 
					
						
							|  |  |  |         # Fetch UTF-8 output from the queue ... | 
					
						
							|  |  |  |         data = self.queue.getvalue() | 
					
						
							|  |  |  |         data = data.decode("utf-8") | 
					
						
							|  |  |  |         # ... and reencode it into the target encoding | 
					
						
							|  |  |  |         data = self.encoder.encode(data) | 
					
						
							|  |  |  |         # write to the target stream | 
					
						
							|  |  |  |         self.stream.write(data) | 
					
						
							|  |  |  |         # empty queue | 
					
						
							|  |  |  |         self.queue.truncate(0) | 
					
						
							| 
									
										
										
										
											2005-03-18 16:56:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |     def writerows(self, rows): | 
					
						
							|  |  |  |         for row in rows: | 
					
						
							|  |  |  |             self.writerow(row) | 
					
						
							|  |  |  | \end{verbatim} |