mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 05:31:20 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			281 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			281 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \section{\module{csv} --- CSV File Reading and Writing}
 | |
| 
 | |
| \declaremodule{standard}{csv}
 | |
| \modulesynopsis{Write and read tabular data to and from delimited files.}
 | |
| 
 | |
| \versionadded{2.3}
 | |
| \index{csv}
 | |
| \indexii{data}{tabular}
 | |
| 
 | |
| The so-called CSV (Comma Separated Values) format is the most common import
 | |
| and export format for spreadsheets and databases.  There is no ``CSV
 | |
| standard'', so the format is operationally defined by the many applications
 | |
| which read and write it.  The lack of a standard means that subtle
 | |
| differences often exist in the data produced and consumed by different
 | |
| applications.  These differences can make it annoying to process CSV files
 | |
| from multiple sources.  Still, while the delimiters and quoting characters
 | |
| vary, the overall format is similar enough that it is possible to write a
 | |
| single module which can efficiently manipulate such data, hiding the details
 | |
| of reading and writing the data from the programmer.
 | |
| 
 | |
| The \module{csv} module implements classes to read and write tabular data in
 | |
| CSV format.  It allows programmers to say, ``write this data in the format
 | |
| preferred by Excel,'' or ``read data from this file which was generated by
 | |
| Excel,'' without knowing the precise details of the CSV format used by
 | |
| Excel.  Programmers can also describe the CSV formats understood by other
 | |
| applications or define their own special-purpose CSV formats.
 | |
| 
 | |
| The \module{csv} module's \class{reader} and \class{writer} objects read and
 | |
| write sequences.  Programmers can also read and write data in dictionary
 | |
| form using the \class{DictReader} and \class{DictWriter} classes.
 | |
| 
 | |
| \note{The first version of the \module{csv} module doesn't support Unicode
 | |
| input.  Also, there are currently some issues regarding \ASCII{} NUL
 | |
| characters.  Accordingly, all input should generally be plain \ASCII{} to be
 | |
| safe.  These restrictions will be removed in the future.}
 | |
| 
 | |
| \begin{seealso}
 | |
| %  \seemodule{array}{Arrays of uniformly types numeric values.}
 | |
|   \seepep{305}{CSV File API}
 | |
|          {The Python Enhancement Proposal which proposed this addition
 | |
|           to Python.}
 | |
| \end{seealso}
 | |
| 
 | |
| 
 | |
| \subsection{Module Contents}
 | |
| 
 | |
| 
 | |
| The \module{csv} module defines the following functions:
 | |
| 
 | |
| \begin{funcdesc}{reader}{csvfile\optional{,
 | |
|                          dialect=\code{'excel'}\optional{, fmtparam}}}
 | |
| Return a reader object which will iterate over lines in the given
 | |
| {}\var{csvfile}.  \var{csvfile} can be any object which supports the
 | |
| iterator protocol and returns a string each time its \method{next}
 | |
| method is called.  An optional \var{dialect} parameter can be given
 | |
| which is used to define a set of parameters specific to a particular CSV
 | |
| dialect.  It may be an instance of a subclass of the \class{Dialect}
 | |
| class or one of the strings returned by the \function{list_dialects}
 | |
| function.  The other optional {}\var{fmtparam} keyword arguments can be
 | |
| given to override individual formatting parameters in the current
 | |
| dialect.  For more information about the dialect and formatting
 | |
| parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
 | |
| Parameters'' for details of these parameters.
 | |
| 
 | |
| All data read are returned as strings.  No automatic data type
 | |
| conversion is performed.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{writer}{csvfile\optional{,
 | |
|                          dialect=\code{'excel'}\optional{, fmtparam}}}
 | |
| Return a writer object responsible for converting the user's data into
 | |
| delimited strings on the given file-like object.  An optional
 | |
| {}\var{dialect} parameter can be given which is used to define a set of
 | |
| parameters specific to a particular CSV dialect.  It may be an instance
 | |
| of a subclass of the \class{Dialect} class or one of the strings
 | |
| returned by the \function{list_dialects} function.  The other optional
 | |
| {}\var{fmtparam} keyword arguments can be given to override individual
 | |
| formatting parameters in the current dialect.  For more information
 | |
| about the dialect and formatting parameters, see
 | |
| section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
 | |
| details of these parameters.  To make it as easy as possible to
 | |
| interface with modules which implement the DB API, the value
 | |
| \constant{None} is written as the empty string.  While this isn't a
 | |
| reversible transformation, it makes it easier to dump SQL NULL data values
 | |
| to CSV files without preprocessing the data returned from a
 | |
| \code{cursor.fetch*()} call.  All other non-string data are stringified
 | |
| with \function{str()} before being written.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{register_dialect}{name, dialect}
 | |
| Associate \var{dialect} with \var{name}.  \var{dialect} must be a subclass
 | |
| of \class{csv.Dialect}.  \var{name} must be a string or Unicode object.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{unregister_dialect}{name}
 | |
| Delete the dialect associated with \var{name} from the dialect registry.  An
 | |
| \exception{Error} is raised if \var{name} is not a registered dialect
 | |
| name.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{get_dialect}{name}
 | |
| Return the dialect associated with \var{name}.  An \exception{Error} is
 | |
| raised if \var{name} is not a registered dialect name.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{list_dialects}{}
 | |
| Return the names of all registered dialects.
 | |
| \end{funcdesc}
 | |
| 
 | |
| 
 | |
| The \module{csv} module defines the following classes:
 | |
| 
 | |
| \begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
 | |
|                               restkey=\code{None}\optional{,
 | |
| 			      restval=\code{None}\optional{,
 | |
|                               dialect=\code{'excel'}\optional{,
 | |
| 			      fmtparam}}}}}
 | |
| Create an object which operates like a regular reader but maps the
 | |
| information read into a dict whose keys are given by the \var{fieldnames}
 | |
| parameter.  If the row read has fewer fields than the fieldnames sequence,
 | |
| the value of \var{restval} will be used as the default value.  If the row
 | |
| read has more fields than the fieldnames sequence, the remaining data is
 | |
| added as a sequence keyed by the value of \var{restkey}.  If the row read
 | |
| has fewer fields than the fieldnames sequence, the remaining keys take the
 | |
| value of the optiona \var{restval} parameter.  All other parameters are
 | |
| interpreted as for regular readers.
 | |
| \end{classdesc}
 | |
| 
 | |
| 
 | |
| \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
 | |
|                               restval=""\optional{,
 | |
|                               extrasaction=\code{'raise'}\optional{,
 | |
|                               dialect=\code{'excel'}\optional{, fmtparam}}}}}
 | |
| Create an object which operates like a regular writer but maps dictionaries
 | |
| onto output rows.  The \var{fieldnames} parameter identifies the order in
 | |
| which values in the dictionary passed to the \method{writerow()} method are
 | |
| written to the \var{csvfile}.  The optional \var{restval} parameter
 | |
| specifies the value to be written if the dictionary is missing a key in
 | |
| \var{fieldnames}.  If the dictionary passed to the \method{writerow()}
 | |
| method contains a key not found in \var{fieldnames}, the optional
 | |
| \var{extrasaction} parameter indicates what action to take.  If it is set
 | |
| to \code{'raise'} a \exception{ValueError} is raised.  If it is set to
 | |
| \code{'ignore'}, extra values in the dictionary are ignored.  All other
 | |
| parameters are interpreted as for regular writers.
 | |
| \end{classdesc}
 | |
| 
 | |
| 
 | |
| \begin{classdesc*}{Dialect}{}
 | |
| The \class{Dialect} class is a container class relied on primarily for its
 | |
| attributes, which are used to define the parameters for a specific
 | |
| \class{reader} or \class{writer} instance.  Dialect objects support the
 | |
| following data attributes:
 | |
| 
 | |
| \begin{memberdesc}[string]{delimiter}
 | |
| A one-character string used to separate fields.  It defaults to \code{","}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}[boolean]{doublequote}
 | |
| Controls how instances of \var{quotechar} appearing inside a field should be
 | |
| themselves be quoted.  When \constant{True}, the character is doubledd.
 | |
| When \constant{False}, the \var{escapechar} must be a one-character string
 | |
| which is used as a prefix to the \var{quotechar}.  It defaults to
 | |
| \constant{True}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}{escapechar}
 | |
| A one-character string used to escape the \var{delimiter} if \var{quoting}
 | |
| is set to \constant{QUOTE_NONE}.  It defaults to \constant{None}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}[string]{lineterminator}
 | |
| The string used to terminate lines in the CSV file.  It defaults to
 | |
| \code{"\e r\e n"}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}[string]{quotechar}
 | |
| A one-character string used to quote elements containing the \var{delimiter}
 | |
| or which start with the \var{quotechar}.  It defaults to \code{'"'}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}[integer]{quoting}
 | |
| Controls when quotes should be generated by the writer.  It can take on any
 | |
| of the \code{QUOTE_*} constants defined below and defaults to
 | |
| \constant{QUOTE_MINIMAL}. 
 | |
| \end{memberdesc}
 | |
| 
 | |
| \begin{memberdesc}[boolean]{skipinitialspace}
 | |
| When \constant{True}, whitespace immediately following the \var{delimiter}
 | |
| is ignored.  The default is \constant{False}.
 | |
| \end{memberdesc}
 | |
| 
 | |
| \end{classdesc*}
 | |
| 
 | |
| The \module{csv} module defines the following constants:
 | |
| 
 | |
| \begin{datadesc}{QUOTE_ALWAYS}
 | |
| Instructs \class{writer} objects to quote all fields.
 | |
| \end{datadesc}
 | |
| 
 | |
| \begin{datadesc}{QUOTE_MINIMAL}
 | |
| Instructs \class{writer} objects to only quote those fields which contain
 | |
| the current \var{delimiter} or begin with the current \var{quotechar}.
 | |
| \end{datadesc}
 | |
| 
 | |
| \begin{datadesc}{QUOTE_NONNUMERIC}
 | |
| Instructs \class{writer} objects to quote all non-numeric fields.
 | |
| \end{datadesc}
 | |
| 
 | |
| \begin{datadesc}{QUOTE_NONE}
 | |
| Instructs \class{writer} objects to never quote fields.  When the current
 | |
| \var{delimiter} occurs in output data it is preceded by the current
 | |
| \var{escapechar} character.  When \constant{QUOTE_NONE} is in effect, it
 | |
| is an error not to have a single-character \var{escapechar} defined, even if
 | |
| no data to be written contains the \var{delimiter} character.
 | |
| \end{datadesc}
 | |
| 
 | |
| 
 | |
| The \module{csv} module defines the following exception:
 | |
| 
 | |
| \begin{excdesc}{Error}
 | |
| Raised by any of the functions when an error is detected.
 | |
| \end{excdesc}
 | |
| 
 | |
| 
 | |
| \subsection{Dialects and Formatting Parameters\label{fmt-params}}
 | |
| 
 | |
| To make it easier to specify the format of input and output records,
 | |
| specific formatting parameters are grouped together into dialects.  A
 | |
| dialect is a subclass of the \class{Dialect} class having a set of specific
 | |
| methods and a single \method{validate()} method.  When creating \class{reader}
 | |
| or \class{writer} objects, the programmer can specify a string or a subclass
 | |
| of the \class{Dialect} class as the dialect parameter.  In addition to, or
 | |
| instead of, the \var{dialect} parameter, the programmer can also specify
 | |
| individual formatting parameters, which have the same names as the
 | |
| attributes defined above for the \class{Dialect} class.
 | |
| 
 | |
| 
 | |
| \subsection{Reader Objects}
 | |
| 
 | |
| \class{DictReader} and \var{reader} objects have the following public
 | |
| methods:
 | |
| 
 | |
| \begin{methoddesc}{next}{}
 | |
| Return the next row of the reader's iterable object as a list, parsed
 | |
| according to the current dialect.
 | |
| \end{methoddesc}
 | |
| 
 | |
| 
 | |
| \subsection{Writer Objects}
 | |
| 
 | |
| \class{DictWriter} and \var{writer} objects have the following public
 | |
| methods:
 | |
| 
 | |
| \begin{methoddesc}{writerow}{row}
 | |
| Write the \var{row} parameter to the writer's file object, formatted
 | |
| according to the current dialect.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}{writerows}{rows}
 | |
| Write all the \var{rows} parameters to the writer's file object, formatted
 | |
| according to the current dialect.
 | |
| \end{methoddesc}
 | |
| 
 | |
| 
 | |
| \subsection{Examples}
 | |
| 
 | |
| The ``Hello, world'' of csv reading is
 | |
| 
 | |
| \begin{verbatim}
 | |
|     reader = csv.reader(file("some.csv"))
 | |
|     for row in reader:
 | |
|         print row
 | |
| \end{verbatim}
 | |
| 
 | |
| The corresponding simplest possible writing example is
 | |
| 
 | |
| \begin{verbatim}
 | |
|     writer = csv.writer(file("some.csv", "w"))
 | |
|     for row in someiterable:
 | |
|         writer.writerow(row)
 | |
| \end{verbatim}
 | 
