mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 07:31:38 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			461 lines
		
	
	
	
		
			18 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			461 lines
		
	
	
	
		
			18 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
\section{\module{csv} --- CSV File Reading and Writing}
 | 
						|
 | 
						|
\declaremodule{standard}{csv}
 | 
						|
\modulesynopsis{Write and read tabular data to and from delimited files.}
 | 
						|
\sectionauthor{Skip Montanaro}{skip@pobox.com}
 | 
						|
 | 
						|
\versionadded{2.3}
 | 
						|
\index{csv}
 | 
						|
\indexii{data}{tabular}
 | 
						|
 | 
						|
The so-called CSV (Comma Separated Values) format is the most common import
 | 
						|
and export format for spreadsheets and databases.  There is no ``CSV
 | 
						|
standard'', so the format is operationally defined by the many applications
 | 
						|
which read and write it.  The lack of a standard means that subtle
 | 
						|
differences often exist in the data produced and consumed by different
 | 
						|
applications.  These differences can make it annoying to process CSV files
 | 
						|
from multiple sources.  Still, while the delimiters and quoting characters
 | 
						|
vary, the overall format is similar enough that it is possible to write a
 | 
						|
single module which can efficiently manipulate such data, hiding the details
 | 
						|
of reading and writing the data from the programmer.
 | 
						|
 | 
						|
The \module{csv} module implements classes to read and write tabular data in
 | 
						|
CSV format.  It allows programmers to say, ``write this data in the format
 | 
						|
preferred by Excel,'' or ``read data from this file which was generated by
 | 
						|
Excel,'' without knowing the precise details of the CSV format used by
 | 
						|
Excel.  Programmers can also describe the CSV formats understood by other
 | 
						|
applications or define their own special-purpose CSV formats.
 | 
						|
 | 
						|
The \module{csv} module's \class{reader} and \class{writer} objects read and
 | 
						|
write sequences.  Programmers can also read and write data in dictionary
 | 
						|
form using the \class{DictReader} and \class{DictWriter} classes.
 | 
						|
 | 
						|
\begin{notice}
 | 
						|
  This version of the \module{csv} module doesn't support Unicode
 | 
						|
  input.  Also, there are currently some issues regarding \ASCII{} NUL
 | 
						|
  characters.  Accordingly, all input should generally be printable
 | 
						|
  \ASCII{} to be safe.  These restrictions will be removed in the future.
 | 
						|
\end{notice}
 | 
						|
 | 
						|
\begin{seealso}
 | 
						|
%  \seemodule{array}{Arrays of uniformly types numeric values.}
 | 
						|
  \seepep{305}{CSV File API}
 | 
						|
         {The Python Enhancement Proposal which proposed this addition
 | 
						|
          to Python.}
 | 
						|
\end{seealso}
 | 
						|
 | 
						|
 | 
						|
\subsection{Module Contents \label{csv-contents}}
 | 
						|
 | 
						|
The \module{csv} module defines the following functions:
 | 
						|
 | 
						|
\begin{funcdesc}{reader}{csvfile\optional{,
 | 
						|
                         dialect=\code{'excel'}}\optional{, fmtparam}}
 | 
						|
Return a reader object which will iterate over lines in the given
 | 
						|
{}\var{csvfile}.  \var{csvfile} can be any object which supports the
 | 
						|
iterator protocol and returns a string each time its \method{next}
 | 
						|
method is called - file objects and list objects are both suitable.  
 | 
						|
If \var{csvfile} is a file object, it must be opened with
 | 
						|
the 'b' flag on platforms where that makes a difference.  An optional
 | 
						|
{}\var{dialect} parameter can be given
 | 
						|
which is used to define a set of parameters specific to a particular CSV
 | 
						|
dialect.  It may be an instance of a subclass of the \class{Dialect}
 | 
						|
class or one of the strings returned by the \function{list_dialects}
 | 
						|
function.  The other optional {}\var{fmtparam} keyword arguments can be
 | 
						|
given to override individual formatting parameters in the current
 | 
						|
dialect.  For more information about the dialect and formatting
 | 
						|
parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
 | 
						|
Parameters'' for details of these parameters.
 | 
						|
 | 
						|
All data read are returned as strings.  No automatic data type
 | 
						|
conversion is performed.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{writer}{csvfile\optional{,
 | 
						|
                         dialect=\code{'excel'}}\optional{, fmtparam}}
 | 
						|
Return a writer object responsible for converting the user's data into
 | 
						|
delimited strings on the given file-like object.  \var{csvfile} can be any
 | 
						|
object with a \function{write} method.  If \var{csvfile} is a file object,
 | 
						|
it must be opened with the 'b' flag on platforms where that makes a
 | 
						|
difference.  An optional
 | 
						|
{}\var{dialect} parameter can be given which is used to define a set of
 | 
						|
parameters specific to a particular CSV dialect.  It may be an instance
 | 
						|
of a subclass of the \class{Dialect} class or one of the strings
 | 
						|
returned by the \function{list_dialects} function.  The other optional
 | 
						|
{}\var{fmtparam} keyword arguments can be given to override individual
 | 
						|
formatting parameters in the current dialect.  For more information
 | 
						|
about the dialect and formatting parameters, see
 | 
						|
section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
 | 
						|
details of these parameters.  To make it as easy as possible to
 | 
						|
interface with modules which implement the DB API, the value
 | 
						|
\constant{None} is written as the empty string.  While this isn't a
 | 
						|
reversible transformation, it makes it easier to dump SQL NULL data values
 | 
						|
to CSV files without preprocessing the data returned from a
 | 
						|
\code{cursor.fetch*()} call.  All other non-string data are stringified
 | 
						|
with \function{str()} before being written.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}}
 | 
						|
Associate \var{dialect} with \var{name}.  \var{name} must be a string
 | 
						|
or Unicode object. The dialect can be specified either by passing a
 | 
						|
sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments,
 | 
						|
or both, with keyword arguments overriding parameters of the dialect.
 | 
						|
For more information about the dialect and formatting parameters, see
 | 
						|
section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''
 | 
						|
for details of these parameters.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{unregister_dialect}{name}
 | 
						|
Delete the dialect associated with \var{name} from the dialect registry.  An
 | 
						|
\exception{Error} is raised if \var{name} is not a registered dialect
 | 
						|
name.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{get_dialect}{name}
 | 
						|
Return the dialect associated with \var{name}.  An \exception{Error} is
 | 
						|
raised if \var{name} is not a registered dialect name.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{list_dialects}{}
 | 
						|
Return the names of all registered dialects.
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
\begin{funcdesc}{field_size_limit}{\optional{new_limit}}
 | 
						|
  Returns the current maximum field size allowed by the parser. If
 | 
						|
  \var{new_limit} is given, this becomes the new limit.
 | 
						|
  \versionadded{2.5}
 | 
						|
\end{funcdesc}
 | 
						|
 | 
						|
 | 
						|
The \module{csv} module defines the following classes:
 | 
						|
 | 
						|
\begin{classdesc}{DictReader}{csvfile\optional{,
 | 
						|
			      fieldnames=\constant{None},\optional{,
 | 
						|
                              restkey=\constant{None}\optional{,
 | 
						|
			      restval=\constant{None}\optional{,
 | 
						|
                              dialect=\code{'excel'}\optional{,
 | 
						|
			      *args, **kwds}}}}}}
 | 
						|
Create an object which operates like a regular reader but maps the
 | 
						|
information read into a dict whose keys are given by the optional
 | 
						|
{} \var{fieldnames}
 | 
						|
parameter.  If the \var{fieldnames} parameter is omitted, the values in
 | 
						|
the first row of the \var{csvfile} will be used as the fieldnames.
 | 
						|
If the row read has fewer fields than the fieldnames sequence,
 | 
						|
the value of \var{restval} will be used as the default value.  If the row
 | 
						|
read has more fields than the fieldnames sequence, the remaining data is
 | 
						|
added as a sequence keyed by the value of \var{restkey}.  If the row read
 | 
						|
has fewer fields than the fieldnames sequence, the remaining keys take the
 | 
						|
value of the optional \var{restval} parameter.  Any other optional or
 | 
						|
keyword arguments are passed to the underlying \class{reader} instance.
 | 
						|
\end{classdesc}
 | 
						|
 | 
						|
 | 
						|
\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
 | 
						|
                              restval=""\optional{,
 | 
						|
                              extrasaction=\code{'raise'}\optional{,
 | 
						|
                              dialect=\code{'excel'}\optional{,
 | 
						|
			      *args, **kwds}}}}}
 | 
						|
Create an object which operates like a regular writer but maps dictionaries
 | 
						|
onto output rows.  The \var{fieldnames} parameter identifies the order in
 | 
						|
which values in the dictionary passed to the \method{writerow()} method are
 | 
						|
written to the \var{csvfile}.  The optional \var{restval} parameter
 | 
						|
specifies the value to be written if the dictionary is missing a key in
 | 
						|
\var{fieldnames}.  If the dictionary passed to the \method{writerow()}
 | 
						|
method contains a key not found in \var{fieldnames}, the optional
 | 
						|
\var{extrasaction} parameter indicates what action to take.  If it is set
 | 
						|
to \code{'raise'} a \exception{ValueError} is raised.  If it is set to
 | 
						|
\code{'ignore'}, extra values in the dictionary are ignored.  Any other
 | 
						|
optional or keyword arguments are passed to the underlying \class{writer}
 | 
						|
instance.
 | 
						|
 | 
						|
Note that unlike the \class{DictReader} class, the \var{fieldnames}
 | 
						|
parameter of the \class{DictWriter} is not optional.  Since Python's
 | 
						|
\class{dict} objects are not ordered, there is not enough information
 | 
						|
available to deduce the order in which the row should be written to the
 | 
						|
\var{csvfile}.
 | 
						|
 | 
						|
\end{classdesc}
 | 
						|
 | 
						|
\begin{classdesc*}{Dialect}{}
 | 
						|
The \class{Dialect} class is a container class relied on primarily for its
 | 
						|
attributes, which are used to define the parameters for a specific
 | 
						|
\class{reader} or \class{writer} instance.
 | 
						|
\end{classdesc*}
 | 
						|
 | 
						|
\begin{classdesc}{excel}{}
 | 
						|
The \class{excel} class defines the usual properties of an Excel-generated
 | 
						|
CSV file.
 | 
						|
\end{classdesc}
 | 
						|
 | 
						|
\begin{classdesc}{excel_tab}{}
 | 
						|
The \class{excel_tab} class defines the usual properties of an
 | 
						|
Excel-generated TAB-delimited file.
 | 
						|
\end{classdesc}
 | 
						|
 | 
						|
\begin{classdesc}{Sniffer}{}
 | 
						|
The \class{Sniffer} class is used to deduce the format of a CSV file.
 | 
						|
\end{classdesc}
 | 
						|
 | 
						|
The \class{Sniffer} class provides a single method:
 | 
						|
 | 
						|
\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
 | 
						|
Analyze the given \var{sample} and return a \class{Dialect} subclass
 | 
						|
reflecting the parameters found.  If the optional \var{delimiters} parameter
 | 
						|
is given, it is interpreted as a string containing possible valid delimiter
 | 
						|
characters.
 | 
						|
\end{methoddesc}
 | 
						|
 | 
						|
\begin{methoddesc}{has_header}{sample}
 | 
						|
Analyze the sample text (presumed to be in CSV format) and return
 | 
						|
\constant{True} if the first row appears to be a series of column
 | 
						|
headers.
 | 
						|
\end{methoddesc}
 | 
						|
 | 
						|
 | 
						|
The \module{csv} module defines the following constants:
 | 
						|
 | 
						|
\begin{datadesc}{QUOTE_ALL}
 | 
						|
Instructs \class{writer} objects to quote all fields.
 | 
						|
\end{datadesc}
 | 
						|
 | 
						|
\begin{datadesc}{QUOTE_MINIMAL}
 | 
						|
Instructs \class{writer} objects to only quote those fields which contain
 | 
						|
special characters such as \var{delimiter}, \var{quotechar} or any of the
 | 
						|
characters in \var{lineterminator}.
 | 
						|
\end{datadesc}
 | 
						|
 | 
						|
\begin{datadesc}{QUOTE_NONNUMERIC}
 | 
						|
Instructs \class{writer} objects to quote all non-numeric
 | 
						|
fields. 
 | 
						|
 | 
						|
Instructs the reader to convert all non-quoted fields to type \var{float}.
 | 
						|
\end{datadesc}
 | 
						|
 | 
						|
\begin{datadesc}{QUOTE_NONE}
 | 
						|
Instructs \class{writer} objects to never quote fields.  When the current
 | 
						|
\var{delimiter} occurs in output data it is preceded by the current
 | 
						|
\var{escapechar} character.  If \var{escapechar} is not set, the writer
 | 
						|
will raise \exception{Error} if any characters that require escaping
 | 
						|
are encountered.
 | 
						|
 | 
						|
Instructs \class{reader} to perform no special processing of quote characters.
 | 
						|
\end{datadesc}
 | 
						|
 | 
						|
 | 
						|
The \module{csv} module defines the following exception:
 | 
						|
 | 
						|
\begin{excdesc}{Error}
 | 
						|
Raised by any of the functions when an error is detected.
 | 
						|
\end{excdesc}
 | 
						|
 | 
						|
 | 
						|
\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
 | 
						|
 | 
						|
To make it easier to specify the format of input and output records,
 | 
						|
specific formatting parameters are grouped together into dialects.  A
 | 
						|
dialect is a subclass of the \class{Dialect} class having a set of specific
 | 
						|
methods and a single \method{validate()} method.  When creating \class{reader}
 | 
						|
or \class{writer} objects, the programmer can specify a string or a subclass
 | 
						|
of the \class{Dialect} class as the dialect parameter.  In addition to, or
 | 
						|
instead of, the \var{dialect} parameter, the programmer can also specify
 | 
						|
individual formatting parameters, which have the same names as the
 | 
						|
attributes defined below for the \class{Dialect} class.
 | 
						|
 | 
						|
Dialects support the following attributes:
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{delimiter}
 | 
						|
A one-character string used to separate fields.  It defaults to \code{','}.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{doublequote}
 | 
						|
Controls how instances of \var{quotechar} appearing inside a field should
 | 
						|
be themselves be quoted.  When \constant{True}, the character is doubled.
 | 
						|
When \constant{False}, the \var{escapechar} is used as a prefix to the
 | 
						|
\var{quotechar}.  It defaults to \constant{True}.
 | 
						|
 | 
						|
On output, if \var{doublequote} is \constant{False} and no
 | 
						|
\var{escapechar} is set, \exception{Error} is raised if a \var{quotechar}
 | 
						|
is found in a field.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{escapechar}
 | 
						|
A one-character string used by the writer to escape the \var{delimiter} if
 | 
						|
\var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar}
 | 
						|
if \var{doublequote} is \constant{False}. On reading, the \var{escapechar}
 | 
						|
removes any special meaning from the following character. It defaults
 | 
						|
to \constant{None}, which disables escaping.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{lineterminator}
 | 
						|
The string used to terminate lines produced by the \class{writer}.
 | 
						|
It defaults to \code{'\e r\e n'}. 
 | 
						|
 | 
						|
\note{The \class{reader} is hard-coded to recognise either \code{'\e r'}
 | 
						|
or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This
 | 
						|
behavior may change in the future.}
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{quotechar}
 | 
						|
A one-character string used to quote fields containing special characters,
 | 
						|
such as the \var{delimiter} or \var{quotechar}, or which contain new-line
 | 
						|
characters.  It defaults to \code{'"'}.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{quoting}
 | 
						|
Controls when quotes should be generated by the writer and recognised
 | 
						|
by the reader.  It can take on any of the \constant{QUOTE_*} constants
 | 
						|
(see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[Dialect]{skipinitialspace}
 | 
						|
When \constant{True}, whitespace immediately following the \var{delimiter}
 | 
						|
is ignored.  The default is \constant{False}.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
 | 
						|
\subsection{Reader Objects}
 | 
						|
 | 
						|
Reader objects (\class{DictReader} instances and objects returned by
 | 
						|
the \function{reader()} function) have the following public methods:
 | 
						|
 | 
						|
\begin{methoddesc}[csv reader]{next}{}
 | 
						|
Return the next row of the reader's iterable object as a list, parsed
 | 
						|
according to the current dialect.
 | 
						|
\end{methoddesc}
 | 
						|
 | 
						|
Reader objects have the following public attributes:
 | 
						|
 | 
						|
\begin{memberdesc}[csv reader]{dialect}
 | 
						|
A read-only description of the dialect in use by the parser.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
\begin{memberdesc}[csv reader]{line_num}
 | 
						|
 The number of lines read from the source iterator. This is not the same
 | 
						|
 as the number of records returned, as records can span multiple lines.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
 | 
						|
\subsection{Writer Objects}
 | 
						|
 | 
						|
\class{Writer} objects (\class{DictWriter} instances and objects returned by
 | 
						|
the \function{writer()} function) have the following public methods.  A
 | 
						|
{}\var{row} must be a sequence of strings or numbers for \class{Writer}
 | 
						|
objects and a dictionary mapping fieldnames to strings or numbers (by
 | 
						|
passing them through \function{str()} first) for {}\class{DictWriter}
 | 
						|
objects.  Note that complex numbers are written out surrounded by parens.
 | 
						|
This may cause some problems for other programs which read CSV files
 | 
						|
(assuming they support complex numbers at all).
 | 
						|
 | 
						|
\begin{methoddesc}[csv writer]{writerow}{row}
 | 
						|
Write the \var{row} parameter to the writer's file object, formatted
 | 
						|
according to the current dialect.
 | 
						|
\end{methoddesc}
 | 
						|
 | 
						|
\begin{methoddesc}[csv writer]{writerows}{rows}
 | 
						|
Write all the \var{rows} parameters (a list of \var{row} objects as
 | 
						|
described above) to the writer's file object, formatted
 | 
						|
according to the current dialect.
 | 
						|
\end{methoddesc}
 | 
						|
 | 
						|
Writer objects have the following public attribute:
 | 
						|
 | 
						|
\begin{memberdesc}[csv writer]{dialect}
 | 
						|
A read-only description of the dialect in use by the writer.
 | 
						|
\end{memberdesc}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsection{Examples}
 | 
						|
 | 
						|
The simplest example of reading a CSV file:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
reader = csv.reader(open("some.csv", "rb"))
 | 
						|
for row in reader:
 | 
						|
    print row
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Reading a file with an alternate format:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
 | 
						|
for row in reader:
 | 
						|
    print row
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
The corresponding simplest possible writing example is:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
writer = csv.writer(open("some.csv", "wb"))
 | 
						|
writer.writerows(someiterable)
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Registering a new dialect:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
 | 
						|
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
 | 
						|
 | 
						|
reader = csv.reader(open("passwd", "rb"), 'unixpwd')
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
A slightly more advanced use of the reader - catching and reporting errors:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv, sys
 | 
						|
filename = "some.csv"
 | 
						|
reader = csv.reader(open(filename, "rb"))
 | 
						|
try:
 | 
						|
    for row in reader:
 | 
						|
        print row
 | 
						|
except csv.Error, e:
 | 
						|
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
And while the module doesn't directly support parsing strings, it can
 | 
						|
easily be done:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
print csv.reader(['one,two,three'])[0]
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
The \module{csv} module doesn't directly support reading and writing
 | 
						|
Unicode, but it is 8-bit clean save for some problems with \ASCII{} NUL
 | 
						|
characters, so you can write classes that handle the encoding and decoding
 | 
						|
for you as long as you avoid encodings like utf-16 that use NULs.
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
import csv
 | 
						|
 | 
						|
class UnicodeReader:
 | 
						|
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
 | 
						|
        self.reader = csv.reader(f, dialect=dialect, **kwds)
 | 
						|
        self.encoding = encoding
 | 
						|
 | 
						|
    def next(self):
 | 
						|
        row = self.reader.next()
 | 
						|
        return [unicode(s, self.encoding) for s in row]
 | 
						|
 | 
						|
    def __iter__(self):
 | 
						|
        return self
 | 
						|
 | 
						|
class UnicodeWriter:
 | 
						|
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
 | 
						|
        self.writer = csv.writer(f, dialect=dialect, **kwds)
 | 
						|
        self.encoding = encoding
 | 
						|
 | 
						|
    def writerow(self, row):
 | 
						|
        self.writer.writerow([s.encode("utf-8") for s in row])
 | 
						|
 | 
						|
    def writerows(self, rows):
 | 
						|
        for row in rows:
 | 
						|
            self.writerow(row)
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
They should work just like the \class{csv.reader} and \class{csv.writer}
 | 
						|
classes but add an \var{encoding} parameter.
 |