2012-10-06 13:49:34 +02:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _xml:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								XML Processing Modules
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								======================
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. module:: xml
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								   :synopsis: Package containing XML processing modules
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. sectionauthor:: Christian Heimes <christian@python.org>
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. sectionauthor:: Georg Brandl <georg@python.org>
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Python's interfaces for processing XML are grouped in the ``xml`` package.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. warning::
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								   The XML modules are not secure against erroneous or maliciously
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								   constructed data.  If you need to parse untrusted or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								   unauthenticated data see the :ref:`xml-vulnerabilities` and
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								   :ref:`defused-packages` sections.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:47:23 +01:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								It is important to note that modules in the :mod:`xml` package require that
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								there be at least one SAX-compliant XML parser available. The Expat parser is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								included with Python, so the :mod:`xml.parsers.expat` module will always be
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								available.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								definition of the Python bindings for the DOM and SAX interfaces.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The XML handling submodules are:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
							 | 
						
					
						
							
								
									
										
										
										
											2014-01-31 11:30:36 -06:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  XML processor
							 | 
						
					
						
							
								
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								..
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.dom`: the DOM API definition
							 | 
						
					
						
							
								
									
										
										
										
											2013-12-22 01:57:01 +01:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.dom.minidom`: a minimal DOM implementation
							 | 
						
					
						
							
								
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.dom.pulldom`: support for building partial DOM trees
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								..
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.sax`: SAX2 base classes and convenience functions
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								* :mod:`xml.parsers.expat`: the Expat parser binding
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _xml-vulnerabilities:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								XML vulnerabilities
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								-------------------
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The XML processing modules are not secure against maliciously constructed data.
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								An attacker can abuse XML features to carry out denial of service attacks,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								access local files, generate network connections to other machines, or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								circumvent firewalls.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								The following table gives an overview of the known attacks and whether
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the various modules are vulnerable to them.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								=========================  ========  =========  =========  ========  =========
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								kind                       sax       etree      minidom    pulldom   xmlrpc
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								=========================  ========  =========  =========  ========  =========
							 | 
						
					
						
							
								
									
										
										
										
											2013-10-12 18:19:33 +02:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								billion laughs             **Yes**   **Yes**    **Yes**    **Yes**   **Yes**
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								quadratic blowup           **Yes**   **Yes**    **Yes**    **Yes**   **Yes**
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								external entity expansion  **Yes**   No    (1)  No    (2)  **Yes**   No    (3)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								DTD retrieval              **Yes**   No         No         **Yes**   No
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								decompression bomb         No        No         No         No        **Yes**
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								=========================  ========  =========  =========  ========  =========
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								   :exc:`ParserError` when an entity occurs.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								   the unexpanded entity verbatim.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								billion laughs / exponential entity expansion
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  The `Billion Laughs`_ attack -- also known as exponential entity expansion --
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  uses multiple levels of nested entities. Each entity refers to another entity
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  several times, and the final entity definition contains a small string.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  The exponential expansion results in several gigabytes of text and
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  consumes lots of memory and CPU time.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								quadratic blowup entity expansion
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  entity expansion, too. Instead of nested entities it repeats one large entity
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  with a couple of thousand chars over and over again. The attack isn't as
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  efficient as the exponential case but it avoids triggering parser countermeasures
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  that forbid deeply-nested entities.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								external entity expansion
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Entity declarations can contain more than just text for replacement. They can
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  also point to external resources or local files. The XML
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  parser accesses the resource and embeds the content into the XML document.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								DTD retrieval
							 | 
						
					
						
							
								
									
										
										
										
											2014-01-13 13:51:17 -05:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  definitions from remote or local locations. The feature has similar
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  implications as the external entity expansion issue.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								decompression bomb
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  that can parse compressed XML streams such as gzipped HTTP streams or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  LZMA-compressed
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  files. For an attacker it can reduce the amount of transmitted data by three
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  magnitudes or more.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								The documentation for `defusedxml`_ on PyPI has further information about
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								all known attack vectors with examples and references.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								.. _defused-packages:
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								The :mod:`defusedxml` and :mod:`defusedexpat` Packages
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								------------------------------------------------------
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								`defusedxml`_ is a pure Python package with modified subclasses of all stdlib
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								XML parsers that prevent any potentially malicious operation. Use of this
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								package is recommended for any server code that parses untrusted XML data. The
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								package also ships with example exploits and extended documentation on more
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								XML exploits such as XPath injection.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2014-03-15 21:13:56 -07:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								`defusedexpat`_ provides a modified libexpat and a patched
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								:mod:`pyexpat` module that have countermeasures against entity expansion
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								DoS attacks. The :mod:`defusedexpat` module still allows a sane and configurable amount of entity
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								expansions. The modifications may be included in some future release of Python,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								but will not be included in any bugfix releases of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Python because they break backward compatibility.
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-28 09:11:44 +01:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								.. _defusedxml: https://pypi.python.org/pypi/defusedxml/
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/
							 | 
						
					
						
							
								
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _Billion Laughs: http://en.wikipedia.org/wiki/Billion_laughs
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _ZIP bomb: http://en.wikipedia.org/wiki/Zip_bomb
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								.. _DTD: http://en.wikipedia.org/wiki/Document_Type_Definition
							 |