| 
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 |  |  | .. _xml:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | XML Processing Modules
 | 
					
						
							|  |  |  | ======================
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | .. module:: xml
 | 
					
						
							|  |  |  |    :synopsis: Package containing XML processing modules
 | 
					
						
							| 
									
										
										
										
											2016-06-11 15:02:54 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | .. sectionauthor:: Christian Heimes <christian@python.org>
 | 
					
						
							|  |  |  | .. sectionauthor:: Georg Brandl <georg@python.org>
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-11 15:02:54 -04:00
										 |  |  | **Source code:** :source:`Lib/xml/`
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --------------
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 |  |  | Python's interfaces for processing XML are grouped in the ``xml`` package.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | .. warning::
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    The XML modules are not secure against erroneous or maliciously
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |    constructed data.  If you need to parse untrusted or
 | 
					
						
							|  |  |  |    unauthenticated data see the :ref:`xml-vulnerabilities` and
 | 
					
						
							| 
									
										
										
										
											2020-09-04 14:57:48 -06:00
										 |  |  |    :ref:`defusedxml-package` sections.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:47:23 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 |  |  | It is important to note that modules in the :mod:`xml` package require that
 | 
					
						
							|  |  |  | there be at least one SAX-compliant XML parser available. The Expat parser is
 | 
					
						
							|  |  |  | included with Python, so the :mod:`xml.parsers.expat` module will always be
 | 
					
						
							|  |  |  | available.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
 | 
					
						
							|  |  |  | definition of the Python bindings for the DOM and SAX interfaces.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The XML handling submodules are:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
 | 
					
						
							| 
									
										
										
										
											2014-01-31 11:30:36 -06:00
										 |  |  |   XML processor
 | 
					
						
							| 
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  | ..
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * :mod:`xml.dom`: the DOM API definition
 | 
					
						
							| 
									
										
										
										
											2013-12-22 01:57:01 +01:00
										 |  |  | * :mod:`xml.dom.minidom`: a minimal DOM implementation
 | 
					
						
							| 
									
										
										
										
											2012-10-06 13:49:34 +02:00
										 |  |  | * :mod:`xml.dom.pulldom`: support for building partial DOM trees
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ..
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * :mod:`xml.sax`: SAX2 base classes and convenience functions
 | 
					
						
							|  |  |  | * :mod:`xml.parsers.expat`: the Expat parser binding
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. _xml-vulnerabilities:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | XML vulnerabilities
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | -------------------
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | The XML processing modules are not secure against maliciously constructed data.
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | An attacker can abuse XML features to carry out denial of service attacks,
 | 
					
						
							|  |  |  | access local files, generate network connections to other machines, or
 | 
					
						
							|  |  |  | circumvent firewalls.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | The following table gives an overview of the known attacks and whether
 | 
					
						
							|  |  |  | the various modules are vulnerable to them.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-10-13 14:31:50 -07:00
										 |  |  | =========================  ==============   ===============   ==============   ==============   ==============
 | 
					
						
							|  |  |  | kind                       sax              etree             minidom          pulldom          xmlrpc
 | 
					
						
							|  |  |  | =========================  ==============   ===============   ==============   ==============   ==============
 | 
					
						
							|  |  |  | billion laughs             **Vulnerable**   **Vulnerable**    **Vulnerable**   **Vulnerable**   **Vulnerable**
 | 
					
						
							|  |  |  | quadratic blowup           **Vulnerable**   **Vulnerable**    **Vulnerable**   **Vulnerable**   **Vulnerable**
 | 
					
						
							| 
									
										
										
										
											2018-09-23 09:50:25 +02:00
										 |  |  | external entity expansion  Safe (4)         Safe    (1)       Safe    (2)      Safe (4)         Safe    (3)
 | 
					
						
							|  |  |  | `DTD`_ retrieval           Safe (4)         Safe              Safe             Safe (4)         Safe
 | 
					
						
							| 
									
										
										
										
											2016-10-13 14:31:50 -07:00
										 |  |  | decompression bomb         Safe             Safe              Safe             Safe             **Vulnerable**
 | 
					
						
							|  |  |  | =========================  ==============   ===============   ==============   ==============   ==============
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |    :exc:`ParserError` when an entity occurs.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
 | 
					
						
							|  |  |  |    the unexpanded entity verbatim.
 | 
					
						
							|  |  |  | 3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
 | 
					
						
							| 
									
										
										
										
											2018-12-19 15:29:04 +02:00
										 |  |  | 4. Since Python 3.7.1, external general entities are no longer processed by
 | 
					
						
							| 
									
										
										
										
											2018-12-19 07:05:14 +01:00
										 |  |  |    default.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | billion laughs / exponential entity expansion
 | 
					
						
							|  |  |  |   The `Billion Laughs`_ attack -- also known as exponential entity expansion --
 | 
					
						
							|  |  |  |   uses multiple levels of nested entities. Each entity refers to another entity
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |   several times, and the final entity definition contains a small string.
 | 
					
						
							|  |  |  |   The exponential expansion results in several gigabytes of text and
 | 
					
						
							|  |  |  |   consumes lots of memory and CPU time.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | quadratic blowup entity expansion
 | 
					
						
							|  |  |  |   A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
 | 
					
						
							|  |  |  |   entity expansion, too. Instead of nested entities it repeats one large entity
 | 
					
						
							|  |  |  |   with a couple of thousand chars over and over again. The attack isn't as
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |   efficient as the exponential case but it avoids triggering parser countermeasures
 | 
					
						
							|  |  |  |   that forbid deeply-nested entities.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | external entity expansion
 | 
					
						
							|  |  |  |   Entity declarations can contain more than just text for replacement. They can
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |   also point to external resources or local files. The XML
 | 
					
						
							|  |  |  |   parser accesses the resource and embeds the content into the XML document.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-02-26 19:37:12 +01:00
										 |  |  | `DTD`_ retrieval
 | 
					
						
							| 
									
										
										
										
											2014-01-13 13:51:17 -05:00
										 |  |  |   Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  |   definitions from remote or local locations. The feature has similar
 | 
					
						
							|  |  |  |   implications as the external entity expansion issue.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | decompression bomb
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  |   Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
 | 
					
						
							|  |  |  |   that can parse compressed XML streams such as gzipped HTTP streams or
 | 
					
						
							|  |  |  |   LZMA-compressed
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  |   files. For an attacker it can reduce the amount of transmitted data by three
 | 
					
						
							|  |  |  |   magnitudes or more.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | The documentation for `defusedxml`_ on PyPI has further information about
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | all known attack vectors with examples and references.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-09-04 14:57:48 -06:00
										 |  |  | .. _defusedxml-package:
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-09-04 14:57:48 -06:00
										 |  |  | The :mod:`defusedxml` Package
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | ------------------------------------------------------
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-02-15 15:33:44 -05:00
										 |  |  | `defusedxml`_ is a pure Python package with modified subclasses of all stdlib
 | 
					
						
							|  |  |  | XML parsers that prevent any potentially malicious operation. Use of this
 | 
					
						
							|  |  |  | package is recommended for any server code that parses untrusted XML data. The
 | 
					
						
							|  |  |  | package also ships with example exploits and extended documentation on more
 | 
					
						
							|  |  |  | XML exploits such as XPath injection.
 | 
					
						
							| 
									
										
										
										
											2013-03-26 17:35:55 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-05-15 20:58:35 +02:00
										 |  |  | .. _defusedxml: https://pypi.org/project/defusedxml/
 | 
					
						
							| 
									
										
										
										
											2016-02-26 19:37:12 +01:00
										 |  |  | .. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
 | 
					
						
							|  |  |  | .. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
 | 
					
						
							|  |  |  | .. _DTD: https://en.wikipedia.org/wiki/Document_type_definition
 |