mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 23:21:29 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			146 lines
		
	
	
	
		
			6.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			146 lines
		
	
	
	
		
			6.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _xml:
 | 
						|
 | 
						|
XML Processing Modules
 | 
						|
======================
 | 
						|
 | 
						|
.. module:: xml
 | 
						|
   :synopsis: Package containing XML processing modules
 | 
						|
 | 
						|
.. sectionauthor:: Christian Heimes <christian@python.org>
 | 
						|
.. sectionauthor:: Georg Brandl <georg@python.org>
 | 
						|
 | 
						|
**Source code:** :source:`Lib/xml/`
 | 
						|
 | 
						|
--------------
 | 
						|
 | 
						|
Python's interfaces for processing XML are grouped in the ``xml`` package.
 | 
						|
 | 
						|
.. warning::
 | 
						|
 | 
						|
   The XML modules are not secure against erroneous or maliciously
 | 
						|
   constructed data.  If you need to parse untrusted or
 | 
						|
   unauthenticated data see the :ref:`xml-vulnerabilities` and
 | 
						|
   :ref:`defusedxml-package` sections.
 | 
						|
 | 
						|
It is important to note that modules in the :mod:`xml` package require that
 | 
						|
there be at least one SAX-compliant XML parser available. The Expat parser is
 | 
						|
included with Python, so the :mod:`xml.parsers.expat` module will always be
 | 
						|
available.
 | 
						|
 | 
						|
The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
 | 
						|
definition of the Python bindings for the DOM and SAX interfaces.
 | 
						|
 | 
						|
The XML handling submodules are:
 | 
						|
 | 
						|
* :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
 | 
						|
  XML processor
 | 
						|
 | 
						|
..
 | 
						|
 | 
						|
* :mod:`xml.dom`: the DOM API definition
 | 
						|
* :mod:`xml.dom.minidom`: a minimal DOM implementation
 | 
						|
* :mod:`xml.dom.pulldom`: support for building partial DOM trees
 | 
						|
 | 
						|
..
 | 
						|
 | 
						|
* :mod:`xml.sax`: SAX2 base classes and convenience functions
 | 
						|
* :mod:`xml.parsers.expat`: the Expat parser binding
 | 
						|
 | 
						|
 | 
						|
.. _xml-vulnerabilities:
 | 
						|
 | 
						|
XML vulnerabilities
 | 
						|
-------------------
 | 
						|
 | 
						|
The XML processing modules are not secure against maliciously constructed data.
 | 
						|
An attacker can abuse XML features to carry out denial of service attacks,
 | 
						|
access local files, generate network connections to other machines, or
 | 
						|
circumvent firewalls.
 | 
						|
 | 
						|
The following table gives an overview of the known attacks and whether
 | 
						|
the various modules are vulnerable to them.
 | 
						|
 | 
						|
=========================  ==================  ==================  ==================  ==================  ==================
 | 
						|
kind                       sax                 etree               minidom             pulldom             xmlrpc
 | 
						|
=========================  ==================  ==================  ==================  ==================  ==================
 | 
						|
billion laughs             **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)
 | 
						|
quadratic blowup           **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)
 | 
						|
external entity expansion  Safe (5)            Safe (2)            Safe (3)            Safe (5)            Safe (4)
 | 
						|
`DTD`_ retrieval           Safe (5)            Safe                Safe                Safe (5)            Safe
 | 
						|
decompression bomb         Safe                Safe                Safe                Safe                **Vulnerable**
 | 
						|
large tokens               **Vulnerable** (6)  **Vulnerable** (6)  **Vulnerable** (6)  **Vulnerable** (6)  **Vulnerable** (6)
 | 
						|
=========================  ==================  ==================  ==================  ==================  ==================
 | 
						|
 | 
						|
1. Expat 2.4.1 and newer is not vulnerable to the "billion laughs" and
 | 
						|
   "quadratic blowup" vulnerabilities. Items still listed as vulnerable due to
 | 
						|
   potential reliance on system-provided libraries. Check
 | 
						|
   :const:`!pyexpat.EXPAT_VERSION`.
 | 
						|
2. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
 | 
						|
   :exc:`~xml.etree.ElementTree.ParseError` when an entity occurs.
 | 
						|
3. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
 | 
						|
   the unexpanded entity verbatim.
 | 
						|
4. :mod:`xmlrpc.client` doesn't expand external entities and omits them.
 | 
						|
5. Since Python 3.7.1, external general entities are no longer processed by
 | 
						|
   default.
 | 
						|
6. Expat 2.6.0 and newer is not vulnerable to denial of service
 | 
						|
   through quadratic runtime caused by parsing large tokens.
 | 
						|
   Items still listed as vulnerable due to
 | 
						|
   potential reliance on system-provided libraries. Check
 | 
						|
   :const:`!pyexpat.EXPAT_VERSION`.
 | 
						|
 | 
						|
 | 
						|
billion laughs / exponential entity expansion
 | 
						|
  The `Billion Laughs`_ attack -- also known as exponential entity expansion --
 | 
						|
  uses multiple levels of nested entities. Each entity refers to another entity
 | 
						|
  several times, and the final entity definition contains a small string.
 | 
						|
  The exponential expansion results in several gigabytes of text and
 | 
						|
  consumes lots of memory and CPU time.
 | 
						|
 | 
						|
quadratic blowup entity expansion
 | 
						|
  A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
 | 
						|
  entity expansion, too. Instead of nested entities it repeats one large entity
 | 
						|
  with a couple of thousand chars over and over again. The attack isn't as
 | 
						|
  efficient as the exponential case but it avoids triggering parser countermeasures
 | 
						|
  that forbid deeply nested entities.
 | 
						|
 | 
						|
external entity expansion
 | 
						|
  Entity declarations can contain more than just text for replacement. They can
 | 
						|
  also point to external resources or local files. The XML
 | 
						|
  parser accesses the resource and embeds the content into the XML document.
 | 
						|
 | 
						|
`DTD`_ retrieval
 | 
						|
  Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
 | 
						|
  definitions from remote or local locations. The feature has similar
 | 
						|
  implications as the external entity expansion issue.
 | 
						|
 | 
						|
decompression bomb
 | 
						|
  Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
 | 
						|
  that can parse compressed XML streams such as gzipped HTTP streams or
 | 
						|
  LZMA-compressed
 | 
						|
  files. For an attacker it can reduce the amount of transmitted data by three
 | 
						|
  magnitudes or more.
 | 
						|
 | 
						|
large tokens
 | 
						|
  Expat needs to re-parse unfinished tokens; without the protection
 | 
						|
  introduced in Expat 2.6.0, this can lead to quadratic runtime that can
 | 
						|
  be used to cause denial of service in the application parsing XML.
 | 
						|
  The issue is known as :cve:`2023-52425`.
 | 
						|
 | 
						|
The documentation for :pypi:`defusedxml` on PyPI has further information about
 | 
						|
all known attack vectors with examples and references.
 | 
						|
 | 
						|
.. _defusedxml-package:
 | 
						|
 | 
						|
The :mod:`!defusedxml` Package
 | 
						|
------------------------------
 | 
						|
 | 
						|
:pypi:`defusedxml` is a pure Python package with modified subclasses of all stdlib
 | 
						|
XML parsers that prevent any potentially malicious operation. Use of this
 | 
						|
package is recommended for any server code that parses untrusted XML data. The
 | 
						|
package also ships with example exploits and extended documentation on more
 | 
						|
XML exploits such as XPath injection.
 | 
						|
 | 
						|
 | 
						|
.. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
 | 
						|
.. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
 | 
						|
.. _DTD: https://en.wikipedia.org/wiki/Document_type_definition
 |