Canonical source repository for PyYAML
Find a file
Ingy döt Net 34a9bf8235
Dos in merge key (#937)
* Make local test targets self-contained

Most developer machines do not have libyaml headers installed, so the
old default test flow attempted to build the optional extension, failed
to find yaml.h, and then only exercised the pure Python fallback. The
extension target also required out-of-band system setup before it could
pass.

Split the Makefile test flow into explicit pure-Python and LibYAML
targets, and make the default test target run both. The pure-Python
target now forces the extension off during build, while the LibYAML
target builds a pinned local yaml/libyaml checkout and wires the
include, library, and runtime paths into the extension build.

Document the new Makefile workflow in the README, including make test,
the individual test targets, the pinned LIBYAML-REF default, and the
PYTHON-VERSION override for testing against a specific Python.

This makes both test modes pass from a clean checkout without requiring
developers to install libyaml development packages by hand.

* Fix exponential expansion DoS via duplicate merge key aliases

* Fix exponential expansion based DoS in merge key processing

Resolves #897
Supercedes #916

### Summary
The merge key (`<<`) constructor implementation in
`SafeConstructor.flatten_mapping()` was vulnerable to an
exponential time and memory complexity Denial of Service (DoS)
vulnerability. When mapping/sequence nodes are merged using
anchors/aliases, duplicate references to the same alias point
to the same MappingNode instance in Python. During merge key
processing, the node values are copied and extended in-place.
If the same node appears multiple times at different levels,
this causes exponential amplification of the elements list:
`2^(n+1) - 1`.

A small document under 1 KB can trigger millions of element
list extensions, exhausting CPU and memory during safe loading.

### Hardened Fix
This commit resolves the vulnerability and hardens it against
secondary vectors:
1. Tracks node identity using object ID (`id(node)`) in a single
`seen` set scoped to the parent mapping's `flatten_mapping()`
execution.
2. Checks and skips duplicate node references inside SequenceNode
merge keys (resolving PR-916).
3. Checks and skips duplicate node references across separate,
independent MappingNode merge keys in the same mapping (e.g.,
repeating `<<: *anchor` multiple times).
4. Ensures C-based loaders (e.g., `CSafeLoader`, `CLoader`) are
also protected since they inherit constructor logic from
`SafeConstructor`.

### Performance Impact
- Sequence-nested merge duplicates: Loading a 22-level nested
document drops from 3.76s to 0.0028s (O(N) linear complexity).
- Mapping-level merge duplicates: Loading a 20-level nested
document drops from 0.93s to 0.0026s.

### Tests
- Added regression tests to
`tests/legacy_tests/data/construct-merge.data` and
`tests/legacy_tests/data/construct-merge.code` covering both
duplicate sequence merges and duplicate direct merges.

* Fix CI toolchain drift failures

The PR was failing before it reached the PyYAML regression tests in two
places: the cp38 wheel jobs installed latest cibuildwheel, whose 4.x
series no longer supports Python 3.8, and the Windows arm64 libyaml build
used a newer CMake that rejects libyaml 0.2.5's old minimum policy version.

Pin the cp38 wheel jobs to cibuildwheel<4, and quote the matrix-provided
pip package spec so bash does not parse the '<4' constraint as input
redirection. Pass CMAKE_POLICY_VERSION_MINIMUM=3.5 to the Windows libyaml
configure step, and disable fail-fast for the Windows libyaml matrix so
one architecture does not hide the others.

* Add merge key deduplication regression tests

* Add time-bounded DoS prevention tests to verify fix

* Strengthen merge key DoS timeout tests

* Fix merge key fan-out expansion DoS

* Use structural merge DoS regression tests

Replace the timing-based merge DoS tests with deterministic checks of
the flattened mapping shape. Wall-clock thresholds are fragile on slow
or emulated CI runners and do not directly prove that the expansion was
prevented.

The fan-out regression case uses a small levels=4, width=4 payload from
the PR #916 shape. With the old behavior, flattening root expands to
1024 repeated key/value pairs and the test fails immediately because the
key list contains 1020 extra entries. With the fix, duplicate merge pairs
are collapsed during flattening and the same root mapping contains only
k0, k1, k2, and k3.

This keeps the test fast even if the old bug returns: it observes the
structural amplification directly instead of waiting for a large
pathological input to time out.

* Hybrid merge keys optimization

* A hybrid combination of fixes proposed by frenzymadness and akshat-sj to prevent exponential compute while resolving nested anchor aliases.
* Further optimization is possible with broader caching of anchor node graph, but would require careful design of invalidation policy.

---------

Co-authored-by: Akshat Singh Jaswal <sja.akshat@gmail.com>
Co-authored-by: Aaron Bronow <abronow@gmail.com>
Co-authored-by: Matt Davis <nitzmahone@redhat.com>
2026-06-17 18:15:29 -04:00
.github Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
examples Use full_load in yaml-highlight example (#359) 2019-12-20 20:38:46 +01:00
lib Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
packaging Update CI and Build Targets for Python 3.14 and Windows/arm64 (#864) 2025-09-25 11:13:58 -07:00
tests Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
yaml Update CI and Build Targets for Python 3.14 and Windows/arm64 (#864) 2025-09-25 11:13:58 -07:00
.gitignore Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
announcement.msg 6.0.1 release 2023-08-28 15:29:27 -07:00
CHANGES Refresh CHANGES from release/6.0, bump to 7.0.0.dev0 (#821) 2024-08-06 15:43:04 -07:00
LICENSE 5.4 release 2021-01-19 14:07:59 -05:00
Makefile Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
MANIFEST.in Conditional support for Cython3.x, CI updates (#808) 2024-06-10 15:24:15 -07:00
pyproject.toml Update CI and Build Targets for Python 3.14 and Windows/arm64 (#864) 2025-09-25 11:13:58 -07:00
README.md Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
setup.py fix setup.py test and issue deprecation warning (#820) 2024-08-06 13:06:05 -07:00
tox.ini Move tests to pytest 2023-11-10 11:37:15 -08:00

PyYAML

A full-featured YAML processing framework for Python

Installation

To install, type python setup.py install.

By default, the setup.py script checks whether LibYAML is installed and if so, builds and installs LibYAML bindings. To skip the check and force installation of LibYAML bindings, use the option --with-libyaml: python setup.py --with-libyaml install. To disable the check and skip building and installing LibYAML bindings, use --without-libyaml: python setup.py --without-libyaml install.

When LibYAML bindings are installed, you may use fast LibYAML-based parser and emitter as follows:

>>> yaml.load(stream, Loader=yaml.CLoader)
>>> yaml.dump(data, Dumper=yaml.CDumper)

If you don't trust the input YAML stream, you should use:

>>> yaml.safe_load(stream)

Testing

PyYAML includes a comprehensive test suite.

To run the complete local test suite, type:

make test

This creates a local Python environment, runs the pure Python tests, builds a local copy of LibYAML, and then runs the LibYAML extension tests. The local LibYAML build is pinned by LIBYAML-REF, which defaults to 0.2.5.

To run only one test mode:

make test-python
make test-libyaml

To test with a specific Python version:

make test PYTHON-VERSION=3.13.5

Further Information

License

The PyYAML module was written by Kirill Simonov xi@resolvent.net. It is currently maintained by the YAML and Python communities.

PyYAML is released under the MIT license.

See the file LICENSE for more details.