pyyaml/tests
Ingy döt Net 34a9bf8235
Dos in merge key (#937)
* Make local test targets self-contained

Most developer machines do not have libyaml headers installed, so the
old default test flow attempted to build the optional extension, failed
to find yaml.h, and then only exercised the pure Python fallback. The
extension target also required out-of-band system setup before it could
pass.

Split the Makefile test flow into explicit pure-Python and LibYAML
targets, and make the default test target run both. The pure-Python
target now forces the extension off during build, while the LibYAML
target builds a pinned local yaml/libyaml checkout and wires the
include, library, and runtime paths into the extension build.

Document the new Makefile workflow in the README, including make test,
the individual test targets, the pinned LIBYAML-REF default, and the
PYTHON-VERSION override for testing against a specific Python.

This makes both test modes pass from a clean checkout without requiring
developers to install libyaml development packages by hand.

* Fix exponential expansion DoS via duplicate merge key aliases

* Fix exponential expansion based DoS in merge key processing

Resolves #897
Supercedes #916

### Summary
The merge key (`<<`) constructor implementation in
`SafeConstructor.flatten_mapping()` was vulnerable to an
exponential time and memory complexity Denial of Service (DoS)
vulnerability. When mapping/sequence nodes are merged using
anchors/aliases, duplicate references to the same alias point
to the same MappingNode instance in Python. During merge key
processing, the node values are copied and extended in-place.
If the same node appears multiple times at different levels,
this causes exponential amplification of the elements list:
`2^(n+1) - 1`.

A small document under 1 KB can trigger millions of element
list extensions, exhausting CPU and memory during safe loading.

### Hardened Fix
This commit resolves the vulnerability and hardens it against
secondary vectors:
1. Tracks node identity using object ID (`id(node)`) in a single
`seen` set scoped to the parent mapping's `flatten_mapping()`
execution.
2. Checks and skips duplicate node references inside SequenceNode
merge keys (resolving PR-916).
3. Checks and skips duplicate node references across separate,
independent MappingNode merge keys in the same mapping (e.g.,
repeating `<<: *anchor` multiple times).
4. Ensures C-based loaders (e.g., `CSafeLoader`, `CLoader`) are
also protected since they inherit constructor logic from
`SafeConstructor`.

### Performance Impact
- Sequence-nested merge duplicates: Loading a 22-level nested
document drops from 3.76s to 0.0028s (O(N) linear complexity).
- Mapping-level merge duplicates: Loading a 20-level nested
document drops from 0.93s to 0.0026s.

### Tests
- Added regression tests to
`tests/legacy_tests/data/construct-merge.data` and
`tests/legacy_tests/data/construct-merge.code` covering both
duplicate sequence merges and duplicate direct merges.

* Fix CI toolchain drift failures

The PR was failing before it reached the PyYAML regression tests in two
places: the cp38 wheel jobs installed latest cibuildwheel, whose 4.x
series no longer supports Python 3.8, and the Windows arm64 libyaml build
used a newer CMake that rejects libyaml 0.2.5's old minimum policy version.

Pin the cp38 wheel jobs to cibuildwheel<4, and quote the matrix-provided
pip package spec so bash does not parse the '<4' constraint as input
redirection. Pass CMAKE_POLICY_VERSION_MINIMUM=3.5 to the Windows libyaml
configure step, and disable fail-fast for the Windows libyaml matrix so
one architecture does not hide the others.

* Add merge key deduplication regression tests

* Add time-bounded DoS prevention tests to verify fix

* Strengthen merge key DoS timeout tests

* Fix merge key fan-out expansion DoS

* Use structural merge DoS regression tests

Replace the timing-based merge DoS tests with deterministic checks of
the flattened mapping shape. Wall-clock thresholds are fragile on slow
or emulated CI runners and do not directly prove that the expansion was
prevented.

The fan-out regression case uses a small levels=4, width=4 payload from
the PR #916 shape. With the old behavior, flattening root expands to
1024 repeated key/value pairs and the test fails immediately because the
key list contains 1020 extra entries. With the fix, duplicate merge pairs
are collapsed during flattening and the same root mapping contains only
k0, k1, k2, and k3.

This keeps the test fast even if the old bug returns: it observes the
structural amplification directly instead of waiting for a large
pathological input to time out.

* Hybrid merge keys optimization

* A hybrid combination of fixes proposed by frenzymadness and akshat-sj to prevent exponential compute while resolving nested anchor aliases.
* Further optimization is possible with broader caching of anchor node graph, but would require careful design of invalidation policy.

---------

Co-authored-by: Akshat Singh Jaswal <sja.akshat@gmail.com>
Co-authored-by: Aaron Bronow <abronow@gmail.com>
Co-authored-by: Matt Davis <nitzmahone@redhat.com>
2026-06-17 18:15:29 -04:00
..
legacy_tests Dos in merge key (#937) 2026-06-17 18:15:29 -04:00
test_dump_load.py Move tests to pytest 2023-11-10 11:37:15 -08:00
test_merge.py Dos in merge key (#937) 2026-06-17 18:15:29 -04:00