Added a feature to extract images from OLE2 BIFF streams.
This work was derived from InQuests blog post about extracting XLM and
images from XLS files:
https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files
Assorted ole2 parser code cleanup and massive error handling cleanup.
Also fixed the following:
- The XLS parser may fail to process all BIFF records if some of the
records contain unexpected data or is otherwise malformed. Because the
record size is already known, we can skip over the "malformed" record
and continue with the rest.
- Fixed an issue where the ole2 header size was improperly calculated,
failing to account for the new "has_xlm" boolean added for context.
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.
This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.
The code is based on Didier Steven's plugin_biff for oletools.py.
- Existing VBA extraction code uses undocumented cache structures.
This code uses the documented way of accessing VBA projects.
- Adds additional detail to the dumped information:
Project name, Project doc string, ...
All VBA projects are dumped into a single file.
- Malware authors are currently evading detection by spreading
malicious code over several projects. It is hard to write
signatures if only part of the malicious code is visible.