Store URLs found in HTML `<a>` and `<form>` tags during scan of HTML files
when recording scan metadata.
HTML URL recording will be ON by default, but is a part of the
generate-metadata-json feature.
The generate-metadata-json feature is OFF by default.
This introduces a new general scan option:
- libclamav: `CL_SCAN_GENERAL_STORE_HTML_URLS`.
- ClamD: `JsonStoreHTMLUrls`.
- ClamScan: `--json-store-html-urls`
Thank you Matt Jolly for the helpful comment on the pull request.
The C-Rust FFI code is needlessly complex. Now that we are calling into
magic_scan from Rust, we can simply hand off the <style> block contents
to Rust code to handle extraction and scanning.
This commit adds a feature to find, decode, and scan each image found
within HTML <style> tags where the image data is embedded in `url()`
function parameters a base64 blob
In C in the html normalization process we extract style tag contents
to new buffer for processing. We call into a new feature in Rust code to
find and decode each image (if there are multiple).
Once extracted, the images are scanned as contained files of unknown
type, and file type identifcation will determine the actual type.
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.
This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.
The code is based on Didier Steven's plugin_biff for oletools.py.
use only cli_readline() we don't need exact conversion
drop unused functions,
simplify encoding_norm_readline(), and rename to encoding_normalize_toascii()
git-svn: trunk@3571