Commit graph

26 commits

Author SHA1 Message Date
Val Snyder
7ff29b8c37
Bump copyright dates for 2025 2025-02-14 10:24:30 -05:00
Andy Ragusa
666e047f2b
Store URLs from HTML when recording scan metadata json
Store URLs found in HTML `<a>` and `<form>` tags during scan of HTML files
when recording scan metadata.

HTML URL recording will be ON by default, but is a part of the
generate-metadata-json feature.
The generate-metadata-json feature is OFF by default.

This introduces a new general scan option:
- libclamav: `CL_SCAN_GENERAL_STORE_HTML_URLS`.
- ClamD: `JsonStoreHTMLUrls`.
- ClamScan: `--json-store-html-urls`

Thank you Matt Jolly for the helpful comment on the pull request.
2024-09-11 13:40:29 -04:00
Micah Snyder
6e1afbbb62 Reduce C-Rust FFI complexity for HTML CSS image extraction logic
The C-Rust FFI code is needlessly complex. Now that we are calling into
magic_scan from Rust, we can simply hand off the <style> block contents
to Rust code to handle extraction and scanning.
2024-04-15 12:27:13 -07:00
Micah Snyder
9cb28e51e6 Bump copyright dates for 2024 2024-01-22 11:27:17 -05:00
Micah Snyder
6eebecc303 Bump copyright for 2023 2023-02-12 11:20:22 -08:00
Micah Snyder
6f54fe2d66 Find and scan base64'd images found in HTML <style> url() args
This commit adds a feature to find, decode, and scan each image found
within HTML <style> tags where the image data is embedded in `url()`
function parameters a base64 blob

In C in the html normalization process we extract style tag contents
to new buffer for processing. We call into a new feature in Rust code to
find and decode each image (if there are multiple).

Once extracted, the images are scanned as contained files of unknown
type, and file type identifcation will determine the actual type.
2023-02-07 22:02:02 -06:00
Micah Snyder
60d9465a73 Minor code cleanup
Convert integer bools to bool bools.
2023-02-07 22:02:02 -06:00
micasnyd
140c88aa4e Bump copyright for 2022
Includes minor format corrections.
2022-01-09 14:23:25 -07:00
Micah Snyder (micasnyd)
b9ca6ea103 Update copyright dates for 2021
Also fixes up clang-format.
2021-03-19 15:12:26 -07:00
Jonas Zaddach (jzaddach)
d5a733ef90 XLM (Excel 4.0) macro detection and extraction
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.

This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.

The code is based on Didier Steven's plugin_biff for oletools.py.
2020-04-29 14:19:41 -07:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
6289eda8e0 Eliminating AUTHORS file, and moving acknowledgements for various source code contributions to the file comment blocks for the individual files, as appropriate. 2018-03-06 17:44:05 -05:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Török Edvin
32f7e1d77b fmapfiy screnc 2011-06-13 12:03:26 +03:00
aCaB
49cc1e3c35 s/struct F_MAP/fmap_t/ 2009-10-02 18:09:31 +02:00
aCaB
084d19aa8c (some) html to fmap 2009-08-31 06:16:12 +02:00
aCaB
ba65fdc815 port htmlnorm to fmap 2009-08-22 16:31:14 +02:00
Török Edvin
f2b71eb961 extract URLs from mail body (bb #1482).
git-svn: trunk@5014
2009-04-02 20:36:22 +00:00
Török Edvin
7d4b5f164a use clistrdup/free instead of blobs (bb #828)
git-svn: trunk@4203
2008-09-23 20:52:33 +00:00
Tomasz Kojm
2023340a41 update copyrights and stick more files to GPLv2; move and add more credits to the AUTHORS file; add COPYING.BSD
git-svn: trunk@3749
2008-04-02 15:24:51 +00:00
Török Edvin
b3fc7f9747 use entconv to detect UTF-16BE, and UCS-4 variants
use only cli_readline() we don't need exact conversion
drop unused functions,
simplify encoding_norm_readline(), and rename to encoding_normalize_toascii()


git-svn: trunk@3571
2008-02-01 19:38:52 +00:00
Török Edvin
a6de01aa14 handle NULL characters in HTML files. (bb #539).
git-svn: trunk@3543
2008-01-25 16:39:40 +00:00
Török Edvin
462e8e5eb3 apply next set of patches for enabling phishing code
git-svn: trunk@3043
2007-05-01 16:46:52 +00:00
Sven Strickroth
a99111f050 remove old CVS-stuff and make the repository look more like SVN
git-svn: trunk@2755
2007-02-17 19:02:20 +00:00
Renamed from clamav-devel/libclamav/htmlnorm.h (Browse further)