For OpenSSL 1, `EVP_get_digestbyname()` will fail with "sha2-*" algorithm names.
Must use "sha256", etc.
I made a shim that does the conversion, and I made an improvement to ignore case
when converting alg names to our hash type enumeration.
Other fixes for a few warnings.
Add `cl_scanfile_ex()`, `cl_scanmap_ex()`, and `cl_scandesc_ex()`
functions that provide the following additional parameters:
hash_hint (Optional) A NULL terminated string of the file hash so that
libclamav does not need to calculate it.
[out] hash_out (Optional) A NULL terminated string of the file hash.
The caller is responsible for freeing the string.
hash_alg The hashing algorithm used for either `hash_hint` or `hash_out`.
Supported algorithms are "md5", "sha1", "sha2-256".
If not specified, the default is "sha2-256".
file_type_hint (Optional) A NULL terminated string of the file type hint.
E.g. "pe", "elf", "zip", etc.
You may also use ClamAV type names such as "CL_TYPE_PE".
ClamAV will ignore the hint if it is not familiar with the specified type.
See also: https://docs.clamav.net/appendix/FileTypes.html#file-types
file_type_out (Optional) A NULL terminated string of the file type
of the top layer as determined by ClamAV.
Will take the form of the standard ClamAV file type format. E.g. "CL_TYPE_PE".
See also: https://docs.clamav.net/appendix/FileTypes.html#file-types
CLAM-2626
I accidentally introduced a NULL-dereference bug when scanning any OOXML
file in https://github.com/Cisco-Talos/clamav/pull/1548
I overlooked the test failure out of haste. 😔
The NULL-dereference happens because the `unzip_search()` feature
allowed searching some other file than the one that is currently being
scanned, which you would do by setting `ctx` to NULL and setting an
`fmap` parameter instead.
In practice, the current layer's `fmap` from the `ctx` was always passed in.
This fix makes it so the `unzip_search()` and related functions only
take the `ctx` parameter and do not have and `fmap` or `fsize` field
(Note: the `fsize` was never needed, because `fmap->len` take care of that).
CLAM-2837
This is just preliminary support for identifying an assortment of
different AI model files.
So far, this detects the following types:
- GGML GGUF (.gguf)
- ONNX AI (.onnx)
- TensorFlow Lite (.tflite)
Additional types to consider:
- SafeTensors (.safetensors)
- TensorFlow (.pb, .ckpt, .tfrecords)
- Keras (.keras)
- pickle (.pkl)
- numpy (.npy, .npz)
- coreml (.coreml)
- PyTorch (.pt, .pth, .bin, .mar, .pte, .pt2, .ptl)
Outside of being able to differentiate by file type, the scanner
will treat CL_TYPE_AI_MODEL the same as CL_TYPE_BINARY_DATA.
We're not adding parsers to further process these files, for now.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.
| current name | new name |
| ------------------------- | --------------------------------- |
| magic_scandesc() | cli_magic_scan() |
| cli_magic_scandesc_type() | <delete> |
| cli_magic_scandesc() | cli_magic_scan_desc() |
| cli_base_scandesc() | cli_magic_scan_desc_type() |
| cli_partition_scandesc() | <delete> |
| cli_map_scandesc() | magic_scan_nested_fmap_type() |
| cli_map_scan() | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc() | cli_magic_scan_buff() |
| cli_scanbuff() | cli_scan_buff() |
| cli_scandesc() | cli_scan_desc() |
| cli_fmap_scandesc() | cli_scan_fmap() |
| cli_scanfile() | cli_magic_scan_file() |
| cli_scandir() | cli_magic_scan_dir() |
| cli_filetype2() | cli_determine_fmap_type() |
| cli_filetype() | cli_compare_ftm_file() |
| cli_partitiontype() | cli_compare_ftm_partition() |
| cli_scanraw() | scanraw() |
EGG extraction support includes deflate, bzip2, and lzma decompression. AZO (LZO?) decompression not yet supported. Solid archives not yet supported. Split archives may have some limited success.
This commit also includes updates to autoconf iconv.m4 file enable detection of libiconv in alternative install locations.
* fmapify: (54 commits)
workaround for unrar not supporting fmap.
stfu on large lzma allocs
handle 7z encryption detection albeit post extraction and blockencrypted
add 7z SFX support - bb#3063
fix makefile for external LLVM 2.9
fix wrong interaction between prescan_cb caching and postscan_cb
bytecode_watchdog: fix use of unaddressable data
UPgrade lzma SDK to version 9.20 Also fmapify
export cl_fmap_close
cli_map_scandesc convenience API
Introduce cli_map_scandesc to scan a portion of the existing file
fix utf16_to_utf8, and add testcase
cli_utf16_to_utf8
fmapify jpeg_exploit
fmaify cli_scan_riff
fmapify mydoom
export filetype cb
factor out common code
fix mem API of new fmap
unit tests for new fmap scan API
...
Conflicts:
libclamav/Makefile.in
libclamav/c++/Makefile.am
libclamav/c++/Makefile.in
libclamav/filetypes_int.h
libclamav/scanners.c
libclamav/str.c
unit_tests/check_clamav.c
The allowed sector size is within 2048 to 2448 (2352 raw + 96 sub).
Right now only the only file system supported is plain iso9660 with
optional Joliet extensions.
Additionally files with multi extents and interleaved files are not
supported.
Finally, due to the multiple possible ways to interpret the content
of a cd/dvd, I cannot guarantee that we scan the "right" files.