Using a fuzzy hash test for the clamav daemon JPEG attached to the XLS
document. Not yet testing PNG, because the fuzzy hash implementation
isn't properly hashing that file, yet.
This test is for a regression where malware detection wasn't properly
being tracked for OLE2 (XLS) image extraction / scanning.
In 0.104 and prior, the function for adding a logical subsignature
was being used for NDB sigs, FTM, sigs, *and* Yara strings.
When cleaning up the logic for handling different types of logical
sig subsignatures, we accidentally broke support for regex strings in
yara rules.
This commit adds new logic for recording if the Yara string is a regex
string, by adding a regex subsig opt. Then in a new function for adding
different types of Yara strings, we check if it's a regex string or not
before adding as either a PCRE pattern or as an AC/BM pattern.
Resolves: https://github.com/Cisco-Talos/clamav/issues/494
Also add a basic test for yara regex rule.
I observed undefined symbol errors when linking the bytecode_runtime
object with the check_* test executables on Windows when using LLVM
built from source. I'm not sure why, exactly, but these symbols should
all be present in the ClamAV::libclamav library that we're linking with,
so I don't know why we link with the object library targets in addition
to the libclamav shared/or/static library target.
This commit removes linking with those extra object library targets.
The `have_clamjit` global is used in the unit tests but doesn't appear
to be exported when I was testing the external LLVM runtime support PR,
resulting in an undefined symbol issue. Converting this to a function
that returns 0 or 1 instead of a global variable resolved the issue.
We must pass the LLVM library dependencies to the libclamav_rust
build.rs script so it links the libclamav_rust unit test executable with
LLVM.
Also:
- We can remove the libtinfo dependency that was hardcoded for the LLVM
3.6 support, and must remove it for the build to work on Alpine, macOS.
- Also, increased the libcheck default timeout from 60s to 300s after
experiencing a failure while testing this.
- Also made one of the valgrind suppressions more generic to account for
inline optimization differences observed in testing this.
We've observed a number of false positives with iconv, pthread, libxml2,
etc regarding memory allocation on init that is never cleaned up but
isn't a leak. Now we're seeing a whole tonne more from Rust libraries
with wild stack traces that couldn't be easily suppress.
So the solution is to disable any alerts from possible-leaks and only
show definite-leaks.
Add a new logical signature subsignature type for matching on images
with image fuzzy hashes.
Image fuzzy hash subsigantures follow this format:
fuzzy_img#<hash>#<dist>
In this initial implementation, the hamming distance (dist) is ignored
and only exact fuzzy hash matches will alert.
Fuzzy hash matching is only performed for supported image types.
Also: removed some excessive debug log messages on start-up.
Fixed an issue where the signature name (virname) is being allocated and
stored for every subsignature or even ever sub-pattern in an AC-pattern
(i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard.
This fix is only for LDB subsigs though. NDB signatures are still
allocaing one virname per sub-pattern.
This fix was required because I needed a place to store the virname with
fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig
metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it
wouldn't work because it would cross the C-Rust FFI boundary and giving
pointers to Rust allocated stuff is dicey. Not to mention native Rust
strings are different thatn C strings. Anyways, the correct thing to do
was to store the virname with the actual logical signature.
TODO: Keep track of NDB signatures in the same way and store the virname
for NDB sigs there instead of in AC-patterns so that we can get rid of
the virname field in the AC-pattern struct.
Improvements to the logg() API appear to have corrected a bug wherein
debug/warning/error messages were writing to stdout instead of stderr.
Now that they're going to stderr correctly, this fixes the test to match.
Don't crop logs so we can see the whole valgrind output in large logs.
Generate suppression rules, in case they're needed for slight variations
on existing suppressions or sometimes new suppressions.
Increase valgrind's stack size for initial thread because I observed some
unusual errors in custom tests on Debian 10 like this:
7: [INFO]: ==9911== Process terminating with default action of signal 15 (SIGTERM)
7: [INFO]: ==9911== at 0x401A4FB: __open_nocancel (open64_nocancel.c:45)
7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911== by 0x1: ???
7: [INFO]: ==9911== Jump to the invalid address stated on the next line
7: [INFO]: ==9911== at 0x1036: ???
7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911== by 0x1: ???
7: [INFO]: ==9911== Address 0x1036 is not stack'd, malloc'd or (recently) free'd
7: [INFO]: ==9911==
7: [INFO]: ==9911==
7: [INFO]: ==9911== Process terminating with default action of signal 11 (SIGSEGV)
7: [INFO]: ==9911== Bad permissions for mapped region at address 0x1036
7: [INFO]: ==9911== at 0x1036: ???
7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911== by 0x1: ???
Per this conversation: https://bugs.kde.org/show_bug.cgi?id=359705
I'm testing to see if the issue is resolved by increasing valgrind's
main stack size.
https://valgrind.org/docs/manual/manual-core.html claims that the
default is the current `ulimit` value or 16MB, whichever is lower.
The logic for parsing a logical subsignature isn't clearly identified
and has been, perhaps mistakenly or out of convenience, used to when
parsing NDB signatures in addition to LDB subsignatures. What this means
is that you can technically use a PCRE subsignature in an NDB file and
clam won't complain about it. It won't work however, because a PCRE
subsignature requires another matching subsignature to trigger it, but
it will parse. The same is likely true for byte-compare subsignatures.
This commit restructures that logic a bit so subsignature parsing has
its own function and is more organized.
I also renamed the functions a little bit and added lots of comments.
I fixed a few minor warnings relating to format string characters.
The change in str.c:cli_ldbtokenize is to prevent a buffer under-read if
you were to use the function on the start of a buffer, as is now down in
this commit.
* Added loglevel parameter to logg()
* Fix logg and mprintf internals with new loglevels
* Update all logg calls to set loglevel
* Update all mprintf calls to set loglevel
* Fix hidden logg calls
* Executed clam-format
Add support for compiling with external TomsFastMath library provided by
the system instead of compiling the vendored copy into libclamav.
The vendored source is still built directly into libclamav instead of as
a separate library the way libmspack is done.
The rationale is that:
A) it's more complicated to deal with possibly compiling as static or
dynamic, and also
B) libmspack and libunrar are compiled separately primarily because of
licensing concerns. TomsFastMath public domain, so that isn't a concern.
Resolves: https://bugzilla.clamav.net/show_bug.cgi?id=12562
Adds a clamscan test to verify that --allmatch mode works for the
clam.exe program when loading MD5, SHA1, SHA256, PE-section, & PE-import
hash signatures.
Add a basic unit test for the new libclamav_rust `logging.rs` module.
This test simply initializes logging and then prints out a message with
each of the `log` macros.
Also set the Rust edition to 2018 because the default is the 2015
edition in which using external crates is very clunky.
For the Rust test support in CMake this commit adds support for
cross-compiling the Rust tests.
Rust tests must be built for the same LLVM triple (target platform) as
the rest of the project. In particular this is needed to build both
x64 and x86 packages on a 64bit Windows host.
For Alpine, we observed that the LLVM triple for the host platform tools
may be either:
- x86_64-unknown-linux-musl, or
- x86_64-alpine-linux-musl
To support it either way, we look up the host triple with `rustc -vV`
and use that if the musl libc exists. This is a big hacky and
unfortunately means that we probably can't cross-compile to other
platforms when running on a musl libc host. There are probably
improvements to be made to improve cross compiling support.
The Rust test programs must link with libclamav, libclammspack, and
possibly libclamunrar_iface and libclamunrar plus all of the library
dependencies for those libraries.
To do this, we pass the path of each library in environment variables
when building the libclamav_rust unit test program.
Within `libclamav_rust/build.rs`, we read those environment variables.
If set, we parse each into library path and name components to use
as directives for how to build the unit test program.
See: https://doc.rust-lang.org/cargo/reference/build-scripts.html
Our `build.rs` file ignores the library path environment variables if
thye're not set, which is necessary when building the libclamav_rust
library and when libclamunrar isn't static and for when not linking with
a libiconv external to libc.
Rust test programs are built and executed in subdirectory under:
<target>/<llvm triple>/<config>/deps
where "target" for libclamav_rust tests is set to <build>/unit_tests
For example:
clamav/build/unit_tests/x86_64-pc-windows-msvc/debug/deps/clamav_rust-7e1343f8a2bff1cc.exe
Since this program isn't co-located with the rest of the libraries
we also have to set environment variables so the test program can find and
load the shared libraries:
- Windows: PATH
- macOS: DYLD_LIBRARY_PATH
We already set LD_LIBRARY_PATH when not Windows for similar reasons.
Note: In build.rs, we iterate references to LIB_ENV_LINK & Co because
older Rust versions do implement Iterator for [&str].
The `realpath()` function on macOS will fail when scanning a symlink
that doesn't exist because it tries to determine the real path of the
thing the symlink points to and fails when that thing doesn't exist.
This behavior is different than on Linux or FreeBSD Unix.
I'm fixing this by opening the symlink with `O_SYMLINK` then getting
realpath of the link using `cli_get_filepath_from_filedesc()`, much
like we do on Windows.
Side note: I did try using macOS's _DARWIN_BETTER_REALPATH variant to
see if that resolved the issue, but it didn't seem to.
This resolves https://bugzilla.clamav.net/show_bug.cgi?id=12792
This commit also removes the problematic "access denied" clamd test from
the test suite. This commit broke that test on macOS because the error
message when clamdscan fails to `open()` the file to check the
realpath() is different than the error message when clamd tries to scan
it, but has access denied.
(It's like "Permission denied" instead of "Access denied")
It's not worth #ifdef'ing around macOS to get a test pass because this
test is already causing problems in 32-bit Docker environments where
access isn't actually denied to the user (!). Honestly, it's not worth
keeping the test that simply verifies ClamD's error message when denied
matches expectations.
I also switched to use the C_DARWIN macro instead of __APPLE__ because
I kno C_DARWIN is defined in clamav-config.h on macOS, and I found that
__APPLE__ wasn't defined when I tested.
Extends the existing freshclam 403 test to check if a repeated freshclam
403 response outputs the cool-down message instead of the original 403
response message.
The fmap module provides a mechanism for creating a mapping into an
existing map at an offset and length that's used when a file is found
with an uncompressed archive or when embedded files are found with
embedded file type recognition in scanraw(). This is the
"fmap_duplicate()" function. Duplicate fmaps just reference the original
fmap's 'data' or file handle/descriptor while allowing the caller to
treat it like a new map using offsets and lengths that don't account for
the original/actual file dimensions.
fmap's keep track of this with m->nested_offset & m->real_len, which
admittedly have confusing names. I found incorrect uses of these in a
handful of locations. Notably:
- In cli_magic_scan_nested_fmap_type().
The force-to-disk feature would have been checking incorrect sizes and
may have written incorrect offsets for duplicate fmaps.
- In XDP parser.
- A bunch of places from the previous commit when making dupe maps.
This commit fixes those and adds lots of documentation to the fmap.h API
to try to prevent confusion in the future.
nested_offset should never be referenced outside of fmap.c/h.
The fmap_* functions for accessing or reading map data have two
implementations, mem_* or handle_*, depending the data source.
I found issues with some of these so I made a unit test that covers each
of the functions I'm concerned about for both types of data sources and
for both original fmaps and nested/duplicate fmaps.
With the tests, I found and fixed issues in these fmap functions:
- handle_need_offstr(): must account for the nested_offset in dupe maps.
- handle_gets(): must account for nested_offset and use len & real_len
correctly.
- mem_need_offstr(): must account for nested_offset in dupe maps.
- mem_gets(): must account for nested_offset and use len & real_len
correctly.
Moved CDBRANGE() macro out of function definition so for better
legibility.
Fixed a few warnings.
Scan recursion is the process of identifying files embedded in other
files and then scanning them, recursively.
Internally this process is more complex than it may sound because a file
may have multiple layers of types before finding a new "file".
At present we treat the recursion count in the scanning context as an
index into both our fmap list AND our container list. These two lists
are conceptually a part of the same thing and should be unified.
But what's concerning is that the "recursion level" isn't actually
incremented or decremented at the same time that we add a layer to the
fmap or container lists but instead is more touchy-feely, increasing
when we find a new "file".
To account for this shadiness, the size of the fmap and container lists
has always been a little longer than our "max scan recursion" limit so
we don't accidentally overflow the fmap or container arrays (!).
I've implemented a single recursion-stack as an array, similar to before,
which includes a pointer to each fmap at each layer, along with the size
and type. Push and pop functions add and remove layers whenever a new
fmap is added. A boolean argument when pushing indicates if the new layer
represents a new buffer or new file (descriptor). A new buffer will reset
the "nested fmap level" (described below).
This commit also provides a solution for an issue where we detect
embedded files more than once during scan recursion.
For illustration, imagine a tarball named foo.tar.gz with this structure:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz | GZ | 0 | 0 |
| └── foo.tar | TAR | 1 | 0 |
| ├── bar.zip | ZIP | 2 | 1 |
| │ └── hola.txt | ASCII | 3 | 0 |
| └── baz.exe | PE | 2 | 1 |
But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| baz.exe | PE | 0 | 0 |
| ├── sfx.zip | ZIP | 1 | 1 |
| │ └── hello.txt | ASCII | 2 | 0 |
| └── sfx.7z | 7Z | 1 | 1 |
| └── world.txt | ASCII | 2 | 0 |
(A) If we scan for embedded files at any layer, we may detect:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz | GZ | 0 | 0 |
| ├── foo.tar | TAR | 1 | 0 |
| │ ├── bar.zip | ZIP | 2 | 1 |
| │ │ └── hola.txt | ASCII | 3 | 0 |
| │ ├── baz.exe | PE | 2 | 1 |
| │ │ ├── sfx.zip | ZIP | 3 | 1 |
| │ │ │ └── hello.txt | ASCII | 4 | 0 |
| │ │ └── sfx.7z | 7Z | 3 | 1 |
| │ │ └── world.txt | ASCII | 4 | 0 |
| │ ├── sfx.zip | ZIP | 2 | 1 |
| │ │ └── hello.txt | ASCII | 3 | 0 |
| │ └── sfx.7z | 7Z | 2 | 1 |
| │ └── world.txt | ASCII | 3 | 0 |
| ├── sfx.zip | ZIP | 1 | 1 |
| └── sfx.7z | 7Z | 1 | 1 |
(A) is bad because it scans content more than once.
Note that for the GZ layer, it may detect the ZIP and 7Z if the
signature hits on the compressed data, which it might, though
extracting the ZIP and 7Z will likely fail.
The reason the above doesn't happen now is that we restrict embedded
type scans for a bunch of archive formats to include GZ and TAR.
(B) If we scan for embedded files at the foo.tar layer, we may detect:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz | GZ | 0 | 0 |
| └── foo.tar | TAR | 1 | 0 |
| ├── bar.zip | ZIP | 2 | 1 |
| │ └── hola.txt | ASCII | 3 | 0 |
| ├── baz.exe | PE | 2 | 1 |
| ├── sfx.zip | ZIP | 2 | 1 |
| │ └── hello.txt | ASCII | 3 | 0 |
| └── sfx.7z | 7Z | 2 | 1 |
| └── world.txt | ASCII | 3 | 0 |
(B) is almost right. But we can achieve it easily enough only scanning for
embedded content in the current fmap when the "nested fmap level" is 0.
The upside is that it should safely detect all embedded content, even if
it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe.
The biggest risk I can think of affects ZIPs. SFXZIP detection
is identical to ZIP detection, which is why we don't allow SFXZIP to be
detected if insize of a ZIP. If we only allow embedded type scanning at
fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP
if the bar.exe was not compressed in foo.zip and if non-compressed files
extracted from ZIPs aren't extracted as new buffers:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.zip | ZIP | 0 | 0 |
| └── bar.exe | PE | 1 | 1 |
| └── sfx.zip | ZIP | 2 | 2 |
Provided that we ensure all files extracted from zips are scanned in
new buffers, option (B) should be safe.
(C) If we scan for embedded files at the baz.exe layer, we may detect:
| description | type | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz | GZ | 0 | 0 |
| └── foo.tar | TAR | 1 | 0 |
| ├── bar.zip | ZIP | 2 | 1 |
| │ └── hola.txt | ASCII | 3 | 0 |
| └── baz.exe | PE | 2 | 1 |
| ├── sfx.zip | ZIP | 3 | 1 |
| │ └── hello.txt | ASCII | 4 | 0 |
| └── sfx.7z | 7Z | 3 | 1 |
| └── world.txt | ASCII | 4 | 0 |
(C) is right. But it's harder to achieve. For this example we can get it by
restricting 7ZSFX and ZIPSFX detection only when scanning an executable.
But that may mean losing detection of archives embedded elsewhere.
And we'd have to identify allowable container types for each possible
embedded type, which would be very difficult.
So this commit aims to solve the issue the (B)-way.
Note that in all situations, we still have to scan with file typing
enabled to determine if we need to reassign the current file type, such
as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2-
compressed. Detection of DMG and a handful of other types rely on
finding data partway through or near the ned of a file before
reassigning the entire file as the new type.
Other fixes and considerations in this commit:
- The utf16 HTML parser has weak error handling, particularly with respect
to creating a nested fmap for scanning the ascii decoded file.
This commit cleans up the error handling and wraps the nested scan with
the recursion-stack push()/pop() for correct recursion tracking.
Before this commit, each container layer had a flag to indicate if the
container layer is valid.
We need something similar so that the cli_recursion_stack_get_*()
functions ignore normalized layers. Details...
Imagine an LDB signature for HTML content that specifies a ZIP
container. If the signature actually alerts on the normalized HTML and
you don't ignore normalized layers for the container check, it will
appear as though the alert is in an HTML container rather than a ZIP
container.
This commit accomplishes this with a boolean you set in the scan context
before scanning a new layer. Then when the new fmap is created, it will
use that flag to set similar flag for the layer. The context flag is
reset those that anything after this doesn't have that flag.
The flag allows the new recursion_stack_get() function to ignore
normalized layers when iterating the stack to return a layer at a
requested index, negative or positive.
Scanning normalized extracted/normalized javascript and VBA should also
use the 'layer is normalized' flag.
- This commit also fixes Heuristic.Broken.Executable alert for ELF files
to make sure that:
A) these only alert if cli_append_virus() returns CL_VIRUS (aka it
respects the FP check).
B) all broken-executable alerts for ELF only happen if the
SCAN_HEURISTIC_BROKEN option is enabled.
- This commit also cleans up the error handling in cli_magic_scan_dir().
This was needed so we could correctly apply the layer-is-normalized-flag
to all VBA macros extracted to a directory when scanning the directory.
- Also fix an issue where exceeding scan maximums wouldn't cause embedded
file detection scans to abort. Granted we don't actually want to abort
if max filesize or max recursion depth are exceeded... only if max
scansize, max files, and max scantime are exceeded.
Add 'abort_scan' flag to scan context, to protect against depending on
correct error propagation for fatal conditions. Instead, setting this
flag in the scan context should guarantee that a fatal condition deep in
scan recursion isn't lost which result in more stuff being scanned
instead of aborting. This shouldn't be necessary, but some status codes
like CL_ETIMEOUT never used to be fatal and it's easier to do this than
to verify every parser only returns CL_ETIMEOUT and other "fatal
status codes" in fatal conditions.
- Remove duplicate is_tar() prototype from filestypes.c and include
is_tar.h instead.
- Presently we create the fmap hash when creating the fmap.
This wastes a bit of CPU if the hash is never needed.
Now that we're creating fmap's for all embedded files discovered with
file type recognition scans, this is a much more frequent occurence and
really slows things down.
This commit fixes the issue by only creating fmap hashes as needed.
This should not only resolve the perfomance impact of creating fmap's
for all embedded files, but also should improve performance in general.
- Add allmatch check to the zip parser after the central-header meta
match. That way we don't multiple alerts with the same match except in
allmatch mode. Clean up error handling in the zip parser a tiny bit.
- Fixes to ensure that the scan limits such as scansize, filesize,
recursion depth, # of embedded files, and scantime are always reported
if AlertExceedsMax (--alert-exceeds-max) is enabled.
- Fixed an issue where non-fatal alerts for exceeding scan maximums may
mask signature matches later on. I changed it so these alerts use the
"possibly unwanted" alert-type and thus only alert if no other alerts
were found or if all-match or heuristic-precedence are enabled.
- Added the "Heuristics.Limits.Exceeded.*" events to the JSON metadata
when the --gen-json feature is enabled. These will show up once under
"ParseErrors" the first time a limit is exceeded. In the present
implementation, only one limits-exceeded events will be added, so as to
prevent a malicious or malformed sample from filling the JSON buffer
with millions of events and using a tonne of RAM.
There is a bug where freshclam fails to detect if a downloaded CDIFF is
empty. In 0.103 this, combined with a CDN caching issue could result in
freshclam downloading a daily.cvd but failing to update, putting it in a
sort of infinite loop. In 0.104 this issue manifests slightly
differently, requiring freshclam to run up to 3x before you get over the
empty-CVD hump and are back to normal updates.
This commit updates an existing cdiff test with the zero-byte cdiff + an
out-of-date CVD to confirm the bug. The following commit will fix it.
If running multiple parallel processes of "xor_testfile.py" there was a
race condition between checking for the existence of the directory and
creating it. Now this is handled as a dependency in CMake.
Xcode (and perhaps some other generators?) do not like targets that have
only object files. See:
https://cmake.org/cmake/help/latest/command/add_library.html#object-libraries
And: https://cmake.org/pipermail/cmake/2016-May/063479.html
This issue manifests when using `-G Xcode` on macOS as the library
dylibs being missing when linking with other binaries.
This commit removes the object libraries for libclamav, libfreshclam,
libclamunrar_iface, libclamunrar, libclammspack, and (lib)common
because they were used by static or shared libs that didn't
themselves have any added sources.
Add getter & setter for the debug flag, so it isn't referenced by unit
tests or other code that links with libclamav. This is needed because
global variables are exported symbols on Windows.
The access-denied test and excludepath tests both relied on the full
path of the test file to be in the expected results. This fails if
you're working within a path that has a symlink because clamd and
clamdscan determine real-paths before scanning and end up sending
back the real-path in the results, not the original path.
This fixes the tests by removing the full paths from the expected
results.
I also cleaned up some type safety warnings.
Python 3.5 compatibility fixes for Debian 9, etc that lack 3.6+.
Change a python f-string to an old-style `"".format()`.
Convert Path objects to strings for older `shutil` APIs that don't
accept Paths.
Added a test to verify that clamscan can extract images from an XLS
document. The document has 2 images: a PNG and JPEG version of the
clamav demon/logo. The test requires the json metadata feature to verify
that the MD5 of the images are correct.
No other image formats were tested because despite the format allegedly
supporting other imate formats, Excel converts TIFF, BMP, and GIF images
to PNG files when you insert them.
The split test files are flagged by some AV's because they look like
broken executables. Instead of splitting the test files to prevent
detections, we should encrypt them. This commit replaces the "reassemble
testfiles" script with a basic "XOR testfiles" script that can be used
to encrypt or decrypt test files. This commit also of course then
replaces all the split files with xor'ed files.
The test and unit_tests directories were a bit of a mess, so I
reorganized them all into unit_tests with all of the test files placed
under "unit_tests/input" using subdirectories for different types of files.
Adds a basic test to validate that ExcludePath correctly excludes a
subdirectory but does not exclude subsequent files. As with the other
ClamD/Scan tests, it will test in each mode: regular, stream, and
fdpass (if available).
Unlike the other tests, this one tests ClamDScan with Valgrind instead
of ClamD.
Refactored the clamd_test.py file to reduce duplicate code, and support
enabling and disabling valgrind when running ClamDScan and ClamD.
Add pytest to the github actions environments because the results when
using pytest are far easier to read.
There appear to be minors leak in clamd that can occur when shutting-
down immediately after a command (e.g. RELOAD).
These are causing intermittent clamd test failures.
It seems like they're caused by a thread leaking occasionally,
due to not exiting before the program terminates.
I don't believe these to be a serious issue. Tracking down the exact
cause and crafting a fix for the leaks isn't worth the effort.
This commit adds valgrind suppression rules to stabilize the tests.
The non-existent file test has a hack to "expect" a wierd error message
caused by the '\v' character rather than the file not actually existing.
Recently something(?) changed and the test started reporting yet a
different message or no message.
Removing the '\v' special character fixes the test so it actually tests
a non-existent file and returns the same message as on other operating
systems.
Add a test where freshclam received a zero-byte cdiff to trigger a whole
CVD database download, and the CVD served is older than advertised.
This is a regression test for a bug found & fixed by Andrew Williams.
Adds 3 tests to validate that:
1. a CDIFF update works
2. a CDIFF partial update (with 1 missing CDIFF) works
and that a subsequent update is ok with being 1 behind
3. a CDIFF partial update (with 2 missing CDIFFs) works
and that a subsequent update will try to get the WHOLE CVD -
because being 2+ CDIFFs behind without any update isn't good enough.
Also fixed a minor bug so that the database name is properly displayed
when a partial update occurs instead of displaying "(null)".
Also changed the freshclam test port to 8001 to deconflict with
CVD-Update, in case that's running in the background.
TODO: Make the tests smarter so they find an open port instead of
hoping that 8001 is available.
Some users have scripts set up from long ago to delete mirrors.dat if
FreshClam failed. We used to recommend this if people had technical
issues because mirrors.dat would store a bunch of entries indicating
that all of their regional mirrors were failing and then FreshClam would
give up.
The new freshclam DAT file no longer stores that kind of information.
Deleting the DAT file is no longer sound advice.
We very much want the UUID, which is generated when creating the DAT
file, to persist between runs. So unless people go and change the
scripts to delete freshclam.dat instead, this commit should resolve the
concern.