Commit graph

757 commits

Author SHA1 Message Date
ragusaa
1c6746853f
Fixed heap buffer overflow while loading signatures
There is a possible overflow read when loading PDB and WDB phishing
signatures.

This issue is not a vulnerability.

Changed const char pointers to uint8_t pointers when they are to be used
with data, as well as removing asserts and adding additional error
checking.

Thank you Michał Dardas for reporting this issue.

This fix also resolves:
- https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43845
- https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43812
- https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43866

This commit also fixes a minor leak of pattern matching trans nodes
that was observed when testing with the MPOOL module disabled.
2022-05-16 18:29:25 -07:00
Micah Snyder
d86811ad47 Test: Add test for alerting on images extracted from XLS
Using a fuzzy hash test for the clamav daemon JPEG attached to the XLS
document.  Not yet testing PNG, because the fuzzy hash implementation
isn't properly hashing that file, yet.

This test is for a regression where malware detection wasn't properly
being tracked for OLE2 (XLS) image extraction / scanning.
2022-04-29 14:21:06 -07:00
Micah Snyder
0ad864ab83
Yara: Fix support for regex strings
In 0.104 and prior, the function for adding a logical subsignature
was being used for NDB sigs, FTM, sigs, *and* Yara strings.

When cleaning up the logic for handling different types of logical
sig subsignatures, we accidentally broke support for regex strings in
yara rules.

This commit adds new logic for recording if the Yara string is a regex
string, by adding a regex subsig opt. Then in a new function for adding
different types of Yara strings, we check if it's a regex string or not
before adding as either a PCRE pattern or as an AC/BM pattern.

Resolves: https://github.com/Cisco-Talos/clamav/issues/494

Also add a basic test for yara regex rule.
2022-03-22 16:52:18 -07:00
Micah Snyder
3acada6069 Tests: silence LGTM-com static analysis warning 2022-03-09 20:35:42 -08:00
Micah Snyder
426dd461f6 CMake: Fix win32 linking issue affecting unit tests w/ LLVM runtime
I observed undefined symbol errors when linking the bytecode_runtime
object with the check_* test executables on Windows when using LLVM
built from source. I'm not sure why, exactly, but these symbols should
all be present in the ClamAV::libclamav library that we're linking with,
so I don't know why we link with the object library targets in addition
to the libclamav shared/or/static library target.

This commit removes linking with those extra object library targets.
2022-03-09 20:35:42 -08:00
Micah Snyder
b07b1a65cb Fix linker issues with global variable used in tests
The `have_clamjit` global is used in the unit tests but doesn't appear
to be exported when I was testing the external LLVM runtime support PR,
resulting in an undefined symbol issue. Converting this to a function
that returns 0 or 1 instead of a global variable resolved the issue.
2022-03-09 20:35:42 -08:00
Micah Snyder
8bf70207d5 CMake: Fix LLVM linking issues: libclamav_rust, -ltinfo
We must pass the LLVM library dependencies to the libclamav_rust
build.rs script so it links the libclamav_rust unit test executable with
LLVM.

Also:
- We can remove the libtinfo dependency that was hardcoded for the LLVM
  3.6 support, and must remove it for the build to work on Alpine, macOS.
- Also, increased the libcheck default timeout from 60s to 300s after
  experiencing a failure while testing this.
- Also made one of the valgrind suppressions more generic to account for
  inline optimization differences observed in testing this.
2022-03-09 20:35:42 -08:00
Micah Snyder
943f4c4e0e Tests: Increase default timeout to 10 minutes
I observed the libclamav_valgrind test exceeding the 200sec timeout and
have heard of occasional timeouts from users.
2022-03-02 21:44:48 -07:00
Micah Snyder
390c264480 Tests, Valgrind: suppress jpeg decoder error fp
Observed the following:

7: [INFO]: ==13953== Conditional jump or move depends on uninitialised value(s)
7: [INFO]: ==13953==    at 0x52F1DA5: drop_in_place<core::result::Result<(), std::sync::mpsc::SendError<alloc::vec::Vec<u8, alloc::alloc::Global>>>> (mod.rs:192)
7: [INFO]: ==13953==    by 0x52F1DA5: {closure#0} (multithreaded.rs:77)
7: [INFO]: ==13953==    by 0x52F1DA5: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (panic.rs:347)
7: [INFO]: ==13953==    by 0x52EB913: do_call<std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>, ()> (panicking.rs:401)
7: [INFO]: ==13953==    by 0x52EB913: try<(), std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>> (panicking.rs:365)
7: [INFO]: ==13953==    by 0x52EB913: catch_unwind<std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>, ()> (panic.rs:434)
7: [INFO]: ==13953==    by 0x52EB913: halt_unwinding<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}, ()> (unwind.rs:17)
7: [INFO]: ==13953==    by 0x52EB913: {closure#0}<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}> (mod.rs:97)
7: [INFO]: ==13953==    by 0x52EB913: _ZN77_$LT$rayon_core..job..HeapJob$LT$BODY$GT$$u20$as$u20$rayon_core..job..Job$GT$7execute17h0490359412a3f331E.llvm.897720431843799762 (job.rs:167)
7: [INFO]: ==13953==    by 0x50052C0: execute (job.rs:59)
7: [INFO]: ==13953==    by 0x50052C0: execute (registry.rs:749)
7: [INFO]: ==13953==    by 0x50052C0: rayon_core::registry::WorkerThread::wait_until_cold (registry.rs:726)
7: [INFO]: ==13953==    by 0x53007AC: wait_until<rayon_core::latch::CountLatch> (registry.rs:700)
7: [INFO]: ==13953==    by 0x53007AC: main_loop (registry.rs:833)
7: [INFO]: ==13953==    by 0x53007AC: rayon_core::registry::ThreadBuilder::run (registry.rs:55)
7: [INFO]: ==13953==    by 0x5303104: {closure#0} (registry.rs:100)
7: [INFO]: ==13953==    by 0x5303104: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
7: [INFO]: ==13953==    by 0x52FD0AA: {closure#0}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> (mod.rs:476)
7: [INFO]: ==13953==    by 0x52FD0AA: call_once<(), std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}> (panic.rs:347)
7: [INFO]: ==13953==    by 0x52FD0AA: do_call<std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>, ()> (panicking.rs:401)
7: [INFO]: ==13953==    by 0x52FD0AA: try<(), std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>> (panicking.rs:365)
7: [INFO]: ==13953==    by 0x52FD0AA: catch_unwind<std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>, ()> (panic.rs:434)
7: [INFO]: ==13953==    by 0x52FD0AA: {closure#0}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> (mod.rs:475)
7: [INFO]: ==13953==    by 0x52FD0AA: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227)
7: [INFO]: ==13953==    by 0x53BC4E6: call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> (boxed.rs:1572)
7: [INFO]: ==13953==    by 0x53BC4E6: call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> (boxed.rs:1572)
7: [INFO]: ==13953==    by 0x53BC4E6: std::sys::unix:🧵:Thread:🆕:thread_start (thread.rs:74)
7: [INFO]: ==13953==    by 0x6EF4EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
7: [INFO]: ==13953==    by 0x6C1DB0C: clone (in /usr/lib64/libc-2.17.so)

The issue stems from jpeg decoder error handling.
It appears to be a false positive.
2022-03-02 13:12:59 -07:00
Micah Snyder
20d7041dec Tests: Add basic tests for image fuzzy hash signature matching 2022-03-02 13:12:59 -07:00
Micah Snyder
ad353b1ca5 Ctest, Valgrind: disable possibly-lost alerts
We've observed a number of false positives with iconv, pthread, libxml2,
etc regarding memory allocation on init that is never cleaned up but
isn't a leak. Now we're seeing a whole tonne more from Rust libraries
with wild stack traces that couldn't be easily suppress.

So the solution is to disable any alerts from possible-leaks and only
show definite-leaks.
2022-03-02 13:12:59 -07:00
Micah Snyder
fd587c741c Image fuzzy hash: new logical sub-signature feature
Add a new logical signature subsignature type for matching on images
with image fuzzy hashes.

Image fuzzy hash subsigantures follow this format:

    fuzzy_img#<hash>#<dist>

In this initial implementation, the hamming distance (dist) is ignored
and only exact fuzzy hash matches will alert.

Fuzzy hash matching is only performed for supported image types.

Also: removed some excessive debug log messages on start-up.

Fixed an issue where the signature name (virname) is being allocated and
stored for every subsignature or even ever sub-pattern in an AC-pattern
(i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard.
This fix is only for LDB subsigs though. NDB signatures are still
allocaing one virname per sub-pattern.

This fix was required because I needed a place to store the virname with
fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig
metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it
wouldn't work because it would cross the C-Rust FFI boundary and giving
pointers to Rust allocated stuff is dicey. Not to mention native Rust
strings are different thatn C strings. Anyways, the correct thing to do
was to store the virname with the actual logical signature.

TODO: Keep track of NDB signatures in the same way and store the virname
for NDB sigs there instead of in AC-patterns so that we can get rid of
the virname field in the AC-pattern struct.
2022-03-02 13:12:59 -07:00
Micah Snyder
618e3b6bcf Valgrind,Tests: Add to more variants of statx suppression 2022-02-23 16:32:26 -07:00
Micah Snyder
a9bb5e37a5 Correct stdout/stderr expectations for freshclam tests
Improvements to the logg() API appear to have corrected a bug wherein
debug/warning/error messages were writing to stdout instead of stderr.

Now that they're going to stderr correctly, this fixes the test to match.
2022-02-23 16:32:26 -07:00
Micah Snyder
dc34fd2c75 Tests: valgrind test improvements.
Don't crop logs so we can see the whole valgrind output in large logs.

Generate suppression rules, in case they're needed for slight variations
on existing suppressions or sometimes new suppressions.

Increase valgrind's stack size for initial thread because I observed some
unusual errors in custom tests on Debian 10 like this:

7: [INFO]: ==9911== Process terminating with default action of signal 15 (SIGTERM)
7: [INFO]: ==9911==    at 0x401A4FB: __open_nocancel (open64_nocancel.c:45)
7: [INFO]: ==9911==    by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911==    by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911==    by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911==    by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911==    by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911==    by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911==    by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911==    by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911==    by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911==    by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911==    by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911==    by 0x1: ???
7: [INFO]: ==9911== Jump to the invalid address stated on the next line
7: [INFO]: ==9911==    at 0x1036: ???
7: [INFO]: ==9911==    by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911==    by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911==    by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911==    by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911==    by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911==    by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911==    by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911==    by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911==    by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911==    by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911==    by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911==    by 0x1: ???
7: [INFO]: ==9911==  Address 0x1036 is not stack'd, malloc'd or (recently) free'd
7: [INFO]: ==9911==
7: [INFO]: ==9911==
7: [INFO]: ==9911== Process terminating with default action of signal 11 (SIGSEGV)
7: [INFO]: ==9911==  Bad permissions for mapped region at address 0x1036
7: [INFO]: ==9911==    at 0x1036: ???
7: [INFO]: ==9911==    by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728)
7: [INFO]: ==9911==    by 0x4007737: open_path (dl-load.c:2057)
7: [INFO]: ==9911==    by 0x4008C17: _dl_map_object (dl-load.c:2297)
7: [INFO]: ==9911==    by 0x400D291: openaux (dl-deps.c:64)
7: [INFO]: ==9911==    by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196)
7: [INFO]: ==9911==    by 0x400D605: _dl_map_object_deps (dl-deps.c:248)
7: [INFO]: ==9911==    by 0x4003AFA: dl_main (rtld.c:1733)
7: [INFO]: ==9911==    by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253)
7: [INFO]: ==9911==    by 0x4002117: _dl_start_final (rtld.c:415)
7: [INFO]: ==9911==    by 0x4002117: _dl_start (rtld.c:522)
7: [INFO]: ==9911==    by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so)
7: [INFO]: ==9911==    by 0x1: ???

Per this conversation: https://bugs.kde.org/show_bug.cgi?id=359705
I'm testing to see if the issue is resolved by increasing valgrind's
main stack size.

https://valgrind.org/docs/manual/manual-core.html claims that the
default is the current `ulimit` value or 16MB, whichever is lower.
2022-02-23 16:32:26 -07:00
Micah Snyder
27ac9fe92e CMake: Fix static build with unrar disabled
Fix test build issues when unrar is disabled and the static build is
enabled.
2022-02-23 16:32:26 -07:00
Micah Snyder
350a2faf67 DB read logic cleanup, fix some warnings
The logic for parsing a logical subsignature isn't clearly identified
and has been, perhaps mistakenly or out of convenience, used to when
parsing NDB signatures in addition to LDB subsignatures. What this means
is that you can technically use a PCRE subsignature in an NDB file and
clam won't complain about it. It won't work however, because a PCRE
subsignature requires another matching subsignature to trigger it, but
it will parse. The same is likely true for byte-compare subsignatures.

This commit restructures that logic a bit so subsignature parsing has
its own function and is more organized.
I also renamed the functions a little bit and added lots of comments.

I fixed a few minor warnings relating to format string characters.

The change in str.c:cli_ldbtokenize is to prevent a buffer under-read if
you were to use the function on the start of a buffer, as is now down in
this commit.
2022-02-23 12:28:31 -07:00
mko-x
a21cc6dcd7
Add explicit log level parameter to application logging API
* Added loglevel parameter to logg()

* Fix logg and mprintf internals with new loglevels

* Update all logg calls to set loglevel

* Update all mprintf calls to set loglevel

* Fix hidden logg calls

* Executed clam-format
2022-02-15 15:13:55 -08:00
Micah Snyder
9bdc28b052 CMake: Support external TomsFastMath library (libtfm)
Add support for compiling with external TomsFastMath library provided by
the system instead of compiling the vendored copy into libclamav.

The vendored source is still built directly into libclamav instead of as
a separate library the way libmspack is done.
The rationale is that:
A) it's more complicated to deal with possibly compiling as static or
dynamic, and also
B) libmspack and libunrar are compiled separately primarily because of
licensing concerns. TomsFastMath public domain, so that isn't a concern.

Resolves: https://bugzilla.clamav.net/show_bug.cgi?id=12562
2022-02-10 12:54:23 -07:00
Micah Snyder
7b4e421be7 Test: Test all-match mode works for all hash sigs
Adds a clamscan test to verify that --allmatch mode works for the
clam.exe program when loading MD5, SHA1, SHA256, PE-section, & PE-import
hash signatures.
2022-01-14 12:51:14 -07:00
Scott Hutton
a549e1fd90 valgrind: statx() suppression improvements
Reduce rule strictness for variants that appeared on centos 7
2022-01-10 12:18:33 -07:00
Micah Snyder
5691d57ff2 tests: Suppress valgrind FP's caused by Rust std library hack
See:
- https://github.com/fede1024/rust-rdkafka/blob/master/rdkafka.suppressions
- https://sourceware.org/git/?p=valgrind.git;a=commit;h=2a7d3ae768f9e5b29acd5cb743c3fb13640a391c
2022-01-10 12:18:33 -07:00
micasnyd
140c88aa4e Bump copyright for 2022
Includes minor format corrections.
2022-01-09 14:23:25 -07:00
Scott Hutton
37e5b573aa Add Rust logging module unit test & integrate with CMake / CTest
Add a basic unit test for the new libclamav_rust `logging.rs` module.
This test simply initializes logging and then prints out a message with
each of the `log` macros.

Also set the Rust edition to 2018 because the default is the 2015
edition in which using external crates is very clunky.

For the Rust test support in CMake this commit adds support for
cross-compiling the Rust tests.
Rust tests must be built for the same LLVM triple (target platform) as
the rest of the project. In particular this is needed to build both
x64 and x86 packages on a 64bit Windows host.

For Alpine, we observed that the LLVM triple for the host platform tools
may be either:
- x86_64-unknown-linux-musl, or
- x86_64-alpine-linux-musl
To support it either way, we look up the host triple with `rustc -vV`
and use that if the musl libc exists. This is a big hacky and
unfortunately means that we probably can't cross-compile to other
platforms when running on a musl libc host. There are probably
improvements to be made to improve cross compiling support.

The Rust test programs must link with libclamav, libclammspack, and
possibly libclamunrar_iface and libclamunrar plus all of the library
dependencies for those libraries.
To do this, we pass the path of each library in environment variables
when building the libclamav_rust unit test program.
Within `libclamav_rust/build.rs`, we read those environment variables.
If set, we parse each into library path and name components to use
as directives for how to build the unit test program.
See: https://doc.rust-lang.org/cargo/reference/build-scripts.html

Our `build.rs` file ignores the library path environment variables if
thye're not set, which is necessary when building the libclamav_rust
library and when libclamunrar isn't static and for when not linking with
a libiconv external to libc.

Rust test programs are built and executed in subdirectory under:
  <target>/<llvm triple>/<config>/deps
where "target" for libclamav_rust tests is set to <build>/unit_tests
For example:
  clamav/build/unit_tests/x86_64-pc-windows-msvc/debug/deps/clamav_rust-7e1343f8a2bff1cc.exe
Since this program isn't co-located with the rest of the libraries
we also have to set environment variables so the test program can find and
load the shared libraries:
- Windows: PATH
- macOS: DYLD_LIBRARY_PATH
We already set LD_LIBRARY_PATH when not Windows for similar reasons.

Note: In build.rs, we iterate references to LIB_ENV_LINK & Co because
older Rust versions do implement Iterator for [&str].
2021-11-23 10:43:14 -08:00
krnick
656f4da6f6 Tests: Hoist repeated code outside if statement 2021-11-22 16:35:47 -08:00
Micah Snyder
78f99bef34 ClamD: Fix clamdscan error with broken symlink on macOS
The `realpath()` function on macOS will fail when scanning a symlink
that doesn't exist because it tries to determine the real path of the
thing the symlink points to and fails when that thing doesn't exist.
This behavior is different than on Linux or FreeBSD Unix.

I'm fixing this by opening the symlink with `O_SYMLINK` then getting
realpath of the link using `cli_get_filepath_from_filedesc()`, much
like we do on Windows.

Side note: I did try using macOS's _DARWIN_BETTER_REALPATH variant to
see if that resolved the issue, but it didn't seem to.

This resolves https://bugzilla.clamav.net/show_bug.cgi?id=12792

This commit also removes the problematic "access denied" clamd test from
the test suite. This commit broke that test on macOS because the error
message when clamdscan fails to `open()` the file to check the
realpath() is different than the error message when clamd tries to scan
it, but has access denied.

(It's like "Permission denied" instead of "Access denied")

It's not worth #ifdef'ing around macOS to get a test pass because this
test is already causing problems in 32-bit Docker environments where
access isn't actually denied to the user (!). Honestly, it's not worth
keeping the test that simply verifies ClamD's error message when denied
matches expectations.

I also switched to use the C_DARWIN macro instead of __APPLE__ because
I kno C_DARWIN is defined in clamav-config.h on macOS, and I found that
__APPLE__ wasn't defined when I tested.
2021-10-30 21:35:06 -07:00
Micah Snyder
985f0e9d7c test: extend freshclam 403 test to test cool-down
Extends the existing freshclam 403 test to check if a repeated freshclam
403 response outputs the cool-down message instead of the original 403
response message.
2021-10-29 19:34:01 -07:00
Micah Snyder
0354482e16 Fix issues reading from uncompressed nested files
The fmap module provides a mechanism for creating a mapping into an
existing map at an offset and length that's used when a file is found
with an uncompressed archive or when embedded files are found with
embedded file type recognition in scanraw(). This is the
"fmap_duplicate()" function. Duplicate fmaps just reference the original
fmap's 'data' or file handle/descriptor while allowing the caller to
treat it like a new map using offsets and lengths that don't account for
the original/actual file dimensions.

fmap's keep track of this with m->nested_offset & m->real_len, which
admittedly have confusing names. I found incorrect uses of these in a
handful of locations. Notably:
- In cli_magic_scan_nested_fmap_type().
  The force-to-disk feature would have been checking incorrect sizes and
  may have written incorrect offsets for duplicate fmaps.
- In XDP parser.
- A bunch of places from the previous commit when making dupe maps.

This commit fixes those and adds lots of documentation to the fmap.h API
to try to prevent confusion in the future.

nested_offset should never be referenced outside of fmap.c/h.

The fmap_* functions for accessing or reading map data have two
implementations, mem_* or handle_*, depending the data source.
I found issues with some of these so I made a unit test that covers each
of the functions I'm concerned about for both types of data sources and
for both original fmaps and nested/duplicate fmaps.

With the tests, I found and fixed issues in these fmap functions:
- handle_need_offstr(): must account for the nested_offset in dupe maps.
- handle_gets(): must account for nested_offset and use len & real_len
  correctly.
- mem_need_offstr(): must account for nested_offset in dupe maps.
- mem_gets(): must account for nested_offset and use len & real_len
  correctly.

Moved CDBRANGE() macro out of function definition so for better
legibility.

Fixed a few warnings.
2021-10-25 16:02:29 -07:00
Micah Snyder
db013a2bfd libclamav: Fix scan recursion tracking
Scan recursion is the process of identifying files embedded in other
files and then scanning them, recursively.

Internally this process is more complex than it may sound because a file
may have multiple layers of types before finding a new "file".

At present we treat the recursion count in the scanning context as an
index into both our fmap list AND our container list. These two lists
are conceptually a part of the same thing and should be unified.

But what's concerning is that the "recursion level" isn't actually
incremented or decremented at the same time that we add a layer to the
fmap or container lists but instead is more touchy-feely, increasing
when we find a new "file".

To account for this shadiness, the size of the fmap and container lists
has always been a little longer than our "max scan recursion" limit so
we don't accidentally overflow the fmap or container arrays (!).

I've implemented a single recursion-stack as an array, similar to before,
which includes a pointer to each fmap at each layer, along with the size
and type. Push and pop functions add and remove layers whenever a new
fmap is added. A boolean argument when pushing indicates if the new layer
represents a new buffer or new file (descriptor). A new buffer will reset
the "nested fmap level" (described below).

This commit also provides a solution for an issue where we detect
embedded files more than once during scan recursion.

For illustration, imagine a tarball named foo.tar.gz with this structure:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |

But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| baz.exe                   | PE    | 0         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| │   └── hello.txt         | ASCII | 2         | 0                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |
|     └── world.txt         | ASCII | 2         | 0                 |

(A) If we scan for embedded files at any layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| ├── foo.tar               | TAR   | 1         | 0                 |
| │   ├── bar.zip           | ZIP   | 2         | 1                 |
| │   │   └── hola.txt      | ASCII | 3         | 0                 |
| │   ├── baz.exe           | PE    | 2         | 1                 |
| │   │   ├── sfx.zip       | ZIP   | 3         | 1                 |
| │   │   │   └── hello.txt | ASCII | 4         | 0                 |
| │   │   └── sfx.7z        | 7Z    | 3         | 1                 |
| │   │       └── world.txt | ASCII | 4         | 0                 |
| │   ├── sfx.zip           | ZIP   | 2         | 1                 |
| │   │   └── hello.txt     | ASCII | 3         | 0                 |
| │   └── sfx.7z            | 7Z    | 2         | 1                 |
| │       └── world.txt     | ASCII | 3         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |

(A) is bad because it scans content more than once.

Note that for the GZ layer, it may detect the ZIP and 7Z if the
signature hits on the compressed data, which it might, though
extracting the ZIP and 7Z will likely fail.

The reason the above doesn't happen now is that we restrict embedded
type scans for a bunch of archive formats to include GZ and TAR.

(B) If we scan for embedded files at the foo.tar layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     ├── baz.exe           | PE    | 2         | 1                 |
|     ├── sfx.zip           | ZIP   | 2         | 1                 |
|     │   └── hello.txt     | ASCII | 3         | 0                 |
|     └── sfx.7z            | 7Z    | 2         | 1                 |
|         └── world.txt     | ASCII | 3         | 0                 |

(B) is almost right. But we can achieve it easily enough only scanning for
embedded content in the current fmap when the "nested fmap level" is 0.
The upside is that it should safely detect all embedded content, even if
it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe.

The biggest risk I can think of affects ZIPs. SFXZIP detection
is identical to ZIP detection, which is why we don't allow SFXZIP to be
detected if insize of a ZIP. If we only allow embedded type scanning at
fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP
if the bar.exe was not compressed in foo.zip and if non-compressed files
extracted from ZIPs aren't extracted as new buffers:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.zip                   | ZIP   | 0         | 0                 |
| └── bar.exe               | PE    | 1         | 1                 |
|     └── sfx.zip           | ZIP   | 2         | 2                 |

Provided that we ensure all files extracted from zips are scanned in
new buffers, option (B) should be safe.

(C) If we scan for embedded files at the baz.exe layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |
|         ├── sfx.zip       | ZIP   | 3         | 1                 |
|         │   └── hello.txt | ASCII | 4         | 0                 |
|         └── sfx.7z        | 7Z    | 3         | 1                 |
|             └── world.txt | ASCII | 4         | 0                 |

(C) is right. But it's harder to achieve. For this example we can get it by
restricting 7ZSFX and ZIPSFX detection only when scanning an executable.
But that may mean losing detection of archives embedded elsewhere.
And we'd have to identify allowable container types for each possible
embedded type, which would be very difficult.

So this commit aims to solve the issue the (B)-way.

Note that in all situations, we still have to scan with file typing
enabled to determine if we need to reassign the current file type, such
as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2-
compressed. Detection of DMG and a handful of other types rely on
finding data partway through or near the ned of a file before
reassigning the entire file as the new type.

Other fixes and considerations in this commit:

- The utf16 HTML parser has weak error handling, particularly with respect
  to creating a nested fmap for scanning the ascii decoded file.
  This commit cleans up the error handling and wraps the nested scan with
  the recursion-stack push()/pop() for correct recursion tracking.

  Before this commit, each container layer had a flag to indicate if the
  container layer is valid.
  We need something similar so that the cli_recursion_stack_get_*()
  functions ignore normalized layers. Details...

  Imagine an LDB signature for HTML content that specifies a ZIP
  container. If the signature actually alerts on the normalized HTML and
  you don't ignore normalized layers for the container check, it will
  appear as though the alert is in an HTML container rather than a ZIP
  container.

  This commit accomplishes this with a boolean you set in the scan context
  before scanning a new layer. Then when the new fmap is created, it will
  use that flag to set similar flag for the layer. The context flag is
  reset those that anything after this doesn't have that flag.
  The flag allows the new recursion_stack_get() function to ignore
  normalized layers when iterating the stack to return a layer at a
  requested index, negative or positive.

  Scanning normalized extracted/normalized javascript and VBA should also
  use the 'layer is normalized' flag.

- This commit also fixes Heuristic.Broken.Executable alert for ELF files
  to make sure that:

  A) these only alert if cli_append_virus() returns CL_VIRUS (aka it
  respects the FP check).

  B) all broken-executable alerts for ELF only happen if the
  SCAN_HEURISTIC_BROKEN option is enabled.

- This commit also cleans up the error handling in cli_magic_scan_dir().
  This was needed so we could correctly apply the layer-is-normalized-flag
  to all VBA macros extracted to a directory when scanning the directory.

- Also fix an issue where exceeding scan maximums wouldn't cause embedded
  file detection scans to abort. Granted we don't actually want to abort
  if max filesize or max recursion depth are exceeded... only if max
  scansize, max files, and max scantime are exceeded.

  Add 'abort_scan' flag to scan context, to protect against depending on
  correct error propagation for fatal conditions. Instead, setting this
  flag in the scan context should guarantee that a fatal condition deep in
  scan recursion isn't lost which result in more stuff being scanned
  instead of aborting. This shouldn't be necessary, but some status codes
  like CL_ETIMEOUT never used to be fatal and it's easier to do this than
  to verify every parser only returns CL_ETIMEOUT and other "fatal
  status codes" in fatal conditions.

- Remove duplicate is_tar() prototype from filestypes.c and include
  is_tar.h instead.

- Presently we create the fmap hash when creating the fmap.
  This wastes a bit of CPU if the hash is never needed.
  Now that we're creating fmap's for all embedded files discovered with
  file type recognition scans, this is a much more frequent occurence and
  really slows things down.

  This commit fixes the issue by only creating fmap hashes as needed.
  This should not only resolve the perfomance impact of creating fmap's
  for all embedded files, but also should improve performance in general.

- Add allmatch check to the zip parser after the central-header meta
  match. That way we don't multiple alerts with the same match except in
  allmatch mode. Clean up error handling in the zip parser a tiny bit.

- Fixes to ensure that the scan limits such as scansize, filesize,
  recursion depth, # of embedded files, and scantime are always reported
  if AlertExceedsMax (--alert-exceeds-max) is enabled.

- Fixed an issue where non-fatal alerts for exceeding scan maximums may
  mask signature matches later on. I changed it so these alerts use the
  "possibly unwanted" alert-type and thus only alert if no other alerts
  were found or if all-match or heuristic-precedence are enabled.

- Added the "Heuristics.Limits.Exceeded.*" events to the JSON metadata
  when the --gen-json feature is enabled. These will show up once under
  "ParseErrors" the first time a limit is exceeded. In the present
  implementation, only one limits-exceeded events will be added, so as to
  prevent a malicious or malformed sample from filling the JSON buffer
  with millions of events and using a tonne of RAM.
2021-10-25 16:02:29 -07:00
Micah Snyder
7b77717b40 test: freshclam cdiff test tuning
There is a bug where freshclam fails to detect if a downloaded CDIFF is
empty. In 0.103 this, combined with a CDN caching issue could result in
freshclam downloading a daily.cvd but failing to update, putting it in a
sort of infinite loop. In 0.104 this issue manifests slightly
differently, requiring freshclam to run up to 3x before you get over the
empty-CVD hump and are back to normal updates.

This commit updates an existing cdiff test with the zero-byte cdiff + an
out-of-date CVD to confirm the bug. The following commit will fix it.
2021-10-07 17:40:42 -07:00
Alexander Sulfrian
c5c3b7558e CMake: Fix race condition with parallel builds
If running multiple parallel processes of "xor_testfile.py" there was a
race condition between checking for the existence of the directory and
creating it. Now this is handled as a dependency in CMake.
2021-09-27 13:03:24 -07:00
Micah Snyder
4a9cff9214 CMake: support Xcode builds
Xcode (and perhaps some other generators?) do not like targets that have
only object files. See:
https://cmake.org/cmake/help/latest/command/add_library.html#object-libraries

And: https://cmake.org/pipermail/cmake/2016-May/063479.html

This issue manifests when using `-G Xcode` on macOS as the library
dylibs being missing when linking with other binaries.

This commit removes the object libraries for libclamav, libfreshclam,
libclamunrar_iface, libclamunrar, libclammspack, and (lib)common
because they were used by static or shared libs that didn't
themselves have any added sources.

Add getter & setter for the debug flag, so it isn't referenced by unit
tests or other code that links with libclamav. This is needed because
global variables are exported symbols on Windows.
2021-08-18 13:53:34 -07:00
Micah Snyder
7c0236fc96 tests: Fix clamd tests when path has symlink
The access-denied test and excludepath tests both relied on the full
path of the test file to be in the expected results. This fails if
you're working within a path that has a symlink because clamd and
clamdscan determine real-paths before scanning and end up sending
back the real-path in the results, not the original path.

This fixes the tests by removing the full paths from the expected
results.

I also cleaned up some type safety warnings.
2021-08-17 12:40:23 -07:00
Micah Snyder
8cfaf5f5f6 Test: CDIFF update with UNC paths (Windows)
This is a regression test for https://github.com/Cisco-Talos/clamav/pull/226
2021-08-16 12:10:11 -07:00
Andy Ragusa
c4af06c317 Fix ENABLE_UNRAR=off build
Cmake errors out when the ENABLE_UNRAR=off option is used.  This commit
addresses that.
2021-07-31 11:17:27 -07:00
Andy Ragusa
4db6e1de0a Tests: tune valgrind suppression rule
Handle the case where thrmgr_dispatch_internal was called from somewhere
other than thrmgr_group_dispatch, triggering the valgrind supression
rule.
2021-07-30 14:45:43 -07:00
Micah Snyder
4b400b9b1e Test: Verify that pdf bytecode hooks execute 2021-07-19 14:47:25 -07:00
Micah Snyder
d46832d5cf clamav.net URL update for new docs, github issues
Replace new bugzilla ticket links with links to github issues.
Replace clamav.net/documentation links with docs.clamav.net equivalents.
2021-07-17 15:28:02 -07:00
Micah Snyder
8557bc7a65 Test: python 3.5 compatibility fix
Python 3.5 compatibility fixes for Debian 9, etc that lack 3.6+.

Change a python f-string to an old-style `"".format()`.

Convert Path objects to strings for older `shutil` APIs that don't
accept Paths.
2021-07-17 15:27:36 -07:00
Micah Snyder
b406e7e4d6 Add feature test for XLS image (JPG & PNG) extraction
Added a test to verify that clamscan can extract images from an XLS
document. The document has 2 images: a PNG and JPEG version of the
clamav demon/logo. The test requires the json metadata feature to verify
that the MD5 of the images are correct.

No other image formats were tested because despite the format allegedly
supporting other imate formats, Excel converts TIFF, BMP, and GIF images
to PNG files when you insert them.
2021-07-17 10:39:27 -07:00
Micah Snyder
201e1b12a7 XOR test files; clean up tests directory
The split test files are flagged by some AV's because they look like
broken executables. Instead of splitting the test files to prevent
detections, we should encrypt them. This commit replaces the "reassemble
testfiles" script with a basic "XOR testfiles" script that can be used
to encrypt or decrypt test files. This commit also of course then
replaces all the split files with xor'ed files.

The test and unit_tests directories were a bit of a mess, so I
reorganized them all into unit_tests with all of the test files placed
under "unit_tests/input" using subdirectories for different types of files.
2021-07-17 10:39:27 -07:00
Micah Snyder
9451224323 Test: ClamDScan ExcludePath; Valgrind
Adds a basic test to validate that ExcludePath correctly excludes a
subdirectory but does not exclude subsequent files. As with the other
ClamD/Scan tests, it will test in each mode: regular, stream, and
fdpass (if available).

Unlike the other tests, this one tests ClamDScan with Valgrind instead
of ClamD.

Refactored the clamd_test.py file to reduce duplicate code, and support
enabling and disabling valgrind when running ClamDScan and ClamD.

Add pytest to the github actions environments because the results when
using pytest are far easier to read.
2021-07-15 11:56:13 -07:00
Micah Snyder
ba51d40625 Add valgrind suppression rule for non-serious clamd leaks
There appear to be minors leak in clamd that can occur when shutting-
down immediately after a command (e.g. RELOAD).

These are causing intermittent clamd test failures.

It seems like they're caused by a thread leaking occasionally,
due to not exiting before the program terminates.

I don't believe these to be a serious issue. Tracking down the exact
cause and crafting a fix for the leaks isn't worth the effort.
This commit adds valgrind suppression rules to stabilize the tests.
2021-07-15 11:46:51 -07:00
Micah Snyder
45f228e12c Tests: fix flakey ClamD test on Windows
The non-existent file test has a hack to "expect" a wierd error message
caused by the '\v' character rather than the file not actually existing.
Recently something(?) changed and the test started reporting yet a
different message or no message.

Removing the '\v' special character fixes the test so it actually tests
a non-existent file and returns the same message as on other operating
systems.
2021-07-05 15:21:49 -07:00
Micah Snyder
ee8a62baf7 Test: freshclam w/ zero-byte cdiff & cvd out-of-date
Add a test where freshclam received a zero-byte cdiff to trigger a whole
CVD database download, and the CVD served is older than advertised.

This is a regression test for a bug found & fixed by Andrew Williams.
2021-06-23 18:28:23 -04:00
Micah Snyder
cbe60b30b0 Test: Basic freshclam CDIFF tests
Adds 3 tests to validate that:

1. a CDIFF update works

2. a CDIFF partial update (with 1 missing CDIFF) works
   and that a subsequent update is ok with being 1 behind

3. a CDIFF partial update (with 2 missing CDIFFs) works
   and that a subsequent update will try to get the WHOLE CVD -
   because being 2+ CDIFFs behind without any update isn't good enough.

Also fixed a minor bug so that the database name is properly displayed
when a partial update occurs instead of displaying "(null)".

Also changed the freshclam test port to 8001 to deconflict with
CVD-Update, in case that's running in the background.

TODO: Make the tests smarter so they find an open port instead of
hoping that 8001 is available.
2021-06-22 18:56:01 -07:00
Micah Snyder
d1ccf7747d clang-format housekeeping 2021-06-18 16:34:59 -07:00
Jonas Zaddach
db573de148 Fixed unit tests 2021-06-16 15:50:26 -07:00
Micah Snyder
535867e12f FreshClam: rename mirrors.dat to freshclam.dat
Some users have scripts set up from long ago to delete mirrors.dat if
FreshClam failed. We used to recommend this if people had technical
issues because mirrors.dat would store a bunch of entries indicating
that all of their regional mirrors were failing and then FreshClam would
give up.

The new freshclam DAT file no longer stores that kind of information.
Deleting the DAT file is no longer sound advice.
We very much want the UUID, which is generated when creating the DAT
file, to persist between runs. So unless people go and change the
scripts to delete freshclam.dat instead, this commit should resolve the
concern.
2021-06-01 18:19:12 -07:00
Yasuhiro Kimura
fb479870cc Fix configuration error when ENABLE_EXTERNAL_MSPACK is ON
This fixes https://bugzilla.clamav.net/show_bug.cgi?id=12759
2021-06-01 18:11:35 -07:00