clamav

mirror of https://github.com/Cisco-Talos/clamav.git synced 2025-10-19 10:23:17 +00:00

Author	SHA1	Message	Date
ragusaa	1c6746853f	Fixed heap buffer overflow while loading signatures There is a possible overflow read when loading PDB and WDB phishing signatures. This issue is not a vulnerability. Changed const char pointers to uint8_t pointers when they are to be used with data, as well as removing asserts and adding additional error checking. Thank you Michał Dardas for reporting this issue. This fix also resolves: - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43845 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43812 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43866 This commit also fixes a minor leak of pattern matching trans nodes that was observed when testing with the MPOOL module disabled.	2022-05-16 18:29:25 -07:00
Micah Snyder	d86811ad47	Test: Add test for alerting on images extracted from XLS Using a fuzzy hash test for the clamav daemon JPEG attached to the XLS document. Not yet testing PNG, because the fuzzy hash implementation isn't properly hashing that file, yet. This test is for a regression where malware detection wasn't properly being tracked for OLE2 (XLS) image extraction / scanning.	2022-04-29 14:21:06 -07:00
Micah Snyder	0ad864ab83	Yara: Fix support for regex strings In 0.104 and prior, the function for adding a logical subsignature was being used for NDB sigs, FTM, sigs, and Yara strings. When cleaning up the logic for handling different types of logical sig subsignatures, we accidentally broke support for regex strings in yara rules. This commit adds new logic for recording if the Yara string is a regex string, by adding a regex subsig opt. Then in a new function for adding different types of Yara strings, we check if it's a regex string or not before adding as either a PCRE pattern or as an AC/BM pattern. Resolves: https://github.com/Cisco-Talos/clamav/issues/494 Also add a basic test for yara regex rule.	2022-03-22 16:52:18 -07:00
Micah Snyder	3acada6069	Tests: silence LGTM-com static analysis warning	2022-03-09 20:35:42 -08:00
Micah Snyder	426dd461f6	CMake: Fix win32 linking issue affecting unit tests w/ LLVM runtime I observed undefined symbol errors when linking the bytecode_runtime object with the check_* test executables on Windows when using LLVM built from source. I'm not sure why, exactly, but these symbols should all be present in the ClamAV::libclamav library that we're linking with, so I don't know why we link with the object library targets in addition to the libclamav shared/or/static library target. This commit removes linking with those extra object library targets.	2022-03-09 20:35:42 -08:00
Micah Snyder	b07b1a65cb	Fix linker issues with global variable used in tests The `have_clamjit` global is used in the unit tests but doesn't appear to be exported when I was testing the external LLVM runtime support PR, resulting in an undefined symbol issue. Converting this to a function that returns 0 or 1 instead of a global variable resolved the issue.	2022-03-09 20:35:42 -08:00
Micah Snyder	8bf70207d5	CMake: Fix LLVM linking issues: libclamav_rust, -ltinfo We must pass the LLVM library dependencies to the libclamav_rust build.rs script so it links the libclamav_rust unit test executable with LLVM. Also: - We can remove the libtinfo dependency that was hardcoded for the LLVM 3.6 support, and must remove it for the build to work on Alpine, macOS. - Also, increased the libcheck default timeout from 60s to 300s after experiencing a failure while testing this. - Also made one of the valgrind suppressions more generic to account for inline optimization differences observed in testing this.	2022-03-09 20:35:42 -08:00
Micah Snyder	943f4c4e0e	Tests: Increase default timeout to 10 minutes I observed the libclamav_valgrind test exceeding the 200sec timeout and have heard of occasional timeouts from users.	2022-03-02 21:44:48 -07:00
Micah Snyder	390c264480	Tests, Valgrind: suppress jpeg decoder error fp Observed the following: 7: [INFO]: ==13953== Conditional jump or move depends on uninitialised value(s) 7: [INFO]: ==13953== at 0x52F1DA5: drop_in_place<core::result::Result<(), std::sync::mpsc::SendError<alloc::vec::Vec<u8, alloc::alloc::Global>>>> (mod.rs:192) 7: [INFO]: ==13953== by 0x52F1DA5: {closure#0} (multithreaded.rs:77) 7: [INFO]: ==13953== by 0x52F1DA5: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (panic.rs:347) 7: [INFO]: ==13953== by 0x52EB913: do_call<std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>, ()> (panicking.rs:401) 7: [INFO]: ==13953== by 0x52EB913: try<(), std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>> (panicking.rs:365) 7: [INFO]: ==13953== by 0x52EB913: catch_unwind<std::panic::AssertUnwindSafe<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}>, ()> (panic.rs:434) 7: [INFO]: ==13953== by 0x52EB913: halt_unwinding<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}, ()> (unwind.rs:17) 7: [INFO]: ==13953== by 0x52EB913: {closure#0}<jpeg_decoder::worker::multithreaded::spawn_worker_thread::{closure#0}> (mod.rs:97) 7: [INFO]: ==13953== by 0x52EB913: _ZN77_$LT$rayon_core..job..HeapJob$LT$BODY$GT$$u20$as$u20$rayon_core..job..Job$GT$7execute17h0490359412a3f331E.llvm.897720431843799762 (job.rs:167) 7: [INFO]: ==13953== by 0x50052C0: execute (job.rs:59) 7: [INFO]: ==13953== by 0x50052C0: execute (registry.rs:749) 7: [INFO]: ==13953== by 0x50052C0: rayon_core::registry::WorkerThread::wait_until_cold (registry.rs:726) 7: [INFO]: ==13953== by 0x53007AC: wait_until<rayon_core::latch::CountLatch> (registry.rs:700) 7: [INFO]: ==13953== by 0x53007AC: main_loop (registry.rs:833) 7: [INFO]: ==13953== by 0x53007AC: rayon_core::registry::ThreadBuilder::run (registry.rs:55) 7: [INFO]: ==13953== by 0x5303104: {closure#0} (registry.rs:100) 7: [INFO]: ==13953== by 0x5303104: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125) 7: [INFO]: ==13953== by 0x52FD0AA: {closure#0}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> (mod.rs:476) 7: [INFO]: ==13953== by 0x52FD0AA: call_once<(), std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}> (panic.rs:347) 7: [INFO]: ==13953== by 0x52FD0AA: do_call<std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>, ()> (panicking.rs:401) 7: [INFO]: ==13953== by 0x52FD0AA: try<(), std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>> (panicking.rs:365) 7: [INFO]: ==13953== by 0x52FD0AA: catch_unwind<std::panic::AssertUnwindSafe<std:🧵:{impl#0}::spawn_unchecked::{closure#0}::{closure#0}>, ()> (panic.rs:434) 7: [INFO]: ==13953== by 0x52FD0AA: {closure#0}<rayon_core::registry::{impl#2}::spawn::{closure#0}, ()> (mod.rs:475) 7: [INFO]: ==13953== by 0x52FD0AA: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227) 7: [INFO]: ==13953== by 0x53BC4E6: call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> (boxed.rs:1572) 7: [INFO]: ==13953== by 0x53BC4E6: call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> (boxed.rs:1572) 7: [INFO]: ==13953== by 0x53BC4E6: std::sys::unix:🧵:Thread:🆕:thread_start (thread.rs:74) 7: [INFO]: ==13953== by 0x6EF4EA4: start_thread (in /usr/lib64/libpthread-2.17.so) 7: [INFO]: ==13953== by 0x6C1DB0C: clone (in /usr/lib64/libc-2.17.so) The issue stems from jpeg decoder error handling. It appears to be a false positive.	2022-03-02 13:12:59 -07:00
Micah Snyder	20d7041dec	Tests: Add basic tests for image fuzzy hash signature matching	2022-03-02 13:12:59 -07:00
Micah Snyder	ad353b1ca5	Ctest, Valgrind: disable possibly-lost alerts We've observed a number of false positives with iconv, pthread, libxml2, etc regarding memory allocation on init that is never cleaned up but isn't a leak. Now we're seeing a whole tonne more from Rust libraries with wild stack traces that couldn't be easily suppress. So the solution is to disable any alerts from possible-leaks and only show definite-leaks.	2022-03-02 13:12:59 -07:00
Micah Snyder	fd587c741c	Image fuzzy hash: new logical sub-signature feature Add a new logical signature subsignature type for matching on images with image fuzzy hashes. Image fuzzy hash subsigantures follow this format: fuzzy_img#<hash>#<dist> In this initial implementation, the hamming distance (dist) is ignored and only exact fuzzy hash matches will alert. Fuzzy hash matching is only performed for supported image types. Also: removed some excessive debug log messages on start-up. Fixed an issue where the signature name (virname) is being allocated and stored for every subsignature or even ever sub-pattern in an AC-pattern (i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard. This fix is only for LDB subsigs though. NDB signatures are still allocaing one virname per sub-pattern. This fix was required because I needed a place to store the virname with fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it wouldn't work because it would cross the C-Rust FFI boundary and giving pointers to Rust allocated stuff is dicey. Not to mention native Rust strings are different thatn C strings. Anyways, the correct thing to do was to store the virname with the actual logical signature. TODO: Keep track of NDB signatures in the same way and store the virname for NDB sigs there instead of in AC-patterns so that we can get rid of the virname field in the AC-pattern struct.	2022-03-02 13:12:59 -07:00
Micah Snyder	618e3b6bcf	Valgrind,Tests: Add to more variants of statx suppression	2022-02-23 16:32:26 -07:00
Micah Snyder	a9bb5e37a5	Correct stdout/stderr expectations for freshclam tests Improvements to the logg() API appear to have corrected a bug wherein debug/warning/error messages were writing to stdout instead of stderr. Now that they're going to stderr correctly, this fixes the test to match.	2022-02-23 16:32:26 -07:00
Micah Snyder	dc34fd2c75	Tests: valgrind test improvements. Don't crop logs so we can see the whole valgrind output in large logs. Generate suppression rules, in case they're needed for slight variations on existing suppressions or sometimes new suppressions. Increase valgrind's stack size for initial thread because I observed some unusual errors in custom tests on Debian 10 like this: 7: [INFO]: ==9911== Process terminating with default action of signal 15 (SIGTERM) 7: [INFO]: ==9911== at 0x401A4FB: __open_nocancel (open64_nocancel.c:45) 7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728) 7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057) 7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297) 7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64) 7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196) 7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248) 7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733) 7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253) 7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415) 7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522) 7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so) 7: [INFO]: ==9911== by 0x1: ??? 7: [INFO]: ==9911== Jump to the invalid address stated on the next line 7: [INFO]: ==9911== at 0x1036: ??? 7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728) 7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057) 7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297) 7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64) 7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196) 7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248) 7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733) 7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253) 7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415) 7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522) 7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so) 7: [INFO]: ==9911== by 0x1: ??? 7: [INFO]: ==9911== Address 0x1036 is not stack'd, malloc'd or (recently) free'd 7: [INFO]: ==9911== 7: [INFO]: ==9911== 7: [INFO]: ==9911== Process terminating with default action of signal 11 (SIGSEGV) 7: [INFO]: ==9911== Bad permissions for mapped region at address 0x1036 7: [INFO]: ==9911== at 0x1036: ??? 7: [INFO]: ==9911== by 0x4006F65: open_verify.constprop.12 (dl-load.c:1728) 7: [INFO]: ==9911== by 0x4007737: open_path (dl-load.c:2057) 7: [INFO]: ==9911== by 0x4008C17: _dl_map_object (dl-load.c:2297) 7: [INFO]: ==9911== by 0x400D291: openaux (dl-deps.c:64) 7: [INFO]: ==9911== by 0x401956A: _dl_catch_exception (dl-error-skeleton.c:196) 7: [INFO]: ==9911== by 0x400D605: _dl_map_object_deps (dl-deps.c:248) 7: [INFO]: ==9911== by 0x4003AFA: dl_main (rtld.c:1733) 7: [INFO]: ==9911== by 0x401864F: _dl_sysdep_start (dl-sysdep.c:253) 7: [INFO]: ==9911== by 0x4002117: _dl_start_final (rtld.c:415) 7: [INFO]: ==9911== by 0x4002117: _dl_start (rtld.c:522) 7: [INFO]: ==9911== by 0x4001097: ??? (in /lib/x86_64-linux-gnu/ld-2.28.so) 7: [INFO]: ==9911== by 0x1: ??? Per this conversation: https://bugs.kde.org/show_bug.cgi?id=359705 I'm testing to see if the issue is resolved by increasing valgrind's main stack size. https://valgrind.org/docs/manual/manual-core.html claims that the default is the current `ulimit` value or 16MB, whichever is lower.	2022-02-23 16:32:26 -07:00
Micah Snyder	27ac9fe92e	CMake: Fix static build with unrar disabled Fix test build issues when unrar is disabled and the static build is enabled.	2022-02-23 16:32:26 -07:00
Micah Snyder	350a2faf67	DB read logic cleanup, fix some warnings The logic for parsing a logical subsignature isn't clearly identified and has been, perhaps mistakenly or out of convenience, used to when parsing NDB signatures in addition to LDB subsignatures. What this means is that you can technically use a PCRE subsignature in an NDB file and clam won't complain about it. It won't work however, because a PCRE subsignature requires another matching subsignature to trigger it, but it will parse. The same is likely true for byte-compare subsignatures. This commit restructures that logic a bit so subsignature parsing has its own function and is more organized. I also renamed the functions a little bit and added lots of comments. I fixed a few minor warnings relating to format string characters. The change in str.c:cli_ldbtokenize is to prevent a buffer under-read if you were to use the function on the start of a buffer, as is now down in this commit.	2022-02-23 12:28:31 -07:00
mko-x	a21cc6dcd7	Add explicit log level parameter to application logging API * Added loglevel parameter to logg() * Fix logg and mprintf internals with new loglevels * Update all logg calls to set loglevel * Update all mprintf calls to set loglevel * Fix hidden logg calls * Executed clam-format	2022-02-15 15:13:55 -08:00
Micah Snyder	9bdc28b052	CMake: Support external TomsFastMath library (libtfm) Add support for compiling with external TomsFastMath library provided by the system instead of compiling the vendored copy into libclamav. The vendored source is still built directly into libclamav instead of as a separate library the way libmspack is done. The rationale is that: A) it's more complicated to deal with possibly compiling as static or dynamic, and also B) libmspack and libunrar are compiled separately primarily because of licensing concerns. TomsFastMath public domain, so that isn't a concern. Resolves: https://bugzilla.clamav.net/show_bug.cgi?id=12562	2022-02-10 12:54:23 -07:00
Micah Snyder	7b4e421be7	Test: Test all-match mode works for all hash sigs Adds a clamscan test to verify that --allmatch mode works for the clam.exe program when loading MD5, SHA1, SHA256, PE-section, & PE-import hash signatures.	2022-01-14 12:51:14 -07:00
Scott Hutton	a549e1fd90	valgrind: statx() suppression improvements Reduce rule strictness for variants that appeared on centos 7	2022-01-10 12:18:33 -07:00
Micah Snyder	5691d57ff2	tests: Suppress valgrind FP's caused by Rust std library hack See: - https://github.com/fede1024/rust-rdkafka/blob/master/rdkafka.suppressions - https://sourceware.org/git/?p=valgrind.git;a=commit;h=2a7d3ae768f9e5b29acd5cb743c3fb13640a391c	2022-01-10 12:18:33 -07:00
micasnyd	140c88aa4e	Bump copyright for 2022 Includes minor format corrections.	2022-01-09 14:23:25 -07:00
Scott Hutton	37e5b573aa	Add Rust logging module unit test & integrate with CMake / CTest Add a basic unit test for the new libclamav_rust `logging.rs` module. This test simply initializes logging and then prints out a message with each of the `log` macros. Also set the Rust edition to 2018 because the default is the 2015 edition in which using external crates is very clunky. For the Rust test support in CMake this commit adds support for cross-compiling the Rust tests. Rust tests must be built for the same LLVM triple (target platform) as the rest of the project. In particular this is needed to build both x64 and x86 packages on a 64bit Windows host. For Alpine, we observed that the LLVM triple for the host platform tools may be either: - x86_64-unknown-linux-musl, or - x86_64-alpine-linux-musl To support it either way, we look up the host triple with `rustc -vV` and use that if the musl libc exists. This is a big hacky and unfortunately means that we probably can't cross-compile to other platforms when running on a musl libc host. There are probably improvements to be made to improve cross compiling support. The Rust test programs must link with libclamav, libclammspack, and possibly libclamunrar_iface and libclamunrar plus all of the library dependencies for those libraries. To do this, we pass the path of each library in environment variables when building the libclamav_rust unit test program. Within `libclamav_rust/build.rs`, we read those environment variables. If set, we parse each into library path and name components to use as directives for how to build the unit test program. See: https://doc.rust-lang.org/cargo/reference/build-scripts.html Our `build.rs` file ignores the library path environment variables if thye're not set, which is necessary when building the libclamav_rust library and when libclamunrar isn't static and for when not linking with a libiconv external to libc. Rust test programs are built and executed in subdirectory under: <target>/<llvm triple>/<config>/deps where "target" for libclamav_rust tests is set to <build>/unit_tests For example: clamav/build/unit_tests/x86_64-pc-windows-msvc/debug/deps/clamav_rust-7e1343f8a2bff1cc.exe Since this program isn't co-located with the rest of the libraries we also have to set environment variables so the test program can find and load the shared libraries: - Windows: PATH - macOS: DYLD_LIBRARY_PATH We already set LD_LIBRARY_PATH when not Windows for similar reasons. Note: In build.rs, we iterate references to LIB_ENV_LINK & Co because older Rust versions do implement Iterator for [&str].	2021-11-23 10:43:14 -08:00
krnick	656f4da6f6	Tests: Hoist repeated code outside if statement	2021-11-22 16:35:47 -08:00
Micah Snyder	78f99bef34	ClamD: Fix clamdscan error with broken symlink on macOS The `realpath()` function on macOS will fail when scanning a symlink that doesn't exist because it tries to determine the real path of the thing the symlink points to and fails when that thing doesn't exist. This behavior is different than on Linux or FreeBSD Unix. I'm fixing this by opening the symlink with `O_SYMLINK` then getting realpath of the link using `cli_get_filepath_from_filedesc()`, much like we do on Windows. Side note: I did try using macOS's _DARWIN_BETTER_REALPATH variant to see if that resolved the issue, but it didn't seem to. This resolves https://bugzilla.clamav.net/show_bug.cgi?id=12792 This commit also removes the problematic "access denied" clamd test from the test suite. This commit broke that test on macOS because the error message when clamdscan fails to `open()` the file to check the realpath() is different than the error message when clamd tries to scan it, but has access denied. (It's like "Permission denied" instead of "Access denied") It's not worth #ifdef'ing around macOS to get a test pass because this test is already causing problems in 32-bit Docker environments where access isn't actually denied to the user (!). Honestly, it's not worth keeping the test that simply verifies ClamD's error message when denied matches expectations. I also switched to use the C_DARWIN macro instead of __APPLE__ because I kno C_DARWIN is defined in clamav-config.h on macOS, and I found that __APPLE__ wasn't defined when I tested.	2021-10-30 21:35:06 -07:00
Micah Snyder	985f0e9d7c	test: extend freshclam 403 test to test cool-down Extends the existing freshclam 403 test to check if a repeated freshclam 403 response outputs the cool-down message instead of the original 403 response message.	2021-10-29 19:34:01 -07:00
Micah Snyder	0354482e16	Fix issues reading from uncompressed nested files The fmap module provides a mechanism for creating a mapping into an existing map at an offset and length that's used when a file is found with an uncompressed archive or when embedded files are found with embedded file type recognition in scanraw(). This is the "fmap_duplicate()" function. Duplicate fmaps just reference the original fmap's 'data' or file handle/descriptor while allowing the caller to treat it like a new map using offsets and lengths that don't account for the original/actual file dimensions. fmap's keep track of this with m->nested_offset & m->real_len, which admittedly have confusing names. I found incorrect uses of these in a handful of locations. Notably: - In cli_magic_scan_nested_fmap_type(). The force-to-disk feature would have been checking incorrect sizes and may have written incorrect offsets for duplicate fmaps. - In XDP parser. - A bunch of places from the previous commit when making dupe maps. This commit fixes those and adds lots of documentation to the fmap.h API to try to prevent confusion in the future. nested_offset should never be referenced outside of fmap.c/h. The fmap_* functions for accessing or reading map data have two implementations, mem_* or handle_*, depending the data source. I found issues with some of these so I made a unit test that covers each of the functions I'm concerned about for both types of data sources and for both original fmaps and nested/duplicate fmaps. With the tests, I found and fixed issues in these fmap functions: - handle_need_offstr(): must account for the nested_offset in dupe maps. - handle_gets(): must account for nested_offset and use len & real_len correctly. - mem_need_offstr(): must account for nested_offset in dupe maps. - mem_gets(): must account for nested_offset and use len & real_len correctly. Moved CDBRANGE() macro out of function definition so for better legibility. Fixed a few warnings.	2021-10-25 16:02:29 -07:00
Micah Snyder	db013a2bfd	libclamav: Fix scan recursion tracking Scan recursion is the process of identifying files embedded in other files and then scanning them, recursively. Internally this process is more complex than it may sound because a file may have multiple layers of types before finding a new "file". At present we treat the recursion count in the scanning context as an index into both our fmap list AND our container list. These two lists are conceptually a part of the same thing and should be unified. But what's concerning is that the "recursion level" isn't actually incremented or decremented at the same time that we add a layer to the fmap or container lists but instead is more touchy-feely, increasing when we find a new "file". To account for this shadiness, the size of the fmap and container lists has always been a little longer than our "max scan recursion" limit so we don't accidentally overflow the fmap or container arrays (!). I've implemented a single recursion-stack as an array, similar to before, which includes a pointer to each fmap at each layer, along with the size and type. Push and pop functions add and remove layers whenever a new fmap is added. A boolean argument when pushing indicates if the new layer represents a new buffer or new file (descriptor). A new buffer will reset the "nested fmap level" (described below). This commit also provides a solution for an issue where we detect embedded files more than once during scan recursion. For illustration, imagine a tarball named foo.tar.gz with this structure: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| baz.exe \| PE \| 0 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| │ └── hello.txt \| ASCII \| 2 \| 0 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| \| └── world.txt \| ASCII \| 2 \| 0 \| (A) If we scan for embedded files at any layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| ├── foo.tar \| TAR \| 1 \| 0 \| \| │ ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hola.txt \| ASCII \| 3 \| 0 \| \| │ ├── baz.exe \| PE \| 2 \| 1 \| \| │ │ ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ │ │ └── hello.txt \| ASCII \| 4 \| 0 \| \| │ │ └── sfx.7z \| 7Z \| 3 \| 1 \| \| │ │ └── world.txt \| ASCII \| 4 \| 0 \| \| │ ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hello.txt \| ASCII \| 3 \| 0 \| \| │ └── sfx.7z \| 7Z \| 2 \| 1 \| \| │ └── world.txt \| ASCII \| 3 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| (A) is bad because it scans content more than once. Note that for the GZ layer, it may detect the ZIP and 7Z if the signature hits on the compressed data, which it might, though extracting the ZIP and 7Z will likely fail. The reason the above doesn't happen now is that we restrict embedded type scans for a bunch of archive formats to include GZ and TAR. (B) If we scan for embedded files at the foo.tar layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| ├── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ └── hello.txt \| ASCII \| 3 \| 0 \| \| └── sfx.7z \| 7Z \| 2 \| 1 \| \| └── world.txt \| ASCII \| 3 \| 0 \| (B) is almost right. But we can achieve it easily enough only scanning for embedded content in the current fmap when the "nested fmap level" is 0. The upside is that it should safely detect all embedded content, even if it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe. The biggest risk I can think of affects ZIPs. SFXZIP detection is identical to ZIP detection, which is why we don't allow SFXZIP to be detected if insize of a ZIP. If we only allow embedded type scanning at fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP if the bar.exe was not compressed in foo.zip and if non-compressed files extracted from ZIPs aren't extracted as new buffers: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.zip \| ZIP \| 0 \| 0 \| \| └── bar.exe \| PE \| 1 \| 1 \| \| └── sfx.zip \| ZIP \| 2 \| 2 \| Provided that we ensure all files extracted from zips are scanned in new buffers, option (B) should be safe. (C) If we scan for embedded files at the baz.exe layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ └── hello.txt \| ASCII \| 4 \| 0 \| \| └── sfx.7z \| 7Z \| 3 \| 1 \| \| └── world.txt \| ASCII \| 4 \| 0 \| (C) is right. But it's harder to achieve. For this example we can get it by restricting 7ZSFX and ZIPSFX detection only when scanning an executable. But that may mean losing detection of archives embedded elsewhere. And we'd have to identify allowable container types for each possible embedded type, which would be very difficult. So this commit aims to solve the issue the (B)-way. Note that in all situations, we still have to scan with file typing enabled to determine if we need to reassign the current file type, such as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2- compressed. Detection of DMG and a handful of other types rely on finding data partway through or near the ned of a file before reassigning the entire file as the new type. Other fixes and considerations in this commit: - The utf16 HTML parser has weak error handling, particularly with respect to creating a nested fmap for scanning the ascii decoded file. This commit cleans up the error handling and wraps the nested scan with the recursion-stack push()/pop() for correct recursion tracking. Before this commit, each container layer had a flag to indicate if the container layer is valid. We need something similar so that the cli_recursion_stack_get_() functions ignore normalized layers. Details... Imagine an LDB signature for HTML content that specifies a ZIP container. If the signature actually alerts on the normalized HTML and you don't ignore normalized layers for the container check, it will appear as though the alert is in an HTML container rather than a ZIP container. This commit accomplishes this with a boolean you set in the scan context before scanning a new layer. Then when the new fmap is created, it will use that flag to set similar flag for the layer. The context flag is reset those that anything after this doesn't have that flag. The flag allows the new recursion_stack_get() function to ignore normalized layers when iterating the stack to return a layer at a requested index, negative or positive. Scanning normalized extracted/normalized javascript and VBA should also use the 'layer is normalized' flag. - This commit also fixes Heuristic.Broken.Executable alert for ELF files to make sure that: A) these only alert if cli_append_virus() returns CL_VIRUS (aka it respects the FP check). B) all broken-executable alerts for ELF only happen if the SCAN_HEURISTIC_BROKEN option is enabled. - This commit also cleans up the error handling in cli_magic_scan_dir(). This was needed so we could correctly apply the layer-is-normalized-flag to all VBA macros extracted to a directory when scanning the directory. - Also fix an issue where exceeding scan maximums wouldn't cause embedded file detection scans to abort. Granted we don't actually want to abort if max filesize or max recursion depth are exceeded... only if max scansize, max files, and max scantime are exceeded. Add 'abort_scan' flag to scan context, to protect against depending on correct error propagation for fatal conditions. Instead, setting this flag in the scan context should guarantee that a fatal condition deep in scan recursion isn't lost which result in more stuff being scanned instead of aborting. This shouldn't be necessary, but some status codes like CL_ETIMEOUT never used to be fatal and it's easier to do this than to verify every parser only returns CL_ETIMEOUT and other "fatal status codes" in fatal conditions. - Remove duplicate is_tar() prototype from filestypes.c and include is_tar.h instead. - Presently we create the fmap hash when creating the fmap. This wastes a bit of CPU if the hash is never needed. Now that we're creating fmap's for all embedded files discovered with file type recognition scans, this is a much more frequent occurence and really slows things down. This commit fixes the issue by only creating fmap hashes as needed. This should not only resolve the perfomance impact of creating fmap's for all embedded files, but also should improve performance in general. - Add allmatch check to the zip parser after the central-header meta match. That way we don't multiple alerts with the same match except in allmatch mode. Clean up error handling in the zip parser a tiny bit. - Fixes to ensure that the scan limits such as scansize, filesize, recursion depth, # of embedded files, and scantime are always reported if AlertExceedsMax (--alert-exceeds-max) is enabled. - Fixed an issue where non-fatal alerts for exceeding scan maximums may mask signature matches later on. I changed it so these alerts use the "possibly unwanted" alert-type and thus only alert if no other alerts were found or if all-match or heuristic-precedence are enabled. - Added the "Heuristics.Limits.Exceeded." events to the JSON metadata when the --gen-json feature is enabled. These will show up once under "ParseErrors" the first time a limit is exceeded. In the present implementation, only one limits-exceeded events will be added, so as to prevent a malicious or malformed sample from filling the JSON buffer with millions of events and using a tonne of RAM.	2021-10-25 16:02:29 -07:00
Micah Snyder	7b77717b40	test: freshclam cdiff test tuning There is a bug where freshclam fails to detect if a downloaded CDIFF is empty. In 0.103 this, combined with a CDN caching issue could result in freshclam downloading a daily.cvd but failing to update, putting it in a sort of infinite loop. In 0.104 this issue manifests slightly differently, requiring freshclam to run up to 3x before you get over the empty-CVD hump and are back to normal updates. This commit updates an existing cdiff test with the zero-byte cdiff + an out-of-date CVD to confirm the bug. The following commit will fix it.	2021-10-07 17:40:42 -07:00
Alexander Sulfrian	c5c3b7558e	CMake: Fix race condition with parallel builds If running multiple parallel processes of "xor_testfile.py" there was a race condition between checking for the existence of the directory and creating it. Now this is handled as a dependency in CMake.	2021-09-27 13:03:24 -07:00
Micah Snyder	4a9cff9214	CMake: support Xcode builds Xcode (and perhaps some other generators?) do not like targets that have only object files. See: https://cmake.org/cmake/help/latest/command/add_library.html#object-libraries And: https://cmake.org/pipermail/cmake/2016-May/063479.html This issue manifests when using `-G Xcode` on macOS as the library dylibs being missing when linking with other binaries. This commit removes the object libraries for libclamav, libfreshclam, libclamunrar_iface, libclamunrar, libclammspack, and (lib)common because they were used by static or shared libs that didn't themselves have any added sources. Add getter & setter for the debug flag, so it isn't referenced by unit tests or other code that links with libclamav. This is needed because global variables are exported symbols on Windows.	2021-08-18 13:53:34 -07:00
Micah Snyder	7c0236fc96	tests: Fix clamd tests when path has symlink The access-denied test and excludepath tests both relied on the full path of the test file to be in the expected results. This fails if you're working within a path that has a symlink because clamd and clamdscan determine real-paths before scanning and end up sending back the real-path in the results, not the original path. This fixes the tests by removing the full paths from the expected results. I also cleaned up some type safety warnings.	2021-08-17 12:40:23 -07:00
Micah Snyder	8cfaf5f5f6	Test: CDIFF update with UNC paths (Windows) This is a regression test for https://github.com/Cisco-Talos/clamav/pull/226	2021-08-16 12:10:11 -07:00
Andy Ragusa	c4af06c317	Fix ENABLE_UNRAR=off build Cmake errors out when the ENABLE_UNRAR=off option is used. This commit addresses that.	2021-07-31 11:17:27 -07:00
Andy Ragusa	4db6e1de0a	Tests: tune valgrind suppression rule Handle the case where thrmgr_dispatch_internal was called from somewhere other than thrmgr_group_dispatch, triggering the valgrind supression rule.	2021-07-30 14:45:43 -07:00
Micah Snyder	4b400b9b1e	Test: Verify that pdf bytecode hooks execute	2021-07-19 14:47:25 -07:00
Micah Snyder	d46832d5cf	clamav.net URL update for new docs, github issues Replace new bugzilla ticket links with links to github issues. Replace clamav.net/documentation links with docs.clamav.net equivalents.	2021-07-17 15:28:02 -07:00
Micah Snyder	8557bc7a65	Test: python 3.5 compatibility fix Python 3.5 compatibility fixes for Debian 9, etc that lack 3.6+. Change a python f-string to an old-style `"".format()`. Convert Path objects to strings for older `shutil` APIs that don't accept Paths.	2021-07-17 15:27:36 -07:00
Micah Snyder	b406e7e4d6	Add feature test for XLS image (JPG & PNG) extraction Added a test to verify that clamscan can extract images from an XLS document. The document has 2 images: a PNG and JPEG version of the clamav demon/logo. The test requires the json metadata feature to verify that the MD5 of the images are correct. No other image formats were tested because despite the format allegedly supporting other imate formats, Excel converts TIFF, BMP, and GIF images to PNG files when you insert them.	2021-07-17 10:39:27 -07:00
Micah Snyder	201e1b12a7	XOR test files; clean up tests directory The split test files are flagged by some AV's because they look like broken executables. Instead of splitting the test files to prevent detections, we should encrypt them. This commit replaces the "reassemble testfiles" script with a basic "XOR testfiles" script that can be used to encrypt or decrypt test files. This commit also of course then replaces all the split files with xor'ed files. The test and unit_tests directories were a bit of a mess, so I reorganized them all into unit_tests with all of the test files placed under "unit_tests/input" using subdirectories for different types of files.	2021-07-17 10:39:27 -07:00
Micah Snyder	9451224323	Test: ClamDScan ExcludePath; Valgrind Adds a basic test to validate that ExcludePath correctly excludes a subdirectory but does not exclude subsequent files. As with the other ClamD/Scan tests, it will test in each mode: regular, stream, and fdpass (if available). Unlike the other tests, this one tests ClamDScan with Valgrind instead of ClamD. Refactored the clamd_test.py file to reduce duplicate code, and support enabling and disabling valgrind when running ClamDScan and ClamD. Add pytest to the github actions environments because the results when using pytest are far easier to read.	2021-07-15 11:56:13 -07:00
Micah Snyder	ba51d40625	Add valgrind suppression rule for non-serious clamd leaks There appear to be minors leak in clamd that can occur when shutting- down immediately after a command (e.g. RELOAD). These are causing intermittent clamd test failures. It seems like they're caused by a thread leaking occasionally, due to not exiting before the program terminates. I don't believe these to be a serious issue. Tracking down the exact cause and crafting a fix for the leaks isn't worth the effort. This commit adds valgrind suppression rules to stabilize the tests.	2021-07-15 11:46:51 -07:00
Micah Snyder	45f228e12c	Tests: fix flakey ClamD test on Windows The non-existent file test has a hack to "expect" a wierd error message caused by the '\v' character rather than the file not actually existing. Recently something(?) changed and the test started reporting yet a different message or no message. Removing the '\v' special character fixes the test so it actually tests a non-existent file and returns the same message as on other operating systems.	2021-07-05 15:21:49 -07:00
Micah Snyder	ee8a62baf7	Test: freshclam w/ zero-byte cdiff & cvd out-of-date Add a test where freshclam received a zero-byte cdiff to trigger a whole CVD database download, and the CVD served is older than advertised. This is a regression test for a bug found & fixed by Andrew Williams.	2021-06-23 18:28:23 -04:00
Micah Snyder	cbe60b30b0	Test: Basic freshclam CDIFF tests Adds 3 tests to validate that: 1. a CDIFF update works 2. a CDIFF partial update (with 1 missing CDIFF) works and that a subsequent update is ok with being 1 behind 3. a CDIFF partial update (with 2 missing CDIFFs) works and that a subsequent update will try to get the WHOLE CVD - because being 2+ CDIFFs behind without any update isn't good enough. Also fixed a minor bug so that the database name is properly displayed when a partial update occurs instead of displaying "(null)". Also changed the freshclam test port to 8001 to deconflict with CVD-Update, in case that's running in the background. TODO: Make the tests smarter so they find an open port instead of hoping that 8001 is available.	2021-06-22 18:56:01 -07:00
Micah Snyder	d1ccf7747d	clang-format housekeeping	2021-06-18 16:34:59 -07:00
Jonas Zaddach	db573de148	Fixed unit tests	2021-06-16 15:50:26 -07:00
Micah Snyder	535867e12f	FreshClam: rename mirrors.dat to freshclam.dat Some users have scripts set up from long ago to delete mirrors.dat if FreshClam failed. We used to recommend this if people had technical issues because mirrors.dat would store a bunch of entries indicating that all of their regional mirrors were failing and then FreshClam would give up. The new freshclam DAT file no longer stores that kind of information. Deleting the DAT file is no longer sound advice. We very much want the UUID, which is generated when creating the DAT file, to persist between runs. So unless people go and change the scripts to delete freshclam.dat instead, this commit should resolve the concern.	2021-06-01 18:19:12 -07:00
Yasuhiro Kimura	fb479870cc	Fix configuration error when ENABLE_EXTERNAL_MSPACK is ON This fixes https://bugzilla.clamav.net/show_bug.cgi?id=12759	2021-06-01 18:11:35 -07:00

1 2 3 4 5 ...

757 commits