clamav

mirror of https://github.com/Cisco-Talos/clamav.git synced 2025-10-19 10:23:17 +00:00

Author	SHA1	Message	Date
Val S.	1a2515eea9	Fix compiler warning Mismatched declaration and definition.	2025-10-14 14:05:12 -04:00
Val S.	a77a271fb5	Reduce unnecessary scanning of embedded file FPs (#1571 ) When embedded file type recognition finds a possible embedded file, it is being scanned as a new embedded file even if it turns out it was a false positive and parsing fails. My solution is to pre-parse the file headers as little possible to determine if it is valid. If possible, also determine the file size based on the headers. That will make it so we don't have to scan additional data when the embedded file is not at the very end. This commit adds header checks prior to embedded ZIP, ARJ, and CAB scanning. For these types I was also able to use the header checks to determine the object size so as to prevent excessive pattern matching. TODO: Add the same for RAR, EGG, 7Z, NULSFT, AUTOIT, IShield, and PDF. This commit also removes duplicate matching for embedded MSEXE. The embedded MSEXE detection and scanning logic was accidentally creating an extra duplicate layer in between scanning and detection because of the logic within the `cli_scanembpe()` function. That function was effectively doing the header check which this commit adds for ZIP, ARJ, and CAB but minus the size check. Note: It is unfortunately not possible to get an accurage size from PE file headers. The `cli_scanembpe()` function also used to dump to a temp file for no reason since FMAPs were extended to support windows into other FMAPs. So this commit removes the intermediate layer as well as dropping a temp file for each embedded PE file. Further, this commit adds configuration and DCONF safeguards around all embedded file type scanning. Finally, this commit adds a set of tests to validate proper extraction of embedded ZIP, ARJ, CAB, and MSEXE files. CLAM-2862 Co-authored-by: TheRaynMan <draynor@sourcefire.com>	2025-09-23 15:57:28 -04:00
Valerie Snyder	f7e60d566f	Record unique object-id for each layer scanned Every time we push a new map onto the scanning recursion context, give it a unique object id number, which counts from zero. Moved the location where we add metadata for each file from the "cli_magic_scan" function over to the "recursion stack push" function. Include a "path" as a parameter for creating a new fmap, and rename some related variables and functions to be more intuitive. CLAM-2796 See also: CLAM-2485, CLAM-2626	2025-08-14 21:23:33 -04:00
Valerie Snyder	aa7b7e9421	Swap clean cache from MD5 to SHA2-256 Change the clean-cache to use SHA2-256 instead of MD5. Note that all references are changed to specify "SHA2-256" now instead of "SHA256", for clarity. But there is no plan to add support for SHA3 algorithms at this time. Significant code cleanup. E.g.: - Implemented goto-done error handling. - Used `uint8_t ` instead of `unsigned char `. - Use `bool` for boolean checks, rather than `int. - Used `#defines` instead of magic numbers. - Removed duplicate `#defines` for things like hash length. Add new option to calculate and record additional hash types when the "generate metadata JSON" feature is enabled: - libclamav option: `CL_SCAN_GENERAL_STORE_EXTRA_HASHES` - clamscan option: `--json-store-extra-hashes` (default off) - clamd.conf option: `JsonStoreExtraHashes` (default 'no') Renamed the sigtool option `--sha256` to `--sha2-256`. The original option is still functional, but is deprecated. For the "generate metadata JSON" feature, the file hash is now stored as "sha2-256" instead of "FileMD5". If you enable the "extra hashes" option, then it will also record "md5" and "sha1". Deprecate and disable the internal "SHA collect" feature. This option had been hidden behind C #ifdef checks for an option that wasn't exposed through CMake, so it was basically unavailable anyways. Changes to calculate file hashes when they're needed and no sooner. For the FP feature in the matcher module, I have mimiced the optimization in the FMAP scan routine which makes it so that it can calculate multiple hashes in a single pass of the file. The `HandlerType` feature stores a hash of the file in the scan ctx to prevent retyping the exact same data more than once. I removed that hash field and replaced it with an attribute flag that is applied to the new recursion stack layer when retyping a file. This also closes a minor bug that would prevent retyping a file with an all-zero hash. :) The work upgrading cache.c to support SHA2-256 sized hashes thanks to: https://github.com/m-sola CLAM-255 CLAM-1858 CLAM-1859 CLAM-1860	2025-08-14 21:23:30 -04:00
Val Snyder	7ff29b8c37	Bump copyright dates for 2025	2025-02-14 10:24:30 -05:00
Micah Snyder	a729aafc38	Remove PCRE dead code As of ClamAV 0.105, PCRE2 is required. PCRE (1) is not an option, and there is also no option to disable PCRE support. This commit removes the dead code associated with those old build options.	2024-04-13 12:34:15 -04:00
Micah Snyder	3ae9c1e434	Add LHA/LZH archive support File type magic signatures chosen based on the extensions supported by Rust delharc crate. See: https://docs.rs/delharc/latest/delharc/	2024-04-09 10:35:22 -04:00
Micah Snyder	9cb28e51e6	Bump copyright dates for 2024	2024-01-22 11:27:17 -05:00
RainRat	caf324e544	Fix typos (no functional changes)	2023-11-26 18:01:19 -05:00
Micah Snyder	6eebecc303	Bump copyright for 2023	2023-02-12 11:20:22 -08:00
Micah Snyder	f7b139a776	PE, ELF, Mach-O: code cleanup The header parsing / executable metadata collecting functions for the PE, ELF, and Mach-O file types were using `int` for the return type. Mostly they were returning 0 for success and -1, -2, -3, or -4 for failure. But in some cases they were returning cl_error_t enum values for failure. Regardless, the function using them was treating 0 as success and non-zero as failure, which it stored as -1 ... every time. This commit switches them all to use cl_error_t. I am continuing to storeo the final result as 0 / -1 in the `peinfo` struct, but outside of that everything has been made consistent. While I was working on that, I got a tad side tracked. I noticed that the target type isn't an enum, or even a set of #defines. So I made an enum and then changed the code that uses target types to use the enum. I also removed the `target` parameter from a number of functions that don't actually use it at all. Some recursion was masking the fact that it was an unused parameter which is why there was no warning about it.	2022-10-19 13:13:57 -07:00
Micah Snyder	73088d261b	Fix issue detecting embedded zips attached to small files If initial file type recognition comes back as an SFX type, which may happen for small files that do not get recognized as any other file type and contain a zip entry somewhere in the middle, then the type will be set to that SFX type. This is a problem because later on when we go to do embedded file type recognition, we explicitly skip SFX types, in addition to TAR's and other types that are parsed elsewhere and have a high embedded file type recognition FP-rate because they aren't compressed. This commit prohibits that initial FTM check from selecting an SFX type. The SFX type will be rediscovered in `scanraw()` where the type is handled/parsed.	2022-10-19 13:13:57 -07:00
Micah Snyder	29a761219a	Matcher: code cleanup, fix possible leaks Added inline documentation and did some general cleanup of the `cli_scan_buff`, and then updated the function comment now that I understand the function a little better. While doing this, I found that the calls to cli_ac_initdata were being done regardless of whether or not logically initialized matcher data was required or used. But the call to free that matcher data was only being done when AC-data was not provided by the caller. This would be a leak. I fixed this by only initalizing the AC data when AC data is not provided by the caller.	2022-10-19 13:13:57 -07:00
Micah Snyder	858b541a51	Matcher: Remove allmatch checks and significantly tidy code Significantly tidy the `cli_scan_fmap()` function, and add comments to explain how it all works. Add SHA1 and SHA256 digest variables to the FMAP structure in addition to the existing MD5. Add a function to set the hash so that when we calculate the hashes for hash matching, we save them for subsequent FP matching. This enabled me to remove the extra "hash-only" FP check from `cli_scan_fmap()`. This will also make it easier to switch the clean cache hash algorithm to SHA256 in the future. Remove extra allmatch checks that are no longer required. Add a new header to prevent #include order issues.	2022-10-19 13:13:57 -07:00
Andy Ragusa	778a4b1341	Corrected types to remove warnings.	2022-10-18 14:04:36 -07:00
Micah Snyder	cd3134568a	Code quality: Refactor layer attributes as scan parameter The current implementation sets a "next layer attributes" flag field in the scan context. This may introduce bugs if accidentally not cleared during error handling, causing that attribute to be applied to a different layer than intended. This commit resolves that by adding an attribute flag to the major internal scan functions and removing the "next layer attributes" from the scan context. This attributes flag shares the same flag fields as the attributes flag in the new file inspection callback and the flags are defined in `clamav.h`.	2022-10-13 08:57:44 -07:00
Andy Ragusa	b3a3b358b0	Speed up freeing of signatures Speed up freeing of signatures by tracking all malloced blocks instead of having to find duplicates in our data structures on signature unload.	2022-10-07 08:30:57 -07:00
Scott Hutton	21d1f7defc	Various Rust-related code cleanup * Broke out the variants of error/result handling in `frs_error.rs`. Made syntax slightly cleaner for `frs_call!`, explicitly moving the output variables out of the function call so as not to make the parameter order confusing. * Wrapped the FuzzyHash map into a container rather than exposing the HashMap directly. Simplifies casting, and allows it to feel more like a class with methods. * Fixed various clippy complaints regarding unsafe, etc. * Rename `frs_error.rs` to `ffi_utils.rs` and migrated ffi-specific features like the `validate_str_param!()` macro to this new module.	2022-03-02 13:12:59 -07:00
Micah Snyder	fd587c741c	Image fuzzy hash: new logical sub-signature feature Add a new logical signature subsignature type for matching on images with image fuzzy hashes. Image fuzzy hash subsigantures follow this format: fuzzy_img#<hash>#<dist> In this initial implementation, the hamming distance (dist) is ignored and only exact fuzzy hash matches will alert. Fuzzy hash matching is only performed for supported image types. Also: removed some excessive debug log messages on start-up. Fixed an issue where the signature name (virname) is being allocated and stored for every subsignature or even ever sub-pattern in an AC-pattern (i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard. This fix is only for LDB subsigs though. NDB signatures are still allocaing one virname per sub-pattern. This fix was required because I needed a place to store the virname with fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it wouldn't work because it would cross the C-Rust FFI boundary and giving pointers to Rust allocated stuff is dicey. Not to mention native Rust strings are different thatn C strings. Anyways, the correct thing to do was to store the virname with the actual logical signature. TODO: Keep track of NDB signatures in the same way and store the virname for NDB sigs there instead of in AC-patterns so that we can get rid of the virname field in the AC-pattern struct.	2022-03-02 13:12:59 -07:00
micasnyd	140c88aa4e	Bump copyright for 2022 Includes minor format corrections.	2022-01-09 14:23:25 -07:00
Micah Snyder	db013a2bfd	libclamav: Fix scan recursion tracking Scan recursion is the process of identifying files embedded in other files and then scanning them, recursively. Internally this process is more complex than it may sound because a file may have multiple layers of types before finding a new "file". At present we treat the recursion count in the scanning context as an index into both our fmap list AND our container list. These two lists are conceptually a part of the same thing and should be unified. But what's concerning is that the "recursion level" isn't actually incremented or decremented at the same time that we add a layer to the fmap or container lists but instead is more touchy-feely, increasing when we find a new "file". To account for this shadiness, the size of the fmap and container lists has always been a little longer than our "max scan recursion" limit so we don't accidentally overflow the fmap or container arrays (!). I've implemented a single recursion-stack as an array, similar to before, which includes a pointer to each fmap at each layer, along with the size and type. Push and pop functions add and remove layers whenever a new fmap is added. A boolean argument when pushing indicates if the new layer represents a new buffer or new file (descriptor). A new buffer will reset the "nested fmap level" (described below). This commit also provides a solution for an issue where we detect embedded files more than once during scan recursion. For illustration, imagine a tarball named foo.tar.gz with this structure: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| baz.exe \| PE \| 0 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| │ └── hello.txt \| ASCII \| 2 \| 0 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| \| └── world.txt \| ASCII \| 2 \| 0 \| (A) If we scan for embedded files at any layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| ├── foo.tar \| TAR \| 1 \| 0 \| \| │ ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hola.txt \| ASCII \| 3 \| 0 \| \| │ ├── baz.exe \| PE \| 2 \| 1 \| \| │ │ ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ │ │ └── hello.txt \| ASCII \| 4 \| 0 \| \| │ │ └── sfx.7z \| 7Z \| 3 \| 1 \| \| │ │ └── world.txt \| ASCII \| 4 \| 0 \| \| │ ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hello.txt \| ASCII \| 3 \| 0 \| \| │ └── sfx.7z \| 7Z \| 2 \| 1 \| \| │ └── world.txt \| ASCII \| 3 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| (A) is bad because it scans content more than once. Note that for the GZ layer, it may detect the ZIP and 7Z if the signature hits on the compressed data, which it might, though extracting the ZIP and 7Z will likely fail. The reason the above doesn't happen now is that we restrict embedded type scans for a bunch of archive formats to include GZ and TAR. (B) If we scan for embedded files at the foo.tar layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| ├── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ └── hello.txt \| ASCII \| 3 \| 0 \| \| └── sfx.7z \| 7Z \| 2 \| 1 \| \| └── world.txt \| ASCII \| 3 \| 0 \| (B) is almost right. But we can achieve it easily enough only scanning for embedded content in the current fmap when the "nested fmap level" is 0. The upside is that it should safely detect all embedded content, even if it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe. The biggest risk I can think of affects ZIPs. SFXZIP detection is identical to ZIP detection, which is why we don't allow SFXZIP to be detected if insize of a ZIP. If we only allow embedded type scanning at fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP if the bar.exe was not compressed in foo.zip and if non-compressed files extracted from ZIPs aren't extracted as new buffers: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.zip \| ZIP \| 0 \| 0 \| \| └── bar.exe \| PE \| 1 \| 1 \| \| └── sfx.zip \| ZIP \| 2 \| 2 \| Provided that we ensure all files extracted from zips are scanned in new buffers, option (B) should be safe. (C) If we scan for embedded files at the baz.exe layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ └── hello.txt \| ASCII \| 4 \| 0 \| \| └── sfx.7z \| 7Z \| 3 \| 1 \| \| └── world.txt \| ASCII \| 4 \| 0 \| (C) is right. But it's harder to achieve. For this example we can get it by restricting 7ZSFX and ZIPSFX detection only when scanning an executable. But that may mean losing detection of archives embedded elsewhere. And we'd have to identify allowable container types for each possible embedded type, which would be very difficult. So this commit aims to solve the issue the (B)-way. Note that in all situations, we still have to scan with file typing enabled to determine if we need to reassign the current file type, such as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2- compressed. Detection of DMG and a handful of other types rely on finding data partway through or near the ned of a file before reassigning the entire file as the new type. Other fixes and considerations in this commit: - The utf16 HTML parser has weak error handling, particularly with respect to creating a nested fmap for scanning the ascii decoded file. This commit cleans up the error handling and wraps the nested scan with the recursion-stack push()/pop() for correct recursion tracking. Before this commit, each container layer had a flag to indicate if the container layer is valid. We need something similar so that the cli_recursion_stack_get_() functions ignore normalized layers. Details... Imagine an LDB signature for HTML content that specifies a ZIP container. If the signature actually alerts on the normalized HTML and you don't ignore normalized layers for the container check, it will appear as though the alert is in an HTML container rather than a ZIP container. This commit accomplishes this with a boolean you set in the scan context before scanning a new layer. Then when the new fmap is created, it will use that flag to set similar flag for the layer. The context flag is reset those that anything after this doesn't have that flag. The flag allows the new recursion_stack_get() function to ignore normalized layers when iterating the stack to return a layer at a requested index, negative or positive. Scanning normalized extracted/normalized javascript and VBA should also use the 'layer is normalized' flag. - This commit also fixes Heuristic.Broken.Executable alert for ELF files to make sure that: A) these only alert if cli_append_virus() returns CL_VIRUS (aka it respects the FP check). B) all broken-executable alerts for ELF only happen if the SCAN_HEURISTIC_BROKEN option is enabled. - This commit also cleans up the error handling in cli_magic_scan_dir(). This was needed so we could correctly apply the layer-is-normalized-flag to all VBA macros extracted to a directory when scanning the directory. - Also fix an issue where exceeding scan maximums wouldn't cause embedded file detection scans to abort. Granted we don't actually want to abort if max filesize or max recursion depth are exceeded... only if max scansize, max files, and max scantime are exceeded. Add 'abort_scan' flag to scan context, to protect against depending on correct error propagation for fatal conditions. Instead, setting this flag in the scan context should guarantee that a fatal condition deep in scan recursion isn't lost which result in more stuff being scanned instead of aborting. This shouldn't be necessary, but some status codes like CL_ETIMEOUT never used to be fatal and it's easier to do this than to verify every parser only returns CL_ETIMEOUT and other "fatal status codes" in fatal conditions. - Remove duplicate is_tar() prototype from filestypes.c and include is_tar.h instead. - Presently we create the fmap hash when creating the fmap. This wastes a bit of CPU if the hash is never needed. Now that we're creating fmap's for all embedded files discovered with file type recognition scans, this is a much more frequent occurence and really slows things down. This commit fixes the issue by only creating fmap hashes as needed. This should not only resolve the perfomance impact of creating fmap's for all embedded files, but also should improve performance in general. - Add allmatch check to the zip parser after the central-header meta match. That way we don't multiple alerts with the same match except in allmatch mode. Clean up error handling in the zip parser a tiny bit. - Fixes to ensure that the scan limits such as scansize, filesize, recursion depth, # of embedded files, and scantime are always reported if AlertExceedsMax (--alert-exceeds-max) is enabled. - Fixed an issue where non-fatal alerts for exceeding scan maximums may mask signature matches later on. I changed it so these alerts use the "possibly unwanted" alert-type and thus only alert if no other alerts were found or if all-match or heuristic-precedence are enabled. - Added the "Heuristics.Limits.Exceeded." events to the JSON metadata when the --gen-json feature is enabled. These will show up once under "ParseErrors" the first time a limit is exceeded. In the present implementation, only one limits-exceeded events will be added, so as to prevent a malicious or malformed sample from filling the JSON buffer with millions of events and using a tonne of RAM.	2021-10-25 16:02:29 -07:00
Micah Snyder	81402e1abb	Inline doxygen documentation fixup Fixup input output params to be anotated with [in,out], not [in/out]. Note: skipped some other incorrectly annodated [out] params that are already staged to be fixed in a different PR.	2021-07-17 10:39:27 -07:00
Micah Snyder	090c8990e3	libclamav, clamscan: load/unload callbacks & progress meters Add progress callbacks to libclamav for: - database load - engine compile - engine free Add a progress bar to clamscan for load & compile. These are disabled if you run with --debug or stdout is not a TTY or you are using one of --quiet, --infected, or --no-summary. Added code so you can test the engine-free callback by building with ENABLE_ENGINE_FREE_PROGRESSBAR defined. The compile & free progress callbacks pre-calculate the number of tasks to complete to estimate the progress. Some tasks may take longer than others so the progress speed my appear to vary a little. The callbacks return type is a cl_error_t but doesn't currently do anything. It is reserved for future use. Minor formatting change in matcher-ac.c to counteract weird clang-format behavior, and to make it easier to read. Added progress callbacks and clamscan progress bars to the news.	2021-07-16 11:47:23 -07:00
Micah Snyder (micasnyd)	b9ca6ea103	Update copyright dates for 2021 Also fixes up clang-format.	2021-03-19 15:12:26 -07:00
Micah Snyder	4cce1fcd20	GIF, PNG bugfixes; Add AlertBrokenMedia option Added a new scan option to alert on broken media (graphics) file formats. This feature mitigates the risk of malformed media files intended to exploit vulnerabilities in other software. At present media validation exists for JPEG, TIFF, PNG, and GIF files. To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or use the `--alert-broken-media` option when using `clamscan`. These options are disabled by default for now. Application developers may enable this scan option by enabling `CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit field. Fixed PNG parser logic bugs that caused an excess of parsing errors and fixed a stack exhaustion issue affecting some systems when scanning PNG files. PNG file type detection was disabled via signature database update for 0.103.0 to mitigate effects from these bugs. Fixed an issue where PNG and GIF files no longer work with Target:5 (graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as CL_TYPE_GRAPHICS. Target types now support up to 10 possible file types to make way for additional graphics types in future releases. Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse" errors when file format validation fails. Instead, the scan will alert with the "Heuristics.Broken.Media" signature prefix and a descriptive suffix to indicate the issue, provided that the "alert broken media" feature is enabled. GIF format validation will no longer fail if the GIF image is missing the trailer byte, as this appears to be a relatively common issue in otherwise functional GIF files. Added a TIFF dynamic configuration (DCONF) option, which was missing. This will allow us to disable TIFF format validation via signature database update in the event that it proves to be problematic. This feature already exists for many other file types. Added CL_TYPE_JPEG and CL_TYPE_TIFF types.	2021-01-28 12:54:47 -08:00
Mickey Sola	9ea3b93018	Recurse all fpmaps when doing fpchecks Changes cli_checkfp_virus to a recursive function which checks all parent fmaps in the context for false positives Simplifies params needed for cli_checkfp_virus to use the current digest and fmap length that resides within the fmap struct itself	2020-08-03 12:11:56 -07:00
Micah Snyder	9b9999d778	Rename core scanning functions Many of the core scanning functions' names no longer represent their specific purpose or arguments. This commit aims to make the names more intuitive. Names are now prefixed with "magic" if they involve file-typing and file-type parsing. In addition, each function now includes the type of input being scanned whether its "desc", "fmap", or "buff". Some of the APIs also now specify "type" to indicate that a type other than "ANY" may be passed in to select the type rather than use file type magic for type recognition. \| current name \| new name \| \| ------------------------- \| --------------------------------- \| \| magic_scandesc() \| cli_magic_scan() \| \| cli_magic_scandesc_type() \| <delete> \| \| cli_magic_scandesc() \| cli_magic_scan_desc() \| \| cli_base_scandesc() \| cli_magic_scan_desc_type() \| \| cli_partition_scandesc() \| <delete> \| \| cli_map_scandesc() \| magic_scan_nested_fmap_type() \| \| cli_map_scan() \| cli_magic_scan_nested_fmap_type() \| \| cli_mem_scandesc() \| cli_magic_scan_buff() \| \| cli_scanbuff() \| cli_scan_buff() \| \| cli_scandesc() \| cli_scan_desc() \| \| cli_fmap_scandesc() \| cli_scan_fmap() \| \| cli_scanfile() \| cli_magic_scan_file() \| \| cli_scandir() \| cli_magic_scan_dir() \| \| cli_filetype2() \| cli_determine_fmap_type() \| \| cli_filetype() \| cli_compare_ftm_file() \| \| cli_partitiontype() \| cli_compare_ftm_partition() \| \| cli_scanraw() \| scanraw() \|	2020-06-03 11:00:40 -04:00
Micah Snyder	005cbf5a37	Record names of extracted files A way is needed to record scanned file names for two purposes: 1. File names (and extensions) must be stored in the json metadata properties recorded when using the --gen-json clamscan option. Future work may use this to compare file extensions with detected file types. 2. File names are useful when interpretting tmp directory output when using the --leave-temps option. This commit enables file name retention for later use by storing file names in the fmap header structure, if a file name exists. To store the names in fmaps, an optional name argument has been added to any internal scan API's that create fmaps and every call to these APIs has been modified to pass a file name or NULL if a file name is not required. The zip and gpt parsers required some modification to record file names. The NSIS and XAR parsers fail to collect file names at all and will require future work to support file name extraction. Also: - Added recursive extraction to the tmp directory when the --leave-temps option is enabled. When not enabled, the tmp directory structure remains flat so as to prevent the likelihood of exceeding MAX_PATH. The current tmp directory is stored in the scan context. - Made the cli_scanfile() internal API non-static and added it to scanners.h so it would be accessible outside of scanners.c in order to remove code duplication within libmspack.c. - Added function comments to scanners.h and matcher.h - Converted a TDB-type macros and LSIG-type macros to enums for improved type safey. - Converted more return status variables from `int` to `cl_error_t` for improved type safety, and corrected ooxml file typing functions so they use `cli_file_t` exclusively rather than mixing types with `cl_error_t`. - Restructured the magic_scandesc() function to use goto's for error handling and removed the early_ret_from_magicscan() macro and magic_scandesc_cleanup() function. This makes the code easier to read and made it easier to add the recursive tmp directory cleanup to magic_scandesc(). - Corrected zip, egg, rar filename extraction issues. - Removed use of extra sub-directory layer for zip, egg, and rar file extraction. For Zip, this also involved changing the extracted filenames to be randomly generated rather than using the "zip.###" file name scheme.	2020-06-03 10:39:18 -04:00
Micah Snyder	206dbaefe8	Update copyright dates for 2020	2020-01-03 15:44:07 -05:00
Micah Snyder	5f4f69102d	Correcting types from int to cl_error_t where appropriate. Eliminating unused variables and referencing unused parameters to remove warnings.	2019-10-02 16:08:25 -04:00
Andrew	7ba310e605	PE parsing code improvements, db loading bug fixes Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these. I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later PE parsing code improvements: - PEs without all 16 data directories are parsed more appropriately now - Added lots more debug statements Also: - Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS When doing performance testing with the latest CVD, MAX_BC and MAX_TRACKED_PCRE need to be raised to track all the events. Allow these to be specified via CFLAGS by not redefining them if they are already defined - Fix an issue preventing wildcard sizes in .MDB/.MSB rules I'm not sure what the original intent of the check I removed was, but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT these wildcard sizes should be handled appropriately by the MD5 section hash computation code, so I don't think a check on that is needed. - Fix several issues related to db loading - .imp files will now get loaded if they exist in a directory passed via clamscan's '-d' flag - .pwdb files will now get loaded if they exist in a directory passed via clamscan's '-d' flag even when compiling without yara support - Changes to .imp, .ign, and .ign2 files will now be reflected in calls to cl_statinidir and cl_statchkdir (and also .pwdb files, even when compiling without yara support) - The contents of .sfp files won't be included in some of the signature counts, and the contents of .cud files will be - Any local.gdb files will no longer be loaded twice - For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed	2019-10-02 16:08:20 -04:00
Micah Snyder	52cddcbcfd	Updating and cleaning up copyright notices.	2019-10-02 16:08:18 -04:00
Micah Snyder	b3e82e5e61	Replacing libclamav/cltypes.h with clamav-types.h.in, which generates a header clamav-types.h that we install alongside clamav.h.	2019-10-02 16:08:17 -04:00
Micah Snyder	72fd33c8b2	clang-format'd using new .clang-format rules.	2019-10-02 16:08:16 -04:00
Micah Snyder	38fe8b69a0	Added .clang-format style rules, clam-format script to automate formatting of ClamAV code, and preparing select files so that clang-format does not alter carefully formatted sections.	2019-10-02 16:08:16 -04:00
Mickey Sola	2b6c456a1b	bcomp - updates and fixes following code review	2018-12-02 23:07:03 -05:00
Mickey Sola	18ff502920	refactoring byte compare functionality as a subsig; adding loader and matchers for bytecompare subsig	2018-12-02 23:07:03 -05:00
Micah Snyder	d0cba11ea7	adding back changes to eliminate warnings from mspack, matcher, others, and readdb.	2017-09-21 13:10:01 -04:00
Micah Snyder	169af0fc67	Revert "eliminating warnings. mostly correcting variable types. also correcting struct initialization in a couple instances (var = {0} does not zero the memory on all platforms). Also some minor formatting corrections in areas I was already working. eliminated some unused variables." This reverts commit `84a7f40288`.	2017-09-20 12:37:07 -04:00
Micah Snyder	84a7f40288	eliminating warnings. mostly correcting variable types. also correcting struct initialization in a couple instances (var = {0} does not zero the memory on all platforms). Also some minor formatting corrections in areas I was already working. eliminated some unused variables.	2017-08-15 14:00:07 -04:00
Steven Morgan	cbf5017a7d	bb11805 fix multiple results. Refactor false positive and heuristic precedence logic.	2017-04-18 12:07:06 -04:00
Kevin Lin	87b2a1a9e3	add 'Intermediates' field to target description block (allows specification of any number of intermediate containers)	2017-02-01 17:33:02 -05:00
Kevin Lin	984f90ca4f	bb#11587 - track linked bcs on matchers for target 7 normalization	2016-06-28 15:19:50 -04:00
Mickey Sola	46a35abe56	mass update of copyright headers	2015-09-17 13:41:26 -04:00
Kevin Lin	e7b3198df2	bb#9858 - added target 14 for binary (unidentified) files	2015-07-23 16:37:15 -04:00
Steven Morgan	7665e02d5b	Add support for YARA private rules and referencing other rules in a YARA condition.	2015-06-19 16:33:59 -04:00
Steven Morgan	b7999b89c9	YARA: capture offsets in matcher and use for processing YARA condition 'at' clauses.	2015-03-30 17:12:01 -04:00
Steven Morgan	f51f42e95c	Capture YARA compiled condition string and anchor in struct cli_ac_lsig.	2015-03-06 17:10:47 -05:00
Steven Morgan	9de400559d	refactor and simplify cli_lsig_eval, add new function cli_exp_eval to loop thru the lsig table and call either lsig_eval or yara_eval.	2015-03-03 19:25:13 -05:00
Kevin Lin	b5b3fecd6c	unioned lsig logic and future yara conditional	2015-02-11 10:36:43 -08:00

1 2 3

138 commits