Valgrind suppression for false positive in Rust png-decoder crate

Fix embedded RAR archive extraction issue
If the current layer has a file descriptor, ClamAV is passing the path for that file to the UnRAR module, even if the RAR we want to scan is just some small embedded bit (e.g. detected by RARSFX signature). We need to drop the RAR portion to a new file for the UnRAR module because it does not accept file buffers to be scanned, only file paths. CLAM-2900 Note this commit also changes `scanners.c` to use `access()` on Windows instead of `_access_s()`. ClamAV defines `access()` to map to a custom `access_w32()` function on Windows. We already use it everywhere else.
2025-10-19 10:23:17 +00:00 · 2025-10-14 19:18:22 -04:00 · 2025-10-14 18:23:56 -04:00 · 2025-10-14 14:05:13 -04:00 · 2025-10-14 14:05:12 -04:00 · 2025-10-14 14:05:12 -04:00
3 changed files with 11 additions and 12 deletions
--- a/libclamav/scanners.c
+++ b/libclamav/scanners.c
@ -418,11 +418,7 @@ static cl_error_t cli_scanrar_file(const char *filepath, int desc, cli_ctx *ctx)
                     * File should be extracted...
                     * ... make sure we have read permissions to the file.
                     */
-#ifdef _WIN32
-                    if (0 != _access_s(extract_fullpath, R_OK)) {
-#else
                    if (0 != access(extract_fullpath, R_OK)) {
-#endif
                        cli_dbgmsg("RAR: Don't have read permissions, attempting to change file permissions to make it readable..\n");
 #ifdef _WIN32
                        if (0 != _chmod(extract_fullpath, _S_IREAD)) {
@ -533,17 +529,11 @@ static cl_error_t cli_scanrar(cli_ctx *ctx)
    char *tmpname = NULL;
    int tmpfd     = -1;

-#ifdef _WIN32
-    if ((SCAN_UNPRIVILEGED) ||
-        (NULL == ctx->fmap->path) ||
-        (0 != _access_s(ctx->fmap->path, R_OK)) ||
-        (ctx->fmap->nested_offset > 0) || (ctx->fmap->len < ctx->fmap->real_len)) {
-#else
    if ((SCAN_UNPRIVILEGED) ||
        (NULL == ctx->fmap->path) ||
        (0 != access(ctx->fmap->path, R_OK)) ||
        (ctx->fmap->nested_offset > 0) || (ctx->fmap->len < ctx->fmap->real_len)) {
-#endif
+
        /* If map is not file-backed have to dump to file for scanrar. */
        status = fmap_dump_to_file(ctx->fmap, ctx->fmap->path, ctx->this_layer_tmpdir, &tmpname, &tmpfd, 0, SIZE_MAX);
        if (status != CL_SUCCESS) {
--- a/libclamav/xlm_extract.c
+++ b/libclamav/xlm_extract.c
@ -4950,7 +4950,7 @@ cl_error_t cli_extract_xlm_macros_and_images(const char *dir, cli_ctx *ctx, char
    }

    if (CL_VIRUS == cli_scan_desc(out_fd, ctx, CL_TYPE_SCRIPT, false, NULL, AC_SCAN_VIR,
-                                  NULL, "xlm-macro", tempfile, LAYER_ATTRIBUTES_NORMALIZED)) {
+                                  NULL, "xlm-macro", tempfile, LAYER_ATTRIBUTES_NONE)) {
        status = CL_VIRUS;
        goto done;
    }
--- a/unit_tests/valgrind.supp
+++ b/unit_tests/valgrind.supp
@ -376,3 +376,12 @@
   fun:start_thread
   fun:clone
 }
+{
+   <image-fuzzy-hash-png-decoder>
+   Memcheck:Cond
+   ..
+   fun:read_header_info<std::io::cursor::Cursor<&[u8]>>
+   ...
+   fun:fuzzy_hash_calculate_image
+   ...
+}
Author	SHA1	Message	Date
Val S.	ac34e12bac	Valgrind suppression for false positive in Rust png-decoder crate	2025-10-14 19:18:22 -04:00
Val S.	50326da519	Fix embedded RAR archive extraction issue If the current layer has a file descriptor, ClamAV is passing the path for that file to the UnRAR module, even if the RAR we want to scan is just some small embedded bit (e.g. detected by RARSFX signature). We need to drop the RAR portion to a new file for the UnRAR module because it does not accept file buffers to be scanned, only file paths. CLAM-2900 Note this commit also changes `scanners.c` to use `access()` on Windows instead of `_access_s()`. ClamAV defines `access()` to map to a custom `access_w32()` function on Windows. We already use it everywhere else.	2025-10-14 18:23:56 -04:00
Val S.	b1c1f1840c	Update Rust dependencies; Fix image fuzzy hash values Large range testing identified some files where image fuzzy hashing produces different hashes with ClamAV 1.5 vs 1.4. With my investigation, I found the issue is with changes in Rust library dependencies, though it actually wasn't any change with the 'image' or 'jpeg-decoder' crates. After running a simple `cargo update` to update all non-pinned versions. I confirmed that this does not affect the minimum supported Rust version (MSRV). CLAM-2899	2025-10-14 14:05:13 -04:00
Val S.	1a2515eea9	Fix compiler warning Mismatched declaration and definition.	2025-10-14 14:05:12 -04:00
Val S.	0462dae12a	Increase limit for finding PE files embedded in other PE files I am seeing missed detections since we changed to prohibit embedded file type identification when inside an embedded file. In particular, I'm seeing this issue with PE files that contain multiple other MSEXE as well as a variety of false positives for PE file headers. For example, imagine a PE with four concatenated DLL's, like so: ``` [ EXE file \| DLL #1 \| DLL #2 \| DLL #3 \| DLL #4 ] ``` And note that false positives for embedded MSEXE files are fairly common. So there may be a few mixed in there. Before limiting embedded file identification we might interpret the file structure something like this: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1: { embedded MSEXE #1: false positive, embedded MSEXE #2: DLL #2: { embedded MSEXE #1: DLL #3: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: DLL #4 } embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #4 } embedded MSEXE #3: DLL #3, embedded MSEXE #4: false positive, embedded MSEXE #5: false positive, embedded MSEXE #6: false positive, embedded MSEXE #7: false positive, embedded MSEXE #8: DLL #4 } } ``` This is obviously terrible, which is why why we don't allow detecting embedded files within other embedded files. So after we enforce that limit, the same file may be interpreted like this instead: ``` MSEXE: { embedded MSEXE #1: false positive, embedded MSEXE #2: false positive, embedded MSEXE #3: false positive, embedded MSEXE #4: DLL #1, embedded MSEXE #5: false positive, embedded MSEXE #6: DLL #2, embedded MSEXE #7: DLL #3, embedded MSEXE #8: false positive, embedded MSEXE #9: false positive, embedded MSEXE #10: false positive, embedded MSEXE #11: false positive, embedded MSEXE #12: DLL #4 } ``` That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit for embedded type matches (limit 10, but 12 found). That means we won't see or extract the 4th DLL anymore. My solution is to lift the limit when adding an matched MSEXE type. We already do this for matched ZIPSFX types. While doing this, I've significantly tidied up the limits checks to make it more readble, and removed duplicate checks from within the `ac_addtype()` function. CLAM-2897	2025-10-14 14:05:12 -04:00
Val S.	3bd6c575c2	Loosen restrictions on embedded file identification In regression testing against a large sample set, I found that strictly disallowing any embedded file identification if any previous layer was an embedded file resulted in missed detections. Specifically, I found an MSEXE file which has an embedded RAR, which in turn had another MSEXE that itself had an embedded 7ZIP containing... malware. sha256: c3cf573fd3d1568348506bf6dd6152d99368a7dc19037d135d5903bc1958ea85 To make it so ClamAV can extract all that, we must loosen the restriction and allow prior layers to be embedded, just not the current layer. I've also added some logic to prevent attempting to extract an object at the same offset twice. The `fpt->offset`s appear in order, but if you have multiple file type magic signatures match on the same address, like maybe you accidentally load two different .ftm files, then you'd get duplicates and double-extraction. As a bonus, I found I could also skip over offsets within a valid ZIP, ARJ, and CAB since we now have the length of those from the header check and as I know we don't want to extract embedded contents in that way.	2025-10-14 14:05:12 -04:00
Val S.	63997273a8	Fix issue detecting VBA projects Previously for documents containing VBA projects, the VBA was treated as an object within the document and not as a normalized version of the document. I apparently switched it say that the VBA is a normalized version of the document. This kind of makes sense in that presently Javascript extracted from HTML is treated as a normalized version of the HTML. But it probably shouldn't. Normalized layers are treated as the same file as the parent. So now those older signatures that match on VBA projects using "Container:CL_TYPE_MSOLE2" are failing to match. So this commit switches it back. VBA project bits written out to a temp file for scanning will be treated as being contained within the document. CLAM-2896 Extracted XLM macros had the same issue.	2025-10-14 14:04:55 -04:00