clamav/libclamav/unarj.h
Val S. a77a271fb5
Reduce unnecessary scanning of embedded file FPs (#1571)
When embedded file type recognition finds a possible embedded file, it
is being scanned as a new embedded file even if it turns out it was a
false positive and parsing fails. My solution is to pre-parse the file
headers as little possible to determine if it is valid. If possible,
also determine the file size based on the headers. That will make it so
we don't have to scan additional data when the embedded file is not at
the very end.

This commit adds header checks prior to embedded ZIP, ARJ, and CAB
scanning. For these types I was also able to use the header checks to
determine the object size so as to prevent excessive pattern matching.

TODO: Add the same for RAR, EGG, 7Z, NULSFT, AUTOIT, IShield, and PDF.

This commit also removes duplicate matching for embedded MSEXE.
The embedded MSEXE detection and scanning logic was accidentally
creating an extra duplicate layer in between scanning and detection
because of the logic within the `cli_scanembpe()` function.
That function was effectively doing the header check which this commit
adds for ZIP, ARJ, and CAB but minus the size check.
Note: It is unfortunately not possible to get an accurage size from PE
file headers.
The `cli_scanembpe()` function also used to dump to a temp file for no
reason since FMAPs were extended to support windows into other FMAPs.
So this commit removes the intermediate layer as well as dropping a temp
file for each embedded PE file.

Further, this commit adds configuration and DCONF safeguards around all
embedded file type scanning.

Finally, this commit adds a set of tests to validate proper extraction
of embedded ZIP, ARJ, CAB, and MSEXE files.

CLAM-2862

Co-authored-by: TheRaynMan <draynor@sourcefire.com>
2025-09-23 15:57:28 -04:00

58 lines
1.8 KiB
C

/*
* Extract component parts of ARJ archives
*
* Copyright (C) 2013-2025 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
* Copyright (C) 2007-2013 Sourcefire, Inc.
*
* Authors: Nigel Horne
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef __UNARJ_H
#define __UNARJ_H
#include "clamav.h"
#include "others.h"
#include "fmap.h"
typedef struct arj_metadata_tag {
char *filename;
uint32_t comp_size;
uint32_t orig_size;
int encrypted;
int ofd;
uint8_t method;
fmap_t *map;
size_t offset;
} arj_metadata_t;
/**
* @brief Verify ARJ file header and get size of ARJ based on headers.
*
* Does not extract or scan the file.
*
* @param[in,out] ctx Scan context
* @param offset Offset of the file header
* @param[out] size Will be set to the size of the file header + file data.
* @return cl_error_t CL_SUCCESS on success, or an error code on failure.
*/
cl_error_t cli_unarj_header_check(cli_ctx *ctx, uint32_t offset, size_t *size);
cl_error_t cli_unarj_open(fmap_t *map, const char *dirname, arj_metadata_t *metadata);
cl_error_t cli_unarj_prepare_file(arj_metadata_t *metadata);
cl_error_t cli_unarj_extract_file(const char *dirname, arj_metadata_t *metadata);
#endif