2003-07-29 15:48:06 +00:00
/*
2021-03-19 15:12:26 -07:00
* Copyright ( C ) 2013 - 2021 Cisco Systems , Inc . and / or its affiliates . All rights reserved .
2013-10-08 17:17:44 -04:00
* Copyright ( C ) 2007 - 2013 Sourcefire , Inc .
2008-02-11 18:34:28 +00:00
*
2008-04-02 15:24:51 +00:00
* Authors : Tomasz Kojm
2003-07-29 15:48:06 +00:00
*
* This program is free software ; you can redistribute it and / or modify
2007-03-31 20:31:04 +00:00
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation .
2003-07-29 15:48:06 +00:00
*
* This program is distributed in the hope that it will be useful ,
* but WITHOUT ANY WARRANTY ; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
* GNU General Public License for more details .
*
* You should have received a copy of the GNU General Public License
* along with this program ; if not , write to the Free Software
2006-04-09 19:59:28 +00:00
* Foundation , Inc . , 51 Franklin Street , Fifth Floor , Boston ,
* MA 02110 - 1301 , USA .
2003-07-29 15:48:06 +00:00
*/
2004-02-06 13:46:08 +00:00
# if HAVE_CONFIG_H
# include "clamav-config.h"
# endif
2009-10-12 23:32:27 +02:00
# ifndef _WIN32
2009-09-10 03:19:43 +02:00
# include <sys/time.h>
2009-10-12 23:32:27 +02:00
# endif
2003-07-29 15:48:06 +00:00
# include <stdio.h>
# include <string.h>
# include <stdlib.h>
2018-07-30 20:19:28 -04:00
# include <libgen.h>
2008-07-03 13:03:26 +00:00
# include <errno.h>
2003-07-29 15:48:06 +00:00
# include <sys/types.h>
# include <sys/stat.h>
2017-08-08 17:38:17 -04:00
# ifdef HAVE_UNISTD_H
2005-03-22 21:26:27 +00:00
# include <unistd.h>
2006-10-09 15:23:50 +00:00
# endif
2017-08-08 17:38:17 -04:00
# ifdef HAVE_SYS_PARAM_H
2004-11-13 14:47:20 +00:00
# include <sys/param.h>
2006-10-09 15:23:50 +00:00
# endif
2003-07-29 15:48:06 +00:00
# include <fcntl.h>
# include <dirent.h>
2011-02-14 19:19:20 +02:00
# ifdef HAVE_SYS_TIMES_H
# include <sys/times.h>
# endif
2004-07-02 23:00:58 +00:00
2017-08-08 17:38:17 -04:00
# define DCONF_ARCH ctx->dconf->archive
# define DCONF_DOC ctx->dconf->doc
# define DCONF_MAIL ctx->dconf->mail
2007-01-09 20:06:51 +00:00
# define DCONF_OTHER ctx->dconf->other
Add CMake build tooling
This patch adds experimental-quality CMake build tooling.
The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.
Removed use of libltdl when using CMake.
Flex & Bison are now required to build.
If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything. TODO!
I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.
CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.
Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.
For example, ...
```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```
... becomes:
```c
// libclamav
include "clamav.h"
// clamd
include "clamd_others.h"
```
Fixes header name conflicts by renaming a few of the files.
Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.
libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.
Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.
Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.
The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.
Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.
Generate common.rc for win32.
Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.
Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.
clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.
Updates to clamdscan man page, add missing options.
Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).
Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
2020-08-13 00:25:34 -07:00
# include <zlib.h>
2003-07-29 15:48:06 +00:00
# include "clamav.h"
# include "others.h"
2007-01-09 20:06:51 +00:00
# include "dconf.h"
2004-08-02 17:09:06 +00:00
# include "scanners.h"
2004-07-19 17:54:40 +00:00
# include "matcher-ac.h"
# include "matcher-bm.h"
2003-07-29 15:48:06 +00:00
# include "matcher.h"
2004-01-23 11:17:16 +00:00
# include "ole2_extract.h"
# include "vba_extract.h"
2020-04-29 14:19:41 -07:00
# include "xlm_extract.h"
2004-05-02 00:51:01 +00:00
# include "msexpand.h"
2006-05-18 11:29:24 +00:00
# include "mbox.h"
2016-03-24 12:26:04 -04:00
# include "libmspack.h"
2004-07-05 23:50:55 +00:00
# include "pe.h"
2006-05-18 11:29:24 +00:00
# include "elf.h"
2004-07-02 23:00:58 +00:00
# include "filetypes.h"
# include "htmlnorm.h"
2004-09-07 21:19:02 +00:00
# include "untar.h"
2004-09-30 00:26:52 +00:00
# include "special.h"
2005-01-26 17:45:25 +00:00
# include "binhex.h"
2006-05-18 11:29:24 +00:00
/* #include "uuencode.h" */
2006-04-10 09:59:51 +00:00
# include "tnef.h"
2005-12-15 02:02:29 +00:00
# include "sis.h"
2006-03-08 15:37:52 +00:00
# include "pdf.h"
2006-10-25 15:40:47 +00:00
# include "str.h"
2020-04-29 14:19:41 -07:00
# include "entconv.h"
2006-12-04 00:10:46 +00:00
# include "rtf.h"
2007-07-11 10:14:08 +00:00
# include "unarj.h"
2009-10-12 23:32:27 +02:00
# include "nsis/nulsft.h"
2007-10-30 18:53:25 +00:00
# include "autoit.h"
2008-02-04 21:38:34 +00:00
# include "textnorm.h"
2006-05-22 16:44:11 +00:00
# include "unzip.h"
2008-04-16 18:47:42 +00:00
# include "dlp.h"
2009-02-14 15:35:26 +00:00
# include "default.h"
2009-07-06 16:35:55 +02:00
# include "cpio.h"
2009-07-08 15:05:22 +02:00
# include "macho.h"
2009-07-13 02:37:16 +02:00
# include "ishield.h"
2011-06-18 01:44:38 +02:00
# include "7z_iface.h"
2009-08-30 19:14:49 +02:00
# include "fmap.h"
2009-08-11 12:23:14 +02:00
# include "cache.h"
2011-03-11 20:30:45 +01:00
# include "events.h"
2011-04-06 15:53:28 +02:00
# include "swf.h"
# include "jpeg.h"
2017-03-04 00:08:03 +01:00
# include "gif.h"
2011-04-07 18:19:19 +02:00
# include "png.h"
2011-11-14 21:23:15 +01:00
# include "iso9660.h"
2013-08-27 13:44:11 -04:00
# include "dmg.h"
# include "xar.h"
2013-09-26 11:12:54 -04:00
# include "hfsplus.h"
2013-10-08 17:17:44 -04:00
# include "xz_iface.h"
2014-02-07 12:22:44 -05:00
# include "mbr.h"
2014-01-24 14:24:56 -05:00
# include "gpt.h"
2014-02-06 19:01:26 -05:00
# include "apm.h"
2014-07-09 13:16:31 -04:00
# include "ooxml.h"
2014-07-31 19:11:22 -04:00
# include "xdp.h"
2014-04-24 14:22:00 -04:00
# include "json_api.h"
2015-03-10 15:33:32 -04:00
# include "msxml.h"
2015-07-30 17:37:05 -04:00
# include "tiff.h"
2015-12-08 17:28:49 -05:00
# include "hwp.h"
2017-08-10 15:40:52 -04:00
# include "msdoc.h"
PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-01-08 00:09:08 -05:00
# include "execs.h"
2018-10-08 12:59:42 -04:00
# include "egg.h"
2003-07-29 15:48:06 +00:00
Add CMake build tooling
This patch adds experimental-quality CMake build tooling.
The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.
Removed use of libltdl when using CMake.
Flex & Bison are now required to build.
If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything. TODO!
I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.
CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.
Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.
For example, ...
```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```
... becomes:
```c
// libclamav
include "clamav.h"
// clamd
include "clamd_others.h"
```
Fixes header name conflicts by renaming a few of the files.
Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.
libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.
Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.
Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.
The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.
Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.
Generate common.rc for win32.
Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.
Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.
clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.
Updates to clamdscan man page, add missing options.
Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).
Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
2020-08-13 00:25:34 -07:00
// libclamunrar_iface
# include "unrar_iface.h"
2003-07-29 15:48:06 +00:00
# ifdef HAVE_BZLIB_H
# include <bzlib.h>
# endif
2018-07-30 20:19:28 -04:00
# include <fcntl.h>
2014-02-08 00:31:12 -05:00
# include <string.h>
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan_dir ( const char * dirname , cli_ctx * ctx )
2008-02-11 18:34:28 +00:00
{
2017-08-08 17:38:17 -04:00
DIR * dd ;
struct dirent * dent ;
STATBUF statbuf ;
char * fname ;
unsigned int viruses_found = 0 ;
2008-02-11 18:34:28 +00:00
2018-12-03 12:40:13 -05:00
if ( ( dd = opendir ( dirname ) ) ! = NULL ) {
while ( ( dent = readdir ( dd ) ) ) {
if ( dent - > d_ino ) {
if ( strcmp ( dent - > d_name , " . " ) & & strcmp ( dent - > d_name , " .. " ) ) {
2017-08-08 17:38:17 -04:00
/* build the full name */
fname = cli_malloc ( strlen ( dirname ) + strlen ( dent - > d_name ) + 2 ) ;
2018-12-03 12:40:13 -05:00
if ( ! fname ) {
2017-08-08 17:38:17 -04:00
closedir ( dd ) ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_dir: Unable to allocate memory for filename \n " ) ;
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
}
sprintf ( fname , " %s " PATHSEP " %s " , dirname , dent - > d_name ) ;
/* stat the file */
2018-12-03 12:40:13 -05:00
if ( LSTAT ( fname , & statbuf ) ! = - 1 ) {
if ( S_ISDIR ( statbuf . st_mode ) & & ! S_ISLNK ( statbuf . st_mode ) ) {
2020-03-21 14:15:28 -04:00
if ( cli_magic_scan_dir ( fname , ctx ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
free ( fname ) ;
2018-12-03 12:40:13 -05:00
if ( SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
viruses_found + + ;
continue ;
}
2012-10-18 14:12:58 -07:00
closedir ( dd ) ;
return CL_VIRUS ;
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
} else {
if ( S_ISREG ( statbuf . st_mode ) ) {
2020-03-21 14:15:28 -04:00
if ( cli_magic_scan_file ( fname , ctx , dent - > d_name ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
free ( fname ) ;
2018-12-03 12:40:13 -05:00
if ( SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
viruses_found + + ;
continue ;
}
2012-10-18 14:12:58 -07:00
closedir ( dd ) ;
return CL_VIRUS ;
2017-08-08 17:38:17 -04:00
}
}
}
}
free ( fname ) ;
}
}
}
2018-12-03 12:40:13 -05:00
} else {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_dir: Can't open directory %s. \n " , dirname ) ;
2017-08-08 17:38:17 -04:00
return CL_EOPEN ;
2008-02-11 18:34:28 +00:00
}
closedir ( dd ) ;
2018-07-20 22:28:48 -04:00
if ( SCAN_ALLMATCHES & & viruses_found )
2017-08-08 17:38:17 -04:00
return CL_VIRUS ;
2008-02-11 18:34:28 +00:00
return CL_CLEAN ;
}
2018-07-30 20:19:28 -04:00
/**
* @ brief Scan the metadata using cli_matchmeta ( )
2019-01-22 14:05:05 -05:00
*
2018-07-30 20:19:28 -04:00
* @ param metadata unrar metadata structure
* @ param ctx scanning context structure
2019-01-22 14:05:05 -05:00
* @ param files
2018-09-24 15:01:22 -04:00
* @ return cl_error_t Returns CL_CLEAN if nothing found , CL_VIRUS if something found , CL_EUNPACK if encrypted .
2018-07-30 20:19:28 -04:00
*/
2018-12-03 12:40:13 -05:00
static cl_error_t cli_unrar_scanmetadata ( unrar_metadata_t * metadata , cli_ctx * ctx , unsigned int files )
2003-07-29 15:48:06 +00:00
{
2018-07-30 20:19:28 -04:00
cl_error_t status = CL_CLEAN ;
2003-07-29 15:48:06 +00:00
2008-02-06 21:19:10 +00:00
cli_dbgmsg ( " RAR: %s, crc32: 0x%x, encrypted: %u, compressed: %u, normal: %u, method: %u, ratio: %u \n " ,
2018-12-03 12:40:13 -05:00
metadata - > filename , metadata - > crc , metadata - > encrypted , ( unsigned int ) metadata - > pack_size ,
( unsigned int ) metadata - > unpack_size , metadata - > method ,
metadata - > pack_size ? ( unsigned int ) ( metadata - > unpack_size / metadata - > pack_size ) : 0 ) ;
2005-03-18 01:27:45 +00:00
2018-07-30 20:19:28 -04:00
if ( CL_VIRUS = = cli_matchmeta ( ctx , metadata - > filename , metadata - > pack_size , metadata - > unpack_size , metadata - > encrypted , files , metadata - > crc , NULL ) ) {
status = CL_VIRUS ;
} else if ( SCAN_HEURISTIC_ENCRYPTED_ARCHIVE & & metadata - > encrypted ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " RAR: Encrypted files found in archive. \n " ) ;
2018-07-30 20:19:28 -04:00
status = CL_EUNPACK ;
2007-01-13 00:01:39 +00:00
}
2005-03-18 01:27:45 +00:00
2018-07-30 20:19:28 -04:00
return status ;
2007-01-28 20:22:16 +00:00
}
2018-12-03 12:40:13 -05:00
static cl_error_t cli_scanrar ( const char * filepath , int desc , cli_ctx * ctx )
2007-01-13 00:01:39 +00:00
{
2018-12-03 12:40:13 -05:00
cl_error_t status = CL_EPARSE ;
2018-07-30 20:19:28 -04:00
cl_unrar_error_t unrar_ret = UNRAR_ERR ;
2018-12-03 12:40:13 -05:00
unsigned int file_count = 0 ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
2007-01-13 00:01:39 +00:00
2018-07-30 20:19:28 -04:00
uint32_t nEncryptedFilesFound = 0 ;
2018-12-03 12:40:13 -05:00
uint32_t nTooLargeFilesFound = 0 ;
2007-01-13 00:01:39 +00:00
2018-12-03 12:40:13 -05:00
void * hArchive = NULL ;
2007-12-15 20:34:31 +00:00
2018-12-03 12:40:13 -05:00
char * comment = NULL ;
2018-07-30 20:19:28 -04:00
uint32_t comment_size = 0 ;
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
unrar_metadata_t metadata ;
2018-12-03 12:40:13 -05:00
char * filename_base = NULL ;
char * extract_fullpath = NULL ;
char * comment_fullpath = NULL ;
2018-07-30 20:19:28 -04:00
2020-03-19 21:23:54 -04:00
UNUSEDPARAM ( desc ) ;
2018-07-30 20:19:28 -04:00
if ( filepath = = NULL | | ctx = = NULL ) {
cli_dbgmsg ( " RAR: Invalid arguments! \n " ) ;
return CL_EARG ;
2017-08-08 17:38:17 -04:00
}
2018-07-30 20:19:28 -04:00
cli_dbgmsg ( " in scanrar() \n " ) ;
/* Zero out the metadata struct before we read the header */
memset ( & metadata , 0 , sizeof ( unrar_metadata_t ) ) ;
2018-12-03 12:40:13 -05:00
2018-07-30 20:19:28 -04:00
/*
* Open the archive .
*/
if ( UNRAR_OK ! = ( unrar_ret = cli_unrar_open ( filepath , & hArchive , & comment , & comment_size , cli_debug_flag ) ) ) {
if ( unrar_ret = = UNRAR_ENCRYPTED ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " RAR: Encrypted main header \n " ) ;
2018-07-30 20:19:28 -04:00
status = CL_EUNPACK ;
goto done ;
2017-08-08 17:38:17 -04:00
}
2018-07-30 20:19:28 -04:00
if ( unrar_ret = = UNRAR_EMEM ) {
status = CL_EMEM ;
goto done ;
2020-01-23 17:42:33 -08:00
} else if ( unrar_ret = = UNRAR_EOPEN ) {
status = CL_EOPEN ;
goto done ;
2018-07-30 20:19:28 -04:00
} else {
status = CL_EFORMAT ;
goto done ;
2017-08-08 17:38:17 -04:00
}
}
2018-07-30 20:19:28 -04:00
/* If the archive header had a comment, write it to the comment dir. */
if ( ( comment ! = NULL ) & & ( comment_size > 0 ) ) {
2020-07-15 08:39:32 -07:00
if ( ctx - > engine - > keeptmp ) {
int comment_fd = - 1 ;
if ( ! ( comment_fullpath = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " comments " ) ) ) {
status = CL_EMEM ;
goto done ;
}
2018-07-30 20:19:28 -04:00
2020-07-15 08:39:32 -07:00
comment_fd = open ( comment_fullpath , O_WRONLY | O_CREAT | O_TRUNC | O_BINARY , 0600 ) ;
if ( comment_fd < 0 ) {
cli_dbgmsg ( " RAR: ERROR: Failed to open output file \n " ) ;
} else {
cli_dbgmsg ( " RAR: Writing the archive comment to temp file: %s \n " , comment_fullpath ) ;
if ( 0 = = write ( comment_fd , comment , comment_size ) ) {
cli_dbgmsg ( " RAR: ERROR: Failed to write to output file \n " ) ;
2018-07-30 20:19:28 -04:00
}
2020-07-15 08:39:32 -07:00
close ( comment_fd ) ;
2018-07-30 20:19:28 -04:00
}
2020-07-15 08:39:32 -07:00
}
/* Scan the comment */
status = cli_magic_scan_buff ( comment , comment_size , ctx , NULL ) ;
if ( ( status = = CL_VIRUS ) & & SCAN_ALLMATCHES ) {
status = CL_CLEAN ;
viruses_found + + ;
}
if ( ( status = = CL_VIRUS ) | | ( status = = CL_BREAK ) ) {
goto done ;
2017-08-08 17:38:17 -04:00
}
2018-07-30 20:19:28 -04:00
}
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
/*
* Read & scan each file header .
* Extract & scan each file .
2019-01-22 14:05:05 -05:00
*
2018-07-30 20:19:28 -04:00
* Skip files if they will exceed max filesize or max scansize .
* Count the number of encrypted file headers and encrypted files .
* - Alert if there are encrypted files ,
* if the Heuristic for encrypted archives is enabled ,
* and if we have not detected a signature match .
*/
do {
status = CL_CLEAN ;
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
/* Zero out the metadata struct before we read the header */
memset ( & metadata , 0 , sizeof ( unrar_metadata_t ) ) ;
/*
* Get the header information for the next file in the archive .
*/
unrar_ret = cli_unrar_peek_file_header ( hArchive , & metadata ) ;
if ( unrar_ret ! = UNRAR_OK ) {
if ( unrar_ret = = UNRAR_ENCRYPTED ) {
/* Found an encrypted file header, must skip. */
cli_dbgmsg ( " RAR: Encrypted file header, unable to reading file metadata and file contents. Skipping file... \n " ) ;
nEncryptedFilesFound + = 1 ;
if ( UNRAR_OK ! = cli_unrar_skip_file ( hArchive ) ) {
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " RAR: Failed to skip file. RAR archive extraction has failed. \n " ) ;
break ;
}
} else if ( unrar_ret = = UNRAR_BREAK ) {
/* No more files. Break extraction loop. */
cli_dbgmsg ( " RAR: No more files in archive. \n " ) ;
break ;
} else {
/* Memory error or some other error reading the header info. */
cli_dbgmsg ( " RAR: Error (%u) reading file header! \n " , unrar_ret ) ;
break ;
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
} else {
2018-07-30 20:19:28 -04:00
file_count + = 1 ;
/*
* Scan the metadata for the file in question since the content was clean , or we ' re running in all - match .
*/
2018-09-24 15:01:22 -04:00
status = cli_unrar_scanmetadata ( & metadata , ctx , file_count ) ;
2018-07-30 20:19:28 -04:00
if ( ( status = = CL_VIRUS ) & & SCAN_ALLMATCHES ) {
status = CL_CLEAN ;
2017-08-08 17:38:17 -04:00
viruses_found + + ;
}
2018-07-30 20:19:28 -04:00
if ( ( status = = CL_VIRUS ) | | ( status = = CL_BREAK ) ) {
break ;
}
/* Check if we've already exceeded the scan limit */
if ( cli_checklimits ( " RAR " , ctx , 0 , 0 , 0 ) )
break ;
2018-12-03 12:40:13 -05:00
2018-07-30 20:19:28 -04:00
if ( metadata . is_dir ) {
/* Entry is a directory. Skip. */
cli_dbgmsg ( " RAR: Found directory. Skipping to next file. \n " ) ;
if ( UNRAR_OK ! = cli_unrar_skip_file ( hArchive ) ) {
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " RAR: Failed to skip directory. RAR archive extraction has failed. \n " ) ;
break ;
}
} else if ( cli_checklimits ( " RAR " , ctx , metadata . unpack_size , 0 , 0 ) ) {
2019-01-22 14:05:05 -05:00
/* File size exceeds maxfilesize, must skip extraction.
2018-07-30 20:19:28 -04:00
* Although we may be able to scan the metadata */
nTooLargeFilesFound + = 1 ;
cli_dbgmsg ( " RAR: Next file is too large (% " PRIu64 " bytes); it would exceed max scansize. Skipping to next file. \n " , metadata . unpack_size ) ;
if ( UNRAR_OK ! = cli_unrar_skip_file ( hArchive ) ) {
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " RAR: Failed to skip file. RAR archive extraction has failed. \n " ) ;
break ;
}
} else if ( metadata . encrypted ! = 0 ) {
/* Found an encrypted file, must skip. */
cli_dbgmsg ( " RAR: Encrypted file, unable to extract file contents. Skipping file... \n " ) ;
nEncryptedFilesFound + = 1 ;
if ( UNRAR_OK ! = cli_unrar_skip_file ( hArchive ) ) {
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " RAR: Failed to skip file. RAR archive extraction has failed. \n " ) ;
break ;
}
} else {
/*
2020-03-19 21:23:54 -04:00
* Extract the file . . .
*/
if ( NULL ! = metadata . filename ) {
( void ) cli_basename ( metadata . filename , strlen ( metadata . filename ) , & filename_base ) ;
}
if ( ! ( ctx - > engine - > keeptmp ) | |
( NULL = = filename_base ) ) {
extract_fullpath = cli_gentemp ( ctx - > sub_tmpdir ) ;
} else {
extract_fullpath = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , filename_base ) ;
}
2018-07-30 20:19:28 -04:00
if ( NULL = = extract_fullpath ) {
cli_dbgmsg ( " RAR: Memory error allocating filename for extracted file. " ) ;
status = CL_EMEM ;
break ;
}
cli_dbgmsg ( " RAR: Extracting file: %s to %s \n " , metadata . filename , extract_fullpath ) ;
unrar_ret = cli_unrar_extract_file ( hArchive , extract_fullpath , NULL ) ;
if ( unrar_ret ! = UNRAR_OK ) {
2019-01-22 14:05:05 -05:00
/*
2018-07-30 20:19:28 -04:00
* Some other error extracting the file
*/
cli_dbgmsg ( " RAR: Error extracting file: %s \n " , metadata . filename ) ;
2019-01-22 14:05:05 -05:00
/* TODO:
2018-07-30 20:19:28 -04:00
* may need to manually skip the file depending on what , specifically , cli_unrar_extract_file ( ) returned .
*/
} else {
2019-05-28 14:40:40 -07:00
/*
* File should be extracted . . .
* . . . make sure we have read permissions to the file .
*/
# ifdef _WIN32
if ( 0 ! = _access_s ( extract_fullpath , R_OK ) ) {
# else
if ( 0 ! = access ( extract_fullpath , R_OK ) ) {
# endif
cli_dbgmsg ( " RAR: Don't have read permissions, attempting to change file permissions to make it readable.. \n " ) ;
# ifdef _WIN32
if ( 0 ! = _chmod ( extract_fullpath , _S_IREAD ) ) {
# else
if ( 0 ! = chmod ( extract_fullpath , S_IRUSR | S_IRGRP ) ) {
# endif
cli_dbgmsg ( " RAR: Failed to change permission bits so the extracted file is readable.. \n " ) ;
}
}
/*
2018-07-30 20:19:28 -04:00
* . . . scan the extracted file .
*/
cli_dbgmsg ( " RAR: Extraction complete. Scanning now... \n " ) ;
2020-03-21 14:15:28 -04:00
status = cli_magic_scan_file ( extract_fullpath , ctx , filename_base ) ;
2018-07-30 20:19:28 -04:00
if ( status = = CL_EOPEN ) {
cli_dbgmsg ( " RAR: File not found, Extraction failed! \n " ) ;
status = CL_CLEAN ;
} else {
/* Delete the tempfile if not --leave-temps */
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( extract_fullpath ) )
cli_dbgmsg ( " RAR: Failed to unlink the extracted file: %s \n " , extract_fullpath ) ;
if ( status = = CL_VIRUS ) {
cli_dbgmsg ( " RAR: infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
status = CL_VIRUS ;
viruses_found + + ;
}
}
}
/* Free up that the filepath */
if ( NULL ! = extract_fullpath ) {
free ( extract_fullpath ) ;
extract_fullpath = NULL ;
}
}
2017-08-08 17:38:17 -04:00
}
2018-07-30 20:19:28 -04:00
if ( status = = CL_VIRUS ) {
2018-07-20 22:28:48 -04:00
if ( SCAN_ALLMATCHES )
2018-07-30 20:19:28 -04:00
status = CL_SUCCESS ;
2017-08-08 17:38:17 -04:00
else
break ;
}
2018-07-30 20:19:28 -04:00
if ( ctx - > engine - > maxscansize & & ctx - > scansize > = ctx - > engine - > maxscansize ) {
status = CL_CLEAN ;
break ;
}
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
/*
2020-04-14 11:52:12 -04:00
* Free up any malloced metadata . . .
2018-07-30 20:19:28 -04:00
*/
if ( metadata . filename ! = NULL ) {
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
2020-04-14 11:52:12 -04:00
if ( NULL ! = filename_base ) {
free ( filename_base ) ;
filename_base = NULL ;
}
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
} while ( status = = CL_CLEAN ) ;
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
if ( status = = CL_BREAK )
status = CL_CLEAN ;
2017-08-08 17:38:17 -04:00
2018-07-30 20:19:28 -04:00
done :
if ( NULL ! = comment ) {
free ( comment ) ;
comment = NULL ;
}
2003-07-29 15:48:06 +00:00
2018-07-30 20:19:28 -04:00
if ( NULL ! = comment_fullpath ) {
if ( ! ctx - > engine - > keeptmp ) {
cli_rmdirs ( comment_fullpath ) ;
}
free ( comment_fullpath ) ;
comment_fullpath = NULL ;
}
2007-01-20 11:38:54 +00:00
2018-07-30 20:19:28 -04:00
if ( NULL ! = hArchive ) {
cli_unrar_close ( hArchive ) ;
hArchive = NULL ;
}
2003-07-29 15:48:06 +00:00
2018-07-30 20:19:28 -04:00
if ( NULL ! = filename_base ) {
free ( filename_base ) ;
filename_base = NULL ;
}
2007-01-13 00:01:39 +00:00
2018-07-30 20:19:28 -04:00
if ( metadata . filename ! = NULL ) {
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
if ( NULL ! = extract_fullpath ) {
free ( extract_fullpath ) ;
extract_fullpath = NULL ;
2003-07-29 15:48:06 +00:00
}
2018-07-30 20:19:28 -04:00
if ( ( CL_VIRUS ! = status ) & & ( ( CL_EUNPACK = = status ) | | ( nEncryptedFilesFound > 0 ) ) ) {
2020-03-19 21:23:54 -04:00
/* If user requests enabled the Heuristic for encrypted archives... */
if ( SCAN_HEURISTIC_ENCRYPTED_ARCHIVE ) {
2018-07-30 20:19:28 -04:00
if ( CL_VIRUS = = cli_append_virus ( ctx , " Heuristics.Encrypted.RAR " ) ) {
status = CL_VIRUS ;
}
}
if ( status ! = CL_VIRUS ) {
status = CL_CLEAN ;
}
}
cli_dbgmsg ( " RAR: Exit code: %d \n " , status ) ;
2004-03-16 19:39:49 +00:00
2018-07-20 22:28:48 -04:00
if ( SCAN_ALLMATCHES & & viruses_found )
2018-07-30 20:19:28 -04:00
status = CL_VIRUS ;
return status ;
2003-07-29 15:48:06 +00:00
}
2018-10-08 12:59:42 -04:00
/**
* @ brief Scan the metadata using cli_matchmeta ( )
*
* @ param metadata egg metadata structure
* @ param ctx scanning context structure
* @ param files number of files
* @ return cl_error_t Returns CL_CLEAN if nothing found , CL_VIRUS if something found , CL_EUNPACK if encrypted .
*/
static cl_error_t cli_egg_scanmetadata ( cl_egg_metadata * metadata , cli_ctx * ctx , unsigned int files )
{
cl_error_t status = CL_CLEAN ;
cli_dbgmsg ( " EGG: %s, encrypted: %u, compressed: %u, normal: %u, ratio: %u \n " ,
metadata - > filename , metadata - > encrypted , ( unsigned int ) metadata - > pack_size ,
( unsigned int ) metadata - > unpack_size ,
metadata - > pack_size ? ( unsigned int ) ( metadata - > unpack_size / metadata - > pack_size ) : 0 ) ;
if ( CL_VIRUS = = cli_matchmeta ( ctx , metadata - > filename , metadata - > pack_size , metadata - > unpack_size , metadata - > encrypted , files , 0 , NULL ) ) {
status = CL_VIRUS ;
} else if ( SCAN_HEURISTIC_ENCRYPTED_ARCHIVE & & metadata - > encrypted ) {
cli_dbgmsg ( " EGG: Encrypted files found in archive. \n " ) ;
status = CL_EUNPACK ;
}
return status ;
}
static cl_error_t cli_scanegg ( cli_ctx * ctx , size_t sfx_offset )
{
2019-08-16 17:18:59 -07:00
cl_error_t status = CL_EPARSE ;
2019-07-01 16:08:14 -04:00
cl_error_t egg_ret = CL_EPARSE ;
2018-10-08 12:59:42 -04:00
unsigned int file_count = 0 ;
unsigned int viruses_found = 0 ;
uint32_t nEncryptedFilesFound = 0 ;
uint32_t nTooLargeFilesFound = 0 ;
void * hArchive = NULL ;
2019-08-16 17:18:59 -07:00
char * * comments = NULL ;
2019-05-24 10:00:35 -04:00
uint32_t nComments = 0 ;
2018-10-08 12:59:42 -04:00
cl_egg_metadata metadata ;
char * filename_base = NULL ;
char * extract_fullpath = NULL ;
char * comment_fullpath = NULL ;
if ( ctx = = NULL ) {
cli_dbgmsg ( " EGG: Invalid arguments! \n " ) ;
return CL_EARG ;
}
cli_dbgmsg ( " in scanegg() \n " ) ;
/* Zero out the metadata struct before we read the header */
memset ( & metadata , 0 , sizeof ( cl_egg_metadata ) ) ;
/*
* Open the archive .
*/
2019-05-24 10:00:35 -04:00
if ( CL_SUCCESS ! = ( egg_ret = cli_egg_open ( * ctx - > fmap , sfx_offset , & hArchive , & comments , & nComments ) ) ) {
2019-07-01 16:08:14 -04:00
if ( egg_ret = = CL_EUNPACK ) {
2018-10-08 12:59:42 -04:00
cli_dbgmsg ( " EGG: Encrypted main header \n " ) ;
status = CL_EUNPACK ;
goto done ;
}
2019-07-01 16:08:14 -04:00
if ( egg_ret = = CL_EMEM ) {
2018-10-08 12:59:42 -04:00
status = CL_EMEM ;
goto done ;
} else {
status = CL_EFORMAT ;
goto done ;
}
}
/* If the archive header had a comment, write it to the comment dir. */
2019-05-24 10:00:35 -04:00
if ( comments ! = NULL ) {
uint32_t i ;
for ( i = 0 ; i < nComments ; i + + ) {
/*
* Drop the comment to a temp file , if requested
*/
if ( ctx - > engine - > keeptmp ) {
2019-08-16 17:18:59 -07:00
int comment_fd = - 1 ;
2019-05-24 10:00:35 -04:00
size_t prefixLen = strlen ( " comments_ " ) + 5 ;
2019-08-16 17:18:59 -07:00
char * prefix = ( char * ) malloc ( prefixLen + 1 ) ;
2018-10-08 12:59:42 -04:00
2019-05-24 10:00:35 -04:00
snprintf ( prefix , prefixLen , " comments_%u " , i ) ;
prefix [ prefixLen ] = ' \0 ' ;
2020-03-19 21:23:54 -04:00
if ( ! ( comment_fullpath = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , prefix ) ) ) {
2019-05-24 10:00:35 -04:00
free ( prefix ) ;
status = CL_EMEM ;
goto done ;
}
free ( prefix ) ;
comment_fd = open ( comment_fullpath , O_WRONLY | O_CREAT | O_TRUNC | O_BINARY , 0600 ) ;
if ( comment_fd < 0 ) {
cli_dbgmsg ( " EGG: ERROR: Failed to open output file \n " ) ;
2018-10-08 12:59:42 -04:00
} else {
2019-05-24 10:00:35 -04:00
cli_dbgmsg ( " EGG: Writing the archive comment to temp file: %s \n " , comment_fullpath ) ;
if ( 0 = = write ( comment_fd , comments [ i ] , nComments ) ) {
cli_dbgmsg ( " EGG: ERROR: Failed to write to output file \n " ) ;
}
2020-07-15 08:39:32 -07:00
close ( comment_fd ) ;
2018-10-08 12:59:42 -04:00
}
2019-05-24 10:00:35 -04:00
free ( comment_fullpath ) ;
comment_fullpath = NULL ;
2018-10-08 12:59:42 -04:00
}
2019-05-24 10:00:35 -04:00
/*
* Scan the comment .
*/
2020-03-21 14:15:28 -04:00
status = cli_magic_scan_buff ( comments [ i ] , strlen ( comments [ i ] ) , ctx , NULL ) ;
2018-10-08 12:59:42 -04:00
2019-05-24 10:00:35 -04:00
if ( ( status = = CL_VIRUS ) & & SCAN_ALLMATCHES ) {
status = CL_CLEAN ;
viruses_found + + ;
}
if ( ( status = = CL_VIRUS ) | | ( status = = CL_BREAK ) ) {
goto done ;
}
2018-10-08 12:59:42 -04:00
}
}
/*
* Read & scan each file header .
* Extract & scan each file .
*
* Skip files if they will exceed max filesize or max scansize .
* Count the number of encrypted file headers and encrypted files .
* - Alert if there are encrypted files ,
* if the Heuristic for encrypted archives is enabled ,
* and if we have not detected a signature match .
*/
do {
status = CL_CLEAN ;
/* Zero out the metadata struct before we read the header */
2019-08-24 20:48:50 -04:00
memset ( & metadata , 0 , sizeof ( cl_egg_metadata ) ) ;
2018-10-08 12:59:42 -04:00
/*
* Get the header information for the next file in the archive .
*/
egg_ret = cli_egg_peek_file_header ( hArchive , & metadata ) ;
2019-07-01 16:08:14 -04:00
if ( egg_ret ! = CL_SUCCESS ) {
if ( egg_ret = = CL_EUNPACK ) {
2018-10-08 12:59:42 -04:00
/* Found an encrypted file header, must skip. */
cli_dbgmsg ( " EGG: Encrypted file header, unable to reading file metadata and file contents. Skipping file... \n " ) ;
nEncryptedFilesFound + = 1 ;
2019-07-01 16:08:14 -04:00
if ( CL_SUCCESS ! = cli_egg_skip_file ( hArchive ) ) {
2018-10-08 12:59:42 -04:00
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " EGG: Failed to skip file. EGG archive extraction has failed. \n " ) ;
break ;
}
2019-07-01 16:08:14 -04:00
} else if ( egg_ret = = CL_BREAK ) {
2018-10-08 12:59:42 -04:00
/* No more files. Break extraction loop. */
cli_dbgmsg ( " EGG: No more files in archive. \n " ) ;
break ;
} else {
/* Memory error or some other error reading the header info. */
cli_dbgmsg ( " EGG: Error (%u) reading file header! \n " , egg_ret ) ;
break ;
}
} else {
file_count + = 1 ;
/*
* Scan the metadata for the file in question since the content was clean , or we ' re running in all - match .
*/
status = cli_egg_scanmetadata ( & metadata , ctx , file_count ) ;
if ( ( status = = CL_VIRUS ) & & SCAN_ALLMATCHES ) {
status = CL_CLEAN ;
viruses_found + + ;
}
if ( ( status = = CL_VIRUS ) | | ( status = = CL_BREAK ) ) {
break ;
}
/* Check if we've already exceeded the scan limit */
if ( cli_checklimits ( " EGG " , ctx , 0 , 0 , 0 ) )
break ;
if ( metadata . is_dir ) {
/* Entry is a directory. Skip. */
cli_dbgmsg ( " EGG: Found directory. Skipping to next file. \n " ) ;
2019-07-01 16:08:14 -04:00
if ( CL_SUCCESS ! = cli_egg_skip_file ( hArchive ) ) {
2018-10-08 12:59:42 -04:00
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " EGG: Failed to skip directory. EGG archive extraction has failed. \n " ) ;
break ;
}
} else if ( cli_checklimits ( " EGG " , ctx , metadata . unpack_size , 0 , 0 ) ) {
/* File size exceeds maxfilesize, must skip extraction.
* Although we may be able to scan the metadata */
nTooLargeFilesFound + = 1 ;
cli_dbgmsg ( " EGG: Next file is too large (% " PRIu64 " bytes); it would exceed max scansize. Skipping to next file. \n " , metadata . unpack_size ) ;
2019-07-01 16:08:14 -04:00
if ( CL_SUCCESS ! = cli_egg_skip_file ( hArchive ) ) {
2018-10-08 12:59:42 -04:00
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " EGG: Failed to skip file. EGG archive extraction has failed. \n " ) ;
break ;
}
} else if ( metadata . encrypted ! = 0 ) {
/* Found an encrypted file, must skip. */
cli_dbgmsg ( " EGG: Encrypted file, unable to extract file contents. Skipping file... \n " ) ;
nEncryptedFilesFound + = 1 ;
2019-07-01 16:08:14 -04:00
if ( CL_SUCCESS ! = cli_egg_skip_file ( hArchive ) ) {
2018-10-08 12:59:42 -04:00
/* Failed to skip! Break extraction loop. */
cli_dbgmsg ( " EGG: Failed to skip file. EGG archive extraction has failed. \n " ) ;
break ;
}
} else {
/*
* Extract the file . . .
*/
char * extract_filename = NULL ;
char * extract_buffer = NULL ;
size_t extract_buffer_len = 0 ;
cli_dbgmsg ( " EGG: Extracting file: %s \n " , metadata . filename ) ;
egg_ret = cli_egg_extract_file ( hArchive , ( const char * * ) & extract_filename , ( const char * * ) & extract_buffer , & extract_buffer_len ) ;
2019-07-01 16:08:14 -04:00
if ( egg_ret ! = CL_SUCCESS ) {
2018-10-08 12:59:42 -04:00
/*
* Some other error extracting the file
*/
cli_dbgmsg ( " EGG: Error extracting file: %s \n " , metadata . filename ) ;
} else if ( ! extract_buffer | | 0 = = extract_buffer_len ) {
/*
* Empty file . Skip .
*/
cli_dbgmsg ( " EGG: Skipping empty file: %s \n " , metadata . filename ) ;
2019-09-07 07:29:01 -07:00
if ( NULL ! = extract_filename ) {
free ( extract_filename ) ;
extract_filename = NULL ;
}
if ( NULL ! = extract_buffer ) {
free ( extract_buffer ) ;
extract_buffer = NULL ;
}
2018-10-08 12:59:42 -04:00
} else {
/*
* Drop to a temp file , if requested .
*/
2020-03-19 21:23:54 -04:00
if ( NULL ! = metadata . filename ) {
( void ) cli_basename ( metadata . filename , strlen ( metadata . filename ) , & filename_base ) ;
}
2018-10-08 12:59:42 -04:00
if ( ctx - > engine - > keeptmp ) {
int extracted_fd = - 1 ;
2020-03-19 21:23:54 -04:00
if ( NULL = = filename_base ) {
extract_fullpath = cli_gentemp ( ctx - > sub_tmpdir ) ;
} else {
extract_fullpath = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , filename_base ) ;
}
if ( NULL = = extract_fullpath ) {
cli_dbgmsg ( " EGG: Memory error allocating filename for extracted file. " ) ;
2018-10-08 12:59:42 -04:00
status = CL_EMEM ;
break ;
}
extracted_fd = open ( extract_fullpath , O_WRONLY | O_CREAT | O_TRUNC | O_BINARY , 0600 ) ;
if ( extracted_fd < 0 ) {
cli_dbgmsg ( " EGG: ERROR: Failed to open output file \n " ) ;
} else {
cli_dbgmsg ( " EGG: Writing the extracted file contents to temp file: %s \n " , extract_fullpath ) ;
if ( 0 = = write ( extracted_fd , extract_buffer , extract_buffer_len ) ) {
cli_dbgmsg ( " EGG: ERROR: Failed to write to output file \n " ) ;
} else {
close ( extracted_fd ) ;
extracted_fd = - 1 ;
}
}
}
/*
2020-03-19 21:23:54 -04:00
* Scan the extracted file . . .
2018-10-08 12:59:42 -04:00
*/
cli_dbgmsg ( " EGG: Extraction complete. Scanning now... \n " ) ;
2020-03-21 14:15:28 -04:00
status = cli_magic_scan_buff ( extract_buffer , extract_buffer_len , ctx , filename_base ) ;
2018-10-08 12:59:42 -04:00
if ( status = = CL_VIRUS ) {
cli_dbgmsg ( " EGG: infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
status = CL_VIRUS ;
viruses_found + + ;
}
2020-03-19 21:23:54 -04:00
if ( NULL ! = filename_base ) {
free ( filename_base ) ;
filename_base = NULL ;
}
2018-10-08 12:59:42 -04:00
if ( NULL ! = extract_filename ) {
free ( extract_filename ) ;
extract_filename = NULL ;
}
if ( NULL ! = extract_buffer ) {
free ( extract_buffer ) ;
extract_buffer = NULL ;
}
}
/* Free up that the filepath */
if ( NULL ! = extract_fullpath ) {
free ( extract_fullpath ) ;
extract_fullpath = NULL ;
}
}
}
if ( status = = CL_VIRUS ) {
if ( SCAN_ALLMATCHES )
status = CL_SUCCESS ;
else
break ;
}
if ( ctx - > engine - > maxscansize & & ctx - > scansize > = ctx - > engine - > maxscansize ) {
status = CL_CLEAN ;
break ;
}
/*
* TODO : Free up any malloced metadata . . .
*/
if ( metadata . filename ! = NULL ) {
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
} while ( status = = CL_CLEAN ) ;
if ( status = = CL_BREAK )
status = CL_CLEAN ;
done :
if ( NULL ! = comment_fullpath ) {
free ( comment_fullpath ) ;
comment_fullpath = NULL ;
}
if ( NULL ! = hArchive ) {
cli_egg_close ( hArchive ) ;
hArchive = NULL ;
}
if ( NULL ! = filename_base ) {
free ( filename_base ) ;
filename_base = NULL ;
}
if ( metadata . filename ! = NULL ) {
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
if ( NULL ! = extract_fullpath ) {
free ( extract_fullpath ) ;
extract_fullpath = NULL ;
}
if ( ( CL_VIRUS ! = status ) & & ( ( CL_EUNPACK = = status ) | | ( nEncryptedFilesFound > 0 ) ) ) {
2020-03-19 21:23:54 -04:00
/* If user requests enabled the Heuristic for encrypted archives... */
if ( SCAN_HEURISTIC_ENCRYPTED_ARCHIVE ) {
2018-10-08 12:59:42 -04:00
if ( CL_VIRUS = = cli_append_virus ( ctx , " Heuristics.Encrypted.EGG " ) ) {
status = CL_VIRUS ;
}
}
if ( status ! = CL_VIRUS ) {
status = CL_CLEAN ;
}
}
cli_dbgmsg ( " EGG: Exit code: %d \n " , status ) ;
if ( SCAN_ALLMATCHES & & viruses_found )
status = CL_VIRUS ;
return status ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanarj ( cli_ctx * ctx , off_t sfx_offset )
2007-07-11 10:14:08 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
cl_error_t rc ;
int file = 0 ;
2017-08-08 17:38:17 -04:00
arj_metadata_t metadata ;
char * dir ;
int virus_found = 0 ;
2007-07-11 10:14:08 +00:00
cli_dbgmsg ( " in cli_scanarj() \n " ) ;
2016-03-08 14:37:20 -05:00
memset ( & metadata , 0 , sizeof ( arj_metadata_t ) ) ;
2017-08-08 17:38:17 -04:00
/* generate the temporary directory */
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " arj-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " ARJ: Can't create temporary directory %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2007-07-11 10:14:08 +00:00
}
2011-06-10 19:54:43 +03:00
ret = cli_unarj_open ( * ctx - > fmap , dir , & metadata , sfx_offset ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
free ( dir ) ;
cli_dbgmsg ( " ARJ: Error: %s \n " , cl_strerror ( ret ) ) ;
return ret ;
}
2018-12-03 12:40:13 -05:00
do {
2020-01-08 16:11:26 -05:00
2009-10-23 20:49:12 +02:00
metadata . filename = NULL ;
2018-12-03 12:40:13 -05:00
ret = cli_unarj_prepare_file ( dir , & metadata ) ;
if ( ret ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " ARJ: cli_unarj_prepare_file Error: %s \n " , cl_strerror ( ret ) ) ;
break ;
}
file + + ;
2018-12-03 12:40:13 -05:00
if ( cli_matchmeta ( ctx , metadata . filename , metadata . comp_size , metadata . orig_size , metadata . encrypted , file , 0 , NULL ) = = CL_VIRUS ) {
if ( ! SCAN_ALLMATCHES ) {
2016-06-08 16:25:34 -04:00
cli_rmdirs ( dir ) ;
free ( dir ) ;
return CL_VIRUS ;
}
virus_found = 1 ;
2018-12-03 12:40:13 -05:00
ret = CL_SUCCESS ;
2016-06-08 16:25:34 -04:00
}
2010-01-14 23:32:35 +01:00
2018-12-03 12:40:13 -05:00
if ( ( ret = cli_checklimits ( " ARJ " , ctx , metadata . orig_size , metadata . comp_size , 0 ) ) ! = CL_CLEAN ) {
2017-08-08 17:38:17 -04:00
ret = CL_SUCCESS ;
if ( metadata . filename )
free ( metadata . filename ) ;
continue ;
}
ret = cli_unarj_extract_file ( dir , & metadata ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " ARJ: cli_unarj_extract_file Error: %s \n " , cl_strerror ( ret ) ) ;
}
2018-12-03 12:40:13 -05:00
if ( metadata . ofd > = 0 ) {
if ( lseek ( metadata . ofd , 0 , SEEK_SET ) = = - 1 ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " ARJ: call to lseek() failed \n " ) ;
}
2020-03-21 14:15:28 -04:00
rc = cli_magic_scan_desc ( metadata . ofd , NULL , ctx , metadata . filename ) ;
2017-08-08 17:38:17 -04:00
close ( metadata . ofd ) ;
2018-12-03 12:40:13 -05:00
if ( rc = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " ARJ: infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
2018-12-03 12:40:13 -05:00
if ( ! SCAN_ALLMATCHES ) {
2016-06-08 16:25:34 -04:00
ret = CL_VIRUS ;
2018-12-03 12:40:13 -05:00
if ( metadata . filename ) {
2016-06-08 16:25:34 -04:00
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
break ;
}
virus_found = 1 ;
2018-12-03 12:40:13 -05:00
ret = CL_SUCCESS ;
2017-08-08 17:38:17 -04:00
}
}
2018-12-03 12:40:13 -05:00
if ( metadata . filename ) {
2017-08-08 17:38:17 -04:00
free ( metadata . filename ) ;
metadata . filename = NULL ;
}
} while ( ret = = CL_SUCCESS ) ;
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2007-07-11 10:14:08 +00:00
free ( dir ) ;
2018-12-03 12:40:13 -05:00
if ( metadata . filename ) {
2017-08-08 17:38:17 -04:00
free ( metadata . filename ) ;
2007-07-11 10:14:08 +00:00
}
2016-06-08 16:25:34 -04:00
if ( virus_found ! = 0 )
ret = CL_VIRUS ;
2007-07-11 10:14:08 +00:00
cli_dbgmsg ( " ARJ: Exit code: %d \n " , ret ) ;
2007-12-07 09:40:51 +00:00
if ( ret = = CL_BREAK )
2017-08-08 17:38:17 -04:00
ret = CL_CLEAN ;
2007-07-11 10:14:08 +00:00
return ret ;
}
2003-07-29 15:48:06 +00:00
2019-05-04 15:54:54 -04:00
static cl_error_t cli_scangzip_with_zib_from_the_80s ( cli_ctx * ctx , unsigned char * buff )
2017-08-08 17:38:17 -04:00
{
2019-05-04 15:54:54 -04:00
int fd ;
cl_error_t ret ;
size_t outsize = 0 ;
int bytes ;
2010-02-09 16:36:14 +01:00
fmap_t * map = * ctx - > fmap ;
char * tmpname ;
gzFile gz ;
2013-02-19 15:56:26 -05:00
ret = fmap_fd ( map ) ;
2017-08-08 17:38:17 -04:00
if ( ret < 0 )
return CL_EDUP ;
2013-02-19 15:56:26 -05:00
fd = dup ( ret ) ;
2017-08-08 17:38:17 -04:00
if ( fd < 0 )
return CL_EDUP ;
2018-12-03 12:40:13 -05:00
if ( ! ( gz = gzdopen ( fd , " rb " ) ) ) {
2017-08-08 17:38:17 -04:00
close ( fd ) ;
return CL_EOPEN ;
}
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & fd ) ) ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Can't generate temporary file. \n " ) ;
gzclose ( gz ) ;
close ( fd ) ;
return ret ;
}
2018-12-03 12:40:13 -05:00
while ( ( bytes = gzread ( gz , buff , FILEBUFF ) ) > 0 ) {
2017-08-08 17:38:17 -04:00
outsize + = bytes ;
if ( cli_checklimits ( " GZip " , ctx , outsize , 0 , 0 ) ! = CL_CLEAN )
break ;
2019-05-04 15:54:54 -04:00
if ( cli_writen ( fd , buff , ( size_t ) bytes ) ! = ( size_t ) bytes ) {
2017-08-08 17:38:17 -04:00
close ( fd ) ;
gzclose ( gz ) ;
2018-12-03 12:40:13 -05:00
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
free ( tmpname ) ;
return CL_EWRITE ;
}
2010-02-09 16:36:14 +01:00
}
gzclose ( gz ) ;
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_magic_scan_desc ( fd , tmpname , ctx , NULL ) ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
return CL_VIRUS ;
2010-02-09 16:36:14 +01:00
}
close ( fd ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) )
ret = CL_EUNLINK ;
2010-02-09 16:36:14 +01:00
free ( tmpname ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scangzip ( cli_ctx * ctx )
2003-07-29 15:48:06 +00:00
{
2020-03-19 21:23:54 -04:00
int fd ;
cl_error_t ret = CL_CLEAN ;
2017-08-08 17:38:17 -04:00
unsigned char buff [ FILEBUFF ] ;
char * tmpname ;
z_stream z ;
size_t at = 0 , outsize = 0 ;
fmap_t * map = * ctx - > fmap ;
2003-11-09 19:26:44 +00:00
cli_dbgmsg ( " in cli_scangzip() \n " ) ;
2009-09-01 23:33:17 +02:00
memset ( & z , 0 , sizeof ( z ) ) ;
2018-12-03 12:40:13 -05:00
if ( ( ret = inflateInit2 ( & z , MAX_WBITS + 16 ) ) ! = Z_OK ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: InflateInit failed: %d \n " , ret ) ;
return cli_scangzip_with_zib_from_the_80s ( ctx , buff ) ;
}
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & fd ) ) ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Can't generate temporary file. \n " ) ;
inflateEnd ( & z ) ;
return ret ;
}
2018-12-03 12:40:13 -05:00
while ( at < map - > len ) {
2017-08-08 17:38:17 -04:00
unsigned int bytes = MIN ( map - > len - at , map - > pgsz ) ;
2018-12-03 12:40:13 -05:00
if ( ! ( z . next_in = ( void * ) fmap_need_off_once ( map , at , bytes ) ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Can't read %u bytes @ %lu. \n " , bytes , ( long unsigned ) at ) ;
inflateEnd ( & z ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
free ( tmpname ) ;
return CL_EREAD ;
}
at + = bytes ;
z . avail_in = bytes ;
2018-12-03 12:40:13 -05:00
do {
2017-08-08 17:38:17 -04:00
int inf ;
z . avail_out = sizeof ( buff ) ;
2018-12-03 12:40:13 -05:00
z . next_out = buff ;
inf = inflate ( & z , Z_NO_FLUSH ) ;
if ( inf ! = Z_OK & & inf ! = Z_STREAM_END & & inf ! = Z_BUF_ERROR ) {
if ( sizeof ( buff ) = = z . avail_out ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Bad stream, nothing in output buffer. \n " ) ;
at = map - > len ;
break ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Bad stream, data in output buffer. \n " ) ;
/* no break yet, flush extracted bytes to file */
}
}
2019-05-04 15:54:54 -04:00
if ( cli_writen ( fd , buff , sizeof ( buff ) - z . avail_out ) = = ( size_t ) - 1 ) {
2017-08-08 17:38:17 -04:00
inflateEnd ( & z ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
free ( tmpname ) ;
return CL_EWRITE ;
}
outsize + = sizeof ( buff ) - z . avail_out ;
2018-12-03 12:40:13 -05:00
if ( cli_checklimits ( " GZip " , ctx , outsize , 0 , 0 ) ! = CL_CLEAN ) {
2017-08-08 17:38:17 -04:00
at = map - > len ;
break ;
}
2018-12-03 12:40:13 -05:00
if ( inf = = Z_STREAM_END ) {
2017-08-08 17:38:17 -04:00
at - = z . avail_in ;
inflateReset ( & z ) ;
break ;
2018-12-03 12:40:13 -05:00
} else if ( inf ! = Z_OK & & inf ! = Z_BUF_ERROR ) {
2017-08-08 17:38:17 -04:00
at = map - > len ;
break ;
}
} while ( z . avail_out = = 0 ) ;
}
inflateEnd ( & z ) ;
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_magic_scan_desc ( fd , tmpname , ctx , NULL ) ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " GZip: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
return CL_VIRUS ;
2003-07-29 15:48:06 +00:00
}
2007-08-31 19:55:09 +00:00
close ( fd ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) )
ret = CL_EUNLINK ;
2009-09-01 23:33:17 +02:00
free ( tmpname ) ;
2003-07-29 15:48:06 +00:00
return ret ;
}
2008-07-25 00:44:01 +00:00
# ifndef HAVE_BZLIB_H
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanbzip ( cli_ctx * ctx )
2017-08-08 17:38:17 -04:00
{
2008-07-25 00:44:01 +00:00
cli_warnmsg ( " cli_scanbzip: bzip2 support not compiled in \n " ) ;
return CL_CLEAN ;
}
# else
2003-07-29 15:48:06 +00:00
# ifdef NOBZ2PREFIX
2011-06-10 21:22:46 +03:00
# define BZ2_bzDecompressInit bzDecompressInit
# define BZ2_bzDecompress bzDecompress
# define BZ2_bzDecompressEnd bzDecompressEnd
2003-07-29 15:48:06 +00:00
# endif
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanbzip ( cli_ctx * ctx )
2003-07-29 15:48:06 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
int fd , rc ;
2020-01-30 09:15:44 -08:00
uint64_t size = 0 ;
2011-06-10 21:22:46 +03:00
char * tmpname ;
bz_stream strm ;
size_t off = 0 ;
size_t avail ;
char buf [ FILEBUFF ] ;
memset ( & strm , 0 , sizeof ( strm ) ) ;
strm . next_out = buf ;
strm . avail_out = sizeof ( buf ) ;
rc = BZ2_bzDecompressInit ( & strm , 0 , 0 ) ;
2018-12-03 12:40:13 -05:00
if ( BZ_OK ! = rc ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: DecompressInit failed: %d \n " , rc ) ;
return CL_EOPEN ;
}
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & fd ) ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: Can't generate temporary file. \n " ) ;
BZ2_bzDecompressEnd ( & strm ) ;
return ret ;
}
2018-12-03 12:40:13 -05:00
do {
if ( ! strm . avail_in ) {
2017-08-08 17:38:17 -04:00
strm . next_in = ( void * ) fmap_need_off_once_len ( * ctx - > fmap , off , FILEBUFF , & avail ) ;
strm . avail_in = avail ;
off + = avail ;
2018-12-03 12:40:13 -05:00
if ( ! strm . avail_in ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: premature end of compressed stream \n " ) ;
break ;
}
}
rc = BZ2_bzDecompress ( & strm ) ;
2018-12-03 12:40:13 -05:00
if ( BZ_OK ! = rc & & BZ_STREAM_END ! = rc ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: decompress error: %d \n " , rc ) ;
break ;
}
2018-12-03 12:40:13 -05:00
if ( ! strm . avail_out | | BZ_STREAM_END = = rc ) {
2017-08-08 17:38:17 -04:00
size + = sizeof ( buf ) - strm . avail_out ;
2018-12-03 12:40:13 -05:00
if ( cli_writen ( fd , buf , sizeof ( buf ) - strm . avail_out ) ! = sizeof ( buf ) - strm . avail_out ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: Can't write to file. \n " ) ;
BZ2_bzDecompressEnd ( & strm ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
return CL_EWRITE ;
}
if ( cli_checklimits ( " Bzip " , ctx , size , 0 , 0 ) ! = CL_CLEAN )
break ;
strm . next_out = buf ;
strm . avail_out = sizeof ( buf ) ;
}
2011-06-10 21:22:46 +03:00
} while ( BZ_STREAM_END ! = rc ) ;
2003-07-29 15:48:06 +00:00
2011-06-10 21:22:46 +03:00
BZ2_bzDecompressEnd ( & strm ) ;
2005-04-06 22:48:06 +00:00
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_magic_scan_desc ( fd , tmpname , ctx , NULL ) ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Bzip: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
ret = CL_EUNLINK ;
free ( tmpname ) ;
return ret ;
}
}
free ( tmpname ) ;
return CL_VIRUS ;
2003-07-29 15:48:06 +00:00
}
2007-08-31 19:55:09 +00:00
close ( fd ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) )
ret = CL_EUNLINK ;
2011-06-10 21:22:46 +03:00
free ( tmpname ) ;
2003-07-29 15:48:06 +00:00
return ret ;
}
# endif
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanxz ( cli_ctx * ctx )
2013-10-08 17:17:44 -04:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
int fd , rc ;
2013-10-08 17:17:44 -04:00
unsigned long int size = 0 ;
char * tmpname ;
2014-07-09 13:16:31 -04:00
struct CLI_XZ strm ;
2013-10-08 17:17:44 -04:00
size_t off = 0 ;
size_t avail ;
2014-07-09 13:16:31 -04:00
unsigned char * buf ;
2013-10-08 17:17:44 -04:00
2014-07-09 13:16:31 -04:00
buf = cli_malloc ( CLI_XZ_OBUF_SIZE ) ;
2018-12-03 12:40:13 -05:00
if ( buf = = NULL ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: nomemory for decompress buffer. \n " ) ;
2013-10-09 15:41:55 -04:00
return CL_EMEM ;
}
2014-07-09 13:16:31 -04:00
memset ( & strm , 0x00 , sizeof ( struct CLI_XZ ) ) ;
2018-12-03 12:40:13 -05:00
strm . next_out = buf ;
2013-10-09 15:41:55 -04:00
strm . avail_out = CLI_XZ_OBUF_SIZE ;
2018-12-03 12:40:13 -05:00
rc = cli_XzInit ( & strm ) ;
if ( rc ! = XZ_RESULT_OK ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: DecompressInit failed: %i \n " , rc ) ;
2013-10-09 15:41:55 -04:00
free ( buf ) ;
2017-08-08 17:38:17 -04:00
return CL_EOPEN ;
2013-10-08 17:17:44 -04:00
}
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & fd ) ) ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: Can't generate temporary file. \n " ) ;
cli_XzShutdown ( & strm ) ;
2013-10-09 15:41:55 -04:00
free ( buf ) ;
2017-08-08 17:38:17 -04:00
return ret ;
2013-10-08 17:17:44 -04:00
}
cli_dbgmsg ( " cli_scanxz: decompressing to file %s \n " , tmpname ) ;
2018-12-03 12:40:13 -05:00
do {
2013-10-08 17:17:44 -04:00
/* set up input buffer */
2018-12-03 12:40:13 -05:00
if ( ! strm . avail_in ) {
strm . next_in = ( void * ) fmap_need_off_once_len ( * ctx - > fmap , off , CLI_XZ_IBUF_SIZE , & avail ) ;
2017-08-08 17:38:17 -04:00
strm . avail_in = avail ;
off + = avail ;
2018-12-03 12:40:13 -05:00
if ( ! strm . avail_in ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: premature end of compressed stream \n " ) ;
2013-10-08 17:17:44 -04:00
ret = CL_EFORMAT ;
2017-08-08 17:38:17 -04:00
goto xz_exit ;
}
}
2013-10-08 17:17:44 -04:00
/* xz decompress a chunk */
2017-08-08 17:38:17 -04:00
rc = cli_XzDecode ( & strm ) ;
2018-12-03 12:40:13 -05:00
if ( XZ_RESULT_OK ! = rc & & XZ_STREAM_END ! = rc ) {
if ( rc = = XZ_DIC_HEURISTIC ) {
2018-07-20 22:28:48 -04:00
ret = cli_append_virus ( ctx , " Heuristics.XZ.DicSizeLimit " ) ;
2017-06-19 15:41:17 -04:00
goto xz_exit ;
}
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: decompress error: %d \n " , rc ) ;
2013-10-08 17:17:44 -04:00
ret = CL_EFORMAT ;
goto xz_exit ;
2017-08-08 17:38:17 -04:00
}
2013-10-08 17:17:44 -04:00
//cli_dbgmsg("cli_scanxz: xz decompressed %li of %li available bytes\n",
// avail - strm.avail_in, avail);
2017-08-08 17:38:17 -04:00
2013-10-08 17:17:44 -04:00
/* write decompress buffer */
2018-12-03 12:40:13 -05:00
if ( ! strm . avail_out | | rc = = XZ_STREAM_END ) {
2017-08-08 17:38:17 -04:00
size_t towrite = CLI_XZ_OBUF_SIZE - strm . avail_out ;
size + = towrite ;
2013-10-08 17:17:44 -04:00
//cli_dbgmsg("Writing %li bytes to XZ decompress temp file(%li byte total)\n",
// towrite, size);
2019-05-04 15:54:54 -04:00
if ( cli_writen ( fd , buf , towrite ) ! = towrite ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanxz: Can't write to file. \n " ) ;
2013-10-08 17:17:44 -04:00
ret = CL_EWRITE ;
goto xz_exit ;
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
if ( cli_checklimits ( " cli_scanxz " , ctx , size , 0 , 0 ) ! = CL_CLEAN ) {
2013-10-08 17:17:44 -04:00
cli_warnmsg ( " cli_scanxz: decompress file size exceeds limits - "
2017-08-08 17:38:17 -04:00
" only scanning %li bytes \n " ,
size ) ;
break ;
2013-10-08 17:17:44 -04:00
}
2018-12-03 12:40:13 -05:00
strm . next_out = buf ;
2017-08-08 17:38:17 -04:00
strm . avail_out = CLI_XZ_OBUF_SIZE ;
}
2013-10-08 17:17:44 -04:00
} while ( XZ_STREAM_END ! = rc ) ;
/* scan decompressed file */
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_magic_scan_desc ( fd , tmpname , ctx , NULL ) ) = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanxz: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
2013-10-08 17:17:44 -04:00
}
2017-08-08 17:38:17 -04:00
xz_exit :
2013-10-08 17:17:44 -04:00
cli_XzShutdown ( & strm ) ;
close ( fd ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) & & ret = = CL_CLEAN )
2013-10-08 17:17:44 -04:00
ret = CL_EUNLINK ;
free ( tmpname ) ;
2013-10-09 15:41:55 -04:00
free ( buf ) ;
2013-10-08 17:17:44 -04:00
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanszdd ( cli_ctx * ctx )
2004-05-02 00:51:01 +00:00
{
2020-03-19 21:23:54 -04:00
int ofd ;
cl_error_t ret ;
2017-08-08 17:38:17 -04:00
char * tmpname ;
2004-09-07 21:19:02 +00:00
2005-06-11 20:27:59 +00:00
cli_dbgmsg ( " in cli_scanszdd() \n " ) ;
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & ofd ) ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " MSEXPAND: Can't generate temporary file/descriptor \n " ) ;
return ret ;
2004-05-02 00:51:01 +00:00
}
2011-06-10 19:09:38 +02:00
ret = cli_msexpand ( ctx , ofd ) ;
2004-05-02 00:51:01 +00:00
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) { /* CL_VIRUS or some error */
2017-08-08 17:38:17 -04:00
close ( ofd ) ;
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) )
ret = CL_EUNLINK ;
free ( tmpname ) ;
return ret ;
2004-05-02 00:51:01 +00:00
}
2007-12-13 23:18:03 +00:00
cli_dbgmsg ( " MSEXPAND: Decompressed into %s \n " , tmpname ) ;
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_desc ( ofd , tmpname , ctx , NULL ) ;
2007-12-13 23:18:03 +00:00
close ( ofd ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
if ( cli_unlink ( tmpname ) )
ret = CL_EUNLINK ;
free ( tmpname ) ;
2007-12-13 23:18:03 +00:00
2004-05-02 00:51:01 +00:00
return ret ;
}
2019-05-04 15:54:54 -04:00
static cl_error_t vba_scandata ( const unsigned char * data , size_t len , cli_ctx * ctx )
2011-11-18 15:25:04 +01:00
{
2017-08-08 17:38:17 -04:00
struct cli_matcher * groot = ctx - > engine - > root [ 0 ] ;
struct cli_matcher * troot = ctx - > engine - > root [ 2 ] ;
struct cli_ac_data gmdata , tmdata ;
struct cli_ac_data * mdata [ 2 ] ;
2019-05-04 15:54:54 -04:00
cl_error_t ret ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
2011-11-18 15:25:04 +01:00
2017-08-08 17:38:17 -04:00
if ( ( ret = cli_ac_initdata ( & tmdata , troot - > ac_partsigs , troot - > ac_lsigs , troot - > ac_reloff_num , CLI_DEFAULT_AC_TRACKLEN ) ) )
return ret ;
2011-11-18 15:25:04 +01:00
2018-12-03 12:40:13 -05:00
if ( ( ret = cli_ac_initdata ( & gmdata , groot - > ac_partsigs , groot - > ac_lsigs , groot - > ac_reloff_num , CLI_DEFAULT_AC_TRACKLEN ) ) ) {
2017-08-08 17:38:17 -04:00
cli_ac_freedata ( & tmdata ) ;
return ret ;
2011-11-18 15:25:04 +01:00
}
mdata [ 0 ] = & tmdata ;
mdata [ 1 ] = & gmdata ;
2020-03-21 14:15:28 -04:00
ret = cli_scan_buff ( data , len , 0 , ctx , CL_TYPE_MSOLE2 , mdata ) ;
2012-11-30 11:16:47 -08:00
if ( ret = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
2011-11-18 15:25:04 +01:00
2018-12-03 12:40:13 -05:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2017-08-08 17:38:17 -04:00
fmap_t * map = * ctx - > fmap ;
2020-03-19 21:23:54 -04:00
* ctx - > fmap = fmap_open_memory ( data , len , NULL ) ;
2016-10-24 18:03:36 -04:00
if ( * ctx - > fmap = = NULL )
return CL_EMEM ;
2017-08-08 17:38:17 -04:00
ret = cli_exp_eval ( ctx , troot , & tmdata , NULL , NULL ) ;
if ( ret = = CL_VIRUS )
viruses_found + + ;
2012-11-30 11:16:47 -08:00
2018-07-20 22:28:48 -04:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) )
2017-08-08 17:38:17 -04:00
ret = cli_exp_eval ( ctx , groot , & gmdata , NULL , NULL ) ;
2016-10-24 18:03:36 -04:00
funmap ( * ctx - > fmap ) ;
* ctx - > fmap = map ;
2011-11-18 15:25:04 +01:00
}
cli_ac_freedata ( & tmdata ) ;
cli_ac_freedata ( & gmdata ) ;
2021-04-08 19:16:11 -07:00
return ( ret ! = CL_CLEAN ) ? ret : viruses_found ? CL_VIRUS : CL_CLEAN ;
2011-11-18 15:25:04 +01:00
}
2020-04-28 13:32:07 -07:00
# define min(x, y) ((x) < (y) ? (x) : (y))
/**
* Find a file in a directory tree .
* \ param filename Name of the file to find
* \ param dir Directory path where to find the file
* \ param A pointer to the string to store the result into
* \ param Size of the string to store the result in
*/
2020-03-19 21:23:54 -04:00
cl_error_t find_file ( const char * filename , const char * dir , char * result , size_t result_size )
2020-04-28 13:32:07 -07:00
{
DIR * dd ;
struct dirent * dent ;
char fullname [ PATH_MAX ] ;
cl_error_t ret ;
size_t len ;
STATBUF statbuf ;
if ( ! result ) {
return CL_ENULLARG ;
}
if ( ( dd = opendir ( dir ) ) ! = NULL ) {
while ( ( dent = readdir ( dd ) ) ) {
if ( dent - > d_ino ) {
if ( strcmp ( dent - > d_name , " . " ) ! = 0 & & strcmp ( dent - > d_name , " .. " ) ! = 0 ) {
snprintf ( fullname , sizeof ( fullname ) , " %s " PATHSEP " %s " , dir , dent - > d_name ) ;
fullname [ sizeof ( fullname ) - 1 ] = ' \0 ' ;
/* stat the file */
if ( LSTAT ( fullname , & statbuf ) ! = - 1 ) {
if ( S_ISDIR ( statbuf . st_mode ) & & ! S_ISLNK ( statbuf . st_mode ) ) {
ret = find_file ( filename , fullname , result , result_size ) ;
if ( ret = = CL_SUCCESS ) {
closedir ( dd ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
} else if ( S_ISREG ( statbuf . st_mode ) ) {
2020-04-28 13:32:07 -07:00
if ( strcmp ( dent - > d_name , filename ) = = 0 ) {
2020-03-19 21:23:54 -04:00
len = min ( strlen ( dir ) + 1 , result_size ) ;
2020-04-28 13:32:07 -07:00
memcpy ( result , dir , len ) ;
result [ len - 1 ] = ' \0 ' ;
closedir ( dd ) ;
return CL_SUCCESS ;
}
}
}
}
}
}
closedir ( dd ) ;
}
return CL_EOPEN ;
}
/**
* Scan an OLE directory for a VBA project .
* Contrary to cli_vba_scandir , this function uses the dir file to locate VBA modules .
*/
2020-08-07 23:48:20 -07:00
static cl_error_t cli_vba_scandir_new ( const char * dirname , cli_ctx * ctx , struct uniq * U , int * has_macros )
2020-04-28 13:32:07 -07:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_SUCCESS ;
2020-04-28 13:32:07 -07:00
uint32_t hashcnt = 0 ;
2020-03-19 21:23:54 -04:00
char * hash = NULL ;
2020-04-28 13:32:07 -07:00
char path [ PATH_MAX ] ;
char filename [ PATH_MAX ] ;
2020-08-07 23:48:20 -07:00
int tempfd = - 1 ;
int viruses_found = 0 ;
2020-04-28 13:32:07 -07:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " dir " , 3 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " cli_vba_scandir_new: uniq_get('dir') failed with ret code (%d)! \n " , ret ) ;
return ret ;
}
while ( hashcnt ) {
2020-03-19 21:23:54 -04:00
//Find the directory containing the extracted dir file. This is complicated
//because ClamAV doesn't use the file names from the OLE file, but temporary names,
//and we have neither the complete path of the dir file in the OLE container,
//nor the mapping of the temporary directory names to their OLE names.
2020-04-28 13:32:07 -07:00
snprintf ( filename , sizeof ( filename ) , " %s_%u " , hash , hashcnt ) ;
filename [ sizeof ( filename ) - 1 ] = ' \0 ' ;
if ( CL_SUCCESS = = find_file ( filename , dirname , path , sizeof ( path ) ) ) {
cli_dbgmsg ( " cli_vba_scandir_new: Found dir file: %s \n " , path ) ;
2020-08-07 23:48:20 -07:00
if ( ( ret = cli_vba_readdir_new ( ctx , path , U , hash , hashcnt , & tempfd , has_macros ) ) ! = CL_SUCCESS ) {
2020-04-28 13:32:07 -07:00
//FIXME: Since we only know the stream name of the OLE2 stream, but not its path inside the
// OLE2 archive, we don't know if we have the right file. The only thing we can do is
// iterate all of them until one succeeds.
2020-03-19 21:23:54 -04:00
cli_dbgmsg ( " cli_vba_scandir_new: Failed to read dir from %s, trying others (error: %s (%d)) \n " , path , cl_strerror ( ret ) , ( int ) ret ) ;
2020-04-28 13:32:07 -07:00
ret = CL_SUCCESS ;
hashcnt - - ;
continue ;
}
2020-08-07 23:48:20 -07:00
# if HAVE_JSON
if ( * has_macros & & SCAN_COLLECT_METADATA & & ( ctx - > wrkproperty ! = NULL ) ) {
cli_jsonbool ( ctx - > wrkproperty , " HasMacros " , 1 ) ;
json_object * macro_languages = cli_jsonarray ( ctx - > wrkproperty , " MacroLanguages " ) ;
if ( macro_languages ) {
cli_jsonstr ( macro_languages , NULL , " VBA " ) ;
} else {
cli_dbgmsg ( " [cli_vba_scandir_new] Failed to add \" VBA \" entry to MacroLanguages JSON array \n " ) ;
}
}
# endif
if ( SCAN_HEURISTIC_MACROS & & * has_macros ) {
2020-08-10 14:57:28 -07:00
ret = cli_append_virus ( ctx , " Heuristics.OLE2.ContainsMacros.VBA " ) ;
2020-08-07 23:48:20 -07:00
if ( ret = = CL_VIRUS ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
goto done ;
}
}
}
/*
* Now rewind the extracted vba - project output FD and scan it !
*/
2020-04-28 13:32:07 -07:00
if ( lseek ( tempfd , 0 , SEEK_SET ) ! = 0 ) {
cli_dbgmsg ( " cli_vba_scandir_new: Failed to seek to beginning of temporary VBA project file \n " ) ;
ret = CL_ESEEK ;
goto done ;
}
ctx - > recursion + = 1 ;
cli_set_container ( ctx , CL_TYPE_MSOLE2 , 0 ) ; //TODO: set correct container size
2020-08-07 23:48:20 -07:00
ret = cli_scan_desc ( tempfd , ctx , CL_TYPE_SCRIPT , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ;
2020-04-28 13:32:07 -07:00
close ( tempfd ) ;
tempfd = - 1 ;
ctx - > recursion - = 1 ;
2020-08-07 23:48:20 -07:00
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
goto done ;
}
}
2020-04-28 13:32:07 -07:00
}
hashcnt - - ;
}
done :
if ( tempfd ! = - 1 ) {
close ( tempfd ) ;
tempfd = - 1 ;
}
2020-08-07 23:48:20 -07:00
if ( viruses_found > 0 )
ret = CL_VIRUS ;
2020-04-28 13:32:07 -07:00
return ret ;
}
2020-08-07 23:48:20 -07:00
static cl_error_t cli_vba_scandir ( const char * dirname , cli_ctx * ctx , struct uniq * U , int * has_macros )
2003-07-29 15:48:06 +00:00
{
2020-08-07 23:48:20 -07:00
cl_error_t status = CL_CLEAN ;
cl_error_t ret ;
2020-08-12 18:14:39 -07:00
int i , j ;
2019-05-04 15:54:54 -04:00
size_t data_len ;
2017-08-08 17:38:17 -04:00
vba_project_t * vba_project ;
2020-08-07 23:48:20 -07:00
DIR * dd = NULL ;
2017-08-08 17:38:17 -04:00
struct dirent * dent ;
STATBUF statbuf ;
char * fullname , vbaname [ 1024 ] ;
unsigned char * data ;
char * hash ;
2019-01-22 14:05:05 -05:00
uint32_t hashcnt = 0 ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
2004-09-07 21:19:02 +00:00
cli_dbgmsg ( " VBADir: %s \n " , dirname ) ;
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " _vba_project " , 12 , NULL , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('_vba_project') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
if ( ! ( vba_project = ( vba_project_t * ) cli_vba_readdir ( dirname , U , hashcnt ) ) ) {
hashcnt - - ;
2017-08-08 17:38:17 -04:00
continue ;
2019-01-22 14:05:05 -05:00
}
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
for ( i = 0 ; i < vba_project - > count ; i + + ) {
2019-01-22 14:05:05 -05:00
for ( j = 1 ; ( unsigned int ) j < = vba_project - > colls [ i ] ; j + + ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , 1024 , " %s " PATHSEP " %s_%u " , vba_project - > dir , vba_project - > name [ i ] , j ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
2020-08-12 18:14:39 -07:00
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2019-01-22 14:05:05 -05:00
if ( fd = = - 1 ) {
2017-08-08 17:38:17 -04:00
continue ;
2019-01-22 14:05:05 -05:00
}
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " VBADir: Decompress VBA project '%s_%u' \n " , vba_project - > name [ i ] , j ) ;
data = ( unsigned char * ) cli_vba_inflate ( fd , vba_project - > offset [ i ] , & data_len ) ;
close ( fd ) ;
2020-08-07 23:48:20 -07:00
* has_macros = * has_macros + 1 ;
2018-12-03 12:40:13 -05:00
if ( ! data ) {
} else {
2017-08-08 17:38:17 -04:00
/* cli_dbgmsg("Project content:\n%s", data); */
if ( ctx - > scanned )
* ctx - > scanned + = data_len / CL_COUNT_PRECISION ;
2018-12-03 12:40:13 -05:00
if ( ctx - > engine - > keeptmp ) {
2017-08-08 17:38:17 -04:00
char * tempfile ;
int of ;
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tempfile , & of ) ) ! = CL_SUCCESS ) {
2017-08-08 17:38:17 -04:00
cli_warnmsg ( " VBADir: WARNING: VBA project '%s_%u' cannot be dumped to file \n " , vba_project - > name [ i ] , j ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
if ( cli_writen ( of , data , data_len ) ! = data_len ) {
2017-08-08 17:38:17 -04:00
cli_warnmsg ( " VBADir: WARNING: VBA project '%s_%u' failed to write to file \n " , vba_project - > name [ i ] , j ) ;
close ( of ) ;
free ( tempfile ) ;
2020-08-07 23:48:20 -07:00
status = CL_EWRITE ;
goto done ;
2017-08-08 17:38:17 -04:00
}
cli_dbgmsg ( " VBADir: VBA project '%s_%u' dumped to %s \n " , vba_project - > name [ i ] , j , tempfile ) ;
free ( tempfile ) ;
}
2018-12-03 12:40:13 -05:00
if ( vba_scandata ( data , data_len , ctx ) = = CL_VIRUS ) {
2020-08-06 22:15:33 -07:00
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
free ( data ) ;
2020-08-07 23:48:20 -07:00
status = CL_VIRUS ;
2017-08-08 17:38:17 -04:00
break ;
}
}
free ( data ) ;
}
}
2020-08-06 22:15:33 -07:00
2020-08-07 23:48:20 -07:00
if ( status = = CL_VIRUS )
2020-08-06 22:15:33 -07:00
break ;
2017-08-08 17:38:17 -04:00
}
2019-01-22 14:05:05 -05:00
cli_free_vba_project ( vba_project ) ;
vba_project = NULL ;
2020-08-07 23:48:20 -07:00
if ( status = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
break ;
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2017-08-08 17:38:17 -04:00
}
2020-08-07 23:48:20 -07:00
if ( status = = CL_CLEAN | | ( status = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " powerpoint document " , 19 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('powerpoint document') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , 1024 , " %s " PATHSEP " %s_%u " , dirname , hash , hashcnt ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
2020-08-12 18:14:39 -07:00
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2019-01-22 14:05:05 -05:00
if ( fd = = - 1 ) {
hashcnt - - ;
2017-08-08 17:38:17 -04:00
continue ;
2019-01-22 14:05:05 -05:00
}
2018-12-03 12:40:13 -05:00
if ( ( fullname = cli_ppt_vba_read ( fd , ctx ) ) ) {
2020-08-10 14:57:28 -07:00
ret = cli_magic_scan_dir ( fullname , ctx ) ;
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( fullname ) ;
free ( fullname ) ;
2020-08-10 14:57:28 -07:00
if ( ret = = CL_VIRUS ) {
status = CL_VIRUS ;
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
2020-08-12 18:14:39 -07:00
close ( fd ) ;
2020-08-10 14:57:28 -07:00
break ;
}
}
2017-08-08 17:38:17 -04:00
}
close ( fd ) ;
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2017-08-08 17:38:17 -04:00
}
}
2020-08-07 23:48:20 -07:00
if ( status = = CL_CLEAN | | ( status = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " worddocument " , 12 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('worddocument') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , sizeof ( vbaname ) , " %s " PATHSEP " %s_%u " , dirname , hash , hashcnt ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
2020-08-12 18:14:39 -07:00
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2019-01-22 14:05:05 -05:00
if ( fd = = - 1 ) {
hashcnt - - ;
2017-08-08 17:38:17 -04:00
continue ;
2019-01-22 14:05:05 -05:00
}
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
if ( ! ( vba_project = ( vba_project_t * ) cli_wm_readdir ( fd ) ) ) {
2017-08-08 17:38:17 -04:00
close ( fd ) ;
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2017-08-08 17:38:17 -04:00
continue ;
}
2018-12-03 12:40:13 -05:00
for ( i = 0 ; i < vba_project - > count ; i + + ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " VBADir: Decompress WM project macro:%d key:%d length:%d \n " , i , vba_project - > key [ i ] , vba_project - > length [ i ] ) ;
data = ( unsigned char * ) cli_wm_decrypt_macro ( fd , vba_project - > offset [ i ] , vba_project - > length [ i ] , vba_project - > key [ i ] ) ;
2018-12-03 12:40:13 -05:00
if ( ! data ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " VBADir: WARNING: WM project '%s' macro %d decrypted to NULL \n " , vba_project - > name [ i ] , i ) ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Project content: \n %s " , data ) ;
if ( ctx - > scanned )
* ctx - > scanned + = vba_project - > length [ i ] / CL_COUNT_PRECISION ;
2018-12-03 12:40:13 -05:00
if ( vba_scandata ( data , vba_project - > length [ i ] , ctx ) = = CL_VIRUS ) {
2020-08-06 22:15:33 -07:00
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
free ( data ) ;
2020-08-07 23:48:20 -07:00
status = CL_VIRUS ;
2017-08-08 17:38:17 -04:00
break ;
}
}
free ( data ) ;
}
}
close ( fd ) ;
2019-01-22 14:05:05 -05:00
cli_free_vba_project ( vba_project ) ;
vba_project = NULL ;
2020-08-07 23:48:20 -07:00
if ( status = = CL_VIRUS & & ! SCAN_ALLMATCHES ) {
break ;
2017-08-08 17:38:17 -04:00
}
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2017-08-08 17:38:17 -04:00
}
}
2014-04-23 18:15:29 -04:00
# if HAVE_JSON
2014-04-21 16:44:26 -04:00
/* JSON Output Summary Information */
2018-12-03 12:40:13 -05:00
if ( SCAN_COLLECT_METADATA & & ( ctx - > wrkproperty ! = NULL ) ) {
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " _5_summaryinformation " , 21 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('_5_summaryinformation') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , sizeof ( vbaname ) , " %s " PATHSEP " %s_%u " , dirname , hash , hashcnt ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2014-04-23 18:15:29 -04:00
cli_dbgmsg ( " VBADir: detected a '_5_summaryinformation' stream \n " ) ;
2014-04-24 14:22:00 -04:00
/* JSONOLE2 - what to do if something breaks? */
2014-04-23 18:15:29 -04:00
cli_ole2_summary_json ( ctx , fd , 0 ) ;
close ( fd ) ;
}
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2014-04-23 18:15:29 -04:00
}
2014-04-21 16:44:26 -04:00
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " _5_documentsummaryinformation " , 29 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('_5_documentsummaryinformation') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , sizeof ( vbaname ) , " %s " PATHSEP " %s_%u " , dirname , hash , hashcnt ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2014-04-23 18:15:29 -04:00
cli_dbgmsg ( " VBADir: detected a '_5_documentsummaryinformation' stream \n " ) ;
2014-04-24 14:22:00 -04:00
/* JSONOLE2 - what to do if something breaks? */
2014-04-23 18:15:29 -04:00
cli_ole2_summary_json ( ctx , fd , 1 ) ;
close ( fd ) ;
}
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2014-04-21 16:44:26 -04:00
}
}
2017-08-08 17:38:17 -04:00
# endif
2014-04-21 16:44:26 -04:00
2020-08-07 23:48:20 -07:00
if ( status ! = CL_CLEAN & & ! ( status = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
goto done ;
}
2005-04-14 18:42:13 +00:00
/* Check directory for embedded OLE objects */
2019-01-22 14:05:05 -05:00
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , " _1_ole10native " , 14 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " VBADir: uniq_get('_1_ole10native') failed with ret code (%d)! \n " , ret ) ;
2020-08-07 23:48:20 -07:00
status = ret ;
goto done ;
2019-01-22 14:05:05 -05:00
}
while ( hashcnt ) {
2020-08-12 18:14:39 -07:00
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
snprintf ( vbaname , sizeof ( vbaname ) , " %s " PATHSEP " %s_%u " , dirname , hash , hashcnt ) ;
vbaname [ sizeof ( vbaname ) - 1 ] = ' \0 ' ;
fd = open ( vbaname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2017-08-08 17:38:17 -04:00
ret = cli_scan_ole10 ( fd , ctx ) ;
close ( fd ) ;
2020-08-07 23:48:20 -07:00
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
status = ret ;
goto done ;
}
}
2017-08-08 17:38:17 -04:00
}
2019-01-22 14:05:05 -05:00
hashcnt - - ;
2005-04-14 18:42:13 +00:00
}
2008-05-27 16:30:47 +00:00
/* ACAB: since we now hash filenames and handle collisions we
* could avoid recursion by removing the block below and by
* flattening the paths in ole2_walk_property_tree ( case 1 ) */
2018-12-03 12:40:13 -05:00
if ( ( dd = opendir ( dirname ) ) ! = NULL ) {
while ( ( dent = readdir ( dd ) ) ) {
if ( dent - > d_ino ) {
if ( strcmp ( dent - > d_name , " . " ) & & strcmp ( dent - > d_name , " .. " ) ) {
2017-08-08 17:38:17 -04:00
/* build the full name */
fullname = cli_malloc ( strlen ( dirname ) + strlen ( dent - > d_name ) + 2 ) ;
2018-12-03 12:40:13 -05:00
if ( ! fullname ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_vba_scandir: Unable to allocate memory for fullname \n " ) ;
2020-08-10 14:57:28 -07:00
status = CL_EMEM ;
2017-08-08 17:38:17 -04:00
break ;
}
sprintf ( fullname , " %s " PATHSEP " %s " , dirname , dent - > d_name ) ;
/* stat the file */
2018-12-03 12:40:13 -05:00
if ( LSTAT ( fullname , & statbuf ) ! = - 1 ) {
2017-08-08 17:38:17 -04:00
if ( S_ISDIR ( statbuf . st_mode ) & & ! S_ISLNK ( statbuf . st_mode ) )
2020-08-07 23:48:20 -07:00
if ( cli_vba_scandir ( fullname , ctx , U , has_macros ) = = CL_VIRUS ) {
2020-08-06 22:15:33 -07:00
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
2020-08-10 14:57:28 -07:00
status = CL_VIRUS ;
2017-08-08 17:38:17 -04:00
free ( fullname ) ;
break ;
}
}
}
free ( fullname ) ;
}
}
}
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " VBADir: Can't open directory %s. \n " , dirname ) ;
2020-08-07 23:48:20 -07:00
status = CL_EOPEN ;
goto done ;
}
done :
if ( NULL ! = dd ) {
closedir ( dd ) ;
2003-07-29 15:48:06 +00:00
}
2014-05-19 16:45:52 -04:00
# if HAVE_JSON
2020-08-07 23:48:20 -07:00
if ( * has_macros & & SCAN_COLLECT_METADATA & & ( ctx - > wrkproperty ! = NULL ) ) {
2014-05-16 12:39:45 -04:00
cli_jsonbool ( ctx - > wrkproperty , " HasMacros " , 1 ) ;
2020-04-29 14:19:41 -07:00
json_object * macro_languages = cli_jsonarray ( ctx - > wrkproperty , " MacroLanguages " ) ;
if ( macro_languages ) {
cli_jsonstr ( macro_languages , NULL , " VBA " ) ;
} else {
cli_dbgmsg ( " [cli_scan_vbadir] Failed to add \" VBA \" entry to MacroLanguages JSON array \n " ) ;
}
}
2014-05-19 16:45:52 -04:00
# endif
2020-08-07 23:48:20 -07:00
if ( SCAN_HEURISTIC_MACROS & & * has_macros ) {
2020-08-10 14:57:28 -07:00
ret = cli_append_virus ( ctx , " Heuristics.OLE2.ContainsMacros.VBA " ) ;
2017-08-08 17:38:17 -04:00
if ( ret = = CL_VIRUS )
2017-04-18 12:03:36 -04:00
viruses_found + + ;
2010-10-29 19:04:23 +02:00
}
2020-08-07 23:48:20 -07:00
2020-08-08 21:02:47 -07:00
if ( viruses_found > 0 ) {
2020-08-07 23:48:20 -07:00
status = CL_VIRUS ;
}
return status ;
2004-04-20 22:33:42 +00:00
}
2020-04-29 14:19:41 -07:00
static cl_error_t cli_xlm_scandir ( const char * dirname , cli_ctx * ctx , struct uniq * U )
{
cl_error_t ret = CL_CLEAN ;
char * hash = NULL ;
uint32_t hashcnt = 0 ;
unsigned int viruses_found = 0 ;
char STR_WORKBOOK [ ] = " workbook " ;
char STR_BOOK [ ] = " book " ;
cli_dbgmsg ( " XLMDir: %s \n " , dirname ) ;
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , STR_WORKBOOK , sizeof ( STR_WORKBOOK ) - 1 , & hash , & hashcnt ) ) ) {
if ( CL_SUCCESS ! = ( ret = uniq_get ( U , STR_BOOK , sizeof ( STR_BOOK ) - 1 , & hash , & hashcnt ) ) ) {
cli_dbgmsg ( " XLMDir: uniq_get('%s') failed with ret code (%d)! \n " , STR_BOOK , ret ) ;
return ret ;
}
}
for ( ; hashcnt > 0 ; hashcnt - - ) {
if ( ( ret = cli_xlm_extract_macros ( dirname , ctx , U , hash , hashcnt ) ) ! = CL_SUCCESS ) {
2020-07-16 13:39:47 -07:00
switch ( ret ) {
case CL_VIRUS :
case CL_EMEM :
return ret ;
default :
cli_dbgmsg ( " XLMDir: An error occured when parsing XLM BIFF temp file, skipping to next file. \n " ) ;
}
2020-04-29 14:19:41 -07:00
}
}
if ( SCAN_HEURISTIC_MACROS ) {
2020-08-10 14:57:28 -07:00
ret = cli_append_virus ( ctx , " Heuristics.OLE2.ContainsMacros.XLM " ) ;
2020-04-29 14:19:41 -07:00
if ( ret = = CL_VIRUS )
viruses_found + + ;
}
if ( SCAN_ALLMATCHES & & viruses_found )
return CL_VIRUS ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanhtml ( cli_ctx * ctx )
2005-01-14 14:56:09 +00:00
{
2012-11-27 17:15:02 -05:00
char * tempname , fullname [ 1024 ] ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
int fd ;
2018-12-03 12:40:13 -05:00
fmap_t * map = * ctx - > fmap ;
2012-11-27 17:15:02 -05:00
unsigned int viruses_found = 0 ;
2018-12-03 12:40:13 -05:00
uint64_t curr_len = map - > len ;
2005-01-14 14:56:09 +00:00
cli_dbgmsg ( " in cli_scanhtml() \n " ) ;
2012-11-27 17:15:02 -05:00
/* CL_ENGINE_MAX_HTMLNORMALIZE */
2018-12-03 12:40:13 -05:00
if ( curr_len > ctx - > engine - > maxhtmlnormalize ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanhtml: exiting (file larger than MaxHTMLNormalize) \n " ) ;
return CL_CLEAN ;
2007-01-14 13:25:14 +00:00
}
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( tempname = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " html-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( tempname , 0700 ) ) {
2007-01-14 13:25:14 +00:00
cli_errmsg ( " cli_scanhtml: Can't create temporary directory %s \n " , tempname ) ;
2017-08-08 17:38:17 -04:00
free ( tempname ) ;
2005-01-14 14:56:09 +00:00
return CL_ETMPDIR ;
}
2008-03-18 22:44:39 +00:00
cli_dbgmsg ( " cli_scanhtml: using tempdir %s \n " , tempname ) ;
2017-08-08 17:38:17 -04:00
html_normalise_map ( map , tempname , NULL , ctx - > dconf ) ;
snprintf ( fullname , 1024 , " %s " PATHSEP " nocomment.html " , tempname ) ;
fd = open ( fullname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_scan_desc ( fd , ctx , CL_TYPE_HTML , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ) = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
close ( fd ) ;
}
2018-12-03 12:40:13 -05:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2017-08-08 17:38:17 -04:00
/* CL_ENGINE_MAX_HTMLNOTAGS */
curr_len = map - > len ;
2018-12-03 12:40:13 -05:00
if ( curr_len > ctx - > engine - > maxhtmlnotags ) {
2017-08-08 17:38:17 -04:00
/* we're not interested in scanning large files in notags form */
/* TODO: don't even create notags if file is over limit */
cli_dbgmsg ( " cli_scanhtml: skipping notags (normalized size over MaxHTMLNoTags) \n " ) ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
snprintf ( fullname , 1024 , " %s " PATHSEP " notags.html " , tempname ) ;
fd = open ( fullname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_scan_desc ( fd , ctx , CL_TYPE_HTML , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ) = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
close ( fd ) ;
}
}
}
2018-12-03 12:40:13 -05:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2017-08-08 17:38:17 -04:00
snprintf ( fullname , 1024 , " %s " PATHSEP " javascript " , tempname ) ;
fd = open ( fullname , O_RDONLY | O_BINARY ) ;
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_scan_desc ( fd , ctx , CL_TYPE_HTML , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ) = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
2018-12-03 12:40:13 -05:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_scan_desc ( fd , ctx , CL_TYPE_TEXT_ASCII , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ) = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
}
close ( fd ) ;
}
}
2018-12-03 12:40:13 -05:00
if ( ret = = CL_CLEAN | | ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) ) {
2017-08-08 17:38:17 -04:00
snprintf ( fullname , 1024 , " %s " PATHSEP " rfc2397 " , tempname ) ;
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_dir ( fullname , ctx ) ;
2020-03-23 18:53:12 -04:00
if ( CL_EOPEN = = ret ) {
/* If the directory doesn't exist, that's fine */
ret = CL_CLEAN ;
}
2017-08-08 17:38:17 -04:00
}
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( tempname ) ;
free ( tempname ) ;
2018-07-20 22:28:48 -04:00
if ( SCAN_ALLMATCHES & & viruses_found )
2017-08-08 17:38:17 -04:00
return CL_VIRUS ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanscript ( cli_ctx * ctx )
2017-08-08 17:38:17 -04:00
{
const unsigned char * buff ;
2019-03-12 12:45:19 -04:00
unsigned char * normalized = NULL ;
2017-08-08 17:38:17 -04:00
struct text_norm_state state ;
char * tmpname = NULL ;
2020-03-19 21:23:54 -04:00
int ofd = - 1 ;
cl_error_t ret ;
2017-08-08 17:38:17 -04:00
struct cli_matcher * troot ;
uint32_t maxpatlen , offset = 0 ;
struct cli_matcher * groot ;
struct cli_ac_data gmdata , tmdata ;
2019-03-12 12:45:19 -04:00
int gmdata_initialized = 0 ;
int tmdata_initialized = 0 ;
2017-08-08 17:38:17 -04:00
struct cli_ac_data * mdata [ 2 ] ;
fmap_t * map ;
2018-12-03 12:40:13 -05:00
size_t at = 0 ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
uint64_t curr_len ;
struct cli_target_info info ;
if ( ! ctx | | ! ctx - > engine - > root )
return CL_ENULLARG ;
2018-12-03 12:40:13 -05:00
map = * ctx - > fmap ;
curr_len = map - > len ;
groot = ctx - > engine - > root [ 0 ] ;
troot = ctx - > engine - > root [ 7 ] ;
2017-08-08 17:38:17 -04:00
maxpatlen = troot ? troot - > maxpatlen : 0 ;
PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-01-08 00:09:08 -05:00
// Initialize info so it's safe to pass to destroy later
cli_targetinfo_init ( & info ) ;
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " in cli_scanscript() \n " ) ;
/* CL_ENGINE_MAX_SCRIPTNORMALIZE */
2018-12-03 12:40:13 -05:00
if ( curr_len > ctx - > engine - > maxscriptnormalize ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanscript: exiting (file larger than MaxScriptSize) \n " ) ;
2019-03-12 12:45:19 -04:00
ret = CL_CLEAN ;
goto done ;
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
if ( ! ( normalized = cli_malloc ( SCANBUFF + maxpatlen ) ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanscript: Unable to malloc %u bytes \n " , SCANBUFF ) ;
2019-03-12 12:45:19 -04:00
ret = CL_EMEM ;
goto done ;
2017-08-08 17:38:17 -04:00
}
text_normalize_init ( & state , normalized , SCANBUFF + maxpatlen ) ;
2018-12-03 12:40:13 -05:00
if ( ( ret = cli_ac_initdata ( & tmdata , troot ? troot - > ac_partsigs : 0 , troot ? troot - > ac_lsigs : 0 , troot ? troot - > ac_reloff_num : 0 , CLI_DEFAULT_AC_TRACKLEN ) ) ) {
2019-03-12 12:45:19 -04:00
goto done ;
2017-08-08 17:38:17 -04:00
}
2019-03-12 12:45:19 -04:00
tmdata_initialized = 1 ;
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
if ( ( ret = cli_ac_initdata ( & gmdata , groot - > ac_partsigs , groot - > ac_lsigs , groot - > ac_reloff_num , CLI_DEFAULT_AC_TRACKLEN ) ) ) {
2019-03-12 12:45:19 -04:00
goto done ;
2017-08-08 17:38:17 -04:00
}
2019-03-12 12:45:19 -04:00
gmdata_initialized = 1 ;
2017-08-08 17:38:17 -04:00
/* dump to disk only if explicitly asked to
* or if necessary to check relative offsets ,
* otherwise we can process just in - memory */
2018-12-03 12:40:13 -05:00
if ( ctx - > engine - > keeptmp | | ( troot & & ( troot - > ac_reloff_num > 0 | | troot - > linked_bcs ) ) ) {
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_gentempfd ( ctx - > sub_tmpdir , & tmpname , & ofd ) ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanscript: Can't generate temporary file/descriptor \n " ) ;
goto done ;
}
if ( ctx - > engine - > keeptmp )
cli_dbgmsg ( " cli_scanscript: saving normalized file to %s \n " , tmpname ) ;
}
mdata [ 0 ] = & tmdata ;
mdata [ 1 ] = & gmdata ;
/* If there's a relative offset in troot or triggered bytecodes, normalize to file.*/
2018-12-03 12:40:13 -05:00
if ( troot & & ( troot - > ac_reloff_num > 0 | | troot - > linked_bcs ) ) {
2017-08-08 17:38:17 -04:00
size_t map_off = 0 ;
2018-12-03 12:40:13 -05:00
while ( map_off < map - > len ) {
2017-08-08 17:38:17 -04:00
size_t written ;
if ( ! ( written = text_normalize_map ( & state , map , map_off ) ) )
break ;
map_off + = written ;
2018-12-03 12:40:13 -05:00
if ( write ( ofd , state . out , state . out_pos ) = = - 1 ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanscript: can't write to file %s \n " , tmpname ) ;
ret = CL_EWRITE ;
goto done ;
}
text_normalize_reset ( & state ) ;
}
/* Temporarily store the normalized file map in the context. */
2020-03-19 21:23:54 -04:00
* ctx - > fmap = fmap ( ofd , 0 , 0 , NULL ) ;
2018-12-03 12:40:13 -05:00
if ( ! ( * ctx - > fmap ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanscript: could not map file %s \n " , tmpname ) ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
/* scan map */
2020-03-21 14:15:28 -04:00
ret = cli_scan_fmap ( ctx , CL_TYPE_TEXT_ASCII , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ;
2018-12-03 12:40:13 -05:00
if ( ret = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
viruses_found + + ;
}
funmap ( * ctx - > fmap ) ;
}
* ctx - > fmap = map ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
/* Since the above is moderately costly all in all,
* do the old stuff if there ' s no relative offsets . */
2018-12-03 12:40:13 -05:00
if ( troot ) {
2017-08-08 17:38:17 -04:00
cli_targetinfo ( & info , 7 , map ) ;
ret = cli_ac_caloff ( troot , & tmdata , & info ) ;
if ( ret )
goto done ;
}
2005-01-14 14:56:09 +00:00
2018-12-03 12:40:13 -05:00
while ( 1 ) {
2017-08-08 17:38:17 -04:00
size_t len = MIN ( map - > pgsz , map - > len - at ) ;
2018-12-03 12:40:13 -05:00
buff = fmap_need_off_once ( map , at , len ) ;
2017-08-08 17:38:17 -04:00
at + = len ;
2018-12-03 12:40:13 -05:00
if ( ! buff | | ! len | | state . out_pos + len > state . out_len ) {
2017-08-08 17:38:17 -04:00
/* flush if error/EOF, or too little buffer space left */
2018-12-03 12:40:13 -05:00
if ( ( ofd ! = - 1 ) & & ( write ( ofd , state . out , state . out_pos ) = = - 1 ) ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanscript: can't write to file %s \n " , tmpname ) ;
close ( ofd ) ;
ofd = - 1 ;
/* we can continue to scan in memory */
}
/* when we flush the buffer also scan */
2020-03-21 14:15:28 -04:00
if ( cli_scan_buff ( state . out , state . out_pos , offset , ctx , CL_TYPE_TEXT_ASCII , mdata ) = = CL_VIRUS ) {
2018-07-20 22:28:48 -04:00
if ( SCAN_ALLMATCHES )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
2018-12-03 12:40:13 -05:00
else {
2017-08-08 17:38:17 -04:00
ret = CL_VIRUS ;
break ;
}
}
if ( ctx - > scanned )
* ctx - > scanned + = state . out_pos / CL_COUNT_PRECISION ;
offset + = state . out_pos ;
/* carry over maxpatlen from previous buffer */
if ( state . out_pos > maxpatlen )
memmove ( state . out , state . out + state . out_pos - maxpatlen , maxpatlen ) ;
text_normalize_reset ( & state ) ;
state . out_pos = maxpatlen ;
}
if ( ! len )
break ;
2018-12-03 12:40:13 -05:00
if ( ! buff | | text_normalize_buffer ( & state , buff , len ) ! = len ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanscript: short read during normalizing \n " ) ;
2012-11-27 17:15:02 -05:00
}
}
2005-01-14 14:56:09 +00:00
}
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_VIRUS | | SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
if ( ( ret = cli_exp_eval ( ctx , troot , & tmdata , NULL , NULL ) ) = = CL_VIRUS )
viruses_found + + ;
2018-07-20 22:28:48 -04:00
if ( ret ! = CL_VIRUS | | SCAN_ALLMATCHES )
2017-08-08 17:38:17 -04:00
if ( ( ret = cli_exp_eval ( ctx , groot , & gmdata , NULL , NULL ) ) = = CL_VIRUS )
viruses_found + + ;
2008-07-08 11:33:32 +00:00
}
2017-08-08 17:38:17 -04:00
done :
PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-01-08 00:09:08 -05:00
cli_targetinfo_destroy ( & info ) ;
2019-03-12 12:45:19 -04:00
if ( NULL ! = normalized ) {
free ( normalized ) ;
}
if ( tmdata_initialized ) {
cli_ac_freedata ( & tmdata ) ;
}
if ( gmdata_initialized ) {
cli_ac_freedata ( & gmdata ) ;
}
2017-08-08 17:38:17 -04:00
if ( ofd ! = - 1 )
close ( ofd ) ;
2018-12-03 12:40:13 -05:00
if ( tmpname ! = NULL ) {
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_unlink ( tmpname ) ;
free ( tmpname ) ;
2005-01-14 14:56:09 +00:00
}
2017-08-08 17:38:17 -04:00
if ( viruses_found )
return CL_VIRUS ;
2005-01-14 14:56:09 +00:00
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanhtml_utf16 ( cli_ctx * ctx )
2006-10-25 15:40:47 +00:00
{
2017-08-08 17:38:17 -04:00
char * tempname , * decoded ;
const char * buff ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
int fd , bytes ;
2018-12-03 12:40:13 -05:00
size_t at = 0 ;
2017-08-08 17:38:17 -04:00
fmap_t * map = * ctx - > fmap ;
2006-10-25 15:40:47 +00:00
cli_dbgmsg ( " in cli_scanhtml_utf16() \n " ) ;
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( tempname = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " html-utf16-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2020-03-30 20:42:44 -04:00
if ( ( fd = open ( tempname , O_RDWR | O_CREAT | O_TRUNC | O_BINARY , S_IRUSR | S_IWUSR ) ) < 0 ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanhtml_utf16: Can't create file %s \n " , tempname ) ;
free ( tempname ) ;
return CL_EOPEN ;
2006-10-25 15:40:47 +00:00
}
2008-03-18 22:44:39 +00:00
cli_dbgmsg ( " cli_scanhtml_utf16: using tempfile %s \n " , tempname ) ;
2018-12-03 12:40:13 -05:00
while ( at < map - > len ) {
2017-08-08 17:38:17 -04:00
bytes = MIN ( map - > len - at , map - > pgsz * 16 ) ;
2018-12-03 12:40:13 -05:00
if ( ! ( buff = fmap_need_off_once ( map , at , bytes ) ) ) {
2017-08-08 17:38:17 -04:00
close ( fd ) ;
cli_unlink ( tempname ) ;
free ( tempname ) ;
return CL_EREAD ;
}
at + = bytes ;
decoded = cli_utf16toascii ( buff , bytes ) ;
2018-12-03 12:40:13 -05:00
if ( decoded ) {
if ( write ( fd , decoded , bytes / 2 ) = = - 1 ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanhtml_utf16: Can't write to file %s \n " , tempname ) ;
free ( decoded ) ;
close ( fd ) ;
cli_unlink ( tempname ) ;
free ( tempname ) ;
return CL_EWRITE ;
}
free ( decoded ) ;
}
2006-10-25 15:40:47 +00:00
}
2020-03-19 21:23:54 -04:00
* ctx - > fmap = fmap ( fd , 0 , 0 , NULL ) ;
2018-12-03 12:40:13 -05:00
if ( * ctx - > fmap ) {
2017-08-08 17:38:17 -04:00
ret = cli_scanhtml ( ctx ) ;
funmap ( * ctx - > fmap ) ;
2018-12-03 12:40:13 -05:00
} else
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanhtml_utf16: fmap of %s failed \n " , tempname ) ;
2009-08-31 06:16:12 +02:00
* ctx - > fmap = map ;
2006-10-25 15:40:47 +00:00
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
2017-08-08 17:38:17 -04:00
if ( cli_unlink ( tempname ) )
ret = CL_EUNLINK ;
2018-12-03 12:40:13 -05:00
} else
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanhtml_utf16: Decoded HTML data saved in %s \n " , tempname ) ;
2006-10-25 15:40:47 +00:00
free ( tempname ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanole2 ( cli_ctx * ctx )
2004-04-20 22:33:42 +00:00
{
2020-08-06 22:15:33 -07:00
char * dir = NULL ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
2020-04-29 14:19:41 -07:00
struct uniq * files = NULL ;
2020-08-06 22:15:33 -07:00
int has_vba = 0 , has_xlm = 0 , has_macros = 0 , viruses_found = 0 ;
2004-09-07 21:19:02 +00:00
2004-04-20 22:33:42 +00:00
cli_dbgmsg ( " in cli_scanole2() \n " ) ;
2020-08-06 22:15:33 -07:00
if ( ctx - > engine - > maxreclevel & & ctx - > recursion > = ctx - > engine - > maxreclevel ) {
ret = CL_EMAXREC ;
goto done ;
}
2008-02-11 13:18:41 +00:00
2004-04-20 22:33:42 +00:00
/* generate the temporary directory */
2020-08-06 22:15:33 -07:00
if ( NULL = = ( dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " ole2-tmp " ) ) ) {
ret = CL_EMEM ;
goto done ;
}
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " OLE2: Can't create temporary directory %s \n " , dir ) ;
free ( dir ) ;
2020-08-12 18:14:39 -07:00
dir = NULL ;
2020-08-06 22:15:33 -07:00
ret = CL_ETMPDIR ;
goto done ;
2004-04-20 22:33:42 +00:00
}
2020-04-29 14:19:41 -07:00
ret = cli_ole2_extract ( dir , ctx , & files , & has_vba , & has_xlm ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_CLEAN & & ret ! = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " OLE2: %s \n " , cl_strerror ( ret ) ) ;
2020-08-06 22:15:33 -07:00
goto done ;
}
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
goto done ;
}
2004-04-20 22:33:42 +00:00
}
2020-04-29 14:19:41 -07:00
if ( has_vba & & files ) {
2008-05-27 16:30:47 +00:00
ctx - > recursion + + ;
2008-02-11 13:18:41 +00:00
2020-08-06 22:15:33 -07:00
ret = cli_vba_scandir ( dir , ctx , files , & has_macros ) ;
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
ctx - > recursion - - ;
goto done ;
}
2020-04-28 13:32:07 -07:00
}
2020-04-29 14:19:41 -07:00
2020-08-07 23:48:20 -07:00
ret = cli_vba_scandir_new ( dir , ctx , files , & has_macros ) ;
2020-08-06 22:15:33 -07:00
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
ctx - > recursion - - ;
goto done ;
}
}
2020-04-29 14:19:41 -07:00
ctx - > recursion - - ;
}
2020-08-06 22:15:33 -07:00
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
goto done ;
}
}
2020-04-29 14:19:41 -07:00
if ( has_xlm & & files ) {
ctx - > recursion + + ;
ret = cli_xlm_scandir ( dir , ctx , files ) ;
2020-08-06 22:15:33 -07:00
if ( CL_VIRUS = = ret ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
ctx - > recursion - - ;
goto done ;
}
}
2017-08-08 17:38:17 -04:00
ctx - > recursion - - ;
2004-04-20 22:33:42 +00:00
}
2020-08-07 23:48:20 -07:00
if ( ( has_xlm | | has_vba ) & & files ) {
2020-08-10 14:57:28 -07:00
ctx - > recursion + + ;
2020-08-07 23:48:20 -07:00
if ( CL_VIRUS = = cli_magic_scan_dir ( dir , ctx ) ) {
viruses_found + + ;
if ( ! SCAN_ALLMATCHES ) {
2020-08-10 14:57:28 -07:00
ctx - > recursion - - ;
2020-08-07 23:48:20 -07:00
goto done ;
}
}
2020-08-10 14:57:28 -07:00
ctx - > recursion - - ;
2020-08-06 22:15:33 -07:00
}
done :
2020-04-29 14:19:41 -07:00
if ( files ) {
uniq_free ( files ) ;
}
2020-08-06 22:15:33 -07:00
if ( NULL ! = dir ) {
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
free ( dir ) ;
}
2020-08-07 23:48:20 -07:00
if ( viruses_found > 0 ) {
ret = CL_VIRUS ;
}
2004-04-20 22:33:42 +00:00
return ret ;
2003-07-29 15:48:06 +00:00
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scantar ( cli_ctx * ctx , unsigned int posix )
2004-09-07 21:19:02 +00:00
{
2017-08-08 17:38:17 -04:00
char * dir ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
2004-09-07 21:19:02 +00:00
cli_dbgmsg ( " in cli_scantar() \n " ) ;
/* generate temporary directory */
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " tar-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " Tar: Can't create temporary directory %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2004-09-07 21:19:02 +00:00
}
2011-06-13 11:57:59 +03:00
ret = cli_untar ( dir , posix , ctx ) ;
2004-09-07 21:19:02 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2004-09-07 21:19:02 +00:00
free ( dir ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanscrenc ( cli_ctx * ctx )
2004-09-13 10:30:14 +00:00
{
2017-08-08 17:38:17 -04:00
char * tempname ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
2004-09-13 10:30:14 +00:00
cli_dbgmsg ( " in cli_scanscrenc() \n " ) ;
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( tempname = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " screnc-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( tempname , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " CHM: Can't create temporary directory %s \n " , tempname ) ;
free ( tempname ) ;
return CL_ETMPDIR ;
2004-09-13 10:30:14 +00:00
}
2011-06-13 12:03:26 +03:00
if ( html_screnc_decode ( * ctx - > fmap , tempname ) )
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_dir ( tempname , ctx ) ;
2004-09-13 10:30:14 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( tempname ) ;
2004-09-13 10:30:14 +00:00
free ( tempname ) ;
return ret ;
}
2004-09-18 00:14:00 +00:00
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanriff ( cli_ctx * ctx )
2005-02-05 15:50:18 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
2005-02-05 15:50:18 +00:00
2017-04-18 12:03:36 -04:00
if ( cli_check_riff_exploit ( ctx ) = = 2 )
2017-08-08 17:38:17 -04:00
ret = cli_append_virus ( ctx , " Heuristics.Exploit.W32.MS05-002 " ) ;
2005-02-05 15:50:18 +00:00
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scancryptff ( cli_ctx * ctx )
2005-11-14 21:02:26 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN , ndesc ;
2017-08-08 17:38:17 -04:00
unsigned int i ;
const unsigned char * src ;
unsigned char * dest = NULL ;
char * tempfile ;
size_t pos ;
size_t bread ;
2005-11-14 21:02:26 +00:00
/* Skip the CryptFF file header */
2011-06-13 12:12:01 +03:00
pos = 0x10 ;
2005-11-14 21:02:26 +00:00
2018-12-03 12:40:13 -05:00
if ( ( dest = ( unsigned char * ) cli_malloc ( FILEBUFF ) ) = = NULL ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " CryptFF: Can't allocate memory \n " ) ;
2005-11-14 21:02:26 +00:00
return CL_EMEM ;
}
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( tempfile = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " cryptff " ) ) ) {
2017-08-08 17:38:17 -04:00
free ( dest ) ;
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
}
2020-03-30 20:42:44 -04:00
if ( ( ndesc = open ( tempfile , O_RDWR | O_CREAT | O_TRUNC | O_BINARY , S_IRUSR | S_IWUSR ) ) < 0 ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " CryptFF: Can't create file %s \n " , tempfile ) ;
free ( dest ) ;
free ( tempfile ) ;
return CL_ECREAT ;
2005-11-14 21:02:26 +00:00
}
2018-12-03 12:40:13 -05:00
for ( ; ( src = fmap_need_off_once_len ( * ctx - > fmap , pos , FILEBUFF , & bread ) ) & & bread ; pos + = bread ) {
2017-08-08 17:38:17 -04:00
for ( i = 0 ; i < bread ; i + + )
dest [ i ] = src [ i ] ^ ( unsigned char ) 0xff ;
2019-05-04 15:54:54 -04:00
if ( cli_writen ( ndesc , dest , bread ) = = ( size_t ) - 1 ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " CryptFF: Can't write to descriptor %d \n " , ndesc ) ;
free ( dest ) ;
close ( ndesc ) ;
free ( tempfile ) ;
return CL_EWRITE ;
}
2005-11-14 21:02:26 +00:00
}
free ( dest ) ;
cli_dbgmsg ( " CryptFF: Scanning decrypted data \n " ) ;
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_magic_scan_desc ( ndesc , tempfile , ctx , NULL ) ) = = CL_VIRUS )
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " CryptFF: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
2005-11-14 21:02:26 +00:00
close ( ndesc ) ;
2017-08-08 17:38:17 -04:00
if ( ctx - > engine - > keeptmp )
cli_dbgmsg ( " CryptFF: Decompressed data saved in %s \n " , tempfile ) ;
else if ( cli_unlink ( tempfile ) )
ret = CL_EUNLINK ;
2005-11-14 21:02:26 +00:00
free ( tempfile ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanpdf ( cli_ctx * ctx , off_t offset )
2005-05-03 00:10:46 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret ;
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
char * dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " pdf-tmp " ) ;
2005-05-03 00:10:46 +00:00
2017-08-08 17:38:17 -04:00
if ( ! dir )
return CL_EMEM ;
2005-05-03 00:10:46 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Can't create temporary directory for PDF file %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2005-05-03 00:10:46 +00:00
}
2009-08-31 05:37:43 +02:00
ret = cli_pdf ( dir , ctx , offset ) ;
2005-05-03 00:10:46 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2005-05-03 00:10:46 +00:00
free ( dir ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scantnef ( cli_ctx * ctx )
2005-03-25 15:17:13 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret ;
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
char * dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " tnef-tmp " ) ;
2005-03-25 15:17:13 +00:00
2017-08-08 17:38:17 -04:00
if ( ! dir )
return CL_EMEM ;
2005-03-25 15:17:13 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Can't create temporary directory for tnef file %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2005-03-25 15:17:13 +00:00
}
2011-06-13 11:47:41 +03:00
ret = cli_tnef ( dir , ctx ) ;
2005-03-25 15:17:13 +00:00
2017-08-08 17:38:17 -04:00
if ( ret = = CL_CLEAN )
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_dir ( dir , ctx ) ;
2005-03-25 15:17:13 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2005-03-25 15:17:13 +00:00
free ( dir ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanuuencoded ( cli_ctx * ctx )
2006-01-21 18:37:48 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret ;
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
char * dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " uuencoded-tmp " ) ;
2006-01-21 18:37:48 +00:00
2017-08-08 17:38:17 -04:00
if ( ! dir )
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Can't create temporary directory for uuencoded file %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2006-01-21 18:37:48 +00:00
}
2009-09-10 03:19:43 +02:00
ret = cli_uuencode ( dir , * ctx - > fmap ) ;
2006-01-21 18:37:48 +00:00
2017-08-08 17:38:17 -04:00
if ( ret = = CL_CLEAN )
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_dir ( dir , ctx ) ;
2006-01-21 18:37:48 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2006-01-21 18:37:48 +00:00
free ( dir ) ;
return ret ;
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scanmail ( cli_ctx * ctx )
2003-07-29 15:48:06 +00:00
{
2017-08-08 17:38:17 -04:00
char * dir ;
2020-03-19 21:23:54 -04:00
cl_error_t ret ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
2003-07-29 15:48:06 +00:00
2008-02-06 21:19:10 +00:00
cli_dbgmsg ( " Starting cli_scanmail(), recursion = %u \n " , ctx - > recursion ) ;
2003-07-29 15:48:06 +00:00
2004-11-16 17:10:47 +00:00
/* generate the temporary directory */
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
if ( ! ( dir = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " mail-tmp " ) ) )
2017-08-08 17:38:17 -04:00
return CL_EMEM ;
2008-03-06 20:19:22 +00:00
2018-12-03 12:40:13 -05:00
if ( mkdir ( dir , 0700 ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Mail: Can't create temporary directory %s \n " , dir ) ;
free ( dir ) ;
return CL_ETMPDIR ;
2004-11-16 17:10:47 +00:00
}
2003-07-29 15:48:06 +00:00
2004-11-16 17:10:47 +00:00
/*
* Extract the attachments into the temporary directory
*/
2018-12-03 12:40:13 -05:00
if ( ( ret = cli_mbox ( dir , ctx ) ) ) {
2018-07-20 22:28:48 -04:00
if ( ret = = CL_VIRUS & & SCAN_ALLMATCHES )
2017-08-08 17:38:17 -04:00
viruses_found + + ;
2018-12-03 12:40:13 -05:00
else {
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
free ( dir ) ;
return ret ;
}
2004-11-16 17:10:47 +00:00
}
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_dir ( dir , ctx ) ;
2004-11-16 17:10:47 +00:00
2017-08-08 17:38:17 -04:00
if ( ! ctx - > engine - > keeptmp )
cli_rmdirs ( dir ) ;
2004-11-16 17:10:47 +00:00
free ( dir ) ;
2016-06-08 16:25:34 -04:00
if ( viruses_found )
2017-08-08 17:38:17 -04:00
return CL_VIRUS ;
2004-11-16 17:10:47 +00:00
return ret ;
2003-07-29 15:48:06 +00:00
}
2020-03-19 21:23:54 -04:00
static cl_error_t cli_scan_structured ( cli_ctx * ctx )
2008-04-16 18:47:42 +00:00
{
2017-08-08 17:38:17 -04:00
char buf [ 8192 ] ;
2019-08-16 17:18:59 -07:00
size_t result = 0 ;
2018-12-03 12:40:13 -05:00
unsigned int cc_count = 0 ;
2017-08-08 17:38:17 -04:00
unsigned int ssn_count = 0 ;
2018-12-03 12:40:13 -05:00
int done = 0 ;
2017-08-08 17:38:17 -04:00
fmap_t * map ;
size_t pos = 0 ;
2016-04-21 12:13:57 -04:00
int ( * ccfunc ) ( const unsigned char * buffer , size_t length , int cc_only ) ;
2019-05-04 15:54:54 -04:00
int ( * ssnfunc ) ( const unsigned char * buffer , size_t length ) ;
2017-08-08 17:38:17 -04:00
unsigned int viruses_found = 0 ;
if ( ctx = = NULL )
return CL_ENULLARG ;
2008-04-16 18:47:42 +00:00
2012-07-23 14:43:36 -04:00
map = * ctx - > fmap ;
2012-07-12 10:21:00 -04:00
2017-08-08 17:38:17 -04:00
if ( ctx - > engine - > min_cc_count = = 1 )
ccfunc = dlp_has_cc ;
2008-04-16 18:47:42 +00:00
else
2017-08-08 17:38:17 -04:00
ccfunc = dlp_get_cc_count ;
2008-04-16 18:47:42 +00:00
2018-12-03 12:40:13 -05:00
switch ( SCAN_HEURISTIC_STRUCTURED_SSN_NORMAL | SCAN_HEURISTIC_STRUCTURED_SSN_STRIPPED ) {
case ( CL_SCAN_HEURISTIC_STRUCTURED_SSN_NORMAL | CL_SCAN_HEURISTIC_STRUCTURED_SSN_STRIPPED ) :
if ( ctx - > engine - > min_ssn_count = = 1 )
ssnfunc = dlp_has_ssn ;
else
ssnfunc = dlp_get_ssn_count ;
break ;
2008-04-16 18:47:42 +00:00
2018-12-03 12:40:13 -05:00
case CL_SCAN_HEURISTIC_STRUCTURED_SSN_NORMAL :
if ( ctx - > engine - > min_ssn_count = = 1 )
ssnfunc = dlp_has_normal_ssn ;
else
ssnfunc = dlp_get_normal_ssn_count ;
break ;
2008-04-16 18:47:42 +00:00
2018-12-03 12:40:13 -05:00
case CL_SCAN_HEURISTIC_STRUCTURED_SSN_STRIPPED :
if ( ctx - > engine - > min_ssn_count = = 1 )
ssnfunc = dlp_has_stripped_ssn ;
else
ssnfunc = dlp_get_stripped_ssn_count ;
break ;
2008-04-18 17:14:20 +00:00
2018-12-03 12:40:13 -05:00
default :
ssnfunc = NULL ;
2008-04-16 18:47:42 +00:00
}
2019-05-04 15:54:54 -04:00
while ( ! done & & ( ( result = fmap_readn ( map , buf , pos , 8191 ) ) > 0 ) & & ( result ! = ( size_t ) - 1 ) ) {
2017-08-08 17:38:17 -04:00
pos + = result ;
2016-04-21 12:13:57 -04:00
if ( ( cc_count + = ccfunc ( ( const unsigned char * ) buf , result ,
2019-08-27 17:33:22 -04:00
( ctx - > options - > heuristic & CL_SCAN_HEURISTIC_STRUCTURED_CC ) ? 1 : 0 ) ) > = ctx - > engine - > min_cc_count ) {
2017-08-08 17:38:17 -04:00
done = 1 ;
}
2008-04-18 17:14:20 +00:00
2018-12-03 12:40:13 -05:00
if ( ssnfunc & & ( ( ssn_count + = ssnfunc ( ( const unsigned char * ) buf , result ) ) > = ctx - > engine - > min_ssn_count ) ) {
2017-08-08 17:38:17 -04:00
done = 1 ;
}
2008-04-16 18:47:42 +00:00
}
2018-12-03 12:40:13 -05:00
if ( cc_count ! = 0 & & cc_count > = ctx - > engine - > min_cc_count ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scan_structured: %u credit card numbers detected \n " , cc_count ) ;
2018-12-03 12:40:13 -05:00
if ( CL_VIRUS = = cli_append_virus ( ctx , " Heuristics.Structured.CreditCardNumber " ) ) {
if ( SCAN_ALLMATCHES ) {
2017-04-18 12:03:36 -04:00
viruses_found + + ;
2018-12-03 12:40:13 -05:00
} else {
2017-04-18 12:03:36 -04:00
return CL_VIRUS ;
2017-08-08 17:38:17 -04:00
}
}
2008-04-16 18:47:42 +00:00
}
2018-12-03 12:40:13 -05:00
if ( ssn_count ! = 0 & & ssn_count > = ctx - > engine - > min_ssn_count ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scan_structured: %u social security numbers detected \n " , ssn_count ) ;
2018-12-03 12:40:13 -05:00
if ( CL_VIRUS = = cli_append_virus ( ctx , " Heuristics.Structured.SSN " ) ) {
if ( SCAN_ALLMATCHES ) {
2017-04-18 12:03:36 -04:00
viruses_found + + ;
2018-12-03 12:40:13 -05:00
} else {
2017-04-18 12:03:36 -04:00
return CL_VIRUS ;
2017-08-08 17:38:17 -04:00
}
}
2008-04-16 18:47:42 +00:00
}
2016-06-08 16:25:34 -04:00
if ( viruses_found )
2017-08-08 17:38:17 -04:00
return CL_VIRUS ;
2008-04-16 18:47:42 +00:00
return CL_CLEAN ;
}
2019-05-04 15:54:54 -04:00
static cl_error_t cli_scanembpe ( cli_ctx * ctx , off_t offset )
2007-03-12 21:31:40 +00:00
{
2019-05-04 15:54:54 -04:00
cl_error_t ret = CL_CLEAN ;
int fd ;
size_t bytes ;
size_t size = 0 ;
size_t todo ;
2017-08-08 17:38:17 -04:00
const char * buff ;
char * tmpname ;
fmap_t * map = * ctx - > fmap ;
unsigned int corrupted_input ;
2007-03-12 21:31:40 +00:00
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
tmpname = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , " embedded-pe " ) ;
2017-08-08 17:38:17 -04:00
if ( ! tmpname )
return CL_EMEM ;
2007-03-12 21:31:40 +00:00
2020-03-30 20:42:44 -04:00
if ( ( fd = open ( tmpname , O_RDWR | O_CREAT | O_TRUNC | O_BINARY , S_IRUSR | S_IWUSR ) ) < 0 ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " cli_scanembpe: Can't create file %s \n " , tmpname ) ;
free ( tmpname ) ;
return CL_ECREAT ;
2007-03-12 21:31:40 +00:00
}
2010-07-29 03:55:24 +02:00
todo = map - > len - offset ;
2018-12-03 12:40:13 -05:00
while ( 1 ) {
2017-08-08 17:38:17 -04:00
bytes = MIN ( todo , map - > pgsz ) ;
if ( ! bytes )
break ;
2018-12-03 12:40:13 -05:00
if ( ! ( buff = fmap_need_off_once ( map , offset + size , bytes ) ) ) {
2017-08-08 17:38:17 -04:00
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
return CL_EREAD ;
}
size + = bytes ;
todo - = bytes ;
if ( cli_checklimits ( " cli_scanembpe " , ctx , size , 0 , 0 ) ! = CL_CLEAN )
break ;
2018-12-03 12:40:13 -05:00
if ( cli_writen ( fd , buff , bytes ) ! = bytes ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanembpe: Can't write to temporary file \n " ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
return CL_EWRITE ;
}
2007-03-12 21:31:40 +00:00
}
2008-02-13 10:34:58 +00:00
ctx - > recursion + + ;
2018-12-03 12:40:13 -05:00
corrupted_input = ctx - > corrupted_input ;
2010-10-18 13:23:51 +02:00
ctx - > corrupted_input = 1 ;
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_desc ( fd , tmpname , ctx , NULL ) ;
2010-10-18 13:23:51 +02:00
ctx - > corrupted_input = corrupted_input ;
2018-12-03 12:40:13 -05:00
if ( ret = = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_scanembpe: Infected with %s \n " , cli_get_last_virus ( ctx ) ) ;
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
2020-02-28 18:29:35 -05:00
ctx - > recursion - - ;
2017-08-08 17:38:17 -04:00
return CL_EUNLINK ;
}
}
free ( tmpname ) ;
2020-02-28 18:29:35 -05:00
ctx - > recursion - - ;
2017-08-08 17:38:17 -04:00
return CL_VIRUS ;
2007-03-12 21:31:40 +00:00
}
2008-02-13 10:34:58 +00:00
ctx - > recursion - - ;
2007-03-12 21:31:40 +00:00
close ( fd ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
2017-08-08 17:38:17 -04:00
free ( tmpname ) ;
return CL_EUNLINK ;
}
2008-04-08 17:45:05 +00:00
}
2007-03-12 21:31:40 +00:00
free ( tmpname ) ;
2020-03-21 14:15:28 -04:00
/* intentionally ignore possible errors from cli_magic_scan_desc */
2007-03-12 21:31:40 +00:00
return CL_CLEAN ;
}
2018-10-19 20:43:19 -07:00
# if defined(_WIN32) || defined(C_LINUX) || defined(C_DARWIN)
2011-02-14 19:19:20 +02:00
# define PERF_MEASURE
# endif
# ifdef PERF_MEASURE
2017-08-08 17:38:17 -04:00
static struct
{
2011-02-14 19:19:20 +02:00
enum perfev id ;
const char * name ;
enum ev_type type ;
} perf_events [ ] = {
{ PERFT_SCAN , " full scan " , ev_time } ,
{ PERFT_PRECB , " prescan cb " , ev_time } ,
{ PERFT_POSTCB , " postscan cb " , ev_time } ,
{ PERFT_CACHE , " cache " , ev_time } ,
{ PERFT_FT , " filetype " , ev_time } ,
{ PERFT_CONTAINER , " container " , ev_time } ,
{ PERFT_SCRIPT , " script " , ev_time } ,
{ PERFT_PE , " pe " , ev_time } ,
{ PERFT_RAW , " raw " , ev_time } ,
{ PERFT_RAWTYPENO , " raw container " , ev_time } ,
{ PERFT_MAP , " map " , ev_time } ,
2017-08-08 17:38:17 -04:00
{ PERFT_BYTECODE , " bytecode " , ev_time } ,
{ PERFT_KTIME , " kernel " , ev_int } ,
{ PERFT_UTIME , " user " , ev_int } } ;
2011-02-14 19:19:20 +02:00
static void get_thread_times ( uint64_t * kt , uint64_t * ut )
{
# ifdef _WIN32
2017-08-08 17:38:17 -04:00
FILETIME c , e , k , u ;
ULARGE_INTEGER kl , ul ;
2018-12-03 12:40:13 -05:00
if ( ! GetThreadTimes ( GetCurrentThread ( ) , & c , & e , & k , & u ) ) {
2017-08-08 17:38:17 -04:00
* kt = * ut = 0 ;
return ;
2011-02-14 19:19:20 +02:00
}
2018-12-03 12:40:13 -05:00
kl . LowPart = k . dwLowDateTime ;
2011-02-14 19:19:20 +02:00
kl . HighPart = k . dwHighDateTime ;
2018-12-03 12:40:13 -05:00
ul . LowPart = u . dwLowDateTime ;
2011-02-14 19:19:20 +02:00
ul . HighPart = u . dwHighDateTime ;
2018-12-03 12:40:13 -05:00
* kt = kl . QuadPart / 10 ;
* ut = ul . QuadPart / 10 ;
2011-02-14 19:19:20 +02:00
# else
struct tms tbuf ;
2019-05-03 18:25:17 -04:00
if ( times ( & tbuf ) ! = ( ( clock_t ) - 1 ) ) {
2017-08-08 17:38:17 -04:00
clock_t tck = sysconf ( _SC_CLK_TCK ) ;
2018-12-03 12:40:13 -05:00
* kt = ( ( uint64_t ) 1000000 ) * tbuf . tms_stime / tck ;
* ut = ( ( uint64_t ) 1000000 ) * tbuf . tms_utime / tck ;
} else {
2017-08-08 17:38:17 -04:00
* kt = * ut = 0 ;
2011-02-14 19:19:20 +02:00
}
# endif
}
static inline void perf_init ( cli_ctx * ctx )
{
2017-08-08 17:38:17 -04:00
uint64_t kt , ut ;
2011-02-14 19:19:20 +02:00
unsigned i ;
2018-07-20 22:28:48 -04:00
if ( ! SCAN_DEV_COLLECT_PERF_INFO )
2017-08-08 17:38:17 -04:00
return ;
2011-02-14 19:19:20 +02:00
ctx - > perf = cli_events_new ( PERFT_LAST ) ;
2018-12-03 12:40:13 -05:00
for ( i = 0 ; i < sizeof ( perf_events ) / sizeof ( perf_events [ 0 ] ) ; i + + ) {
2017-08-08 17:38:17 -04:00
if ( cli_event_define ( ctx - > perf , perf_events [ i ] . id , perf_events [ i ] . name ,
perf_events [ i ] . type , multiple_sum ) = = - 1 )
continue ;
2011-02-14 19:19:20 +02:00
}
cli_event_time_start ( ctx - > perf , PERFT_SCAN ) ;
get_thread_times ( & kt , & ut ) ;
cli_event_int ( ctx - > perf , PERFT_KTIME , - kt ) ;
cli_event_int ( ctx - > perf , PERFT_UTIME , - ut ) ;
}
2017-08-08 17:38:17 -04:00
static inline void perf_done ( cli_ctx * ctx )
2011-02-14 19:19:20 +02:00
{
char timestr [ 512 ] ;
char * p ;
unsigned i ;
2017-08-08 17:38:17 -04:00
uint64_t kt , ut ;
2011-02-14 19:25:22 +02:00
char * pend ;
2011-02-14 19:19:20 +02:00
cli_events_t * perf = ctx - > perf ;
if ( ! perf )
2017-08-08 17:38:17 -04:00
return ;
2011-02-14 19:19:20 +02:00
2018-12-03 12:40:13 -05:00
p = timestr ;
pend = timestr + sizeof ( timestr ) - 1 ;
2011-02-14 19:19:20 +02:00
* pend = 0 ;
cli_event_time_stop ( perf , PERFT_SCAN ) ;
get_thread_times ( & kt , & ut ) ;
cli_event_int ( perf , PERFT_KTIME , kt ) ;
cli_event_int ( perf , PERFT_UTIME , ut ) ;
2018-12-03 12:40:13 -05:00
for ( i = 0 ; i < sizeof ( perf_events ) / sizeof ( perf_events [ 0 ] ) ; i + + ) {
2017-08-08 17:38:17 -04:00
union ev_val val ;
unsigned count ;
2011-02-14 19:19:20 +02:00
2017-08-08 17:38:17 -04:00
cli_event_get ( perf , perf_events [ i ] . id , & val , & count ) ;
if ( p < pend )
p + = snprintf ( p , pend - p , " %s: %d.%03ums, " , perf_events [ i ] . name ,
( signed ) ( val . v_int / 1000 ) ,
( unsigned ) ( val . v_int % 1000 ) ) ;
2011-02-14 19:19:20 +02:00
}
* p = 0 ;
cli_infomsg ( ctx , " performance: %s \n " , timestr ) ;
cli_events_free ( perf ) ;
ctx - > perf = NULL ;
}
2017-08-08 17:38:17 -04:00
static inline void perf_start ( cli_ctx * ctx , int id )
2011-02-14 19:19:20 +02:00
{
cli_event_time_start ( ctx - > perf , id ) ;
}
2017-08-08 17:38:17 -04:00
static inline void perf_stop ( cli_ctx * ctx , int id )
2011-02-14 19:19:20 +02:00
{
cli_event_time_stop ( ctx - > perf , id ) ;
}
2017-08-08 17:38:17 -04:00
static inline void perf_nested_start ( cli_ctx * ctx , int id , int nestedid )
2011-02-14 19:19:20 +02:00
{
cli_event_time_nested_start ( ctx - > perf , id , nestedid ) ;
}
2017-08-08 17:38:17 -04:00
static inline void perf_nested_stop ( cli_ctx * ctx , int id , int nestedid )
2011-02-14 19:19:20 +02:00
{
cli_event_time_nested_stop ( ctx - > perf , id , nestedid ) ;
}
# else
2017-08-08 17:38:17 -04:00
static inline void perf_init ( cli_ctx * ctx )
{
UNUSEDPARAM ( ctx ) ;
}
static inline void perf_start ( cli_ctx * ctx , int id )
{
UNUSEDPARAM ( ctx ) ;
UNUSEDPARAM ( id ) ;
}
static inline void perf_stop ( cli_ctx * ctx , int id )
{
UNUSEDPARAM ( ctx ) ;
UNUSEDPARAM ( id ) ;
}
static inline void perf_nested_start ( cli_ctx * ctx , int id , int nestedid )
{
UNUSEDPARAM ( ctx ) ;
UNUSEDPARAM ( id ) ;
UNUSEDPARAM ( nestedid ) ;
}
static inline void perf_nested_stop ( cli_ctx * ctx , int id , int nestedid )
{
UNUSEDPARAM ( ctx ) ;
UNUSEDPARAM ( id ) ;
UNUSEDPARAM ( nestedid ) ;
}
2018-12-03 12:40:13 -05:00
static inline void perf_done ( cli_ctx * ctx )
{
UNUSEDPARAM ( ctx ) ;
}
2011-02-14 19:19:20 +02:00
# endif
2020-03-19 21:23:54 -04:00
/**
* @ brief Perform raw scan of current fmap .
*
* @ param ctx Current scan context .
* @ param type File type
* @ param typercg Enable type recognition ( file typing scan results ) .
* If 0 , will be a regular ac - mode scan .
* @ param dettype [ out ] If typercg enabled and scan detects HTML or MAIL types ,
* will output HTML or MAIL types after performing HTML / MAIL scans
* @ param refhash Hash of current fmap
* @ return cl_error_t
*/
2020-03-21 14:15:28 -04:00
static cl_error_t scanraw ( cli_ctx * ctx , cli_file_t type , uint8_t typercg , cli_file_t * dettype , unsigned char * refhash )
2006-01-25 12:11:31 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN , nret = CL_CLEAN ;
2017-08-08 17:38:17 -04:00
struct cli_matched_type * ftoffset = NULL , * fpt ;
struct cli_exe_info peinfo ;
unsigned int acmode = AC_SCAN_VIR , break_loop = 0 ;
fmap_t * map = * ctx - > fmap ;
2020-03-21 11:36:53 -04:00
# if HAVE_JSON
struct json_object * parent_property = NULL ;
# else
void * parent_property = NULL ;
# endif
2006-01-25 12:11:31 +00:00
2018-12-03 12:40:13 -05:00
if ( ctx - > engine - > maxreclevel & & ctx - > recursion > = ctx - > engine - > maxreclevel ) {
2016-08-24 17:39:20 -04:00
cli_check_blockmax ( ctx , CL_EMAXREC ) ;
2009-06-10 20:50:49 +00:00
return CL_EMAXREC ;
2016-08-24 17:39:20 -04:00
}
2009-06-10 20:50:49 +00:00
2020-08-12 00:18:53 -07:00
if ( ( typercg ) & &
2021-01-19 14:23:02 -08:00
// We should also omit bzips, but DMG's may be detected in bzips. (type != CL_TYPE_BZ) && /* Omit BZ files because they can contain portions of original files like zip file entries that cause invalid extractions and lots of warnings. Decompress first, then scan! */
2020-12-17 19:28:15 -08:00
( type ! = CL_TYPE_GZ ) & & /* Omit GZ files because they can contain portions of original files like zip file entries that cause invalid extractions and lots of warnings. Decompress first, then scan! */
( type ! = CL_TYPE_GPT ) & & /* Omit GPT files because it's an image format that we can extract and scan manually. */
( type ! = CL_TYPE_CPIO_OLD ) & & /* Omit CPIO_OLD files because it's an image format that we can extract and scan manually. */
( type ! = CL_TYPE_ZIP ) & & /* Omit ZIP files because it'll detect each zip file entry as SFXZIP, which is a waste. We'll extract it and then scan. */
( type ! = CL_TYPE_OLD_TAR ) & & /* Omit OLD TAR files because it's a raw archive format that we can extract and scan manually. */
( type ! = CL_TYPE_POSIX_TAR ) ) { /* Omit POSIX TAR files because it's a raw archive format that we can extract and scan manually. */
2020-08-12 00:18:53 -07:00
/*
2020-12-17 19:28:15 -08:00
* Enable file type recognition scan mode if requested , except for some some problematic types ( above ) .
2020-08-12 00:18:53 -07:00
*/
2017-08-08 17:38:17 -04:00
acmode | = AC_SCAN_FT ;
2020-08-12 00:18:53 -07:00
}
2006-01-25 12:11:31 +00:00
2020-08-12 00:18:53 -07:00
perf_start ( ctx , PERFT_RAW ) ;
2020-03-21 14:15:28 -04:00
ret = cli_scan_fmap ( ctx , type = = CL_TYPE_TEXT_ASCII ? CL_TYPE_ANY : type , 0 , & ftoffset , acmode , NULL , refhash ) ;
2011-02-14 19:19:20 +02:00
perf_stop ( ctx , PERFT_RAW ) ;
2006-01-25 12:11:31 +00:00
2019-02-12 15:10:04 -05:00
// TODO I think this causes embedded file extraction to stop when a
2020-03-21 14:15:28 -04:00
// signature has matched in cli_scan_fmap, which wouldn't be what
2019-02-12 15:10:04 -05:00
// we want if allmatch is specified.
2018-12-03 12:40:13 -05:00
if ( ret > = CL_TYPENO ) {
2017-08-08 17:38:17 -04:00
perf_nested_start ( ctx , PERFT_RAWTYPENO , PERFT_SCAN ) ;
ctx - > recursion + + ;
2013-10-23 16:21:46 -04:00
fpt = ftoffset ;
2018-12-03 12:40:13 -05:00
while ( fpt ) {
2017-01-19 12:24:46 -05:00
/* set current level as container AFTER recursing */
cli_set_container ( ctx , fpt - > type , map - > len ) ;
2020-03-21 11:36:53 -04:00
if ( fpt - > offset > 0 ) {
/*
* Scan embedded file types .
*/
2019-05-23 22:50:04 -04:00
# if HAVE_JSON
2020-03-21 11:36:53 -04:00
if ( SCAN_COLLECT_METADATA & & ctx - > wrkproperty ) {
json_object * arrobj ;
parent_property = ctx - > wrkproperty ;
if ( ! json_object_object_get_ex ( parent_property , " EmbeddedObjects " , & arrobj ) ) {
arrobj = json_object_new_array ( ) ;
if ( NULL = = arrobj ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " scanraw: no memory for json properties object \n " ) ;
2020-03-21 11:36:53 -04:00
nret = CL_EMEM ;
break ;
}
json_object_object_add ( parent_property , " EmbeddedObjects " , arrobj ) ;
}
ctx - > wrkproperty = json_object_new_object ( ) ;
if ( NULL = = ctx - > wrkproperty ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " scanraw: no memory for json properties object \n " ) ;
2020-03-21 11:36:53 -04:00
nret = CL_EMEM ;
break ;
}
json_object_array_add ( arrobj , ctx - > wrkproperty ) ;
ret = cli_jsonstr ( ctx - > wrkproperty , " FileType " , cli_ftname ( fpt - > type ) ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " scanraw: failed to add string to json object \n " ) ;
2020-03-21 11:36:53 -04:00
nret = CL_EMEM ;
break ;
}
ret = cli_jsonint64 ( ctx - > wrkproperty , " Offset " , ( int64_t ) fpt - > offset ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " scanraw: failed to add int to json object \n " ) ;
2020-03-21 11:36:53 -04:00
nret = CL_EMEM ;
break ;
}
}
2019-05-23 22:50:04 -04:00
# endif
2018-12-03 12:40:13 -05:00
switch ( fpt - > type ) {
case CL_TYPE_MHTML :
if ( SCAN_PARSE_MAIL & & ( DCONF_MAIL & MAIL_CONF_MBOX ) ) {
cli_dbgmsg ( " MHTML signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = ret = cli_scanmail ( ctx ) ;
}
break ;
2016-05-02 17:32:03 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_XDP :
if ( SCAN_PARSE_PDF & & ( DCONF_DOC & DOC_CONF_PDF ) ) {
cli_dbgmsg ( " XDP signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = ret = cli_scanxdp ( ctx ) ;
}
break ;
case CL_TYPE_XML_WORD :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_MSXML ) ) {
cli_dbgmsg ( " XML-WORD signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = ret = cli_scanmsxml ( ctx ) ;
}
break ;
case CL_TYPE_XML_XL :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_MSXML ) ) {
cli_dbgmsg ( " XML-XL signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = ret = cli_scanmsxml ( ctx ) ;
}
break ;
case CL_TYPE_XML_HWP :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_HWP ) ) {
cli_dbgmsg ( " XML-HWP signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = ret = cli_scanhwpml ( ctx ) ;
}
break ;
case CL_TYPE_RARSFX :
if ( type ! = CL_TYPE_RAR & & have_rar & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_RAR ) ) {
const char * filepath = NULL ;
int fd = - 1 ;
2018-07-30 20:19:28 -04:00
2018-12-03 12:40:13 -05:00
char * tmpname = NULL ;
int tmpfd = - 1 ;
size_t csize = map - > len - fpt - > offset ; /* not precise */
2018-07-30 20:19:28 -04:00
2018-12-03 12:40:13 -05:00
cli_set_container ( ctx , CL_TYPE_RAR , csize ) ;
cli_dbgmsg ( " RAR/RAR-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
2018-07-30 20:19:28 -04:00
2020-01-23 17:42:33 -08:00
# ifdef _WIN32
if ( ( fpt - > offset ! = 0 ) | | ( SCAN_UNPRIVILEGED ) | | ( NULL = = ctx - > sub_filepath ) | | ( 0 ! = _access_s ( ctx - > sub_filepath , R_OK ) ) ) {
# else
if ( ( fpt - > offset ! = 0 ) | | ( SCAN_UNPRIVILEGED ) | | ( NULL = = ctx - > sub_filepath ) | | ( 0 ! = access ( ctx - > sub_filepath , R_OK ) ) ) {
# endif
2018-12-03 12:40:13 -05:00
/*
2020-01-23 17:42:33 -08:00
* If map is not file - backed , or offset is not at the start of the file . . .
* . . . have to dump to file for scanrar .
*/
2020-03-19 21:23:54 -04:00
nret = fmap_dump_to_file ( map , ctx - > sub_filepath , ctx - > sub_tmpdir , & tmpname , & tmpfd , fpt - > offset , fpt - > offset + csize ) ;
2018-12-03 12:40:13 -05:00
if ( nret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " scanraw: failed to generate temporary file. \n " ) ;
2018-12-03 12:40:13 -05:00
ret = nret ;
break_loop = 1 ;
break ;
}
filepath = tmpname ;
fd = tmpfd ;
} else {
/* Use the original file and file descriptor. */
filepath = ctx - > sub_filepath ;
fd = fmap_fd ( map ) ;
2013-11-22 19:41:46 -05:00
}
2018-07-30 20:19:28 -04:00
2018-12-03 12:40:13 -05:00
/* scan file */
nret = cli_scanrar ( filepath , fd , ctx ) ;
2018-07-30 20:19:28 -04:00
2020-01-23 17:42:33 -08:00
if ( ( NULL = = tmpname ) & & ( CL_EOPEN = = nret ) ) {
/*
* Failed to open the file using the original filename .
* Try writing the file descriptor to a temp file and try again .
*/
2020-03-19 21:23:54 -04:00
nret = fmap_dump_to_file ( map , ctx - > sub_filepath , ctx - > sub_tmpdir , & tmpname , & tmpfd , fpt - > offset , fpt - > offset + csize ) ;
2020-01-23 17:42:33 -08:00
if ( nret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " scanraw: failed to generate temporary file. \n " ) ;
2020-01-23 17:42:33 -08:00
ret = nret ;
break_loop = 1 ;
break ;
}
filepath = tmpname ;
fd = tmpfd ;
/* try to scan again */
nret = cli_scanrar ( filepath , fd , ctx ) ;
}
2018-12-03 12:40:13 -05:00
if ( tmpfd ! = - 1 ) {
/* If dumped tempfile, need to cleanup */
close ( tmpfd ) ;
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
ret = nret = CL_EUNLINK ;
break_loop = 1 ;
}
2013-11-22 19:41:46 -05:00
}
}
2018-07-30 20:19:28 -04:00
2018-12-03 12:40:13 -05:00
if ( tmpname ! = NULL ) {
free ( tmpname ) ;
}
2013-11-22 19:41:46 -05:00
}
2018-12-03 12:40:13 -05:00
break ;
2013-10-23 16:21:46 -04:00
2018-10-08 12:59:42 -04:00
case CL_TYPE_EGGSFX :
if ( type ! = CL_TYPE_EGG & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_EGG ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_EGG , csize ) ;
cli_dbgmsg ( " EGG/EGG-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanegg ( ctx , fpt - > offset ) ;
}
break ;
2018-12-03 12:40:13 -05:00
case CL_TYPE_ZIPSFX :
if ( type ! = CL_TYPE_ZIP & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ZIP ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_ZIP , csize ) ;
cli_dbgmsg ( " ZIP/ZIP-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_unzip_single ( ctx , fpt - > offset ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_CABSFX :
if ( type ! = CL_TYPE_MSCAB & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CAB ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_MSCAB , csize ) ;
cli_dbgmsg ( " CAB/CAB-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanmscab ( ctx , fpt - > offset ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_ARJSFX :
if ( type ! = CL_TYPE_ARJ & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ARJ ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_ARJ , csize ) ;
cli_dbgmsg ( " ARJ-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanarj ( ctx , fpt - > offset ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_7ZSFX :
if ( type ! = CL_TYPE_7Z & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_7Z ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_7Z , csize ) ;
cli_dbgmsg ( " 7Zip-SFX signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_7unz ( ctx , fpt - > offset ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_ISO9660 :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ISO9660 ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_ISO9660 , csize ) ;
cli_dbgmsg ( " ISO9660 signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scaniso ( ctx , fpt - > offset ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_NULSFT :
if ( SCAN_PARSE_ARCHIVE & & type = = CL_TYPE_MSEXE & & ( DCONF_ARCH & ARCH_CONF_NSIS ) & &
fpt - > offset > 4 ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_NULSFT , csize ) ;
cli_dbgmsg ( " NSIS signature found at %u \n " , ( unsigned int ) fpt - > offset - 4 ) ;
nret = cli_scannulsft ( ctx , fpt - > offset - 4 ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_AUTOIT :
if ( SCAN_PARSE_ARCHIVE & & type = = CL_TYPE_MSEXE & & ( DCONF_ARCH & ARCH_CONF_AUTOIT ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_AUTOIT , csize ) ;
cli_dbgmsg ( " AUTOIT signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanautoit ( ctx , fpt - > offset + 23 ) ;
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_ISHIELD_MSI :
if ( SCAN_PARSE_ARCHIVE & & type = = CL_TYPE_MSEXE & & ( DCONF_ARCH & ARCH_CONF_ISHIELD ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
2020-08-12 00:18:53 -07:00
cli_set_container ( ctx , CL_TYPE_ISHIELD_MSI , csize ) ;
2018-12-03 12:40:13 -05:00
cli_dbgmsg ( " ISHIELD-MSI signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanishield_msi ( ctx , fpt - > offset + 14 ) ;
2014-03-13 15:25:33 -04:00
}
2018-12-03 12:40:13 -05:00
break ;
case CL_TYPE_DMG :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_DMG ) ) {
cli_dbgmsg ( " DMG signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scandmg ( ctx ) ;
2014-03-13 15:25:33 -04:00
}
2018-12-03 12:40:13 -05:00
break ;
2014-03-13 15:25:33 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_MBR :
if ( SCAN_PARSE_ARCHIVE ) {
int iret = cli_mbr_check2 ( ctx , 0 ) ;
if ( ( iret = = CL_TYPE_GPT ) & & ( DCONF_ARCH & ARCH_CONF_GPT ) ) {
cli_dbgmsg ( " Recognized GUID Partition Table file \n " ) ;
cli_set_container ( ctx , CL_TYPE_GPT , map - > len ) ;
cli_dbgmsg ( " GPT signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scangpt ( ctx , 0 ) ;
} else if ( ( iret = = CL_CLEAN ) & & ( DCONF_ARCH & ARCH_CONF_MBR ) ) {
cli_dbgmsg ( " MBR signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanmbr ( ctx , 0 ) ;
}
}
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_PDF :
if ( type ! = CL_TYPE_PDF & & SCAN_PARSE_PDF & & ( DCONF_DOC & DOC_CONF_PDF ) ) {
size_t csize = map - > len - fpt - > offset ; /* not precise */
cli_set_container ( ctx , CL_TYPE_PDF , csize ) ;
cli_dbgmsg ( " PDF signature found at %u \n " , ( unsigned int ) fpt - > offset ) ;
nret = cli_scanpdf ( ctx , fpt - > offset ) ;
2013-10-23 16:21:46 -04:00
}
2018-12-03 12:40:13 -05:00
break ;
case CL_TYPE_MSEXE :
if ( SCAN_PARSE_PE & & ( type = = CL_TYPE_MSEXE | | type = = CL_TYPE_ZIP | | type = = CL_TYPE_MSOLE2 ) & & ctx - > dconf - > pe ) {
uint64_t curr_len = map - > len ;
size_t csize = map - > len - fpt - > offset ; /* not precise */
/* CL_ENGINE_MAX_EMBEDDED_PE */
if ( curr_len > ctx - > engine - > maxembeddedpe ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " scanraw: MaxEmbeddedPE exceeded \n " ) ;
2018-12-03 12:40:13 -05:00
break ;
}
cli_set_container ( ctx , CL_TYPE_MSEXE , csize ) ;
PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-01-08 00:09:08 -05:00
cli_exe_info_init ( & peinfo , fpt - > offset ) ;
// TODO We could probably substitute in a quicker
// method of determining whether a PE file exists
// at this offset.
2019-05-10 16:38:57 -04:00
if ( cli_peheader ( map , & peinfo , CLI_PEHEADER_OPT_NONE , NULL ) ! = 0 ) {
/* Despite failing, peinfo memory may have been allocated and must be freed. */
cli_exe_info_destroy ( & peinfo ) ;
} else {
2018-12-03 12:40:13 -05:00
cli_dbgmsg ( " *** Detected embedded PE file at %u *** \n " ,
( unsigned int ) fpt - > offset ) ;
2019-05-10 16:38:57 -04:00
/* Immediately free up peinfo allocated memory, prior to any recursion */
PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-01-08 00:09:08 -05:00
cli_exe_info_destroy ( & peinfo ) ;
2018-12-03 12:40:13 -05:00
nret = cli_scanembpe ( ctx , fpt - > offset ) ;
break_loop = 1 ; /* we can stop here and other
2019-05-10 16:38:57 -04:00
* embedded executables will
* be found recursively
* through the above call
*/
2019-02-12 15:10:04 -05:00
// TODO This method of embedded PE extraction
// is kinda gross in that:
// - if you have an executable that contains
// 20 other exes, the bytes associated with
// the last exe will have been included in
// hash computations and things 20 times
// (as overlay data to the previously
// extracted exes).
// - if you have a signed embedded exe, it
// will fail to validate after extraction
// bc it has overlay data, which is a
// violation of the Authenticode spec.
// - this method of extraction is subject to
// the recursion limit, which is fairly
// low by default (I think 16)
//
// It'd be awesome if we could compute the PE
// size from the PE header and just extract
// that.
2018-12-03 12:40:13 -05:00
}
2013-10-23 16:21:46 -04:00
}
2018-12-03 12:40:13 -05:00
break ;
2013-10-23 16:21:46 -04:00
2018-12-03 12:40:13 -05:00
default :
2020-03-21 14:15:28 -04:00
cli_warnmsg ( " scanraw: Type %u not handled in fpt loop \n " , fpt - > type ) ;
2017-08-08 17:38:17 -04:00
}
2020-03-21 11:36:53 -04:00
}
2007-03-23 15:11:45 +00:00
2020-01-31 11:52:00 -08:00
if ( nret = = CL_VIRUS | | nret = = CL_EMEM | | break_loop )
2013-10-23 16:21:46 -04:00
break ;
2007-03-23 15:11:45 +00:00
2013-10-23 16:21:46 -04:00
fpt = fpt - > next ;
2020-03-21 11:36:53 -04:00
# if HAVE_JSON
if ( NULL ! = parent_property ) {
ctx - > wrkproperty = ( struct json_object * ) ( parent_property ) ;
parent_property = NULL ;
}
# endif
2013-10-23 16:21:46 -04:00
}
2017-08-08 17:38:17 -04:00
if ( nret ! = CL_VIRUS )
2021-01-19 14:23:02 -08:00
/*
* Now run the other file type parsers that may rely on file type
* recognition to determine the actual file type .
*/
2018-12-03 12:40:13 -05:00
switch ( ret ) {
case CL_TYPE_HTML :
/* bb#11196 - autoit script file misclassified as HTML */
if ( cli_get_container_intermediate ( ctx , - 2 ) = = CL_TYPE_AUTOIT ) {
ret = CL_TYPE_TEXT_ASCII ;
2021-01-19 14:23:02 -08:00
} else if ( SCAN_PARSE_HTML & &
( type = = CL_TYPE_TEXT_ASCII | |
type = = CL_TYPE_GIF ) & & /* Scan GIFs for embedded HTML/Javascript */
2018-12-03 12:40:13 -05:00
( DCONF_DOC & DOC_CONF_HTML ) ) {
* dettype = CL_TYPE_HTML ;
nret = cli_scanhtml ( ctx ) ;
}
break ;
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_MAIL :
cli_set_container ( ctx , CL_TYPE_MAIL , map - > len ) ;
if ( SCAN_PARSE_MAIL & & type = = CL_TYPE_TEXT_ASCII & & ( DCONF_MAIL & MAIL_CONF_MBOX ) ) {
* dettype = CL_TYPE_MAIL ;
nret = cli_scanmail ( ctx ) ;
}
break ;
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
default :
break ;
2017-08-08 17:38:17 -04:00
}
perf_nested_stop ( ctx , PERFT_RAWTYPENO , PERFT_SCAN ) ;
ctx - > recursion - - ;
ret = nret ;
}
2020-03-21 11:36:53 -04:00
# if HAVE_JSON
if ( NULL ! = parent_property ) {
ctx - > wrkproperty = ( struct json_object * ) ( parent_property ) ;
}
# endif
2018-12-03 12:40:13 -05:00
while ( ftoffset ) {
fpt = ftoffset ;
2017-08-08 17:38:17 -04:00
ftoffset = ftoffset - > next ;
free ( fpt ) ;
}
if ( ret = = CL_VIRUS )
cli_dbgmsg ( " %s found \n " , cli_get_last_virus ( ctx ) ) ;
2007-02-25 21:46:38 +00:00
2006-01-25 12:11:31 +00:00
return ret ;
}
2017-08-08 17:38:17 -04:00
static void emax_reached ( cli_ctx * ctx )
{
2010-03-05 18:09:00 +01:00
fmap_t * * ctx_fmap = ctx - > fmap ;
2010-03-10 14:58:18 +02:00
if ( ! ctx_fmap )
2017-08-08 17:38:17 -04:00
return ;
2018-12-03 12:40:13 -05:00
while ( * ctx_fmap ) {
fmap_t * map = * ctx_fmap ;
2017-08-08 17:38:17 -04:00
map - > dont_cache_flag = 1 ;
ctx_fmap - - ;
2010-03-05 18:09:00 +01:00
}
cli_dbgmsg ( " emax_reached: marked parents as non cacheable \n " ) ;
}
2010-03-05 17:11:45 +01:00
# define LINESTR(x) #x
# define LINESTR2(x) LINESTR(x)
2017-08-08 17:38:17 -04:00
# define __AT__ " at line " LINESTR2(__LINE__)
2011-07-04 17:00:55 +03:00
2020-08-12 00:18:53 -07:00
static cl_error_t dispatch_prescan_callback ( clcb_pre_scan cb , cli_ctx * ctx , const char * filetype , bitset_t * old_hook_lsig_matches , void * parent_property , unsigned char * hash , size_t hashed_size , int * run_cleanup )
2014-04-30 15:01:05 -04:00
{
2020-03-19 21:23:54 -04:00
cl_error_t res = CL_CLEAN ;
2014-04-30 15:01:05 -04:00
2014-07-09 13:16:31 -04:00
UNUSEDPARAM ( parent_property ) ;
UNUSEDPARAM ( hash ) ;
UNUSEDPARAM ( hashed_size ) ;
2014-04-30 15:01:05 -04:00
* run_cleanup = 0 ;
2018-12-03 12:40:13 -05:00
if ( cb ) {
2014-04-30 15:01:05 -04:00
perf_start ( ctx , PERFT_PRECB ) ;
2018-12-03 12:40:13 -05:00
switch ( cb ( fmap_fd ( * ctx - > fmap ) , filetype , ctx - > cb_ctx ) ) {
case CL_BREAK :
2020-08-12 00:18:53 -07:00
cli_dbgmsg ( " dispatch_prescan_callback: file whitelisted by callback \n " ) ;
2018-12-03 12:40:13 -05:00
perf_stop ( ctx , PERFT_PRECB ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
/* returns CL_CLEAN */
* run_cleanup = 1 ;
break ;
case CL_VIRUS :
2020-08-12 00:18:53 -07:00
cli_dbgmsg ( " dispatch_prescan_callback: file blacklisted by callback \n " ) ;
2018-12-03 12:40:13 -05:00
cli_append_virus ( ctx , " Detected.By.Callback " ) ;
perf_stop ( ctx , PERFT_PRECB ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
* run_cleanup = 1 ;
res = CL_VIRUS ;
break ;
case CL_CLEAN :
break ;
default :
2020-08-12 00:18:53 -07:00
cli_warnmsg ( " dispatch_prescan_callback: ignoring bad return code from callback \n " ) ;
2014-04-30 15:01:05 -04:00
}
2011-03-04 18:27:32 +01:00
2014-04-30 15:01:05 -04:00
perf_stop ( ctx , PERFT_PRECB ) ;
}
return res ;
}
2011-03-04 18:27:32 +01:00
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan ( cli_ctx * ctx , cli_file_t type )
2003-07-29 15:48:06 +00:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
cl_error_t cb_retcode ;
2017-08-08 17:38:17 -04:00
cli_file_t dettype = 0 ;
2018-12-03 12:40:13 -05:00
uint8_t typercg = 1 ;
2017-08-08 17:38:17 -04:00
size_t hashed_size ;
2020-02-23 12:38:18 -05:00
unsigned char * hash = NULL ;
2017-08-08 17:38:17 -04:00
bitset_t * old_hook_lsig_matches ;
const char * filetype ;
int cache_clean = 0 , res ;
int run_cleanup = 0 ;
2014-04-23 17:37:23 -04:00
# if HAVE_JSON
2017-08-08 17:38:17 -04:00
struct json_object * parent_property = NULL ;
2014-04-30 15:01:05 -04:00
# else
2017-08-08 17:38:17 -04:00
void * parent_property = NULL ;
2014-04-23 11:47:30 -04:00
# endif
2005-03-22 21:26:27 +00:00
2020-03-19 21:23:54 -04:00
char * old_temp_path = NULL ;
char * new_temp_path = NULL ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " CRITICAL: engine == NULL \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_ENULLARG ;
goto early_ret ;
2009-03-02 16:36:23 +00:00
}
2018-12-03 12:40:13 -05:00
if ( ! ( ctx - > engine - > dboptions & CL_DB_COMPILED ) ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " CRITICAL: engine not compiled \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMALFDB ;
goto early_ret ;
2003-07-29 15:48:06 +00:00
}
2018-12-03 12:40:13 -05:00
if ( ctx - > engine - > maxreclevel & & ctx - > recursion > ctx - > engine - > maxreclevel ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: Archive recursion limit exceeded (%u, max: %u) \n " , ctx - > recursion , ctx - > engine - > maxreclevel ) ;
2017-08-08 17:38:17 -04:00
emax_reached ( ctx ) ;
2016-08-24 17:39:20 -04:00
cli_check_blockmax ( ctx , CL_EMAXREC ) ;
2020-03-19 21:23:54 -04:00
ret = CL_CLEAN ;
goto early_ret ;
2012-11-27 17:15:02 -05:00
}
2020-04-18 10:46:57 -04:00
if ( ( * ctx - > fmap ) - > len < = 5 ) {
cli_dbgmsg ( " cli_magic_scandesc: File is too too small (%zu bytes), ignoring. \n " , ( * ctx - > fmap ) - > len ) ;
ret = CL_CLEAN ;
goto early_ret ;
}
2018-12-03 12:40:13 -05:00
if ( cli_updatelimits ( ctx , ( * ctx - > fmap ) - > len ) ! = CL_CLEAN ) {
2017-08-08 17:38:17 -04:00
emax_reached ( ctx ) ;
2020-03-19 21:23:54 -04:00
ret = CL_CLEAN ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
}
if ( ctx - > engine - > keeptmp ) {
2020-07-15 08:39:32 -07:00
char * fmap_basename = NULL ;
2020-03-19 21:23:54 -04:00
/*
* Keep - temp enabled , so create a sub - directory to provide extraction directory recursion .
*/
if ( ( NULL ! = ( * ctx - > fmap ) - > name ) & &
( CL_SUCCESS = = cli_basename ( ( * ctx - > fmap ) - > name , strlen ( ( * ctx - > fmap ) - > name ) , & fmap_basename ) ) ) {
/*
* The fmap has a name , lets include it in the new sub - directory .
*/
new_temp_path = cli_gentemp_with_prefix ( ctx - > sub_tmpdir , fmap_basename ) ;
2020-07-15 08:39:32 -07:00
free ( fmap_basename ) ;
2020-03-19 21:23:54 -04:00
if ( NULL = = new_temp_path ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: Failed to generate temp directory name. \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
goto early_ret ;
}
} else {
/*
* The fmap has no name or we failed to get the basename .
*/
Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files. For parsers like the vba parser, this is required as the
directory is later scanned. For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled. It's not quite as simple as removing the extra
subdirectories, however. Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.
The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory. In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.
This commit also:
- Provides the 'bmp' prefix for extracted PE icons.
- Removes empty tmp subdirs when extracting rtf files, to eliminate
clutter.
- The PDF parser sometimes creates tmp files when decompressing streams
before it knows if there is actually any content to decompress. This
resulted in a large number of empty files. While it would be best to
avoid creating empty files in the first place, that's not quite as
as it sounds. This commit does the next best thing and deletes the
tmp files if nothing was actually extracted, even if --leave-temps is
enabled.
- Removes the "scantemp" prefix for unnamed fmaps scanned with
cli_magic_scan(). The 5-character hashes given to tmp files with
prefixes resulted in occasional file name collisions when extracting
certain file types with thousands of embedded files.
- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
resulting in truncated file paths and failed extraction when
--leave-temps is enabled and a lot of recursion is in play. This commit
switches them from NAME_MAX to PATH_MAX.
2020-03-27 16:06:22 -04:00
new_temp_path = cli_gentemp ( ctx - > sub_tmpdir ) ;
2020-03-19 21:23:54 -04:00
if ( NULL = = new_temp_path ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: Failed to generate temp directory name. \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
goto early_ret ;
}
}
old_temp_path = ctx - > sub_tmpdir ;
ctx - > sub_tmpdir = new_temp_path ;
if ( mkdir ( ctx - > sub_tmpdir , 0700 ) ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: Can't create tmp sub-directory for scan: %s. \n " , ctx - > sub_tmpdir ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EACCES ;
goto early_ret ;
}
2009-08-30 19:14:49 +02:00
}
2020-02-23 12:38:18 -05:00
2020-02-28 18:29:35 -05:00
hash = ( * ctx - > fmap ) - > maphash ;
2020-02-23 12:38:18 -05:00
hashed_size = ( * ctx - > fmap ) - > len ;
2013-03-05 11:21:29 -05:00
old_hook_lsig_matches = ctx - > hook_lsig_matches ;
2018-12-03 12:40:13 -05:00
if ( type = = CL_TYPE_PART_ANY ) {
2017-08-08 17:38:17 -04:00
typercg = 0 ;
2013-09-17 16:45:48 -04:00
}
2009-08-30 19:14:49 +02:00
2021-01-19 14:23:02 -08:00
/*
* Perform file typing from the start of the file .
*/
2011-06-14 17:00:06 +02:00
perf_start ( ctx , PERFT_FT ) ;
2018-12-03 12:40:13 -05:00
if ( ( type = = CL_TYPE_ANY ) | | type = = CL_TYPE_PART_ANY ) {
2020-03-21 14:15:28 -04:00
type = cli_determine_fmap_type ( * ctx - > fmap , ctx - > engine , type ) ;
2014-03-24 18:45:29 -04:00
}
2011-06-14 17:00:06 +02:00
perf_stop ( ctx , PERFT_FT ) ;
2018-12-03 12:40:13 -05:00
if ( type = = CL_TYPE_ERROR ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: cli_determine_fmap_type returned CL_TYPE_ERROR \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EREAD ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2011-06-14 03:26:30 +02:00
}
2011-06-14 17:00:06 +02:00
filetype = cli_ftname ( type ) ;
2014-04-16 16:40:56 -04:00
2014-04-23 17:37:23 -04:00
# if HAVE_JSON
2018-12-03 12:40:13 -05:00
if ( SCAN_COLLECT_METADATA ) {
2021-01-19 14:23:02 -08:00
/*
* Create JSON object to record metadata during the scan .
*/
2018-12-03 12:40:13 -05:00
if ( NULL = = ctx - > properties ) {
2019-05-23 22:50:04 -04:00
ctx - > properties = json_object_new_object ( ) ;
if ( NULL = = ctx - > properties ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: no memory for json properties object \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2019-05-23 22:50:04 -04:00
}
ctx - > wrkproperty = ctx - > properties ;
ret = cli_jsonstr ( ctx - > properties , " Magic " , " CLAMJSONv0 " ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2019-05-23 22:50:04 -04:00
}
ret = cli_jsonstr ( ctx - > properties , " RootFileType " , filetype ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-23 11:47:30 -04:00
}
2019-05-23 22:50:04 -04:00
2018-12-03 12:40:13 -05:00
} else {
2020-03-19 21:23:54 -04:00
json_object * arrobj ;
2014-04-23 11:47:30 -04:00
parent_property = ctx - > wrkproperty ;
2018-12-03 12:40:13 -05:00
if ( ! json_object_object_get_ex ( parent_property , " ContainedObjects " , & arrobj ) ) {
2014-04-23 11:47:30 -04:00
arrobj = json_object_new_array ( ) ;
2018-12-03 12:40:13 -05:00
if ( NULL = = arrobj ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: no memory for json properties object \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-23 11:47:30 -04:00
}
2014-04-28 17:01:57 -04:00
json_object_object_add ( parent_property , " ContainedObjects " , arrobj ) ;
2014-04-23 11:47:30 -04:00
}
ctx - > wrkproperty = json_object_new_object ( ) ;
2018-12-03 12:40:13 -05:00
if ( NULL = = ctx - > wrkproperty ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: no memory for json properties object \n " ) ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-23 11:47:30 -04:00
}
json_object_array_add ( arrobj , ctx - > wrkproperty ) ;
2020-03-19 21:23:54 -04:00
}
2019-05-23 22:50:04 -04:00
2020-03-19 21:23:54 -04:00
if ( ( * ctx - > fmap ) - > name ) {
ret = cli_jsonstr ( ctx - > wrkproperty , " FileName " , ( * ctx - > fmap ) - > name ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
}
}
if ( ctx - > sub_filepath ) {
ret = cli_jsonstr ( ctx - > wrkproperty , " FilePath " , ctx - > sub_filepath ) ;
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2019-05-23 22:50:04 -04:00
}
2014-04-23 11:47:30 -04:00
}
2014-04-30 15:01:05 -04:00
ret = cli_jsonstr ( ctx - > wrkproperty , " FileType " , filetype ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-16 16:40:56 -04:00
}
2014-04-30 15:01:05 -04:00
ret = cli_jsonint ( ctx - > wrkproperty , " FileSize " , ( * ctx - > fmap ) - > len ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-16 16:40:56 -04:00
}
}
# endif
2020-08-12 00:18:53 -07:00
ret = dispatch_prescan_callback ( ctx - > engine - > cb_pre_cache , ctx , filetype , old_hook_lsig_matches , parent_property , hash , hashed_size , & run_cleanup ) ;
2018-12-03 12:40:13 -05:00
if ( run_cleanup ) {
2020-03-19 21:23:54 -04:00
if ( ret = = CL_VIRUS ) {
2020-08-03 12:11:56 -07:00
ret = cli_checkfp ( ctx ) ;
2020-03-19 21:23:54 -04:00
goto done ;
} else {
ret = CL_CLEAN ;
goto done ;
}
2014-04-30 15:01:05 -04:00
}
2011-06-14 03:26:30 +02:00
2021-01-19 14:23:02 -08:00
/*
* Check if we ' ve already scanned this file before .
*/
2011-02-14 19:19:20 +02:00
perf_start ( ctx , PERFT_CACHE ) ;
2018-07-20 22:28:48 -04:00
if ( ! ( SCAN_COLLECT_METADATA ) )
2015-11-04 14:46:46 -05:00
res = cache_check ( hash , ctx ) ;
2017-01-04 13:20:29 -05:00
else
res = CL_VIRUS ;
2014-04-24 14:22:00 -04:00
# if HAVE_JSON
2018-12-03 12:40:13 -05:00
if ( SCAN_COLLECT_METADATA /* ctx.options->general & CL_SCAN_GENERAL_COLLECT_METADATA && ctx->wrkproperty != NULL */ ) {
2014-04-24 14:22:00 -04:00
char hashstr [ 33 ] ;
2015-11-04 14:46:46 -05:00
snprintf ( hashstr , 33 , " %02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x " ,
hash [ 0 ] , hash [ 1 ] , hash [ 2 ] , hash [ 3 ] , hash [ 4 ] , hash [ 5 ] , hash [ 6 ] , hash [ 7 ] ,
hash [ 8 ] , hash [ 9 ] , hash [ 10 ] , hash [ 11 ] , hash [ 12 ] , hash [ 13 ] , hash [ 14 ] , hash [ 15 ] ) ;
2014-04-24 14:22:00 -04:00
2014-04-30 15:01:05 -04:00
ret = cli_jsonstr ( ctx - > wrkproperty , " FileMD5 " , hashstr ) ;
2015-11-04 14:46:46 -05:00
if ( ctx - > engine - > engine_options & ENGINE_OPTIONS_DISABLE_CACHE )
2020-02-23 12:38:18 -05:00
memset ( hash , 0 , 16 ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2014-04-24 14:22:00 -04:00
}
}
# endif
2018-12-03 12:40:13 -05:00
if ( res ! = CL_VIRUS ) {
2017-08-08 17:38:17 -04:00
perf_stop ( ctx , PERFT_CACHE ) ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: returning %d %s (no post, no cache) \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
goto early_ret ;
2010-01-05 18:15:59 +01:00
}
2012-05-08 15:34:27 +02:00
2011-02-14 19:19:20 +02:00
perf_stop ( ctx , PERFT_CACHE ) ;
2010-01-19 16:38:12 +02:00
ctx - > hook_lsig_matches = NULL ;
2010-01-15 03:00:15 +01:00
2018-12-03 12:40:13 -05:00
if ( ! ( ( ctx - > options - > general & ~ CL_SCAN_GENERAL_ALLMATCHES ) | | ( ctx - > options - > parse ) | | ( ctx - > options - > heuristic ) | | ( ctx - > options - > mail ) | | ( ctx - > options - > dev ) ) | | ( ctx - > recursion = = ctx - > engine - > maxreclevel ) ) { /* raw mode (stdin, etc.) or last level of recursion */
if ( ctx - > recursion = = ctx - > engine - > maxreclevel ) {
2016-08-24 17:39:20 -04:00
cli_check_blockmax ( ctx , CL_EMAXREC ) ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: Hit recursion limit, only scanning raw file \n " ) ;
2018-12-03 12:40:13 -05:00
} else
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: Raw mode: No support for special files \n " ) ;
2010-03-05 17:11:45 +01:00
2020-08-12 00:18:53 -07:00
ret = dispatch_prescan_callback ( ctx - > engine - > cb_pre_scan , ctx , filetype , old_hook_lsig_matches , parent_property , hash , hashed_size , & run_cleanup ) ;
2018-12-03 12:40:13 -05:00
if ( run_cleanup ) {
2020-03-19 21:23:54 -04:00
if ( ret = = CL_VIRUS ) {
2020-08-03 12:11:56 -07:00
ret = cli_checkfp ( ctx ) ;
2020-03-19 21:23:54 -04:00
}
goto done ;
2017-08-08 17:38:17 -04:00
}
2020-03-19 21:23:54 -04:00
2020-03-21 14:15:28 -04:00
if ( ( ret = cli_scan_fmap ( ctx , CL_TYPE_ANY , 0 , NULL , AC_SCAN_VIR , NULL , hash ) ) = = CL_VIRUS )
cli_dbgmsg ( " cli_magic_scan: %s found in descriptor %d \n " , cli_get_last_virus ( ctx ) , fmap_fd ( * ctx - > fmap ) ) ;
2018-12-03 12:40:13 -05:00
else if ( ret = = CL_CLEAN ) {
2017-08-08 17:38:17 -04:00
if ( ctx - > recursion ! = ctx - > engine - > maxreclevel )
cache_clean = 1 ; /* Only cache if limits are not reached */
else
emax_reached ( ctx ) ;
}
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
goto done ;
2008-02-06 21:19:10 +00:00
}
2003-07-29 15:48:06 +00:00
2020-08-12 00:18:53 -07:00
ret = dispatch_prescan_callback ( ctx - > engine - > cb_pre_scan , ctx , filetype , old_hook_lsig_matches , parent_property , hash , hashed_size , & run_cleanup ) ;
2018-12-03 12:40:13 -05:00
if ( run_cleanup ) {
2020-03-19 21:23:54 -04:00
if ( ret = = CL_VIRUS ) {
2020-08-03 12:11:56 -07:00
ret = cli_checkfp ( ctx ) ;
2020-03-19 21:23:54 -04:00
}
goto done ;
2014-04-30 15:01:05 -04:00
}
2011-03-04 18:27:32 +01:00
2010-05-07 19:47:11 +02:00
# ifdef HAVE__INTERNAL__SHA_COLLECT
2017-08-08 17:38:17 -04:00
if ( ! ctx - > sha_collect & & type = = CL_TYPE_MSEXE )
ctx - > sha_collect = 1 ;
2010-05-07 19:47:11 +02:00
# endif
2004-02-06 13:46:08 +00:00
2010-01-19 16:38:12 +02:00
ctx - > hook_lsig_matches = cli_bitset_init ( ) ;
2018-12-03 12:40:13 -05:00
if ( ! ctx - > hook_lsig_matches ) {
2017-08-08 17:38:17 -04:00
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
ret = CL_EMEM ;
goto done ;
2010-03-05 19:56:43 +02:00
}
2010-01-19 16:38:12 +02:00
2018-12-03 12:40:13 -05:00
if ( type ! = CL_TYPE_IGNORED & & ctx - > engine - > sdb ) {
2021-01-19 14:23:02 -08:00
/*
* If self protection mechanism enabled , do the scanraw ( ) scan first
* before extracting with a file type parser .
*/
2020-03-21 14:15:28 -04:00
ret = scanraw ( ctx , type , 0 , & dettype , ( ctx - > engine - > engine_options & ENGINE_OPTIONS_DISABLE_CACHE ) ? NULL : hash ) ;
2020-01-31 11:52:00 -08:00
if ( ret = = CL_EMEM | | ret = = CL_VIRUS ) {
2020-08-03 12:11:56 -07:00
ret = cli_checkfp ( ctx ) ;
2017-08-08 17:38:17 -04:00
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
goto done ;
2017-08-08 17:38:17 -04:00
}
2006-01-25 12:11:31 +00:00
}
2021-01-19 14:23:02 -08:00
/*
* Run the file type parsers that we normally use before the raw scan .
*/
2008-02-06 21:19:10 +00:00
ctx - > recursion + + ;
2011-02-14 19:19:20 +02:00
perf_nested_start ( ctx , PERFT_CONTAINER , PERFT_SCAN ) ;
2017-01-19 12:24:46 -05:00
/* set current level as container AFTER recursing */
cli_set_container ( ctx , type , ( * ctx - > fmap ) - > len ) ;
2018-12-03 12:40:13 -05:00
switch ( type ) {
case CL_TYPE_IGNORED :
break ;
case CL_TYPE_HWP3 :
if ( SCAN_PARSE_HWP3 & & ( DCONF_DOC & DOC_CONF_HWP ) )
ret = cli_scanhwp3 ( ctx ) ;
break ;
case CL_TYPE_HWPOLE2 :
if ( SCAN_PARSE_OLE2 & & ( DCONF_ARCH & ARCH_CONF_OLE2 ) )
ret = cli_scanhwpole2 ( ctx ) ;
break ;
case CL_TYPE_XML_WORD :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_MSXML ) )
ret = cli_scanmsxml ( ctx ) ;
break ;
case CL_TYPE_XML_XL :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_MSXML ) )
ret = cli_scanmsxml ( ctx ) ;
break ;
case CL_TYPE_XML_HWP :
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_HWP ) )
ret = cli_scanhwpml ( ctx ) ;
break ;
case CL_TYPE_XDP :
if ( SCAN_PARSE_PDF & & ( DCONF_DOC & DOC_CONF_PDF ) )
ret = cli_scanxdp ( ctx ) ;
break ;
case CL_TYPE_RAR :
if ( have_rar & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_RAR ) ) {
const char * filepath = NULL ;
int fd = - 1 ;
char * tmpname = NULL ;
int tmpfd = - 1 ;
2020-01-23 17:42:33 -08:00
# ifdef _WIN32
if ( ( SCAN_UNPRIVILEGED ) | | ( NULL = = ctx - > sub_filepath ) | | ( 0 ! = _access_s ( ctx - > sub_filepath , R_OK ) ) ) {
# else
if ( ( SCAN_UNPRIVILEGED ) | | ( NULL = = ctx - > sub_filepath ) | | ( 0 ! = access ( ctx - > sub_filepath , R_OK ) ) ) {
# endif
2018-12-03 12:40:13 -05:00
/* If map is not file-backed have to dump to file for scanrar. */
2020-03-19 21:23:54 -04:00
ret = fmap_dump_to_file ( * ctx - > fmap , ctx - > sub_filepath , ctx - > sub_tmpdir , & tmpname , & tmpfd , 0 , SIZE_MAX ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: failed to generate temporary file. \n " ) ;
2018-12-03 12:40:13 -05:00
break ;
}
filepath = tmpname ;
fd = tmpfd ;
} else {
/* Use the original file and file descriptor. */
filepath = ctx - > sub_filepath ;
fd = fmap_fd ( * ctx - > fmap ) ;
2017-08-08 17:38:17 -04:00
}
2018-07-30 20:19:28 -04:00
2018-12-03 12:40:13 -05:00
/* scan file */
ret = cli_scanrar ( filepath , fd , ctx ) ;
2018-07-30 20:19:28 -04:00
2020-01-23 17:42:33 -08:00
if ( ( NULL = = tmpname ) & & ( CL_EOPEN = = ret ) ) {
/*
* Failed to open the file using the original filename .
* Try writing the file descriptor to a temp file and try again .
*/
2020-03-19 21:23:54 -04:00
ret = fmap_dump_to_file ( * ctx - > fmap , ctx - > sub_filepath , ctx - > sub_tmpdir , & tmpname , & tmpfd , 0 , SIZE_MAX ) ;
2020-01-23 17:42:33 -08:00
if ( ret ! = CL_SUCCESS ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan: failed to generate temporary file. \n " ) ;
2020-01-23 17:42:33 -08:00
break ;
}
filepath = tmpname ;
fd = tmpfd ;
/* try to scan again */
ret = cli_scanrar ( filepath , fd , ctx ) ;
}
2018-12-03 12:40:13 -05:00
if ( tmpfd ! = - 1 ) {
/* If dumped tempfile, need to cleanup */
close ( tmpfd ) ;
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tmpname ) ) {
ret = CL_EUNLINK ;
}
2018-07-30 20:19:28 -04:00
}
}
2018-12-03 12:40:13 -05:00
if ( tmpname ! = NULL ) {
free ( tmpname ) ;
}
2017-08-08 17:38:17 -04:00
}
2018-12-03 12:40:13 -05:00
break ;
2017-08-08 17:38:17 -04:00
2018-10-08 12:59:42 -04:00
case CL_TYPE_EGG :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_EGG ) )
ret = cli_scanegg ( ctx , 0 ) ;
break ;
2018-12-03 12:40:13 -05:00
case CL_TYPE_OOXML_WORD :
case CL_TYPE_OOXML_PPT :
case CL_TYPE_OOXML_XL :
case CL_TYPE_OOXML_HWP :
2014-04-24 14:22:00 -04:00
# if HAVE_JSON
2018-12-03 12:40:13 -05:00
if ( SCAN_PARSE_XMLDOCS & & ( DCONF_DOC & DOC_CONF_OOXML ) ) {
if ( SCAN_COLLECT_METADATA & & ( ctx - > wrkproperty ! = NULL ) ) {
ret = cli_process_ooxml ( ctx , type ) ;
if ( ret = = CL_EMEM | | ret = = CL_ENULLARG ) {
/* critical error */
break ;
} else if ( ret ! = CL_SUCCESS ) {
/*
2019-08-16 17:18:59 -07:00
* non - critical return = > allow for the CL_TYPE_ZIP scan to occur
* cli_process_ooxml other possible returns :
* CL_ETIMEOUT , CL_EMAXSIZE , CL_EMAXFILES , CL_EPARSE ,
* CL_EFORMAT , CL_BREAK , CL_ESTAT
*/
2018-12-03 12:40:13 -05:00
ret = CL_SUCCESS ;
}
2017-08-08 17:38:17 -04:00
}
}
2014-04-24 14:22:00 -04:00
# endif
2020-08-06 22:15:33 -07:00
/* fall-through */
2018-12-03 12:40:13 -05:00
case CL_TYPE_ZIP :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ZIP ) )
ret = cli_unzip ( ctx ) ;
break ;
case CL_TYPE_GZ :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_GZ ) )
ret = cli_scangzip ( ctx ) ;
break ;
case CL_TYPE_BZ :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_BZ ) )
ret = cli_scanbzip ( ctx ) ;
break ;
case CL_TYPE_XZ :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_XZ ) )
ret = cli_scanxz ( ctx ) ;
break ;
case CL_TYPE_GPT :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_GPT ) )
ret = cli_scangpt ( ctx , 0 ) ;
break ;
case CL_TYPE_APM :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_APM ) )
ret = cli_scanapm ( ctx ) ;
break ;
case CL_TYPE_ARJ :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ARJ ) )
ret = cli_scanarj ( ctx , 0 ) ;
break ;
case CL_TYPE_NULSFT :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_NSIS ) )
ret = cli_scannulsft ( ctx , 0 ) ;
break ;
case CL_TYPE_AUTOIT :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_AUTOIT ) )
ret = cli_scanautoit ( ctx , 23 ) ;
break ;
case CL_TYPE_MSSZDD :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_SZDD ) )
ret = cli_scanszdd ( ctx ) ;
break ;
case CL_TYPE_MSCAB :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CAB ) )
ret = cli_scanmscab ( ctx , 0 ) ;
break ;
case CL_TYPE_HTML :
if ( SCAN_PARSE_HTML & & ( DCONF_DOC & DOC_CONF_HTML ) )
ret = cli_scanhtml ( ctx ) ;
break ;
2020-12-17 19:28:15 -08:00
2018-12-03 12:40:13 -05:00
case CL_TYPE_HTML_UTF16 :
if ( SCAN_PARSE_HTML & & ( DCONF_DOC & DOC_CONF_HTML ) )
ret = cli_scanhtml_utf16 ( ctx ) ;
break ;
case CL_TYPE_SCRIPT :
if ( ( DCONF_DOC & DOC_CONF_SCRIPT ) & & dettype ! = CL_TYPE_HTML )
ret = cli_scanscript ( ctx ) ;
break ;
case CL_TYPE_SWF :
if ( SCAN_PARSE_SWF & & ( DCONF_DOC & DOC_CONF_SWF ) )
ret = cli_scanswf ( ctx ) ;
break ;
case CL_TYPE_RTF :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_DOC & DOC_CONF_RTF ) )
ret = cli_scanrtf ( ctx ) ;
break ;
case CL_TYPE_MAIL :
if ( SCAN_PARSE_MAIL & & ( DCONF_MAIL & MAIL_CONF_MBOX ) )
ret = cli_scanmail ( ctx ) ;
break ;
case CL_TYPE_MHTML :
if ( SCAN_PARSE_MAIL & & ( DCONF_MAIL & MAIL_CONF_MBOX ) )
ret = cli_scanmail ( ctx ) ;
break ;
case CL_TYPE_TNEF :
if ( SCAN_PARSE_MAIL & & ( DCONF_MAIL & MAIL_CONF_TNEF ) )
ret = cli_scantnef ( ctx ) ;
break ;
case CL_TYPE_UUENCODED :
if ( DCONF_OTHER & OTHER_CONF_UUENC )
ret = cli_scanuuencoded ( ctx ) ;
break ;
case CL_TYPE_MSCHM :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CHM ) )
ret = cli_scanmschm ( ctx ) ;
break ;
case CL_TYPE_MSOLE2 :
if ( SCAN_PARSE_OLE2 & & ( DCONF_ARCH & ARCH_CONF_OLE2 ) )
ret = cli_scanole2 ( ctx ) ;
break ;
case CL_TYPE_7Z :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_7Z ) )
ret = cli_7unz ( ctx , 0 ) ;
break ;
case CL_TYPE_POSIX_TAR :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_TAR ) )
ret = cli_scantar ( ctx , 1 ) ;
break ;
case CL_TYPE_OLD_TAR :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_TAR ) )
ret = cli_scantar ( ctx , 0 ) ;
break ;
case CL_TYPE_CPIO_OLD :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CPIO ) )
ret = cli_scancpio_old ( ctx ) ;
break ;
case CL_TYPE_CPIO_ODC :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CPIO ) )
ret = cli_scancpio_odc ( ctx ) ;
break ;
case CL_TYPE_CPIO_NEWC :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CPIO ) )
ret = cli_scancpio_newc ( ctx , 0 ) ;
break ;
case CL_TYPE_CPIO_CRC :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_CPIO ) )
ret = cli_scancpio_newc ( ctx , 1 ) ;
break ;
case CL_TYPE_BINHEX :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_BINHEX ) )
ret = cli_binhex ( ctx ) ;
break ;
case CL_TYPE_SCRENC :
if ( DCONF_OTHER & OTHER_CONF_SCRENC )
ret = cli_scanscrenc ( ctx ) ;
break ;
case CL_TYPE_RIFF :
if ( SCAN_HEURISTICS & & ( DCONF_OTHER & OTHER_CONF_RIFF ) )
ret = cli_scanriff ( ctx ) ;
break ;
case CL_TYPE_GRAPHICS :
GIF, PNG bugfixes; Add AlertBrokenMedia option
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
2020-11-04 15:49:43 -08:00
/*
2020-12-01 16:29:02 -08:00
* This case is for unhandled graphics types such as BMP , JPEG 2000 , etc .
*
* Note : JPEG 2000 is a very different format from JPEG , JPEG / JFIF , JPEG / Exif , JPEG / SPIFF ( 1994 , 1997 )
* JPEG 2000 is not handled by cli_scanjpeg or cli_parsejpeg .
GIF, PNG bugfixes; Add AlertBrokenMedia option
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
2020-11-04 15:49:43 -08:00
*/
2018-12-03 12:40:13 -05:00
break ;
2019-09-16 14:56:27 -04:00
case CL_TYPE_GIF :
GIF, PNG bugfixes; Add AlertBrokenMedia option
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
2020-11-04 15:49:43 -08:00
if ( SCAN_HEURISTICS & & SCAN_HEURISTIC_BROKEN_MEDIA & & ( DCONF_OTHER & OTHER_CONF_GIF ) )
2019-09-16 14:56:27 -04:00
ret = cli_parsegif ( ctx ) ;
break ;
2017-03-04 00:08:03 +01:00
2017-02-21 19:58:09 +01:00
case CL_TYPE_PNG :
2020-01-08 16:11:26 -05:00
if ( SCAN_HEURISTICS & & ( DCONF_OTHER & OTHER_CONF_PNG ) )
GIF, PNG bugfixes; Add AlertBrokenMedia option
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
2020-11-04 15:49:43 -08:00
ret = cli_parsepng ( ctx ) ; /* PNG parser detects a couple CVE's as well as Broken.Media */
break ;
case CL_TYPE_JPEG :
if ( SCAN_HEURISTICS & & ( DCONF_OTHER & OTHER_CONF_JPEG ) )
2020-12-01 22:52:08 -08:00
ret = cli_parsejpeg ( ctx ) ; /* JPG parser detects MS04-028 exploits as well as Broken.Media */
GIF, PNG bugfixes; Add AlertBrokenMedia option
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
2020-11-04 15:49:43 -08:00
break ;
case CL_TYPE_TIFF :
if ( SCAN_HEURISTICS & & SCAN_HEURISTIC_BROKEN_MEDIA & & ( DCONF_OTHER & OTHER_CONF_TIFF ) & & ret ! = CL_VIRUS )
ret = cli_parsetiff ( ctx ) ;
2017-02-21 19:58:09 +01:00
break ;
2018-12-03 12:40:13 -05:00
case CL_TYPE_PDF : /* FIXMELIMITS: pdf should be an archive! */
if ( SCAN_PARSE_PDF & & ( DCONF_DOC & DOC_CONF_PDF ) )
ret = cli_scanpdf ( ctx , 0 ) ;
break ;
case CL_TYPE_CRYPTFF :
if ( DCONF_OTHER & OTHER_CONF_CRYPTFF )
ret = cli_scancryptff ( ctx ) ;
break ;
case CL_TYPE_ELF :
if ( SCAN_PARSE_ELF & & ctx - > dconf - > elf )
ret = cli_scanelf ( ctx ) ;
break ;
case CL_TYPE_MACHO :
if ( ctx - > dconf - > macho )
ret = cli_scanmacho ( ctx , NULL ) ;
break ;
case CL_TYPE_MACHO_UNIBIN :
if ( ctx - > dconf - > macho )
ret = cli_scanmacho_unibin ( ctx ) ;
break ;
case CL_TYPE_SIS :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_SIS ) )
ret = cli_scansis ( ctx ) ;
break ;
case CL_TYPE_XAR :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_XAR ) )
ret = cli_scanxar ( ctx ) ;
break ;
case CL_TYPE_PART_HFSPLUS :
if ( SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_HFSPLUS ) )
ret = cli_scanhfsplus ( ctx ) ;
break ;
case CL_TYPE_BINARY_DATA :
case CL_TYPE_TEXT_UTF16BE :
if ( SCAN_HEURISTICS & & ( DCONF_OTHER & OTHER_CONF_MYDOOMLOG ) )
ret = cli_check_mydoom_log ( ctx ) ;
break ;
case CL_TYPE_TEXT_ASCII :
if ( SCAN_HEURISTIC_STRUCTURED & & ( DCONF_OTHER & OTHER_CONF_DLP ) )
/* TODO: consider calling this from cli_scanscript() for
2019-08-16 17:18:59 -07:00
* a normalised text
*/
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
ret = cli_scan_structured ( ctx ) ;
break ;
2017-08-08 17:38:17 -04:00
2018-12-03 12:40:13 -05:00
default :
break ;
2003-07-29 15:48:06 +00:00
}
2011-02-14 19:19:20 +02:00
perf_nested_stop ( ctx , PERFT_CONTAINER , PERFT_SCAN ) ;
2008-02-06 21:19:10 +00:00
ctx - > recursion - - ;
2004-04-20 22:33:42 +00:00
2021-01-19 14:23:02 -08:00
/*
* Perform the raw scan , which may include file type recognition signatures .
*/
2018-12-03 12:40:13 -05:00
if ( ret = = CL_VIRUS & & ! SCAN_ALLMATCHES ) {
2017-08-08 17:38:17 -04:00
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
goto done ;
2009-08-30 19:14:49 +02:00
}
2008-04-16 18:47:42 +00:00
2021-01-19 14:23:02 -08:00
/* Disable type recognition for the raw scan for zip files larger than maxziptypercg */
2018-12-03 12:40:13 -05:00
if ( type = = CL_TYPE_ZIP & & SCAN_PARSE_ARCHIVE & & ( DCONF_ARCH & ARCH_CONF_ZIP ) ) {
2017-08-08 17:38:17 -04:00
/* CL_ENGINE_MAX_ZIPTYPERCG */
uint64_t curr_len = ( * ctx - > fmap ) - > len ;
2018-12-03 12:40:13 -05:00
if ( curr_len > ctx - > engine - > maxziptypercg ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc: Not checking for embedded PEs (zip file > MaxZipTypeRcg) \n " ) ;
2017-08-08 17:38:17 -04:00
typercg = 0 ;
}
2007-07-16 15:58:54 +00:00
}
2008-02-19 18:43:42 +00:00
/* CL_TYPE_HTML: raw HTML files are not scanned, unless safety measure activated via DCONF */
2018-12-03 12:40:13 -05:00
if ( type ! = CL_TYPE_IGNORED & & ( type ! = CL_TYPE_HTML | | ! ( SCAN_PARSE_HTML ) | | ! ( DCONF_DOC & DOC_CONF_HTML_SKIPRAW ) ) & & ! ctx - > engine - > sdb ) {
2020-03-21 14:15:28 -04:00
res = scanraw ( ctx , type , typercg , & dettype , ( ctx - > engine - > engine_options & ENGINE_OPTIONS_DISABLE_CACHE ) ? NULL : hash ) ;
2018-12-03 12:40:13 -05:00
if ( res ! = CL_CLEAN ) {
switch ( res ) {
/* List of scan halts, runtime errors only! */
case CL_EUNLINK :
case CL_ESTAT :
case CL_ESEEK :
case CL_EWRITE :
case CL_EDUP :
case CL_ETMPFILE :
case CL_ETMPDIR :
case CL_EMEM :
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " Descriptor[%d]: scanraw error %s \n " , fmap_fd ( * ctx - > fmap ) , cl_strerror ( res ) ) ;
2018-12-03 12:40:13 -05:00
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
ret = res ;
goto done ;
2019-02-12 15:10:04 -05:00
/* CL_VIRUS = malware found, check FP and report.
* Likewise , if the file was determined to be trusted , then we
* can also finish with the scan . ( Ex : EXE with a valid
* Authenticode sig . ) */
case CL_VERIFIED :
// For now just conver CL_VERIFIED to CL_CLEAN, since
// CL_VERIFIED isn't used elsewhere
res = CL_CLEAN ;
// Fall through
2018-12-03 12:40:13 -05:00
case CL_VIRUS :
ret = res ;
if ( SCAN_ALLMATCHES )
break ;
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-19 21:23:54 -04:00
goto done ;
2019-08-16 17:18:59 -07:00
/* The CL_ETIMEOUT "MAX" condition should set exceeds max flag and exit out quietly. */
case CL_ETIMEOUT :
cli_check_blockmax ( ctx , ret ) ;
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " Descriptor[%d]: Stopping after scanraw reached %s \n " ,
2019-08-16 17:18:59 -07:00
fmap_fd ( * ctx - > fmap ) , cl_strerror ( res ) ) ;
2020-03-19 21:23:54 -04:00
ret = CL_CLEAN ;
goto done ;
2019-08-16 17:18:59 -07:00
/* All other "MAX" conditions should still fully scan the current file */
2018-12-03 12:40:13 -05:00
case CL_EMAXREC :
case CL_EMAXSIZE :
case CL_EMAXFILES :
ret = res ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " Descriptor[%d]: Continuing after scanraw reached %s \n " ,
2018-12-03 12:40:13 -05:00
fmap_fd ( * ctx - > fmap ) , cl_strerror ( res ) ) ;
2017-08-08 17:38:17 -04:00
break ;
2018-12-03 12:40:13 -05:00
/* Other errors must not block further scans below
2019-08-16 17:18:59 -07:00
* This specifically includes CL_EFORMAT & CL_EREAD & CL_EUNPACK
* Malformed / truncated files could report as any of these three .
*/
2018-12-03 12:40:13 -05:00
default :
ret = res ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " Descriptor[%d]: Continuing after scanraw error %s \n " ,
2018-12-03 12:40:13 -05:00
fmap_fd ( * ctx - > fmap ) , cl_strerror ( res ) ) ;
2017-08-08 17:38:17 -04:00
}
}
2003-12-12 17:48:47 +00:00
}
2003-07-29 15:48:06 +00:00
2021-01-19 14:23:02 -08:00
/*
* Now run the rest of the file type parsers .
*/
2008-02-06 21:19:10 +00:00
ctx - > recursion + + ;
2018-12-03 12:40:13 -05:00
switch ( type ) {
/* bytecode hooks triggered by a lsig must be a hook
2019-08-16 17:18:59 -07:00
* called from one of the functions here */
2018-12-03 12:40:13 -05:00
case CL_TYPE_TEXT_ASCII :
case CL_TYPE_TEXT_UTF16BE :
case CL_TYPE_TEXT_UTF16LE :
case CL_TYPE_TEXT_UTF8 :
perf_nested_start ( ctx , PERFT_SCRIPT , PERFT_SCAN ) ;
if ( ( DCONF_DOC & DOC_CONF_SCRIPT ) & & dettype ! = CL_TYPE_HTML & & ( ret ! = CL_VIRUS | | SCAN_ALLMATCHES ) & & SCAN_PARSE_HTML )
ret = cli_scanscript ( ctx ) ;
if ( SCAN_PARSE_MAIL & & ( DCONF_MAIL & MAIL_CONF_MBOX ) & & ret ! = CL_VIRUS & & ( cli_get_container ( ctx , - 1 ) = = CL_TYPE_MAIL | | dettype = = CL_TYPE_MAIL ) ) {
2020-03-21 14:15:28 -04:00
ret = cli_scan_fmap ( ctx , CL_TYPE_MAIL , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ;
2018-12-03 12:40:13 -05:00
}
perf_nested_stop ( ctx , PERFT_SCRIPT , PERFT_SCAN ) ;
break ;
/* Due to performance reasons all executables were first scanned
2019-08-16 17:18:59 -07:00
* in raw mode . Now we will try to unpack them
*/
2018-12-03 12:40:13 -05:00
case CL_TYPE_MSEXE :
perf_nested_start ( ctx , PERFT_PE , PERFT_SCAN ) ;
if ( SCAN_PARSE_PE & & ctx - > dconf - > pe ) {
unsigned int corrupted_input = ctx - > corrupted_input ;
ret = cli_scanpe ( ctx ) ;
ctx - > corrupted_input = corrupted_input ;
}
perf_nested_stop ( ctx , PERFT_PE , PERFT_SCAN ) ;
break ;
2019-03-19 15:28:49 +01:00
case CL_TYPE_ELF :
perf_nested_start ( ctx , PERFT_ELF , PERFT_SCAN ) ;
ret = cli_unpackelf ( ctx ) ;
perf_nested_stop ( ctx , PERFT_ELF , PERFT_SCAN ) ;
break ;
2019-04-17 20:30:21 +02:00
case CL_TYPE_MACHO :
case CL_TYPE_MACHO_UNIBIN :
perf_nested_start ( ctx , PERFT_MACHO , PERFT_SCAN ) ;
ret = cli_unpackmacho ( ctx ) ;
perf_nested_stop ( ctx , PERFT_MACHO , PERFT_SCAN ) ;
break ;
2018-12-03 12:40:13 -05:00
case CL_TYPE_BINARY_DATA :
2020-03-21 14:15:28 -04:00
ret = cli_scan_fmap ( ctx , CL_TYPE_OTHER , 0 , NULL , AC_SCAN_VIR , NULL , NULL ) ;
2018-12-03 12:40:13 -05:00
break ;
default :
break ;
2004-07-11 14:50:25 +00:00
}
2010-02-15 02:02:54 +01:00
2008-02-06 21:19:10 +00:00
ctx - > recursion - - ;
2010-01-19 16:38:12 +02:00
cli_bitset_free ( ctx - > hook_lsig_matches ) ;
ctx - > hook_lsig_matches = old_hook_lsig_matches ;
2004-07-11 14:50:25 +00:00
2018-12-03 12:40:13 -05:00
switch ( ret ) {
/* Limits exceeded */
2019-08-16 17:18:59 -07:00
case CL_ETIMEOUT :
2018-12-03 12:40:13 -05:00
case CL_EMAXREC :
case CL_EMAXSIZE :
case CL_EMAXFILES :
2019-08-16 17:18:59 -07:00
cli_check_blockmax ( ctx , ret ) ;
2020-08-06 22:15:33 -07:00
/* fall-through */
2019-08-16 17:18:59 -07:00
/* Malformed file cases */
case CL_EFORMAT :
case CL_EREAD :
case CL_EUNPACK :
2018-12-03 12:40:13 -05:00
cli_dbgmsg ( " Descriptor[%d]: %s \n " , fmap_fd ( * ctx - > fmap ) , cl_strerror ( ret ) ) ;
2020-03-19 21:23:54 -04:00
ret = CL_CLEAN ;
goto done ;
2018-12-03 12:40:13 -05:00
case CL_CLEAN :
cache_clean = 1 ;
2020-03-21 11:36:53 -04:00
ret = CL_CLEAN ;
2020-03-19 21:23:54 -04:00
goto done ;
2018-12-03 12:40:13 -05:00
default :
2020-03-19 21:23:54 -04:00
goto done ;
}
done :
# if HAVE_JSON
ctx - > wrkproperty = ( struct json_object * ) ( parent_property ) ;
# endif
if ( ret = = CL_CLEAN & & ctx - > found_possibly_unwanted ) {
cb_retcode = CL_VIRUS ;
} else {
if ( ret = = CL_CLEAN & & ctx - > num_viruses ! = 0 )
cb_retcode = CL_VIRUS ;
else
cb_retcode = ret ;
}
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc: returning %d %s \n " , ret , __AT__ ) ;
2020-03-19 21:23:54 -04:00
if ( ctx - > engine - > cb_post_scan ) {
const char * virusname = NULL ;
perf_start ( ctx , PERFT_POSTCB ) ;
if ( cb_retcode = = CL_VIRUS )
virusname = cli_get_last_virus ( ctx ) ;
switch ( ctx - > engine - > cb_post_scan ( fmap_fd ( * ctx - > fmap ) , cb_retcode , virusname , ctx - > cb_ctx ) ) {
case CL_BREAK :
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc: file whitelisted by post_scan callback \n " ) ;
2020-03-19 21:23:54 -04:00
perf_stop ( ctx , PERFT_POSTCB ) ;
ret = CL_CLEAN ;
break ;
case CL_VIRUS :
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc: file blacklisted by post_scan callback \n " ) ;
2020-03-19 21:23:54 -04:00
cli_append_virus ( ctx , " Detected.By.Callback " ) ;
perf_stop ( ctx , PERFT_POSTCB ) ;
if ( ret ! = CL_VIRUS ) {
2020-08-03 12:11:56 -07:00
ret = cli_checkfp ( ctx ) ;
2020-03-19 21:23:54 -04:00
}
break ;
case CL_CLEAN :
break ;
default :
2020-03-21 14:15:28 -04:00
cli_warnmsg ( " cli_magic_scan_desc: ignoring bad return code from post_scan callback \n " ) ;
2020-03-19 21:23:54 -04:00
}
perf_stop ( ctx , PERFT_POSTCB ) ;
}
if ( cb_retcode = = CL_CLEAN & & cache_clean ) {
perf_start ( ctx , PERFT_CACHE ) ;
if ( ! ( SCAN_COLLECT_METADATA ) )
cache_add ( hash , hashed_size , ctx ) ;
perf_stop ( ctx , PERFT_CACHE ) ;
}
if ( ret = = CL_VIRUS & & SCAN_ALLMATCHES ) {
ret = CL_CLEAN ;
}
early_ret :
if ( ( ctx - > engine - > keeptmp ) & & ( NULL ! = old_temp_path ) ) {
/* Use rmdir to remove empty tmp subdirectories. If rmdir fails, it wasn't empty. */
( void ) rmdir ( ctx - > sub_tmpdir ) ;
free ( ( void * ) ctx - > sub_tmpdir ) ;
ctx - > sub_tmpdir = old_temp_path ;
2004-10-14 23:56:30 +00:00
}
2020-03-19 21:23:54 -04:00
2020-03-21 11:36:53 -04:00
# if HAVE_JSON
if ( NULL ! = parent_property ) {
ctx - > wrkproperty = ( struct json_object * ) ( parent_property ) ;
}
# endif
2020-03-19 21:23:54 -04:00
return ret ;
2003-07-29 15:48:06 +00:00
}
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan_desc_type ( int desc , const char * filepath , cli_ctx * ctx , cli_file_t type , const char * name )
2010-08-24 12:28:16 +02:00
{
2012-07-16 15:36:49 -04:00
STATBUF sb ;
2019-01-30 15:01:59 -05:00
cl_error_t status = CL_CLEAN ;
if ( ! ctx ) {
return CL_EARG ;
}
2011-06-14 11:11:05 +03:00
2018-07-30 20:19:28 -04:00
const char * parent_filepath = ctx - > sub_filepath ;
2018-12-03 12:40:13 -05:00
ctx - > sub_filepath = filepath ;
2018-07-30 20:19:28 -04:00
2011-06-14 11:11:05 +03:00
# ifdef HAVE__INTERNAL__SHA_COLLECT
2017-08-08 17:38:17 -04:00
if ( ctx - > sha_collect > 0 )
ctx - > sha_collect = 0 ;
2011-06-14 11:11:05 +03:00
# endif
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " in cli_magic_scan_desc_type (reclevel: %u/%u) \n " , ctx - > recursion , ctx - > engine - > maxreclevel ) ;
2018-12-03 12:40:13 -05:00
if ( FSTAT ( desc , & sb ) = = - 1 ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan: Can't fstat descriptor %d \n " , desc ) ;
2019-01-30 15:01:59 -05:00
status = CL_ESTAT ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc_type: returning %d %s (no post, no cache) \n " , status , __AT__ ) ;
2019-07-09 09:21:34 -07:00
goto done ;
2011-06-14 11:11:05 +03:00
}
2018-12-03 12:40:13 -05:00
if ( sb . st_size < = 5 ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " Small data (%u bytes) \n " , ( unsigned int ) sb . st_size ) ;
2019-01-30 15:01:59 -05:00
status = CL_CLEAN ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc_type: returning %d %s (no post, no cache) \n " , status , __AT__ ) ;
2019-07-09 09:21:34 -07:00
goto done ;
2011-06-14 11:11:05 +03:00
}
ctx - > fmap + + ;
perf_start ( ctx , PERFT_MAP ) ;
2020-03-19 21:23:54 -04:00
if ( ! ( * ctx - > fmap = fmap ( desc , 0 , sb . st_size , name ) ) ) {
2017-08-08 17:38:17 -04:00
cli_errmsg ( " CRITICAL: fmap() failed \n " ) ;
ctx - > fmap - - ;
perf_stop ( ctx , PERFT_MAP ) ;
2019-01-30 15:01:59 -05:00
status = CL_EMEM ;
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_desc_type: returning %d %s (no post, no cache) \n " , status , __AT__ ) ;
2019-07-09 09:21:34 -07:00
goto done ;
2011-06-14 11:11:05 +03:00
}
perf_stop ( ctx , PERFT_MAP ) ;
2020-03-21 14:15:28 -04:00
status = cli_magic_scan ( ctx , type ) ;
2011-06-14 11:11:05 +03:00
funmap ( * ctx - > fmap ) ;
ctx - > fmap - - ;
2018-07-30 20:19:28 -04:00
2019-01-30 15:01:59 -05:00
done :
2018-07-30 20:19:28 -04:00
ctx - > sub_filepath = parent_filepath ;
2019-01-30 15:01:59 -05:00
return status ;
2010-08-24 12:28:16 +02:00
}
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan_desc ( int desc , const char * filepath , cli_ctx * ctx , const char * name )
2013-09-17 16:45:48 -04:00
{
2020-03-21 14:15:28 -04:00
return cli_magic_scan_desc_type ( desc , filepath , ctx , CL_TYPE_ANY , name ) ;
2013-09-17 16:45:48 -04:00
}
2020-03-21 14:15:28 -04:00
cl_error_t cl_scandesc ( int desc , const char * filename , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions )
2010-08-24 12:28:16 +02:00
{
2020-03-21 14:15:28 -04:00
return cl_scandesc_callback ( desc , filename , virname , scanned , engine , scanoptions , NULL ) ;
2010-08-24 12:28:16 +02:00
}
2020-03-21 14:15:28 -04:00
/**
* @ brief Scan an offset / length into a file map .
*
* Magic - scan some portion of an existing fmap .
*
* @ param map File map .
* @ param offset Offset into file map .
* @ param length Length from offset .
* @ param ctx Scanning context structure .
* @ param type CL_TYPE of data to be scanned .
* @ param name ( optional ) Original name of the file ( to set fmap name metadata )
* @ return int CL_SUCCESS , or an error code .
*/
2021-01-23 16:41:41 -08:00
static cl_error_t magic_scan_nested_fmap_type ( cl_fmap_t * map , size_t offset , size_t length , cli_ctx * ctx , cli_file_t type , const char * name )
2010-07-06 19:46:55 +02:00
{
2020-03-21 14:15:28 -04:00
cl_error_t ret = CL_CLEAN ;
cli_dbgmsg ( " magic_scan_nested_fmap_type: [%zu, +%zu), [ " STDi64 " , +%zu) \n " ,
map - > nested_offset , map - > len ,
( int64_t ) offset , length ) ;
2021-01-23 16:41:41 -08:00
if ( offset > = map - > len ) {
cli_dbgmsg ( " Invalid offset: %zu \n " , offset ) ;
2020-03-21 14:15:28 -04:00
return CL_CLEAN ;
}
if ( ! length )
length = map - > len - offset ;
if ( length > map - > len - offset ) {
cli_dbgmsg ( " Data truncated: %zu -> %zu \n " ,
2021-01-23 16:41:41 -08:00
length , map - > len - offset ) ;
length = map - > len - offset ;
2020-03-21 14:15:28 -04:00
}
if ( length < = 5 ) {
cli_dbgmsg ( " Small data (%zu bytes) \n " , length ) ;
return CL_CLEAN ;
}
ctx - > fmap + + ;
* ctx - > fmap = fmap_duplicate ( map , offset , length , name ) ;
if ( NULL = = * ctx - > fmap ) {
cli_dbgmsg ( " Failed to duplicate fmap for scan of fmap subsection \n " ) ;
ctx - > fmap - - ;
return CL_CLEAN ;
}
ret = cli_magic_scan ( ctx , type ) ;
free_duplicate_fmap ( * ctx - > fmap ) ; /* This fmap is just a duplicate. */
* ctx - > fmap = NULL ;
ctx - > fmap - - ;
return ret ;
2010-07-06 19:46:55 +02:00
}
2013-11-08 17:10:43 -05:00
/* For map scans that may be forced to disk */
2021-01-23 16:41:41 -08:00
cl_error_t cli_magic_scan_nested_fmap_type ( cl_fmap_t * map , size_t offset , size_t length , cli_ctx * ctx , cli_file_t type , const char * name )
2013-11-08 17:10:43 -05:00
{
2021-01-23 16:41:41 -08:00
size_t old_off = map - > nested_offset ;
2013-11-08 17:10:43 -05:00
size_t old_len = map - > len ;
2020-03-19 21:23:54 -04:00
cl_error_t ret = CL_CLEAN ;
2013-11-08 17:10:43 -05:00
2021-01-23 16:41:41 -08:00
cli_dbgmsg ( " cli_magic_scan_nested_fmap_type: [%zu, +%zu) \n " , offset , length ) ;
if ( offset > = old_len ) {
cli_dbgmsg ( " Invalid offset: %zu \n " , offset ) ;
2017-08-08 17:38:17 -04:00
return CL_CLEAN ;
2013-11-08 17:10:43 -05:00
}
2018-12-03 12:40:13 -05:00
if ( ctx - > engine - > engine_options & ENGINE_OPTIONS_FORCE_TO_DISK ) {
2013-11-08 17:10:43 -05:00
/* if this is forced to disk, then need to write the nested map and scan it */
const uint8_t * mapdata = NULL ;
2018-12-03 12:40:13 -05:00
char * tempfile = NULL ;
int fd = - 1 ;
size_t nread = 0 ;
2013-11-08 17:10:43 -05:00
2013-12-11 15:30:40 -05:00
/* Then check length */
2017-08-08 17:38:17 -04:00
if ( ! length )
length = old_len - offset ;
2018-12-03 12:40:13 -05:00
if ( length > old_len - offset ) {
2021-01-23 16:41:41 -08:00
cli_dbgmsg ( " cli_magic_scan_nested_fmap_type: Data truncated: %zu -> %zu \n " , length , old_len - offset ) ;
2013-12-11 15:30:40 -05:00
length = old_len - offset ;
}
2018-12-03 12:40:13 -05:00
if ( length < = 5 ) {
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_nested_fmap_type: Small data (%u bytes) \n " , ( unsigned int ) length ) ;
2013-12-11 15:30:40 -05:00
return CL_CLEAN ;
}
2018-12-03 12:40:13 -05:00
if ( ! CLI_ISCONTAINED ( old_off , old_len , old_off + offset , length ) ) {
2021-01-23 16:41:41 -08:00
cli_dbgmsg ( " cli_magic_scan_nested_fmap_type: map error occurred [%zu, %zu] \n " , old_off , old_len ) ;
2013-12-11 15:30:40 -05:00
return CL_CLEAN ;
}
/* Length checked, now get map */
2013-11-08 17:10:43 -05:00
mapdata = fmap_need_off_once_len ( map , offset , length , & nread ) ;
2018-12-03 12:40:13 -05:00
if ( ! mapdata | | ( nread ! = length ) ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan_nested_fmap_type: could not map sub-file \n " ) ;
2013-11-08 17:10:43 -05:00
return CL_EMAP ;
}
2020-03-19 21:23:54 -04:00
ret = cli_gentempfd ( ctx - > sub_tmpdir , & tempfile , & fd ) ;
2018-12-03 12:40:13 -05:00
if ( ret ! = CL_SUCCESS ) {
2013-11-08 17:10:43 -05:00
return ret ;
}
2020-03-21 14:15:28 -04:00
cli_dbgmsg ( " cli_magic_scan_nested_fmap_type: writing nested map content to temp file %s \n " , tempfile ) ;
2019-05-04 15:54:54 -04:00
if ( cli_writen ( fd , mapdata , length ) = = ( size_t ) - 1 ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan_nested_fmap_type: cli_writen error writing subdoc temporary file. \n " ) ;
2013-11-08 17:10:43 -05:00
ret = CL_EWRITE ;
}
/* scan the temp file */
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_desc_type ( fd , tempfile , ctx , type , name ) ;
2013-11-08 17:10:43 -05:00
/* remove the temp file, if needed */
2018-12-03 12:40:13 -05:00
if ( fd > = 0 ) {
2013-11-08 17:10:43 -05:00
close ( fd ) ;
}
2018-12-03 12:40:13 -05:00
if ( ! ctx - > engine - > keeptmp ) {
if ( cli_unlink ( tempfile ) ) {
2020-03-21 14:15:28 -04:00
cli_errmsg ( " cli_magic_scan_nested_fmap_type: error unlinking tempfile %s \n " , tempfile ) ;
2013-11-08 17:10:43 -05:00
ret = CL_EUNLINK ;
}
}
free ( tempfile ) ;
2018-12-03 12:40:13 -05:00
} else {
2013-11-08 17:10:43 -05:00
/* Not forced to disk, use nested map */
2020-03-21 14:15:28 -04:00
ret = magic_scan_nested_fmap_type ( map , offset , length , ctx , type , name ) ;
2011-06-17 23:08:31 +03:00
}
2011-06-14 21:49:39 +03:00
return ret ;
}
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan_buff ( const void * buffer , size_t length , cli_ctx * ctx , const char * name )
2011-06-17 23:14:36 +03:00
{
2020-03-19 21:23:54 -04:00
cl_error_t ret ;
fmap_t * map = NULL ;
map = fmap_open_memory ( buffer , length , name ) ;
2018-12-03 12:40:13 -05:00
if ( ! map ) {
2017-08-08 17:38:17 -04:00
return CL_EMAP ;
2011-06-17 23:14:36 +03:00
}
2020-03-19 21:23:54 -04:00
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_nested_fmap_type ( map , 0 , length , ctx , CL_TYPE_ANY , name ) ;
2020-03-19 21:23:54 -04:00
funmap ( map ) ;
2011-06-17 23:14:36 +03:00
return ret ;
}
2018-07-30 20:19:28 -04:00
/**
2020-04-18 10:46:57 -04:00
* @ brief The main function to initiate a scan of an fmap .
2019-01-22 14:05:05 -05:00
*
2020-04-18 10:46:57 -04:00
* @ param map File map .
2018-07-30 20:19:28 -04:00
* @ param filepath ( optional , recommended ) filepath of the open file descriptor or file map .
* @ param [ out ] virname Will be set to a statically allocated ( i . e . needs not be freed ) signature name if the scan matches against a signature .
* @ param [ out ] scanned The number of bytes scanned .
* @ param engine The scanning engine .
* @ param scanoptions Scanning options .
2020-03-19 21:23:54 -04:00
* @ param [ inout ] context An opaque context structure allowing the caller to record details about the sample being scanned .
2018-07-30 20:19:28 -04:00
* @ return int CL_CLEAN , CL_VIRUS , or an error code if an error occured during the scan .
*/
2020-04-18 10:46:57 -04:00
static cl_error_t scan_common ( cl_fmap_t * map , const char * filepath , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions , void * context )
2010-07-07 03:01:55 +02:00
{
cli_ctx ctx ;
2020-04-18 10:46:57 -04:00
cl_error_t rc ;
2014-04-11 13:41:18 -04:00
2019-05-23 22:50:04 -04:00
char * target_basename = NULL ;
char * new_temp_prefix = NULL ;
size_t new_temp_prefix_len ;
char * new_temp_path = NULL ;
time_t current_time ;
struct tm tm_struct ;
2020-07-15 08:39:32 -07:00
fmap_t * * fmap_head = NULL ;
2019-05-23 22:50:04 -04:00
2020-04-18 10:46:57 -04:00
if ( NULL = = map ) {
return CL_ENULLARG ;
2014-04-11 13:41:18 -04:00
}
2010-07-07 03:01:55 +02:00
2021-03-31 12:16:41 -07:00
/* We have a limit of around 2GB (INT_MAX - 2). Enforce it here. */
/* TODO: Large file support is large-ly untested. Remove this restriction
* and test with a large set of large files of various types . libclamav ' s
* integer type safety has come a long way since 2014 , so it ' s possible
* we could lift this restriction , but at least one of the parsers is
* bound to behave badly with large files . */
2020-04-18 10:46:57 -04:00
if ( ( size_t ) ( map - > real_len ) > ( size_t ) ( INT_MAX - 2 ) )
return CL_CLEAN ;
2010-07-07 03:01:55 +02:00
memset ( & ctx , ' \0 ' , sizeof ( cli_ctx ) ) ;
2018-12-03 12:40:13 -05:00
ctx . engine = engine ;
2010-07-07 03:01:55 +02:00
ctx . virname = virname ;
ctx . scanned = scanned ;
2018-07-20 22:28:48 -04:00
ctx . options = malloc ( sizeof ( struct cl_scan_options ) ) ;
memcpy ( ctx . options , scanoptions , sizeof ( struct cl_scan_options ) ) ;
2010-07-07 03:01:55 +02:00
ctx . found_possibly_unwanted = 0 ;
2018-12-03 12:40:13 -05:00
ctx . containers = cli_calloc ( sizeof ( cli_ctx_container ) , ctx . engine - > maxreclevel + 2 ) ;
2020-07-15 08:39:32 -07:00
if ( ! ctx . containers ) {
rc = CL_EMEM ;
goto done ;
}
2017-01-19 12:24:46 -05:00
cli_set_container ( & ctx , CL_TYPE_ANY , 0 ) ;
2018-12-03 12:40:13 -05:00
ctx . dconf = ( struct cli_dconf * ) engine - > dconf ;
2010-07-07 03:01:55 +02:00
ctx . cb_ctx = context ;
2020-07-24 08:32:47 -07:00
fmap_head = cli_calloc ( sizeof ( fmap_t * ) , ctx . engine - > maxreclevel + 3 ) ;
2020-07-15 08:39:32 -07:00
if ( ! fmap_head ) {
rc = CL_EMEM ;
goto done ;
}
2018-12-03 12:40:13 -05:00
if ( ! ( ctx . hook_lsig_matches = cli_bitset_init ( ) ) ) {
2020-07-15 08:39:32 -07:00
rc = CL_EMEM ;
goto done ;
2010-07-07 03:01:55 +02:00
}
2020-04-18 10:46:57 -04:00
/*
* The first fmap in ctx . fmap must be NULL so we can fmap - - while not NULL .
* But we need an fmap to be set so we can append viruses or report the
* fmap ' s file descriptor in the virus found callback ( like for deferred
* low - seveerity alerts ) .
*/
2020-07-24 08:32:47 -07:00
ctx . fmap = fmap_head + 1 ;
2020-04-18 10:46:57 -04:00
* ctx . fmap = map ;
2011-02-14 19:19:20 +02:00
perf_init ( & ctx ) ;
2010-07-07 03:01:55 +02:00
2019-08-16 17:18:59 -07:00
if ( ctx . engine - > maxscantime ! = 0 ) {
2018-12-03 12:40:13 -05:00
if ( gettimeofday ( & ctx . time_limit , NULL ) = = 0 ) {
2019-08-16 17:18:59 -07:00
uint32_t secs = ctx . engine - > maxscantime / 1000 ;
uint32_t usecs = ( ctx . engine - > maxscantime % 1000 ) * 1000 ;
2014-06-13 16:11:15 -04:00
ctx . time_limit . tv_sec + = secs ;
ctx . time_limit . tv_usec + = usecs ;
2018-12-03 12:40:13 -05:00
if ( ctx . time_limit . tv_usec > = 1000000 ) {
2014-06-13 16:11:15 -04:00
ctx . time_limit . tv_usec - = 1000000 ;
ctx . time_limit . tv_sec + + ;
}
2018-12-03 12:40:13 -05:00
} else {
2014-06-13 16:11:15 -04:00
char buf [ 64 ] ;
2019-08-16 17:18:59 -07:00
cli_dbgmsg ( " scan_common: gettimeofday error: %s \n " , cli_strerror ( errno , buf , 64 ) ) ;
2014-06-13 16:11:15 -04:00
}
}
2020-01-23 17:42:33 -08:00
if ( filepath ! = NULL ) {
2018-07-30 20:19:28 -04:00
ctx . target_filepath = strdup ( filepath ) ;
2017-08-08 17:38:17 -04:00
}
2010-07-07 03:01:55 +02:00
2019-05-23 22:50:04 -04:00
/*
2020-03-19 21:23:54 -04:00
* Create a tmp sub - directory for the temp files generated by this scan .
*
* If keeptmp ( LeaveTemporaryFiles / - - leave - temps ) is enabled , we ' ll include the
* basename in the tmp directory .
* If keeptmp is not enabled , we ' ll just call it " scantemp " .
2019-05-23 22:50:04 -04:00
*/
current_time = time ( NULL ) ;
# ifdef _WIN32
if ( 0 ! = localtime_s ( & tm_struct , & current_time ) ) {
# else
if ( ! localtime_r ( & current_time , & tm_struct ) ) {
# endif
cli_errmsg ( " scan_common: Failed to get local time. \n " ) ;
2020-07-15 08:39:32 -07:00
rc = CL_ESTAT ;
goto done ;
2019-05-23 22:50:04 -04:00
}
2020-03-19 21:23:54 -04:00
if ( ( ctx . engine - > keeptmp ) & &
( NULL ! = ctx . target_filepath ) & &
( CL_SUCCESS = = cli_basename ( ctx . target_filepath , strlen ( ctx . target_filepath ) , & target_basename ) ) ) {
/* Include the basename in the temp directory */
new_temp_prefix_len = strlen ( " YYYYMMDD_HHMMSS- " ) + strlen ( target_basename ) ;
2020-04-08 16:04:20 -04:00
new_temp_prefix = cli_calloc ( 1 , new_temp_prefix_len + 1 ) ;
2019-05-23 22:50:04 -04:00
if ( ! new_temp_prefix ) {
cli_errmsg ( " scan_common: Failed to allocate memory for temp directory name. \n " ) ;
2020-07-15 08:39:32 -07:00
rc = CL_EMEM ;
goto done ;
2019-05-23 22:50:04 -04:00
}
2020-03-19 21:23:54 -04:00
strftime ( new_temp_prefix , new_temp_prefix_len , " %Y%m%d_%H%M%S- " , & tm_struct ) ;
strcpy ( new_temp_prefix + strlen ( " YYYYMMDD_HHMMSS- " ) , target_basename ) ;
2019-05-23 22:50:04 -04:00
} else {
2020-03-19 21:23:54 -04:00
/* Just use date */
new_temp_prefix_len = strlen ( " YYYYMMDD_HHMMSS-scantemp " ) ;
2020-04-08 16:04:20 -04:00
new_temp_prefix = cli_calloc ( 1 , new_temp_prefix_len + 1 ) ;
2019-05-23 22:50:04 -04:00
if ( ! new_temp_prefix ) {
cli_errmsg ( " scan_common: Failed to allocate memory for temp directory name. \n " ) ;
2020-07-15 08:39:32 -07:00
rc = CL_EMEM ;
goto done ;
2019-05-23 22:50:04 -04:00
}
2020-03-19 21:23:54 -04:00
strftime ( new_temp_prefix , new_temp_prefix_len , " %Y%m%d_%H%M%S-scantemp " , & tm_struct ) ;
2019-05-23 22:50:04 -04:00
}
2020-03-19 21:23:54 -04:00
/* Place the new temp sub-directory within the configured temp directory */
2019-05-23 22:50:04 -04:00
new_temp_path = cli_gentemp_with_prefix ( ctx . engine - > tmpdir , new_temp_prefix ) ;
free ( new_temp_prefix ) ;
if ( NULL = = new_temp_path ) {
cli_errmsg ( " scan_common: Failed to generate temp directory name. \n " ) ;
2020-07-15 08:39:32 -07:00
rc = CL_EMEM ;
goto done ;
2019-05-23 22:50:04 -04:00
}
2020-03-19 21:23:54 -04:00
ctx . sub_tmpdir = new_temp_path ;
2019-05-23 22:50:04 -04:00
2020-03-19 21:23:54 -04:00
if ( mkdir ( ctx . sub_tmpdir , 0700 ) ) {
cli_errmsg ( " Can't create temporary directory for scan: %s. \n " , ctx . sub_tmpdir ) ;
2020-07-15 08:39:32 -07:00
rc = CL_EACCES ;
goto done ;
2019-05-23 22:50:04 -04:00
}
2010-11-02 12:26:33 +02:00
cli_logg_setup ( & ctx ) ;
2020-04-18 10:46:57 -04:00
2021-05-17 17:22:22 -07:00
rc = cli_magic_scan ( & ctx , CL_TYPE_ANY ) ;
2020-04-18 10:46:57 -04:00
if ( rc = = CL_CLEAN & & ctx . found_possibly_unwanted ) {
cli_virus_found_cb ( & ctx ) ;
}
2010-07-07 03:01:55 +02:00
2014-04-23 17:37:23 -04:00
# if HAVE_JSON
2018-12-03 12:40:13 -05:00
if ( ctx . options - > general & CL_SCAN_GENERAL_COLLECT_METADATA & & ( ctx . properties ! = NULL ) ) {
2014-06-30 11:44:02 -04:00
json_object * jobj ;
const char * jstring ;
/* set value of unique root object tag */
2018-12-03 12:40:13 -05:00
if ( json_object_object_get_ex ( ctx . properties , " FileType " , & jobj ) ) {
2014-06-30 11:44:02 -04:00
enum json_type type ;
const char * jstr ;
type = json_object_get_type ( jobj ) ;
2018-12-03 12:40:13 -05:00
if ( type = = json_type_string ) {
2014-06-30 11:44:02 -04:00
jstr = json_object_get_string ( jobj ) ;
cli_jsonstr ( ctx . properties , " RootFileType " , jstr ) ;
}
}
2014-06-03 13:31:50 -04:00
/* serialize json properties to string */
2020-03-21 11:36:53 -04:00
# ifdef JSON_C_TO_STRING_NOSLASHESCAPE
2020-03-19 21:23:54 -04:00
jstring = json_object_to_json_string_ext ( ctx . properties , JSON_C_TO_STRING_PRETTY | JSON_C_TO_STRING_NOSLASHESCAPE ) ;
2020-03-21 11:36:53 -04:00
# else
jstring = json_object_to_json_string_ext ( ctx . properties , JSON_C_TO_STRING_PRETTY ) ;
# endif
2018-12-03 12:40:13 -05:00
if ( NULL = = jstring ) {
2014-04-16 16:40:56 -04:00
cli_errmsg ( " scan_common: no memory for json serialization. \n " ) ;
2014-05-05 17:05:26 -04:00
rc = CL_EMEM ;
2018-12-03 12:40:13 -05:00
} else {
int ret = CL_SUCCESS ;
2016-05-17 16:44:21 -04:00
struct cli_matcher * iroot = ctx . engine - > root [ 13 ] ;
2014-06-03 13:31:50 -04:00
cli_dbgmsg ( " %s \n " , jstring ) ;
2015-01-15 15:15:01 -08:00
2019-05-23 22:50:04 -04:00
if ( ( rc ! = CL_VIRUS ) | | ( ctx . options - > general & CL_SCAN_GENERAL_ALLMATCHES ) ) {
2015-03-04 12:08:34 -05:00
/* run bytecode preclass hook; generate fmap if needed for running hook */
2015-03-03 15:00:41 -05:00
struct cli_bc_ctx * bc_ctx = cli_bytecode_context_alloc ( ) ;
2018-12-03 12:40:13 -05:00
if ( ! bc_ctx ) {
2015-03-03 15:00:41 -05:00
cli_errmsg ( " scan_common: can't allocate memory for bc_ctx \n " ) ;
rc = CL_EMEM ;
2018-12-03 12:40:13 -05:00
} else {
2020-04-18 10:46:57 -04:00
cli_bytecode_context_setctx ( bc_ctx , & ctx ) ;
rc = cli_bytecode_runhook ( & ctx , ctx . engine , bc_ctx , BC_PRECLASS , map ) ;
2018-11-14 16:58:30 -05:00
cli_bytecode_context_destroy ( bc_ctx ) ;
2015-03-03 15:00:41 -05:00
}
2015-03-04 12:08:34 -05:00
/* backwards compatibility: scan the json string unless a virus was detected */
2018-12-03 12:40:13 -05:00
if ( rc ! = CL_VIRUS & & ( iroot - > ac_lsigs | | iroot - > ac_patterns
2017-08-23 15:08:47 -04:00
# ifdef HAVE_PCRE
2018-12-03 12:40:13 -05:00
| | iroot - > pcre_metas
2020-03-19 21:23:54 -04:00
# endif // HAVE_PCRE
2018-12-03 12:40:13 -05:00
) ) {
2015-09-21 12:07:19 -04:00
cli_dbgmsg ( " scan_common: running deprecated preclass bytecodes for target type 13 \n " ) ;
2018-07-20 22:28:48 -04:00
ctx . options - > general & = ~ CL_SCAN_GENERAL_COLLECT_METADATA ;
2020-03-21 14:15:28 -04:00
rc = cli_magic_scan_buff ( jstring , strlen ( jstring ) , & ctx , NULL ) ;
2015-03-04 12:08:34 -05:00
}
2014-06-03 13:31:50 -04:00
}
2014-05-05 17:05:26 -04:00
2014-06-03 13:31:50 -04:00
/* Invoke file props callback */
2018-12-03 12:40:13 -05:00
if ( ctx . engine - > cb_file_props ! = NULL ) {
2015-03-27 13:21:49 -04:00
ret = ctx . engine - > cb_file_props ( jstring , rc , ctx . cb_ctx ) ;
2014-06-03 13:31:50 -04:00
if ( ret ! = CL_SUCCESS )
rc = ret ;
}
/* keeptmp file processing for file properties json string */
2018-12-03 12:40:13 -05:00
if ( ctx . engine - > keeptmp ) {
int fd = - 1 ;
2017-08-08 17:38:17 -04:00
char * tmpname = NULL ;
2019-05-23 22:50:04 -04:00
2020-03-19 21:23:54 -04:00
if ( ( ret = cli_newfilepathfd ( ctx . sub_tmpdir , " metadata.json " , & tmpname , & fd ) ) ! = CL_SUCCESS ) {
2014-06-03 13:31:50 -04:00
cli_dbgmsg ( " scan_common: Can't create json properties file, ret = %i. \n " , ret ) ;
2018-12-03 12:40:13 -05:00
} else {
2019-05-04 15:54:54 -04:00
if ( cli_writen ( fd , jstring , strlen ( jstring ) ) = = ( size_t ) - 1 )
2014-06-03 13:31:50 -04:00
cli_dbgmsg ( " scan_common: cli_writen error writing json properties file. \n " ) ;
else
2014-06-13 11:21:59 -04:00
cli_dbgmsg ( " json written to: %s \n " , tmpname ) ;
2014-05-05 17:05:26 -04:00
}
2014-06-03 13:31:50 -04:00
if ( fd ! = - 1 )
close ( fd ) ;
if ( NULL ! = tmpname )
free ( tmpname ) ;
2014-05-05 17:05:26 -04:00
}
}
2016-05-10 18:43:42 -04:00
cli_json_delobj ( ctx . properties ) ; /* frees all json memory */
2014-04-16 16:40:56 -04:00
}
2020-03-19 21:23:54 -04:00
# endif // HAVE_JSON
2014-04-16 16:40:56 -04:00
2018-12-03 12:40:13 -05:00
if ( rc = = CL_CLEAN ) {
2018-07-20 22:28:48 -04:00
if ( ( ctx . found_possibly_unwanted ) | |
2018-12-03 12:40:13 -05:00
( ( ctx . num_viruses ! = 0 ) & &
( ( ctx . options - > general & CL_SCAN_GENERAL_ALLMATCHES ) | |
2020-03-19 21:23:54 -04:00
( ctx . options - > heuristic & CL_SCAN_HEURISTIC_EXCEEDS_MAX ) ) ) ) {
2017-08-08 17:38:17 -04:00
rc = CL_VIRUS ;
2020-03-19 21:23:54 -04:00
}
2015-10-01 17:47:37 -04:00
}
2018-07-20 22:28:48 -04:00
2020-07-15 08:39:32 -07:00
cli_logg_unsetup ( ) ;
done :
2020-03-19 21:23:54 -04:00
if ( NULL ! = ctx . sub_tmpdir ) {
if ( ! ctx . engine - > keeptmp ) {
( void ) cli_rmdirs ( ctx . sub_tmpdir ) ;
}
free ( ctx . sub_tmpdir ) ;
}
2019-05-23 22:50:04 -04:00
2020-07-15 08:39:32 -07:00
if ( NULL ! = target_basename ) {
free ( target_basename ) ;
}
2018-07-30 20:19:28 -04:00
if ( NULL ! = ctx . target_filepath ) {
free ( ctx . target_filepath ) ;
}
2020-07-15 08:39:32 -07:00
if ( NULL ! = ctx . perf ) {
perf_done ( & ctx ) ;
}
if ( NULL ! = ctx . hook_lsig_matches ) {
cli_bitset_free ( ctx . hook_lsig_matches ) ;
}
if ( NULL ! = fmap_head ) {
free ( fmap_head ) ;
}
if ( NULL ! = ctx . containers ) {
free ( ctx . containers ) ;
}
if ( NULL ! = ctx . options ) {
free ( ctx . options ) ;
}
2019-05-23 22:50:04 -04:00
2010-07-07 03:01:55 +02:00
return rc ;
}
2020-03-19 21:23:54 -04:00
cl_error_t cl_scandesc_callback ( int desc , const char * filename , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions , void * context )
2011-06-14 21:49:39 +03:00
{
2020-04-18 10:46:57 -04:00
cl_error_t status = CL_SUCCESS ;
cl_fmap_t * map = NULL ;
STATBUF sb ;
char * filename_base = NULL ;
if ( FSTAT ( desc , & sb ) = = - 1 ) {
cli_errmsg ( " cl_scandesc_callback: Can't fstat descriptor %d \n " , desc ) ;
status = CL_ESTAT ;
goto done ;
}
if ( sb . st_size < = 5 ) {
2021-03-31 12:16:41 -07:00
cli_dbgmsg ( " cl_scandesc_callback: File too small ( " STDu64 " bytes), ignoring \n " , ( uint64_t ) sb . st_size ) ;
2020-04-18 10:46:57 -04:00
status = CL_CLEAN ;
goto done ;
}
2021-03-31 12:16:41 -07:00
if ( ( uint64_t ) sb . st_size > engine - > maxfilesize ) {
cli_dbgmsg ( " cl_scandesc_callback: File too large ( " STDu64 " bytes), ignoring \n " , ( uint64_t ) sb . st_size ) ;
if ( scanoptions - > heuristic & CL_SCAN_HEURISTIC_EXCEEDS_MAX ) {
engine - > cb_virus_found ( desc , " Heuristics.Limits.Exceeded " , context ) ;
status = CL_VIRUS ;
} else {
status = CL_CLEAN ;
}
goto done ;
}
2020-04-18 10:46:57 -04:00
if ( NULL ! = filename ) {
( void ) cli_basename ( filename , strlen ( filename ) , & filename_base ) ;
}
if ( NULL = = ( map = fmap ( desc , 0 , sb . st_size , filename_base ) ) ) {
cli_errmsg ( " CRITICAL: fmap() failed \n " ) ;
status = CL_EMEM ;
goto done ;
}
status = scan_common ( map , filename , virname , scanned , engine , scanoptions , context ) ;
done :
if ( NULL ! = map ) {
funmap ( map ) ;
}
if ( NULL ! = filename_base ) {
free ( filename_base ) ;
}
return status ;
2011-06-14 21:49:39 +03:00
}
2020-03-19 21:23:54 -04:00
cl_error_t cl_scanmap_callback ( cl_fmap_t * map , const char * filename , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions , void * context )
2011-06-14 21:49:39 +03:00
{
2021-03-31 12:16:41 -07:00
if ( map - > real_len > engine - > maxfilesize ) {
cli_dbgmsg ( " cl_scandesc_callback: File too large (%zu bytes), ignoring \n " , map - > real_len ) ;
if ( scanoptions - > heuristic & CL_SCAN_HEURISTIC_EXCEEDS_MAX ) {
engine - > cb_virus_found ( fmap_fd ( map ) , " Heuristics.Limits.Exceeded " , context ) ;
return CL_VIRUS ;
}
return CL_CLEAN ;
}
2020-04-18 10:46:57 -04:00
return scan_common ( map , filename , virname , scanned , engine , scanoptions , context ) ;
2011-06-14 21:49:39 +03:00
}
2020-03-19 21:23:54 -04:00
cl_error_t cli_found_possibly_unwanted ( cli_ctx * ctx )
2008-07-31 10:51:46 +00:00
{
2018-12-03 12:40:13 -05:00
if ( cli_get_last_virus ( ctx ) ) {
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " found Possibly Unwanted: %s \n " , cli_get_last_virus ( ctx ) ) ;
2018-12-03 12:40:13 -05:00
if ( SCAN_HEURISTIC_PRECEDENCE ) {
2017-08-08 17:38:17 -04:00
/* we found a heuristic match, don't scan further,
2019-08-16 17:18:59 -07:00
* but consider it a virus . */
2017-08-08 17:38:17 -04:00
cli_dbgmsg ( " cli_found_possibly_unwanted: CL_VIRUS \n " ) ;
return CL_VIRUS ;
}
/* heuristic scan isn't taking precedence, keep scanning.
2019-08-16 17:18:59 -07:00
* If this is part of an archive , and
* we find a real malware we report that instead of the
* heuristic match */
2017-08-08 17:38:17 -04:00
ctx - > found_possibly_unwanted = 1 ;
2018-12-03 12:40:13 -05:00
} else {
2017-08-08 17:38:17 -04:00
cli_warnmsg ( " cli_found_possibly_unwanted called, but virname is not set \n " ) ;
2012-10-18 14:12:58 -07:00
}
emax_reached ( ctx ) ;
return CL_CLEAN ;
2008-07-31 10:51:46 +00:00
}
2020-03-21 14:15:28 -04:00
cl_error_t cli_magic_scan_file ( const char * filename , cli_ctx * ctx , const char * original_name )
2004-04-14 22:55:44 +00:00
{
2020-03-19 21:23:54 -04:00
int fd = - 1 ;
cl_error_t ret = CL_EOPEN ;
2004-04-14 22:55:44 +00:00
2004-07-04 14:56:48 +00:00
/* internal version of cl_scanfile with arec/mrec preserved */
2020-03-19 21:23:54 -04:00
fd = safe_open ( filename , O_RDONLY | O_BINARY ) ;
if ( fd < 0 ) {
goto done ;
}
2004-04-14 22:55:44 +00:00
2020-03-21 14:15:28 -04:00
ret = cli_magic_scan_desc ( fd , filename , ctx , original_name ) ;
2004-04-14 22:55:44 +00:00
2020-03-19 21:23:54 -04:00
done :
if ( fd > = 0 ) {
close ( fd ) ;
}
2004-04-14 22:55:44 +00:00
return ret ;
}
2020-03-19 21:23:54 -04:00
cl_error_t cl_scanfile ( const char * filename , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions )
2003-07-29 15:48:06 +00:00
{
2010-11-02 12:26:33 +02:00
return cl_scanfile_callback ( filename , virname , scanned , engine , scanoptions , NULL ) ;
2003-07-29 15:48:06 +00:00
}
2008-05-27 16:30:47 +00:00
2020-03-19 21:23:54 -04:00
cl_error_t cl_scanfile_callback ( const char * filename , const char * * virname , unsigned long int * scanned , const struct cl_engine * engine , struct cl_scan_options * scanoptions , void * context )
2010-07-06 19:46:55 +02:00
{
2020-03-19 21:23:54 -04:00
int fd ;
cl_error_t ret ;
2017-08-08 17:38:17 -04:00
const char * fname = cli_to_utf8_maybe_alloc ( filename ) ;
2010-07-06 19:46:55 +02:00
2017-08-08 17:38:17 -04:00
if ( ! fname )
return CL_EARG ;
2011-04-18 17:25:23 +02:00
2021-01-21 16:32:54 -08:00
if ( ( fd = safe_open ( fname , O_RDONLY | O_BINARY ) ) = = - 1 ) {
if ( errno = = EACCES ) {
return CL_EACCES ;
} else {
return CL_EOPEN ;
}
}
2010-07-06 19:46:55 +02:00
2017-08-08 17:38:17 -04:00
if ( fname ! = filename )
2018-07-30 20:19:28 -04:00
free ( ( char * ) fname ) ;
2011-04-18 17:25:23 +02:00
2018-07-30 20:19:28 -04:00
ret = cl_scandesc_callback ( fd , filename , virname , scanned , engine , scanoptions , context ) ;
2010-07-06 19:46:55 +02:00
close ( fd ) ;
return ret ;
}
2008-05-27 16:30:47 +00:00
/*
Local Variables :
c - basic - offset : 4
End :
*/