Commit graph

110 commits

Author SHA1 Message Date
Micah Snyder
11ef77007b Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files.  For parsers like the vba parser, this is required as the
directory is later scanned.  For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled.  It's not quite as simple as removing the extra
subdirectories, however.  Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.

The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory.  In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.

This commit also:

- Provides the 'bmp' prefix for extracted PE icons.

- Removes empty tmp subdirs when extracting rtf files, to eliminate
  clutter.

- The PDF parser sometimes creates tmp files when decompressing streams
  before it knows if there is actually any content to decompress.  This
  resulted in a large number of empty files.  While it would be best to
  avoid creating empty files in the first place, that's not quite as
  as it sounds.  This commit does the next best thing and deletes the
  tmp files if nothing was actually extracted, even if --leave-temps is
  enabled.

- Removes the "scantemp" prefix for unnamed fmaps scanned with
  cli_magic_scan().  The 5-character hashes given to tmp files with
  prefixes resulted in occasional file name collisions when extracting
  certain file types with thousands of embedded files.

- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
  resulting in truncated file paths and failed extraction  when
  --leave-temps is enabled and a lot of recursion is in play.  This commit
  switches them from NAME_MAX to PATH_MAX.
2020-06-03 11:00:53 -04:00
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Jonas Zaddach (jzaddach)
cd977727f0 Add LZMA & BZip2 decompression to bytecode API
Adds LZMA and BZip2 decompression routines to the bytecode API.
The ability to decompress LZMA and BZip2 streams is particularly
useful for bytecode signatures that extend clamav executable
unpacking capabilities.

Of note, the LZMA format is not well standardized. This API
expects the stream to start with the LZMA_Alone header.

Also fixed a bug in LZMA dictionary size setting.
2020-04-29 09:26:07 -07:00
Micah Snyder
50455664a7 libclamav: Fix fmap leak in bytecode runtime
Fixes an fmap leak in the bytecode switch_input() API.  The
switch_input() API provides a way to read from an extracted file instead
of reading from the current file.  The issue is that the current
implementation fails to free the fmap created to read from the extracted
file on cleanup or when switching back to the original fmap.  In
addition, it fails to use the cli_bytecode_context_setfile() function
to restore the file_size in the context for the current fmap.

Fixes a couple fmap leaks in the unit tests.
2020-04-20 11:26:43 -07:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
53c2cb1b02 Error handling improvements in bytecode api function to alleviate coverity complaints. 2019-10-02 16:08:25 -04:00
Micah Snyder
4524c398f3 Argument and return types for fmap_readn(), cli_writen(), cli_readn() converted to use size_t instead of int. 2019-10-02 16:08:25 -04:00
Micah Snyder
ee40795fe2 Converted mpool calls to macros when USE_MPOOL is defined to clearly differentiate between function and macro behavior. 2019-10-02 16:08:25 -04:00
Micah Snyder
479a9a235a Fixes for issues identified by coverity. 2019-10-02 16:08:19 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
b3e82e5e61 Replacing libclamav/cltypes.h with clamav-types.h.in, which generates a header clamav-types.h that we install alongside clamav.h. 2019-10-02 16:08:17 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder (micasnyd)
f61e92da8f Changing numerous scan options' names, primarily those of heuristic signatature alert options. Original options (command line and clamd) will remain as deprecated & undocumented for a couple releases. Added 2 extra scan options to allow users to differentiate between alerting on encrypted archives vs encrypted documents (bb11911). 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder (micasnyd)
89d5207b31 Added new pdf object stream parsing capability. 2018-12-02 23:06:58 -05:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Steven Morgan
48fef7b8ec 11898 - fix unit test failure with zlib 1.2.9+. Patch provided by Marc Deslauriers. 2017-08-18 16:06:26 -04:00
Steven Morgan
ea4ab2bccc bb11742 fix compile error in bytecode_api.c on Mac OS X. 2017-02-15 14:07:50 -05:00
Mickey Sola
631cb6a005 Fixes and updates to intermediate container sig rules based on code review 2017-02-01 17:33:15 -05:00
klin
031fe00a4d restructure container typing system to use array (#2) 2017-01-19 12:24:46 -05:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00
Steven Morgan
6c048b8a30 Use json_object_object_get_ex() rather than json_object_object_get(), which is deprecated in json-c 0.10 2014-06-06 14:38:45 -04:00
Kevin Lin
9048572cec bytecode_api: fixed variable assignment issue 2014-06-03 12:43:23 -04:00
Kevin Lin
c6a3b294a9 bytecode: fixed a compiler issue and warnings 2014-06-03 11:47:57 -04:00
Kevin Lin
3107a6c24f bytecode: fixed issue with older versions of g++ 2014-06-03 11:19:01 -04:00
Steven Morgan
51f8cc3c18 More json header includes. 2014-05-23 10:11:32 -04:00
Kevin Lin
546e168bb7 api: added safety checks 2014-05-06 18:18:05 -04:00
Kevin Lin
61e3637d08 bytecode api: added support for querying int and booleans from json properties 2014-05-06 16:15:08 -04:00
Kevin Lin
fa7ae4ccbc bytecode api: updated copyright information
bytecode api: added json properties reading implementation
2014-05-06 16:13:48 -04:00
Shawn Webb
b2e7c931d0 Use OpenSSL for hashing. 2014-02-08 00:31:12 -05:00
Kevin Lin
90c0acc762 formatted a number of bytecode files, converted tabs to spaces 2014-01-16 17:57:40 -05:00
Shawn Webb
9691454612 bb6091 - check lseek() return 2013-02-28 19:32:29 -05:00
David Raynor
4a836f4310 CID #10418 2013-02-13 14:21:37 -05:00
Ryan Pentney
791868e80e I don't always test my code, but when I do... I do it in production. 2013-02-07 11:23:31 -08:00
Steve Morgan
6ad45a2931 add initial allscan/allmatch mode to libclamav, clamd, clamdscan, and clamscan with unit tests 2012-10-18 14:12:58 -07:00
Shawn webb
6a049897d9 BB#5455 2012-07-10 13:17:45 -04:00
Török Edvin
cc4d540831 bb #4324
memcpy() crashes because GCC sees 'struct cli_exe_section*'
and assumes that section is aligned to at least 4 bytes.
But it isn't, so change the parameter to just 'void*'.

(Casting doesn't help, as GCC sees through it).

Also fixes part 1 of bb #3789.
2012-02-29 17:04:16 +02:00
Török Edvin
f304dc688a fmapify: fix const-ness warnings 2012-01-05 14:16:09 +02:00
Török Edvin
3d664817f6 fix recursion level crash (bb #3706).
Thanks to Stephane Chazelas for the analysis.
2011-10-08 12:12:22 +03:00
Török Edvin
acc8bccb89 bb #2307. 2010-10-19 16:23:19 +03:00
Török Edvin
e4fedabef4 Warn about zlib version mismatches (bb #2072).
In libclamav: if zlib version at runtime is older than at compile time, warn.
If they are the same, or newer don't warn.

clamconf warns always on mismatch.

Mismatch can happen if:
 - you build zlib yourself, but as static lib and compiler picks old shared lib
 (but new headers!)
 - you have 2 zlibs installed, and the old one takes precedence

Libclamav doesn't warn about mismatches due to zlib upgrades since this is
normal.
2010-10-18 14:16:43 +03:00
Török Edvin
4116c65d1b Add bytecode API to determine whether running under JIT. 2010-10-18 12:35:39 +03:00
Török Edvin
d7531f2ad2 Fix warnings. 2010-10-18 12:24:11 +03:00
Török Edvin
ae8dc8c2bc Gather bytecode events from bytecode API. 2010-10-18 10:48:18 +03:00
Török Edvin
f73212dc62 Fix bytecode virusname reporting (bb #2255).
Also adds possibility to stop a hook from executing, and set
a virus as heuristic (by using BC.Heuristic* name)
2010-09-10 22:11:32 +03:00
Török Edvin
1dae00ebf4 bytecode: add icon match API. 2010-08-02 18:21:24 +03:00