Commit graph

44 commits

Author SHA1 Message Date
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
4524c398f3 Argument and return types for fmap_readn(), cli_writen(), cli_readn() converted to use size_t instead of int. 2019-10-02 16:08:25 -04:00
Micah Snyder
ca8b4c466e Assortment of warning fixes. 2019-10-02 16:08:25 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
8cf9b527b0 Updated win32 3rdparty libxml2 to version 2.9.8. 2018-12-02 23:07:01 -05:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder
78a06dae12 elminating warnings that crop up when using --with-libjson 2017-08-28 17:49:25 -04:00
Steven Morgan
928864b818 fix file descriptor leak for msxml documents - patch from Chris Miserva. 2017-01-17 12:27:07 -05:00
Steven Morgan
22cb38ed24 pull request #53(2/4): Spelling fix by klemens(ka7). 2016-10-19 15:57:45 -04:00
Steven Morgan
10bcbe7a46 clean up file/memory in error case. 2016-07-08 12:15:12 -04:00
Anthony Chan
46b9e8c2ca Fix bug in msxml_parse_element which may leave behind empty temp file and leak a little memory 2016-07-08 12:04:17 -04:00
Kevin Lin
4016c0f994 msxml_parser: suppress xml2 parser error and warnings to clamav debug 2016-05-26 17:05:36 -04:00
Kevin Lin
cd70b7cad8 msxml_parser: add custom callback data slot 2016-05-26 17:05:36 -04:00
Kevin Lin
cb7403214b msxml_parser: change method of setting callback system; add comment_cb 2016-05-26 17:05:36 -04:00
Kevin Lin
6732844acb msxml_parser: flags for modifying reader usage (json, walk) 2016-05-26 17:05:36 -04:00
Kevin Lin
c2df9f79d3 mhtml: wrapper for xml parsing using libxml2 htmlparser 2016-05-26 17:05:35 -04:00
Kevin Lin
80ba6f2c9a clang compiler warning corrections 2016-02-18 11:59:08 -05:00
Kevin Lin
99967b5790 compile bugfix for non-json builds 2016-01-15 16:49:19 -05:00
Kevin Lin
d45186458b squash hwp amd msxml parser memory leaks 2016-01-15 15:34:01 -05:00
Kevin Lin
523e4264e0 msxml_parser: add MSXML_JSON_MULTI option for tracking multiple entries for same key 2015-12-17 16:18:17 -05:00
Kevin Lin
416456da73 msxml_parser: add callback-based scanning mechanism 2015-12-16 16:16:01 -05:00
Kevin Lin
66e314847c msxml_parser: add MSXML_SCAN_B64_TRIM4 key field (for HWPML) 2015-12-16 16:16:01 -05:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Kevin Lin
f8f9d7e1f5 cid 12151/12150 - correct condition for checking allmatch in parsing msxml documents (revised) 2015-08-21 14:41:37 -04:00
Kevin Lin
f8f2ff9480 cid 12144 - simplify null ctx check in msxml parsing 2015-08-19 11:14:52 -04:00
Kevin Lin
4ba7e7fe90 cid 12151/12150 - correct condition for checking allmatch in parsing msxml documents 2015-08-19 11:14:52 -04:00
Kevin Lin
3d374eec31 msxml: virus detection and allmatch fixes 2015-04-29 17:17:31 -04:00
Kevin Lin
48d1a07597 msxml: memory issues with tempfiles 2015-04-17 11:25:44 -04:00
Kevin Lin
25a8a0f91a msxml: memory fixes 2015-04-16 12:31:09 -04:00
Kevin Lin
f773990c28 msxml: final suppression of parsing errors (for release) 2015-04-01 13:21:15 -04:00
Kevin Lin
f17cd8d16f added clamav-specific xmlerror handler for msxml 2015-04-01 12:14:44 -04:00
Kevin Lin
d2efc60c01 win32: fixed build in regards to msxml 2015-03-16 12:07:03 -04:00
Kevin Lin
e014b62302 code optimiztions and clean-up 2015-03-13 16:06:55 -04:00
Kevin Lin
4e2ae35b58 json_api: added parse error reporting function
msxml: added parsing error reporting to preclass json
2015-03-13 14:18:37 -04:00
Kevin Lin
d7fa810a82 msxml: added value parser, taken from ooxml parser 2015-03-13 13:29:04 -04:00
Kevin Lin
6c627868d3 msxml: added timeout checks at various processing loops 2015-03-13 13:19:20 -04:00
Kevin Lin
66af3adf04 msxml: fixed an issue with json element counting 2015-03-13 11:45:23 -04:00
Kevin Lin
1629a6614d msxml: finished keyinfo type handling
msxml: fixed wrkptr json keyinfo type
2015-03-13 11:23:40 -04:00
Kevin Lin
e5d4ae99b7 msxml: added keyinfo type for attributes 2015-03-13 10:32:35 -04:00
Kevin Lin
5994bee6ad added new source file for shared code between ooxml and msxml 2015-03-12 19:58:16 -04:00