Commit graph

102 commits

Author SHA1 Message Date
Jan Smutny
b33e4be3ea zip: Fix false negative w. HeuristicScanPrecedence 2020-07-25 11:15:21 -07:00
Micah Snyder
e2f59af30a Clang-format touchup 2020-07-24 16:37:25 -07:00
Micah Snyder
d1f209e879 Fix fmap NULL deref in preclass bytecode hook
If using a bytecode signature that makes use of the BC_PRECLASS hook and
if it alerts, a NULL dereference may occur.  This change fixes that.

Also fixed unrelated memory leaks introduced recently when adding file
name extraction to the zip parser and rar parser.
2020-06-03 11:00:53 -04:00
Micah Snyder
11ef77007b Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files.  For parsers like the vba parser, this is required as the
directory is later scanned.  For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled.  It's not quite as simple as removing the extra
subdirectories, however.  Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.

The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory.  In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.

This commit also:

- Provides the 'bmp' prefix for extracted PE icons.

- Removes empty tmp subdirs when extracting rtf files, to eliminate
  clutter.

- The PDF parser sometimes creates tmp files when decompressing streams
  before it knows if there is actually any content to decompress.  This
  resulted in a large number of empty files.  While it would be best to
  avoid creating empty files in the first place, that's not quite as
  as it sounds.  This commit does the next best thing and deletes the
  tmp files if nothing was actually extracted, even if --leave-temps is
  enabled.

- Removes the "scantemp" prefix for unnamed fmaps scanned with
  cli_magic_scan().  The 5-character hashes given to tmp files with
  prefixes resulted in occasional file name collisions when extracting
  certain file types with thousands of embedded files.

- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
  resulting in truncated file paths and failed extraction  when
  --leave-temps is enabled and a lot of recursion is in play.  This commit
  switches them from NAME_MAX to PATH_MAX.
2020-06-03 11:00:53 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
74bbc91433 Modifies zip scanning behavior so it scans files using zip records from the catalogue which provides deduplication of file records resulting in faster extraction and scan time and reducing the likelihood of alerting on non-malicious duplicate file entries as overlapping files. 2019-10-02 16:08:30 -04:00
Micah Snyder
a04a7627b3 bb12356 - Improvement to overlapping zip files detection logic.
Increases zip overlapping file checking to equal the max scan files setting.
2019-10-02 16:08:30 -04:00
Micah Snyder (micasnyd)
6a0abb897a Adds --max-scantime clamscan option and MaxScanTime clamd config option.
--max-scantime replaces the --timelimit clamscan option that had been experimental.
Default max-scantime set to 2 minutes (120000 milliseconds).
2019-10-02 16:08:29 -04:00
Micah Snyder
e3d269a933 Adds detection and heuristic alert for zips with overlapping files, preventing extraction of non-recursive zip bombs. 2019-10-02 16:08:26 -04:00
Micah Snyder
ad6e0f70cb Adds unzip parser code readability improvements; doxygen function comments. 2019-10-02 16:08:26 -04:00
Micah Snyder
4524c398f3 Argument and return types for fmap_readn(), cli_writen(), cli_readn() converted to use size_t instead of int. 2019-10-02 16:08:25 -04:00
Micah Snyder
5f4f69102d Correcting types from int to cl_error_t where appropriate. Eliminating unused variables and referencing unused parameters to remove warnings. 2019-10-02 16:08:25 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder (micasnyd)
f3fd2ac2e3 Adjustment to Zip extraction logic to make Z_BUF_ERROR error code non-fatal, allowing scans of partially decompressed files. 2018-12-02 23:07:03 -05:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder (micasnyd)
f61e92da8f Changing numerous scan options' names, primarily those of heuristic signatature alert options. Original options (command line and clamd) will remain as deprecated & undocumented for a couple releases. Added 2 extra scan options to allow users to differentiate between alerting on encrypted archives vs encrypted documents (bb11911). 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Steven Morgan
cbf5017a7d bb11805 fix multiple results. Refactor false positive and heuristic precedence logic. 2017-04-18 12:07:06 -04:00
Matthew Boedicker
1b9b5f6dad bb11605 - Update the error code to CL_ETMPFILE
Signed-off-by: Slawek Ligus <sligus@pivotal.io>
2016-07-15 12:23:21 -04:00
Steven Morgan
7a307529d8 bb11580 - make cli_matchmeta() respect allmatch. 2016-06-08 16:25:34 -04:00
Kevin Lin
51b8cc326d unzip: check for ctx value as requests do not supply a ctx 2016-05-11 14:49:20 -04:00
Steven Morgan
6146fae115 bb11560 - make cdb signatures also operate on central directory file names because they can differ from the file names in the local headers. 2016-05-09 13:53:40 -04:00
Steven Morgan
9276cd1f5f bb11547 - print all CDBNAME entries for a zip file when using the -z flag. 2016-03-29 16:18:51 -04:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Kevin Lin
202a5daec7 cid 12212 - fix zip decryption failure state 2015-08-19 11:14:49 -04:00
Kevin Lin
038cb67a35 pwdb: restructured storage for time efficiency 2015-07-21 10:39:38 -04:00
Kevin Lin
f5f7b7a1b9 dconf: added passwd dconf for archives, applied to unzip 2015-07-21 10:39:38 -04:00
Kevin Lin
0b119e6f78 unzip: debug message consistency 2015-07-21 10:39:38 -04:00
Kevin Lin
1ac97cf036 unzip: added scanning of decrypted files 2015-07-21 10:39:38 -04:00
Kevin Lin
a60ec79975 unzip: added traditional PKWARE decryption password verification 2015-07-21 10:39:38 -04:00
Kevin Lin
93a9a942f7 ooxml: fixed a number of potential memory issues 2014-11-25 13:29:39 -05:00
Kevin Lin
c8c80ddfd9 bb#11145 - added function to determine ooxml filetype
unzip: adjusted dir/file searching mechanism
2014-10-14 17:21:56 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00
Kevin Lin
871b862eb5 added changes from peer code review 2014-06-27 16:30:41 -04:00
Kevin Lin
20b45621cb added pre-class timeouts for ms-docs and pe files 2014-06-25 13:08:18 -04:00
Kevin Lin
25556519f9 ooxml: moved ooxml specific functions to new source
added new source files to Makefile and win32 project
autojunk'd
2014-05-01 16:59:01 -04:00
Kevin Lin
23509742df ooxml: outputs properties to json file
ooxml: json uses integers and booleans with conversion
2014-05-01 16:49:17 -04:00
Kevin Lin
52017fbba3 ooxml: added process of locating property files
ooxml: added property file specific callbacks
2014-04-30 18:01:55 -04:00
Kevin Lin
f85f7323b3 ooxml: parsing for [Content_Types].xml 2014-04-30 16:45:33 -04:00
Kevin Lin
252a31c587 Merge branch 'master' of git.clam.sourcefire.com:/var/lib/git/clamav-devel 2014-04-28 17:09:20 -04:00
Kevin Lin
c8c878f92e zip(ooxml): added function for search by name
zip(ooxml): added callback functionality for simgle file processing
2014-04-28 12:45:51 -04:00
Steven Morgan
92e0ae1504 bz#10978 fix for allmatch/unzip issue. 2014-04-25 17:40:17 -04:00
Shawn Webb
b2e7c931d0 Use OpenSSL for hashing. 2014-02-08 00:31:12 -05:00
David Raynor
9eff941820 cid #11399 2013-03-12 13:13:14 -04:00
Shawn Webb
7e40bab956 bb6099 - check return value of lseek() 2013-02-28 21:01:40 -05:00
Steve Morgan
6ad45a2931 add initial allscan/allmatch mode to libclamav, clamd, clamdscan, and clamscan with unit tests 2012-10-18 14:12:58 -07:00