Commit graph

64 commits

Author SHA1 Message Date
Micah Snyder
0a24f70218 Rename Heuristics.Email.ExceedsMax alerts
Rename Heuristics.Email.ExceedsMax alerts to start with
Heuristics.Limits.Exceeded.Email instead, so that all heuristic alerts
for exceeded scan limits have the same prefix.
2021-10-29 15:01:25 -07:00
Micah Snyder
db013a2bfd libclamav: Fix scan recursion tracking
Scan recursion is the process of identifying files embedded in other
files and then scanning them, recursively.

Internally this process is more complex than it may sound because a file
may have multiple layers of types before finding a new "file".

At present we treat the recursion count in the scanning context as an
index into both our fmap list AND our container list. These two lists
are conceptually a part of the same thing and should be unified.

But what's concerning is that the "recursion level" isn't actually
incremented or decremented at the same time that we add a layer to the
fmap or container lists but instead is more touchy-feely, increasing
when we find a new "file".

To account for this shadiness, the size of the fmap and container lists
has always been a little longer than our "max scan recursion" limit so
we don't accidentally overflow the fmap or container arrays (!).

I've implemented a single recursion-stack as an array, similar to before,
which includes a pointer to each fmap at each layer, along with the size
and type. Push and pop functions add and remove layers whenever a new
fmap is added. A boolean argument when pushing indicates if the new layer
represents a new buffer or new file (descriptor). A new buffer will reset
the "nested fmap level" (described below).

This commit also provides a solution for an issue where we detect
embedded files more than once during scan recursion.

For illustration, imagine a tarball named foo.tar.gz with this structure:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |

But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| baz.exe                   | PE    | 0         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| │   └── hello.txt         | ASCII | 2         | 0                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |
|     └── world.txt         | ASCII | 2         | 0                 |

(A) If we scan for embedded files at any layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| ├── foo.tar               | TAR   | 1         | 0                 |
| │   ├── bar.zip           | ZIP   | 2         | 1                 |
| │   │   └── hola.txt      | ASCII | 3         | 0                 |
| │   ├── baz.exe           | PE    | 2         | 1                 |
| │   │   ├── sfx.zip       | ZIP   | 3         | 1                 |
| │   │   │   └── hello.txt | ASCII | 4         | 0                 |
| │   │   └── sfx.7z        | 7Z    | 3         | 1                 |
| │   │       └── world.txt | ASCII | 4         | 0                 |
| │   ├── sfx.zip           | ZIP   | 2         | 1                 |
| │   │   └── hello.txt     | ASCII | 3         | 0                 |
| │   └── sfx.7z            | 7Z    | 2         | 1                 |
| │       └── world.txt     | ASCII | 3         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |

(A) is bad because it scans content more than once.

Note that for the GZ layer, it may detect the ZIP and 7Z if the
signature hits on the compressed data, which it might, though
extracting the ZIP and 7Z will likely fail.

The reason the above doesn't happen now is that we restrict embedded
type scans for a bunch of archive formats to include GZ and TAR.

(B) If we scan for embedded files at the foo.tar layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     ├── baz.exe           | PE    | 2         | 1                 |
|     ├── sfx.zip           | ZIP   | 2         | 1                 |
|     │   └── hello.txt     | ASCII | 3         | 0                 |
|     └── sfx.7z            | 7Z    | 2         | 1                 |
|         └── world.txt     | ASCII | 3         | 0                 |

(B) is almost right. But we can achieve it easily enough only scanning for
embedded content in the current fmap when the "nested fmap level" is 0.
The upside is that it should safely detect all embedded content, even if
it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe.

The biggest risk I can think of affects ZIPs. SFXZIP detection
is identical to ZIP detection, which is why we don't allow SFXZIP to be
detected if insize of a ZIP. If we only allow embedded type scanning at
fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP
if the bar.exe was not compressed in foo.zip and if non-compressed files
extracted from ZIPs aren't extracted as new buffers:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.zip                   | ZIP   | 0         | 0                 |
| └── bar.exe               | PE    | 1         | 1                 |
|     └── sfx.zip           | ZIP   | 2         | 2                 |

Provided that we ensure all files extracted from zips are scanned in
new buffers, option (B) should be safe.

(C) If we scan for embedded files at the baz.exe layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |
|         ├── sfx.zip       | ZIP   | 3         | 1                 |
|         │   └── hello.txt | ASCII | 4         | 0                 |
|         └── sfx.7z        | 7Z    | 3         | 1                 |
|             └── world.txt | ASCII | 4         | 0                 |

(C) is right. But it's harder to achieve. For this example we can get it by
restricting 7ZSFX and ZIPSFX detection only when scanning an executable.
But that may mean losing detection of archives embedded elsewhere.
And we'd have to identify allowable container types for each possible
embedded type, which would be very difficult.

So this commit aims to solve the issue the (B)-way.

Note that in all situations, we still have to scan with file typing
enabled to determine if we need to reassign the current file type, such
as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2-
compressed. Detection of DMG and a handful of other types rely on
finding data partway through or near the ned of a file before
reassigning the entire file as the new type.

Other fixes and considerations in this commit:

- The utf16 HTML parser has weak error handling, particularly with respect
  to creating a nested fmap for scanning the ascii decoded file.
  This commit cleans up the error handling and wraps the nested scan with
  the recursion-stack push()/pop() for correct recursion tracking.

  Before this commit, each container layer had a flag to indicate if the
  container layer is valid.
  We need something similar so that the cli_recursion_stack_get_*()
  functions ignore normalized layers. Details...

  Imagine an LDB signature for HTML content that specifies a ZIP
  container. If the signature actually alerts on the normalized HTML and
  you don't ignore normalized layers for the container check, it will
  appear as though the alert is in an HTML container rather than a ZIP
  container.

  This commit accomplishes this with a boolean you set in the scan context
  before scanning a new layer. Then when the new fmap is created, it will
  use that flag to set similar flag for the layer. The context flag is
  reset those that anything after this doesn't have that flag.
  The flag allows the new recursion_stack_get() function to ignore
  normalized layers when iterating the stack to return a layer at a
  requested index, negative or positive.

  Scanning normalized extracted/normalized javascript and VBA should also
  use the 'layer is normalized' flag.

- This commit also fixes Heuristic.Broken.Executable alert for ELF files
  to make sure that:

  A) these only alert if cli_append_virus() returns CL_VIRUS (aka it
  respects the FP check).

  B) all broken-executable alerts for ELF only happen if the
  SCAN_HEURISTIC_BROKEN option is enabled.

- This commit also cleans up the error handling in cli_magic_scan_dir().
  This was needed so we could correctly apply the layer-is-normalized-flag
  to all VBA macros extracted to a directory when scanning the directory.

- Also fix an issue where exceeding scan maximums wouldn't cause embedded
  file detection scans to abort. Granted we don't actually want to abort
  if max filesize or max recursion depth are exceeded... only if max
  scansize, max files, and max scantime are exceeded.

  Add 'abort_scan' flag to scan context, to protect against depending on
  correct error propagation for fatal conditions. Instead, setting this
  flag in the scan context should guarantee that a fatal condition deep in
  scan recursion isn't lost which result in more stuff being scanned
  instead of aborting. This shouldn't be necessary, but some status codes
  like CL_ETIMEOUT never used to be fatal and it's easier to do this than
  to verify every parser only returns CL_ETIMEOUT and other "fatal
  status codes" in fatal conditions.

- Remove duplicate is_tar() prototype from filestypes.c and include
  is_tar.h instead.

- Presently we create the fmap hash when creating the fmap.
  This wastes a bit of CPU if the hash is never needed.
  Now that we're creating fmap's for all embedded files discovered with
  file type recognition scans, this is a much more frequent occurence and
  really slows things down.

  This commit fixes the issue by only creating fmap hashes as needed.
  This should not only resolve the perfomance impact of creating fmap's
  for all embedded files, but also should improve performance in general.

- Add allmatch check to the zip parser after the central-header meta
  match. That way we don't multiple alerts with the same match except in
  allmatch mode. Clean up error handling in the zip parser a tiny bit.

- Fixes to ensure that the scan limits such as scansize, filesize,
  recursion depth, # of embedded files, and scantime are always reported
  if AlertExceedsMax (--alert-exceeds-max) is enabled.

- Fixed an issue where non-fatal alerts for exceeding scan maximums may
  mask signature matches later on. I changed it so these alerts use the
  "possibly unwanted" alert-type and thus only alert if no other alerts
  were found or if all-match or heuristic-precedence are enabled.

- Added the "Heuristics.Limits.Exceeded.*" events to the JSON metadata
  when the --gen-json feature is enabled. These will show up once under
  "ParseErrors" the first time a limit is exceeded. In the present
  implementation, only one limits-exceeded events will be added, so as to
  prevent a malicious or malformed sample from filling the JSON buffer
  with millions of events and using a tonne of RAM.
2021-10-25 16:02:29 -07:00
Micah Snyder
efd8ac5244 Manpages: Add environment variables to the docs
The CURL_CA_BUNDLE environment variable used by freshclam & clamsubmit to
specify a custom path to a CA bundle is undocumented.

Feature was added here: https://bugzilla.clamav.net/show_bug.cgi?id=12504

Resolves: https://github.com/Cisco-Talos/clamav/issues/175

Also document:
- clamd/clamscan: using LD_LIBRARY_PATH to find libclamunrar_iface.so/dylib
- sigtool: using SIGNDUSER, SIGNDPASS for auth creds when building CVD

This info also needs to be added to the online documentation.
2021-08-17 10:33:15 -07:00
Micah Snyder
1cda765843 CMake: Fix build on systems lacking inttypes format string macros
Define _SF64_PREFIX and _SF32_PREFIX on systems that do not have these
macros: PRIu64, PRIx64, PRIi64, PRIu32, PRIi32, PRIx32

This logic is the same as in the previous build system, here:
https://github.com/Cisco-Talos/clamav/blob/rel/0.102/m4/reorganization/types.m4#L83

Patch courtesy of Mark Fortescue.
2021-08-05 16:54:02 -07:00
Micah Snyder
cf63dad598 clamav.net URL update for new docs (2)
Additional link fixes, missed in the previous commit.
2021-07-17 16:21:47 -07:00
Micah Snyder
51165518a5 clamscan: add missing cert-related options to manpage 2021-05-27 19:24:18 -07:00
Micah Snyder
cd2f2975b9 Docs: Warn against running untrusted bytecode
Add notices to man pages and help strings cautioning against running
bytecode signatures from untrusted sources.

Also adds missing BytecodeUnsigned option to clamd.conf.sample files.
2020-07-09 15:38:15 -07:00
Micah Snyder (micasnyd)
6e17eb5e97 Adds missing clamscan --max-scantime documentation 2020-04-01 17:21:46 -07:00
Micah Snyder
a6165cd487 bb12151: Added --foreground to clamd help output and man page. Also correcting the default bytecode timeout in the clamscan man page. 2018-12-02 23:07:06 -05:00
Micah Snyder (micasnyd)
f61e92da8f Changing numerous scan options' names, primarily those of heuristic signatature alert options. Original options (command line and clamd) will remain as deprecated & undocumented for a couple releases. Added 2 extra scan options to allow users to differentiate between alerting on encrypted archives vs encrypted documents (bb11911). 2018-12-02 23:06:59 -05:00
Micah Snyder
f67a9b7508 bb12118: Lowering the default PCRERecMatchLimit from 5000 to 2000, to minimize risk of segfault due to bug in older versions of libpcre/libpcrev2. 2018-12-02 23:06:58 -05:00
Micah Snyder
964a1e7321 Converting http urls to https urls. Primary focus was on clamav.net urls. I updated a couple others and fixes a few broken links as well. There are many (non-clamav.net) urls I didn't address, especially in 3rd party or contrib code. 2018-04-02 07:58:33 -04:00
Micah Snyder
e098cdc557 Updating help strings, to include a couple missing items as well as copyrights. updating man page files as well. 2018-02-14 12:08:36 -05:00
Micah Snyder
a1da16eee7 bb11025: Correcting PUA URL in man pages and shared optparser. 2018-02-08 16:00:09 -05:00
Micah Snyder
22880de038 eliminating additional option references to stat collection and submission until such time as a new stats website and associated clamav code is ready. 2017-10-24 13:38:37 -04:00
Mickey Sola
7a85da5c9a increasing size of pcre match limit 2017-03-01 16:19:17 -05:00
Steven Morgan
166174bcf0 pull request #53(1/4): Spelling fix by klemens(ka7). 2016-10-19 12:26:33 -04:00
Steven Morgan
e7dfe57d3a bb11522 - additional block-max w.i.p. : clamd, man pages. 2016-09-20 17:45:40 -04:00
Steven Morgan
ce6becd511 bb11471 - add clamscan parameter --normalize=no for yara compatibility. 2016-06-02 18:09:25 -04:00
Steven Morgan
c18363244b bb1436 - clamscan 'block-macros' option. Patch by Kai Risku. 2016-03-10 18:26:33 -05:00
Kevin Lin
ea9ffd291b add scanning options for scanning xml-based documents (MSXML, OOXML, HWPML) and HWP3 2016-02-02 14:23:19 -05:00
Kevin Lin
731c8e6213 hwp3.x: add support for maximum recursive calls to hwp3 parsing 2016-01-19 14:28:48 -05:00
Steven Morgan
779c0fdc9a Usage message and man page updates for the clamscan --disable-cache option. 2015-12-21 16:50:33 -05:00
Steven Morgan
4de9676764 Fix clamscan and clamd.conf man page web links for PUA categories. 2015-06-12 15:26:15 -04:00
Kevin Lin
c94a95b821 updated documentation on '--statistics' option 2015-02-19 12:47:20 -05:00
Kevin Lin
877bca9b3e updated PCRE functionality documentation 2015-02-05 08:31:52 -08:00
Joel Esler
00fb0d9118 Fixed broken links.
Across the whole of the product.
2014-09-02 11:29:35 -04:00
Shawn Webb
2e10c4d76b Document the new stats feature in manpages and help text 2014-03-07 13:59:17 -05:00
Kevin Lin
067bce5fbc engine: added max-iconspe(MaxIconsPE) option and docs 2014-03-07 10:23:18 -05:00
Kevin Lin
e33d8379c1 docs: added documentation on partition intersection heuristic 2014-03-05 17:37:47 -05:00
Kevin Lin
4b5895b8bc docs: added documentation on max-partitions option 2014-03-05 17:22:13 -05:00
Steven Morgan
06e02797fd additional manpage info for max-scansize. 2014-02-05 10:58:16 -05:00
Kevin Lin
4f4b8d1f04 updated man page documentation for clamscan, freshclam, and *.conf files 2013-12-04 17:12:35 -05:00
Steve Morgan
54402320c0 Add bytecode performance statistics 2012-12-05 15:48:52 -08:00
Steve Morgan
1feaa72a34 for allscan mode, update usage messages and man pages 2012-11-27 14:48:50 -08:00
Tomasz Kojm
ac090cf581 docs: clarify behavior of --scan-*/Scan* options (bb#3134) 2011-08-02 17:05:20 +02:00
Tomasz Kojm
3bf5b5bdae fix copy&paste error 2011-03-28 22:51:22 +02:00
Tomasz Kojm
62315ce69a clamd: add new config option BytecodeUnsigned (bb#2537); drop
"None" from BytecodeSecurity
clamscan: add new switch --bytecode-unsigned and drop --bytecode-trust-all
2011-02-17 19:17:35 +01:00
Tomasz Kojm
8c57a6c1b7 clamscan: add new options --follow-(dir|file)-symlinks (bb#1870) 2010-12-28 18:24:51 +01:00
Tomasz Kojm
021b6720f5 update 'SEE ALSO' (bb#2006) 2010-05-06 17:02:53 +02:00
Tomasz Kojm
50b0bd804f document bytecode timeout 2010-03-24 18:24:12 +01:00
Tomasz Kojm
c4910836f3 update manuals 2010-03-19 17:42:25 +01:00
Tomasz Kojm
8770404a47 clamscan: properly report errors from libclamav; simplify error codes 2010-02-04 21:33:03 +01:00
Tomasz Kojm
9da619b44e clamscan: properly describe --include/exclude (bb#1765) 2009-12-04 14:20:12 +01:00
Tomasz Kojm
208ceae5c7 clamd, clamscan: add support for OfficialDatabaseOnly/--official-db-only (bb#1743) 2009-11-10 19:30:33 +01:00
Tomasz Kojm
6a4dd9dc6b clamd, clamscan, libclamav: drop support for MailFollowURLs (bb#1677) 2009-08-06 22:29:13 +02:00
Tomasz Kojm
2086dc5cab clamd, clamscan: add support for CrossFilesystems/--cross-fs (bb#1607) 2009-08-05 16:27:48 +02:00
Tomasz Kojm
c2b6681b79 clamscan, clamdscan: add support for --file-list/-f
git-svn: trunk@5069
2009-05-21 13:43:05 +00:00
aCaB
32ec634439 bb#1508
git-svn: trunk@5020
2009-04-03 11:09:00 +00:00
Tomasz Kojm
269d520dfb shared/optparser.c, clamscan: use the new option parser
git-svn: trunk@4580
2008-12-30 10:33:43 +00:00