Commit graph

230 commits

Author SHA1 Message Date
John Humlick
dda0a70d90 cdiff: Replace cdiff-apply feature with Rust implementation
Apply both .cdiff and .script CVD patches.

Note: A script is a non-compressed and unsigned file containing cdiff
commands. There is no header or footer that should be processed.

This Rust-based implementation of the cdiff-apply feature includes
equivalent features as found in the C-based implementation:
- cdiff file signature validation against sha256 of the file contents
- Gz decoding of file contents
- File open command
- File close command
- Signature add command
- Line delete command
- Xchg command
- Move command
- Unlink command

This Rust implementation adds cdiff-apply unit tests to verify correct
functionality.
2022-01-10 12:18:33 -07:00
micasnyd
140c88aa4e Bump copyright for 2022
Includes minor format corrections.
2022-01-09 14:23:25 -07:00
Micah Snyder
01ca0a2edd Fix issues spotted by Coverity from fmap recursion fix
CID 361074: fmap.c: Possible invalid dereference if status != success
and the new map was not yet allocated.

CID 361077: others.c: Structurally dead code revealed a bug in the
cli_recursion_stack_get_size() function.

CID 361080, 361078, 361083: sigtool.c: Inverted check for if engine
needs to be free'd, could leak the engine structure.

CID 361075: sigtool.c: Missed a `return -1` that should've been `goto
done;` and would leak the new_map buffer.

CID 361079: sigtool/vba.c: Checking if we should free the new_map on
failure only if ctx also needs to be free'd, which would leak the
new_map if ctx was not allocated yet.
2021-10-29 12:35:15 -07:00
Micah Snyder
0354482e16 Fix issues reading from uncompressed nested files
The fmap module provides a mechanism for creating a mapping into an
existing map at an offset and length that's used when a file is found
with an uncompressed archive or when embedded files are found with
embedded file type recognition in scanraw(). This is the
"fmap_duplicate()" function. Duplicate fmaps just reference the original
fmap's 'data' or file handle/descriptor while allowing the caller to
treat it like a new map using offsets and lengths that don't account for
the original/actual file dimensions.

fmap's keep track of this with m->nested_offset & m->real_len, which
admittedly have confusing names. I found incorrect uses of these in a
handful of locations. Notably:
- In cli_magic_scan_nested_fmap_type().
  The force-to-disk feature would have been checking incorrect sizes and
  may have written incorrect offsets for duplicate fmaps.
- In XDP parser.
- A bunch of places from the previous commit when making dupe maps.

This commit fixes those and adds lots of documentation to the fmap.h API
to try to prevent confusion in the future.

nested_offset should never be referenced outside of fmap.c/h.

The fmap_* functions for accessing or reading map data have two
implementations, mem_* or handle_*, depending the data source.
I found issues with some of these so I made a unit test that covers each
of the functions I'm concerned about for both types of data sources and
for both original fmaps and nested/duplicate fmaps.

With the tests, I found and fixed issues in these fmap functions:
- handle_need_offstr(): must account for the nested_offset in dupe maps.
- handle_gets(): must account for nested_offset and use len & real_len
  correctly.
- mem_need_offstr(): must account for nested_offset in dupe maps.
- mem_gets(): must account for nested_offset and use len & real_len
  correctly.

Moved CDBRANGE() macro out of function definition so for better
legibility.

Fixed a few warnings.
2021-10-25 16:02:29 -07:00
Micah Snyder
db013a2bfd libclamav: Fix scan recursion tracking
Scan recursion is the process of identifying files embedded in other
files and then scanning them, recursively.

Internally this process is more complex than it may sound because a file
may have multiple layers of types before finding a new "file".

At present we treat the recursion count in the scanning context as an
index into both our fmap list AND our container list. These two lists
are conceptually a part of the same thing and should be unified.

But what's concerning is that the "recursion level" isn't actually
incremented or decremented at the same time that we add a layer to the
fmap or container lists but instead is more touchy-feely, increasing
when we find a new "file".

To account for this shadiness, the size of the fmap and container lists
has always been a little longer than our "max scan recursion" limit so
we don't accidentally overflow the fmap or container arrays (!).

I've implemented a single recursion-stack as an array, similar to before,
which includes a pointer to each fmap at each layer, along with the size
and type. Push and pop functions add and remove layers whenever a new
fmap is added. A boolean argument when pushing indicates if the new layer
represents a new buffer or new file (descriptor). A new buffer will reset
the "nested fmap level" (described below).

This commit also provides a solution for an issue where we detect
embedded files more than once during scan recursion.

For illustration, imagine a tarball named foo.tar.gz with this structure:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |

But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| baz.exe                   | PE    | 0         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| │   └── hello.txt         | ASCII | 2         | 0                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |
|     └── world.txt         | ASCII | 2         | 0                 |

(A) If we scan for embedded files at any layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| ├── foo.tar               | TAR   | 1         | 0                 |
| │   ├── bar.zip           | ZIP   | 2         | 1                 |
| │   │   └── hola.txt      | ASCII | 3         | 0                 |
| │   ├── baz.exe           | PE    | 2         | 1                 |
| │   │   ├── sfx.zip       | ZIP   | 3         | 1                 |
| │   │   │   └── hello.txt | ASCII | 4         | 0                 |
| │   │   └── sfx.7z        | 7Z    | 3         | 1                 |
| │   │       └── world.txt | ASCII | 4         | 0                 |
| │   ├── sfx.zip           | ZIP   | 2         | 1                 |
| │   │   └── hello.txt     | ASCII | 3         | 0                 |
| │   └── sfx.7z            | 7Z    | 2         | 1                 |
| │       └── world.txt     | ASCII | 3         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |

(A) is bad because it scans content more than once.

Note that for the GZ layer, it may detect the ZIP and 7Z if the
signature hits on the compressed data, which it might, though
extracting the ZIP and 7Z will likely fail.

The reason the above doesn't happen now is that we restrict embedded
type scans for a bunch of archive formats to include GZ and TAR.

(B) If we scan for embedded files at the foo.tar layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     ├── baz.exe           | PE    | 2         | 1                 |
|     ├── sfx.zip           | ZIP   | 2         | 1                 |
|     │   └── hello.txt     | ASCII | 3         | 0                 |
|     └── sfx.7z            | 7Z    | 2         | 1                 |
|         └── world.txt     | ASCII | 3         | 0                 |

(B) is almost right. But we can achieve it easily enough only scanning for
embedded content in the current fmap when the "nested fmap level" is 0.
The upside is that it should safely detect all embedded content, even if
it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe.

The biggest risk I can think of affects ZIPs. SFXZIP detection
is identical to ZIP detection, which is why we don't allow SFXZIP to be
detected if insize of a ZIP. If we only allow embedded type scanning at
fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP
if the bar.exe was not compressed in foo.zip and if non-compressed files
extracted from ZIPs aren't extracted as new buffers:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.zip                   | ZIP   | 0         | 0                 |
| └── bar.exe               | PE    | 1         | 1                 |
|     └── sfx.zip           | ZIP   | 2         | 2                 |

Provided that we ensure all files extracted from zips are scanned in
new buffers, option (B) should be safe.

(C) If we scan for embedded files at the baz.exe layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |
|         ├── sfx.zip       | ZIP   | 3         | 1                 |
|         │   └── hello.txt | ASCII | 4         | 0                 |
|         └── sfx.7z        | 7Z    | 3         | 1                 |
|             └── world.txt | ASCII | 4         | 0                 |

(C) is right. But it's harder to achieve. For this example we can get it by
restricting 7ZSFX and ZIPSFX detection only when scanning an executable.
But that may mean losing detection of archives embedded elsewhere.
And we'd have to identify allowable container types for each possible
embedded type, which would be very difficult.

So this commit aims to solve the issue the (B)-way.

Note that in all situations, we still have to scan with file typing
enabled to determine if we need to reassign the current file type, such
as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2-
compressed. Detection of DMG and a handful of other types rely on
finding data partway through or near the ned of a file before
reassigning the entire file as the new type.

Other fixes and considerations in this commit:

- The utf16 HTML parser has weak error handling, particularly with respect
  to creating a nested fmap for scanning the ascii decoded file.
  This commit cleans up the error handling and wraps the nested scan with
  the recursion-stack push()/pop() for correct recursion tracking.

  Before this commit, each container layer had a flag to indicate if the
  container layer is valid.
  We need something similar so that the cli_recursion_stack_get_*()
  functions ignore normalized layers. Details...

  Imagine an LDB signature for HTML content that specifies a ZIP
  container. If the signature actually alerts on the normalized HTML and
  you don't ignore normalized layers for the container check, it will
  appear as though the alert is in an HTML container rather than a ZIP
  container.

  This commit accomplishes this with a boolean you set in the scan context
  before scanning a new layer. Then when the new fmap is created, it will
  use that flag to set similar flag for the layer. The context flag is
  reset those that anything after this doesn't have that flag.
  The flag allows the new recursion_stack_get() function to ignore
  normalized layers when iterating the stack to return a layer at a
  requested index, negative or positive.

  Scanning normalized extracted/normalized javascript and VBA should also
  use the 'layer is normalized' flag.

- This commit also fixes Heuristic.Broken.Executable alert for ELF files
  to make sure that:

  A) these only alert if cli_append_virus() returns CL_VIRUS (aka it
  respects the FP check).

  B) all broken-executable alerts for ELF only happen if the
  SCAN_HEURISTIC_BROKEN option is enabled.

- This commit also cleans up the error handling in cli_magic_scan_dir().
  This was needed so we could correctly apply the layer-is-normalized-flag
  to all VBA macros extracted to a directory when scanning the directory.

- Also fix an issue where exceeding scan maximums wouldn't cause embedded
  file detection scans to abort. Granted we don't actually want to abort
  if max filesize or max recursion depth are exceeded... only if max
  scansize, max files, and max scantime are exceeded.

  Add 'abort_scan' flag to scan context, to protect against depending on
  correct error propagation for fatal conditions. Instead, setting this
  flag in the scan context should guarantee that a fatal condition deep in
  scan recursion isn't lost which result in more stuff being scanned
  instead of aborting. This shouldn't be necessary, but some status codes
  like CL_ETIMEOUT never used to be fatal and it's easier to do this than
  to verify every parser only returns CL_ETIMEOUT and other "fatal
  status codes" in fatal conditions.

- Remove duplicate is_tar() prototype from filestypes.c and include
  is_tar.h instead.

- Presently we create the fmap hash when creating the fmap.
  This wastes a bit of CPU if the hash is never needed.
  Now that we're creating fmap's for all embedded files discovered with
  file type recognition scans, this is a much more frequent occurence and
  really slows things down.

  This commit fixes the issue by only creating fmap hashes as needed.
  This should not only resolve the perfomance impact of creating fmap's
  for all embedded files, but also should improve performance in general.

- Add allmatch check to the zip parser after the central-header meta
  match. That way we don't multiple alerts with the same match except in
  allmatch mode. Clean up error handling in the zip parser a tiny bit.

- Fixes to ensure that the scan limits such as scansize, filesize,
  recursion depth, # of embedded files, and scantime are always reported
  if AlertExceedsMax (--alert-exceeds-max) is enabled.

- Fixed an issue where non-fatal alerts for exceeding scan maximums may
  mask signature matches later on. I changed it so these alerts use the
  "possibly unwanted" alert-type and thus only alert if no other alerts
  were found or if all-match or heuristic-precedence are enabled.

- Added the "Heuristics.Limits.Exceeded.*" events to the JSON metadata
  when the --gen-json feature is enabled. These will show up once under
  "ParseErrors" the first time a limit is exceeded. In the present
  implementation, only one limits-exceeded events will be added, so as to
  prevent a malicious or malformed sample from filling the JSON buffer
  with millions of events and using a tonne of RAM.
2021-10-25 16:02:29 -07:00
Micah Snyder
90e4d66f7c OLE2 / XLS document image extraction
Added a feature to extract images from OLE2 BIFF streams.
This work was derived from InQuests blog post about extracting XLM and
images from XLS files:
https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files

Assorted ole2 parser code cleanup and massive error handling cleanup.

Also fixed the following:

- The XLS parser may fail to process all BIFF records if some of the
records contain unexpected data or is otherwise malformed. Because the
record size is already known, we can skip over the "malformed" record
and continue with the rest.

- Fixed an issue where the ole2 header size was improperly calculated,
failing to account for the new "has_xlm" boolean added for context.
2021-07-17 10:39:27 -07:00
Micah Snyder (micasnyd)
13af789f4e SigTool: fix insufficient buffer size for --list-sigs
SigTool's --list-sigs feature can't handle long LDB signatures because the buffer size was wrong.
2021-06-24 14:08:49 -07:00
Micah Snyder
0255f29a72 Blacklist & Whitelist verbiage
Improvements to use modern block list and allow list verbiage.

blacklist -> block list
whitelist -> allow listed
blacklisted -> blocked
whitelisted -> allowed

In the case of certificate verification, use "trust" or "verify" when
something is allowed.

Also changed domainlist -> domain list (or DomainList) to match.
2021-05-27 14:16:00 -07:00
Micah Snyder
c025afd683 Rename "shared" library to "common"
The named "shared" is confusing, especially now that these features are
built as a static library instead of being directly compiled into the
various applications.
2021-04-20 17:31:19 -07:00
Micah Snyder
bae444a25b clang-format housekeeping 2021-04-09 19:08:14 -07:00
Micah Snyder (micasnyd)
b9ca6ea103 Update copyright dates for 2021
Also fixes up clang-format.
2021-03-19 15:12:26 -07:00
Micah Snyder
afbf0b6180 Fix Windows text file EOL conversion issues
On Windows, files open()'ed without the O_BINARY flag will have new-line
LF (aka \n) converted to CRLF (aka \r\n) automatically when read from or
written to. This is undesirable for all scan targets AND temp files
because it affects pattern matching and with hashing.

This commit converts a handful of instances throughout the codebase
where it appears that O_BINARY was mistakenly omitted and could result
in unexpected behavior on Windows.

Git on Windows also converts LF -> CRLF for "text" files, for editing
purposes.
This is problematic for scan files and test files that should match
verbatim.
We can prevent this issue by marking .ref test files as "binary" in the
.gitattributes file and by always opening scan files and temp files as
binary.

In this commit I've also removed the `ChangeLog merge=cl-merge` line
that was once used to reduce ChangeLog merge conflicts by using the
gnulib git-merge-changlog tool. This project now categorizes changes in
the NEWS.md.
For finer detail, git commit history is fully accessible on github.com.
2021-02-25 11:41:28 -08:00
ihsinme
d8976f907c sigtool: Fix leak in script-to-cdiff error handling 2021-02-17 11:43:00 -08:00
thinksilicon
9ce1b70cd7 sigtool --decode-sigs: handle escaped / in regexes
Sigtool's decode-sigs function incorrectly handles escaped forward
slashes in regexes. This code change finds the correct position of
the regex_end, so CFLAGS will be printed correctly.
2021-02-14 19:31:25 -08:00
Micah Snyder (micasnyd)
9e20cdf6ea Add CMake build tooling
This patch adds experimental-quality CMake build tooling.

The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.

Removed use of libltdl when using CMake.

Flex & Bison are now required to build.

If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything.  TODO!

I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.

CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.

Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.

For example, ...

```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```

... becomes:

```c
// libclamav
include "clamav.h"

// clamd
include "clamd_others.h"
```

Fixes header name conflicts by renaming a few of the files.

Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.

libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.

Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.

Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.

The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.

Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.

Generate common.rc for win32.

Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.

Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.

clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
  similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
  the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.

Updates to clamdscan man page, add missing options.

Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).

Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
2020-08-13 00:25:34 -07:00
Andrew
1d66184a7d More Coverity bug fixes
Looking through the list of issues, I spotted some easy ones and submitted
some fixes:

- 225229 - In cli_rarload: Leak of memory or pointers to system resources.
If finding the necessary libunrar functions fails (should be rare),we now
dlclose libunrar.

225224 - In main (freshclam.c): A copied piece of code is inconsistent with
the original (CWE-398). A minor copy-paste error was present, and optOutList
could be cleaned up in one of the failure edge cases.

225228 - In decodecdb: Out-of-bounds access to a buffer (CWE-119). Off by one
error when tokenizing certain CDB sig fields for printing with sigtool. Ex:

$ cat test.cdb
a:CL_TYPE_7Z:1-2-3:/.*/:1-2-3:1-2-3:0:1-2-3::

$ cat test.cdb | ../installed/bin/sigtool --decode
VIRUS NAME: a
CONTAINER TYPE: CL_TYPE_7Z
CONTAINER SIZE: WITHIN RANGE 1 to 2
FILENAME REGEX: /.*/
COMPRESSED FILESIZE: WITHIN RANGE 1 to 2
UNCOMPRESSED FILESIZE: WITHIN RANGE 1 to 2
ENCRYPTION: NO
FILE POSITION: =================================================================
==17245==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffe3136d10 at pc 0x7f0f31c3f414 bp 0x7fffe3136c70 sp 0x7fffe3136c60
WRITE of size 8 at 0x7fffe3136d10 thread T0
    #0 0x7f0f31c3f413 in cli_strtokenize ../../libclamav/str.c:524
    #1 0x559e9797dc91 in decodecdb ../../sigtool/sigtool.c:2929
    #2 0x559e9797ea66 in decodesig ../../sigtool/sigtool.c:3058
    #3 0x559e9797f31e in decodesigs ../../sigtool/sigtool.c:3162
    #4 0x559e97981fbc in main ../../sigtool/sigtool.c:3638
    #5 0x7f0f3100fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #6 0x559e9795a1d9 in _start (/home/zelda/workspace/clamav-devel/installed/bin/sigtool+0x381d9)

Address 0x7fffe3136d10 is located in stack of thread T0 at offset 48 in frame
    #0 0x559e9797d113 in decodecdb ../../sigtool/sigtool.c:2840

  This frame has 1 object(s):
    [32, 48) 'range' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ../../libclamav/str.c:524 in cli_strtokenize

- 225223 - In cli_egg_deflate_decompress: Reads an uninitialized pointer or
its target (CWE-457). Certain fail cases would call inflateEnd on an
uninitialized stream. Now it’s only called after initialization occurs.

- 225220 - In buildcld: Use of an uninitialized variable (CWE-457). Certain
fail cases would result in oldDir being used before initialization. It now
gets zeroed before the first fail case.

- 225219 - In cli_egg_open: Leak of memory or pointers to system resources
(CWE-404).  If certain realloc’s failed, several structures would not be cleaned up

- 225218 - In cli_scanhwpml: Code block is unreachable because of the syntactic
structure of the code (CWE-561).  With certain macros set, there could be two
consecutive return statements.
2020-06-17 16:02:39 -04:00
Micah Snyder
c110392780 Change permission for new tmp files from RWX to RW 2020-06-03 11:00:53 -04:00
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Jonas Zaddach (jzaddach)
d5a733ef90 XLM (Excel 4.0) macro detection and extraction
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.

This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.

The code is based on Didier Steven's plugin_biff for oletools.py.
2020-04-29 14:19:41 -07:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
5f4f69102d Correcting types from int to cl_error_t where appropriate. Eliminating unused variables and referencing unused parameters to remove warnings. 2019-10-02 16:08:25 -04:00
Micah Snyder
737ec1ef21 Corrections to freshclam logging initialization. Added notation to --help output for --stdout option to indicate that debug messages will not be redirected. Changing direct calls to cli_dbgmsg_internal to use cli_dbgmsg, as cli_dbgmsg_internal always prints, even when --debug is not enabled. 2019-10-02 16:08:23 -04:00
Micah Snyder
cef54eaf8f Freshclam refresh. This update makes libcurl a hard requirement for ClamAV.
New features added to freshclam:
- Update signature definitions over HTTPS.
- Support for HTTP protocol v1.1 (formerly v1.0).
- New libfreshclam library with an all new API and versioning separate from libclamav (v2.0.0). This library is now build and installed alongside libclamav as a hard dependency of freshclam.
- The ability to opt-in and opt-out of standard and optional official ClamAV databases (ExtraDatabase, ExcludeDatabase)
- The option to specify the protocol and port number of official and private mirror servers.
- Support for additional types of proxy servers beyond plain HTTP (SOCKS 4, SOCKS 5).

Features removed from freshclam:
- Mirror management (mirrors.dat) file. This feature is no longer needed as official signature databases are distributed using a paid content delivery network (Cloudflare).

This commit also adds the following features for Windows users:
- The clamsubmit tool.
- The json-c library dependency, which will enable the --gen-json option in clamscan.
- Third party libraries under the win32/3rdparty directory have been removed. Developers will need to build the libraries separately from ClamAV and provide the headers and lib/dll library files the same way they do for OpenSSL. This includes libxml2, pthread-win32, bzip2, zlib, pcre2 as well as new dependencies: curl, json-c. Developers are encouraged to use the build tool Mussels to simplify this task.
2019-10-02 16:08:22 -04:00
Andrew
bc6ea0c30a Fix memleaks in sigtool
Primarily addresses an issue with --test-sigs that causes the
process to run out of memory if testing many sigs against a
given file
2019-10-02 16:08:20 -04:00
Andrew
e8169c7053 Multiple blacklist sigs can now match with allmatch
Also, move the cert-related DCONF cfg checks to more
appropriate locations. One change in behavior:

PE_CONF_CATALOG will disable loading trusted hashes from
.cat files, but won't disable Authenticode hash checking
completely (PE_CONF_CERTS does this).
2019-10-02 16:08:20 -04:00
Andrew
92088f91f1 Add support for cert blacklisting and whitelisting upfront
Instead of checking the Authenticode header as an FP prevention
mechanism, we now check it in the beginning if it exists. Also,
we can now do actual blacklisting with .crb rules (previously, a
blacklist rule just let you override a whitelist rule).
2019-10-02 16:08:20 -04:00
Andrew
14d52d0c63 Use genhash_pe instead of checkfp_pe for section hash computation
cli_checkfp_pe is now effectively the function that just checks
the Authenticode hash.  This makes the code less complicated,
and adds some minor improvements:
 - section hashes are no longer computed if there is no stats
   callback function (at least in that part of the code)
 - We now actually set the len field in the stats_section_t
   structure
 - If an error occurs when computing a section hash, we skip
   that section instead of not computing any hashes
2019-10-02 16:08:20 -04:00
Andrew
ef24839531 Add TODOs in sigtool.c 2019-10-02 16:08:20 -04:00
chips
8a5f206964 Update sigtool.c
fix bug: fd open but no close,it makes handle is occupied
2019-10-02 16:08:19 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Andrew
64ecd1099c Fix support for authenticode signatures from external .cat files
This commit adds back in support for whitelisting files based on
signatures from .cat files loaded in via a '-d' flag to clamscan.
This also makes it so that a .crb blacklist rule match can't be
overruled by a signature in a .cat file
2018-12-02 23:07:06 -05:00
Mickey Sola
17360f03be scan_options - fixing up segfault caused by zeroed out scan_options struct when using sigtool to test signatures 2018-12-02 23:07:03 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder
964a1e7321 Converting http urls to https urls. Primary focus was on clamav.net urls. I updated a couple others and fixes a few broken links as well. There are many (non-clamav.net) urls I didn't address, especially in 3rd party or contrib code. 2018-04-02 07:58:33 -04:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Micah Snyder
e098cdc557 Updating help strings, to include a couple missing items as well as copyrights. updating man page files as well. 2018-02-14 12:08:36 -05:00
Steven Morgan
ed47868b3f bb11823 - command line copyright dates. 2017-07-18 16:43:11 -04:00
Mickey Sola
631cb6a005 Fixes and updates to intermediate container sig rules based on code review 2017-02-01 17:33:15 -05:00
klin
031fe00a4d restructure container typing system to use array (#2) 2017-01-19 12:24:46 -05:00
Steven Morgan
7286695f58 bb17595 (FireAmp) - add sigtool support for building fp-only virus databases. 2016-10-12 18:16:51 -04:00
Kevin Lin
9c30a4fc6e sigtool: patch hybrid cvd generation 2016-08-17 11:31:56 -04:00
Kevin Lin
8d37842072 win32: fixes for sigtool imphash linking 2016-07-13 17:05:43 -04:00
Kevin Lin
634c859458 imphash: code review and clean up 2016-07-13 15:08:30 -04:00
Kevin Lin
832d44e748 sig: convert .ith to .imp; add .imp to sigtool 2016-07-13 15:08:30 -04:00
Kevin Lin
3cc632adc8 sigtool: properly generates and reports pe section hashes (mdb) 2016-07-13 15:08:30 -04:00
Mickey Sola
a86c600350 bb11553 - allowing sigtool to ignore comments in signature files 2016-04-18 10:35:07 -04:00
Steven Morgan
09e0a9a720 Add sigtool support for decoding *.ftm signatures (version 1). 2016-03-30 19:30:08 -04:00
Kevin Lin
5eaf0b320a bb#11003 - fix dconf and option handling for nocert and dumpcert 2016-03-17 11:15:52 -04:00