Commit graph

74 commits

Author SHA1 Message Date
Valerie Snyder
51adfb8b61
ClamScan & libclamav: improve precision of bytes-scanned, bytes-read
The ClamScan scan summary prints bytes scanned and bytes read in
multiples of 4096 (aka `CL_COUNT_PRECISION`), as is provided by the
`cl_scanfile()`, `cl_scandesc()`, `cl_scanfile_callback()`, and
`cl_scandesc_callback()` functions.

I believe this imprecision was the result of using an `unsigned long int`
which may be 64bit or 32bit, depending on platform. I believe the
intention was to be able to support scanning more than 4 GiB of data.

Since the new `cl_scan*_ex()` functions use a `uint64_t`, which
guarantees a 64bit integer and supports ~16,777,216 terabytes, I find no
reason not to report an accurate count.

For the legacy scan functions (above) I've kept the `CL_COUNT_PRECISION`
behavior to maintain backwards compatibility.

I have also improved the bytes scanned/read output to report GiB, MiB,
KiB, or B as appropriate. Previously, it always report "MB".

CLAM-1433
2025-08-14 22:39:15 -04:00
Valerie Snyder
31dcec1e42
libclamav: Add engine option to toggle temp directory recursion
Temp directory recursion in ClamAV is when each layer of a scan gets its
own temp directory in the parent layer's temp directory.

In addition to temp directory recursion, ClamAV has been creating a new
subdirectory for each file scan as a risk-adverse method to ensure
no temporary file leaks fill up the disk.
Creating a directory is relatively slow on Windows in particular if
scanning a lot of very small files.

This commit:

1. Separates the temp directory recursion feature from the leave-temps
   feature so that libclamav can leave temp files without making
   subdirectories for each file scanned.

2. Makes it so that when temp directory recursion is off, libclamav
   will just use the configure temp directory for all files.

The new option to enable temp directory recursion is for libclamav-only
at this time. It is off by default, and you can enable it like this:

```c
cl_engine_set_num(engine, CL_ENGINE_TMPDIR_RECURSION, 1);
```

For the `clamscan` and `clamd` programs, temp directory recursion will
be enabled when `--leave-temps` / `LeaveTemporaryFiles` is enabled.

The difference is that when disabled, it will return to using the
configured temp directory without making a subdirectory for each file
scanned, so as to improve scan performance for small files, mostly on
Windows.

Under the hood, this commit also:

1. Cleans up how we keep track of tmpdirs for each layer.
   The goal here is to align how we keep track of layer-specific stuff
   using the scan_layer structure.

2. Cleans up how we record metadata JSON for embedded files.
   Note: Embedded files being different from Contained files, as they
         are extracted not with a parser, but by finding them with
         file type magic signatures.

CLAM-1583
2025-08-14 22:38:58 -04:00
Val Snyder
7ff29b8c37
Bump copyright dates for 2025 2025-02-14 10:24:30 -05:00
Micah Snyder
3ae9c1e434 Add LHA/LZH archive support
File type magic signatures chosen based on the extensions supported
by Rust delharc crate.

See: https://docs.rs/delharc/latest/delharc/
2024-04-09 10:35:22 -04:00
Micah Snyder
405829ee88 Refine max-allocation and safer-allocation function and macro names
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.

Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.

Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
2024-03-15 13:18:47 -04:00
Micah Snyder
8e04c25fec Rename clamav memory allocation functions
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2

... and these names do not convey their purpose

This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2

The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.

So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2

These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
2024-03-15 13:18:47 -04:00
Micah Snyder
6d6e04ddf8 Optimization: replace limited allocation calls
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.

This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC

I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.

I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.

There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
2024-03-15 13:18:47 -04:00
Micah Snyder
9cb28e51e6 Bump copyright dates for 2024 2024-01-22 11:27:17 -05:00
Micah Snyder
0d3dc86f90 Coverity-514958: Error handling check with getpagesize call
`cli_getpagesize()` may return -1 in an error condition.
If it does, let's just treat it as 4096.

I believe the actual coverity complaint is a false positive, but it's
fair to account for the error case and this should shut it up.
2023-08-16 21:08:01 -07:00
Micah Snyder
6eebecc303 Bump copyright for 2023 2023-02-12 11:20:22 -08:00
Micah Snyder
fcd8902cb2 HWP3, ASN1, blob: Remove all-match checks 2022-10-19 13:13:57 -07:00
Micah Snyder
cd3134568a Code quality: Refactor layer attributes as scan parameter
The current implementation sets a "next layer attributes" flag field
in the scan context. This may introduce bugs if accidentally not cleared
during error handling, causing that attribute to be applied to a
different layer than intended.

This commit resolves that by adding an attribute flag to the major
internal scan functions and removing the "next layer attributes" from
the scan context. This attributes flag shares the same flag fields as
the attributes flag in the new file inspection callback and the flags
are defined in `clamav.h`.
2022-10-13 08:57:44 -07:00
mko-x
a21cc6dcd7
Add explicit log level parameter to application logging API
* Added loglevel parameter to logg()

* Fix logg and mprintf internals with new loglevels

* Update all logg calls to set loglevel

* Update all mprintf calls to set loglevel

* Fix hidden logg calls

* Executed clam-format
2022-02-15 15:13:55 -08:00
micasnyd
140c88aa4e Bump copyright for 2022
Includes minor format corrections.
2022-01-09 14:23:25 -07:00
Micah Snyder
d46832d5cf clamav.net URL update for new docs, github issues
Replace new bugzilla ticket links with links to github issues.
Replace clamav.net/documentation links with docs.clamav.net equivalents.
2021-07-17 15:28:02 -07:00
Micah Snyder (micasnyd)
b9ca6ea103 Update copyright dates for 2021
Also fixes up clang-format.
2021-03-19 15:12:26 -07:00
Micah Snyder
e2f59af30a Clang-format touchup 2020-07-24 16:37:25 -07:00
Andy Ragusa (aragusa)
2049078622 fuzz-22348 null deref in egg utf8 conversion
Corrected memory leaks and a null dereference in the egg utf8 conversion.
2020-07-13 19:31:27 -07:00
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
cef54eaf8f Freshclam refresh. This update makes libcurl a hard requirement for ClamAV.
New features added to freshclam:
- Update signature definitions over HTTPS.
- Support for HTTP protocol v1.1 (formerly v1.0).
- New libfreshclam library with an all new API and versioning separate from libclamav (v2.0.0). This library is now build and installed alongside libclamav as a hard dependency of freshclam.
- The ability to opt-in and opt-out of standard and optional official ClamAV databases (ExtraDatabase, ExcludeDatabase)
- The option to specify the protocol and port number of official and private mirror servers.
- Support for additional types of proxy servers beyond plain HTTP (SOCKS 4, SOCKS 5).

Features removed from freshclam:
- Mirror management (mirrors.dat) file. This feature is no longer needed as official signature databases are distributed using a paid content delivery network (Cloudflare).

This commit also adds the following features for Windows users:
- The clamsubmit tool.
- The json-c library dependency, which will enable the --gen-json option in clamscan.
- Third party libraries under the win32/3rdparty directory have been removed. Developers will need to build the libraries separately from ClamAV and provide the headers and lib/dll library files the same way they do for OpenSSL. This includes libxml2, pthread-win32, bzip2, zlib, pcre2 as well as new dependencies: curl, json-c. Developers are encouraged to use the build tool Mussels to simplify this task.
2019-10-02 16:08:22 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder
964a1e7321 Converting http urls to https urls. Primary focus was on clamav.net urls. I updated a couple others and fixes a few broken links as well. There are many (non-clamav.net) urls I didn't address, especially in 3rd party or contrib code. 2018-04-02 07:58:33 -04:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Steven Morgan
7a307529d8 bb11580 - make cli_matchmeta() respect allmatch. 2016-06-08 16:25:34 -04:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00
Shawn Webb
b2e7c931d0 Use OpenSSL for hashing. 2014-02-08 00:31:12 -05:00
Steve Morgan
b81cbc263c some corrections and refinements identified during 0.97 retrofit 2012-10-25 12:36:05 -07:00
Shawn webb
a2a004df25 BB#3737 - Value too large for specified data type
Create compile-time preprocessor defines for switching from calling
stat() to stat64(). Add --enable-stat64 switch in configure script.
2012-07-16 15:36:49 -04:00
Tomasz Kojm
53d41b9793 libclamav/blob.c: properly scan files when LeaveTemporaryFiles is enabled (bb#2447) 2010-12-28 13:05:00 +01:00
Tomasz Kojm
bb1e844cc2 fix some warnings 2010-01-27 16:06:12 +01:00
Tomasz Kojm
2ecbd98a5e cdb: handle mail files 2010-01-15 16:24:16 +01:00
Tomasz Kojm
55094a9c76 libclamav: base code for unified container metadata matcher (bb#1579) 2010-01-07 18:26:12 +01:00
aCaB
58481352d5 win32 paths handling 2009-09-24 19:07:39 +02:00
aCaB
081f64735d win32#2 2009-09-24 16:24:07 +02:00
aCaB
be4bf7f4ab win32 2009-09-24 16:08:52 +02:00
aCaB
cb680655f1 unify mail-container scans 2009-08-30 23:57:20 +02:00
aCaB
86d59b249e fix portability issues for fseeko, sysconf(_SC_PAGESIZE), getpagesize() (bb#1658) 2009-07-16 14:21:25 +02:00
Tomasz Kojm
e06afe8e8e libclamav: fix handling of signature offsets in cli_scanbuff() (bb#1546)
git-svn: trunk@5026
2009-04-06 20:01:09 +00:00
aCaB
f2d79ab352 bb#1456
git-svn: trunk@4925
2009-03-11 18:04:01 +00:00
Tomasz Kojm
0138619577 libclamav/matcher.c: cli_scanbuff: add support for external acdata
git-svn: trunk@4781
2009-02-13 12:42:35 +00:00
Tomasz Kojm
33068e0973 libclamav: drop cl_settempdir(); use cl_engine_set() with CL_ENGINE_TMPDIR and CL_ENGINE_KEEPTMP instead
git-svn: trunk@4416
2008-11-14 22:23:39 +00:00
Török Edvin
6a21552ef2 have configure define NDEBUG unless we use --enable-debug, instead of having
to #ifndef CL_DEBUG #define NDEBUG #endif in each .c file that uses assert.
If you want assertions enabled you'll need to use --enable-debug to configure,
as until now, no change there.

git-svn: trunk@4343
2008-11-06 14:27:18 +00:00
Tomasz Kojm
6670d61d4b drop support for Cygwin (due to broken ClamAV builds)
git-svn: trunk@4143
2008-08-25 21:59:33 +00:00