Add `cl_scanfile_ex()`, `cl_scanmap_ex()`, and `cl_scandesc_ex()`
functions that provide the following additional parameters:
hash_hint (Optional) A NULL terminated string of the file hash so that
libclamav does not need to calculate it.
[out] hash_out (Optional) A NULL terminated string of the file hash.
The caller is responsible for freeing the string.
hash_alg The hashing algorithm used for either `hash_hint` or `hash_out`.
Supported algorithms are "md5", "sha1", "sha2-256".
If not specified, the default is "sha2-256".
file_type_hint (Optional) A NULL terminated string of the file type hint.
E.g. "pe", "elf", "zip", etc.
You may also use ClamAV type names such as "CL_TYPE_PE".
ClamAV will ignore the hint if it is not familiar with the specified type.
See also: https://docs.clamav.net/appendix/FileTypes.html#file-types
file_type_out (Optional) A NULL terminated string of the file type
of the top layer as determined by ClamAV.
Will take the form of the standard ClamAV file type format. E.g. "CL_TYPE_PE".
See also: https://docs.clamav.net/appendix/FileTypes.html#file-types
CLAM-2626
On Windows, the cli_basename function should treat both '/' and '\' as path
separators. Most Windows APIs also accept both.
On Linux/Unix, it makes sense when using a filepath that is more for
informational purposes or where it may have come from a Windows system,
to treat the '\' as a path separator.
But in situations where the the path is needed for some critical action,
like moving or deleting a file, we can't treat it as a path separator.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.
Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.
Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
My recent fix for the issue where a '\' followed by ':' in a Yara regex
string would fail to parse introduced a new issue that broke loading a
signature in the current daily.ldb database.
Unbeknownst to me at the time, you can have multiple PCRE subsignatures
in a logical signature, so long as they're the last subsignatures.
The previous fix made it so the signature parser muddled more than one
PCRE subsignature into one messed up regex string.
This commit essentially reverts the previous fix, while keeping some of
the code readability improvements in that function.
Instead, it addresses the problem a different way. To resolve the
original problem, I'm simply checking if the signame starts with "YARA".
If it does, we don't tokenize it by ':' delimiters.
There is an issue parsing PCRE patterns if the pattern contains a '/' in
the middle, followed by a ':'. When splitting the subsignature (or yara
regex string) by ':' delimiters to identify the offset, it will
inadvertently think that the '/' in the middle of the sig is the end of
the PCRE string and will therefore consider the ':' in the string as
valid delimiter instead of ignoring it for being inside of the regex
string.
The solution I came up with is to ignore all content after a '/' when
tokenizing rather than ignoring content between a matching pair of /'s.
This works for LDB signatures because PCRE subsignatures are always
the last subsignature and because a ':' never comes *after* the PCRE
string.
It works for YARA rules because the `cli_tokenize()` function is only
ever used on the regex strings, never on the whole rule.
Fixes: https://github.com/Cisco-Talos/clamav/issues/594
The logic for parsing a logical subsignature isn't clearly identified
and has been, perhaps mistakenly or out of convenience, used to when
parsing NDB signatures in addition to LDB subsignatures. What this means
is that you can technically use a PCRE subsignature in an NDB file and
clam won't complain about it. It won't work however, because a PCRE
subsignature requires another matching subsignature to trigger it, but
it will parse. The same is likely true for byte-compare subsignatures.
This commit restructures that logic a bit so subsignature parsing has
its own function and is more organized.
I also renamed the functions a little bit and added lots of comments.
I fixed a few minor warnings relating to format string characters.
The change in str.c:cli_ldbtokenize is to prevent a buffer under-read if
you were to use the function on the start of a buffer, as is now down in
this commit.
* Added loglevel parameter to logg()
* Fix logg and mprintf internals with new loglevels
* Update all logg calls to set loglevel
* Update all mprintf calls to set loglevel
* Fix hidden logg calls
* Executed clam-format
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.
This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.
The code is based on Didier Steven's plugin_biff for oletools.py.
New features added to freshclam:
- Update signature definitions over HTTPS.
- Support for HTTP protocol v1.1 (formerly v1.0).
- New libfreshclam library with an all new API and versioning separate from libclamav (v2.0.0). This library is now build and installed alongside libclamav as a hard dependency of freshclam.
- The ability to opt-in and opt-out of standard and optional official ClamAV databases (ExtraDatabase, ExcludeDatabase)
- The option to specify the protocol and port number of official and private mirror servers.
- Support for additional types of proxy servers beyond plain HTTP (SOCKS 4, SOCKS 5).
Features removed from freshclam:
- Mirror management (mirrors.dat) file. This feature is no longer needed as official signature databases are distributed using a paid content delivery network (Cloudflare).
This commit also adds the following features for Windows users:
- The clamsubmit tool.
- The json-c library dependency, which will enable the --gen-json option in clamscan.
- Third party libraries under the win32/3rdparty directory have been removed. Developers will need to build the libraries separately from ClamAV and provide the headers and lib/dll library files the same way they do for OpenSSL. This includes libxml2, pthread-win32, bzip2, zlib, pcre2 as well as new dependencies: curl, json-c. Developers are encouraged to use the build tool Mussels to simplify this task.
The allowed sector size is within 2048 to 2448 (2352 raw + 96 sub).
Right now only the only file system supported is plain iso9660 with
optional Joliet extensions.
Additionally files with multi extents and interleaved files are not
supported.
Finally, due to the multiple possible ways to interpret the content
of a cd/dvd, I cannot guarantee that we scan the "right" files.