Allocations for bytecode signatures to work need not check against the
memory allocation limit, as bytecode signatures are considered trusted
user input.
You may note that I did not remove allocation limits from the bytecode
API functions that may be called by the signatures such as adding json
objects, hashsets, lzma and bz2 decompressors, etc. This is because it
is likely that a bytecode signature may call them more times based on
the structure of the file being scanned - particularly for the json objects.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
Based on changes observed in the commits:
- 08fef61fea
- 1aa8768db2
... I believe that the iptr variable was intended to be used and was
accidentally not used.
In addition, adding the `ULL` suffix in the second location to match the
first, because it appears to have been accidentally omited.
The bytecode source files largely use `int` instead of the appropriate
`cl_errot_t` for clamav status codes, as well for boolean variables.
This hides warnings that would indicate bugs, and makes it harder to
read the code.
I haven't gone as in depth as with some other code cleanups. This
largely just replaces function interfactes and ret variables that use
`int` with `cl_error_t`. I also swapped a couple of `int`s to `bool`s.
While doing so I found that the `cli_bytecode_context_setpdf()` function
was incorrectly placed in the `bytecode_api.c` file instead of the next
to similar functions (`cli_bytecode_context_setpe`, etc.) in bytecode.c.
It's not an API function, so I moved it to the correct location.
I also eliminated a couple of compiler warnings:
- LLVM's CFG.h header emits a warning about a multi-line comment, so
that crops up with using LLVM for the bytecode runtime.
I disabled the warning through CMake.
- C doesn't like using the `inline` keyword on cli_dbgmsg in the
declaration in `bytecode2llvm.c` because we're compiling the bytecode
runtimes as a separate object file from the rest of libclamav.
It doesn't appear to be a functional issue, but I swapped that file
over to use `cli_dbgmsg_no_inline()` instead, just in case.
I would hope link-time-optimization will inline it anyways.
They were not working on the interpreter, interpreter was seeing bounds errors
where there weren't any.
The 2nd parameter is a count, not a pointer size, the pointer's size
is a constant. Override the size for now.
This allows pointers to local stack variables to work correctly after a call.
Previously stackid was not restored, which caused bytecode_vm to stop bytecodes,
claiming it overrun the stack, when in fact it didn't.
(stackid stores stack size).