Commit graph

62 commits

Author SHA1 Message Date
Val Snyder
7ff29b8c37
Bump copyright dates for 2025 2025-02-14 10:24:30 -05:00
Micah Snyder
8915bd2257
Fix possible out of bounds read in PDF parser
The `find_length()` function in the PDF parser incorrectly assumes that
objects found are located in the main PDF file map, and fails to take
into account whether the objects were in fact found in extracted PDF
object streams. The resulting pointer is then invalid and may be an out
of bounds read.

This issue was found by OSS-Fuzz.

This fix checks if the object is from an object stream, and then
calculates the pointer based on the start of the object stream instead
of based on the start of the PDF.

I've also added extra checks to verify the calculated pointer and object
size are within the stream (or PDF file map). I'm not entirely sure this
is necessary, but better safe than sorry.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=69617
2024-09-04 13:12:50 -04:00
Micah Snyder
1e5ddefcee Clang-format touchup 2024-03-15 13:18:47 -04:00
Micah Snyder
8e04c25fec Rename clamav memory allocation functions
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2

... and these names do not convey their purpose

This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2

The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.

So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2

These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
2024-03-15 13:18:47 -04:00
Micah Snyder
6d6e04ddf8 Optimization: replace limited allocation calls
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.

This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC

I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.

I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.

There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
2024-03-15 13:18:47 -04:00
Micah Snyder
82491dabaa PDF: Fix 1-byte overread
An overread may occur if attempting to decrypt an empty string.
Issue introduced during 1.3 development.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=66281
2024-02-02 13:43:43 -05:00
Micah Snyder
35f277c8cb PDF: Add support for checking empty owner password
Specifically for algorithm 6 (/R 6).

Use the O and OE strings to test if an empty owner password will decrypt the file.
2024-01-22 11:29:52 -05:00
Micah Snyder
d114e3fc66 PDF: Fix PDF metadata decryption issues
The encrypted metadata may be stored in a <> block containing hex bytes.

Strip off the <> and decode the hex to binary.
2024-01-22 11:29:52 -05:00
Micah Snyder
9cb28e51e6 Bump copyright dates for 2024 2024-01-22 11:27:17 -05:00
Micah Snyder
6eebecc303 Bump copyright for 2023 2023-02-12 11:20:22 -08:00
Micah Snyder
e9f7fe2a80 Add PDF object extraction recursion limits
Adds in object extraction recursion limits for object extraction as well
as well parsing string, array, and dictionaries during extraction.
The limit is set to 25.

Places recursion-depth variable in pdf parse context structure.
2022-10-21 15:51:54 -07:00
Micah Snyder
eadee86ec2 PDF: Remove all-match checks 2022-10-19 13:13:57 -07:00
mko-x
a21cc6dcd7
Add explicit log level parameter to application logging API
* Added loglevel parameter to logg()

* Fix logg and mprintf internals with new loglevels

* Update all logg calls to set loglevel

* Update all mprintf calls to set loglevel

* Fix hidden logg calls

* Executed clam-format
2022-02-15 15:13:55 -08:00
micasnyd
140c88aa4e Bump copyright for 2022
Includes minor format corrections.
2022-01-09 14:23:25 -07:00
Micah Snyder
afbf0b6180 Fix Windows text file EOL conversion issues
On Windows, files open()'ed without the O_BINARY flag will have new-line
LF (aka \n) converted to CRLF (aka \r\n) automatically when read from or
written to. This is undesirable for all scan targets AND temp files
because it affects pattern matching and with hashing.

This commit converts a handful of instances throughout the codebase
where it appears that O_BINARY was mistakenly omitted and could result
in unexpected behavior on Windows.

Git on Windows also converts LF -> CRLF for "text" files, for editing
purposes.
This is problematic for scan files and test files that should match
verbatim.
We can prevent this issue by marking .ref test files as "binary" in the
.gitattributes file and by always opening scan files and temp files as
binary.

In this commit I've also removed the `ChangeLog merge=cl-merge` line
that was once used to reduce ChangeLog merge conflicts by using the
gnulib git-merge-changlog tool. This project now categorizes changes in
the NEWS.md.
For finer detail, git commit history is fully accessible on github.com.
2021-02-25 11:41:28 -08:00
Andy Ragusa (aragusa)
3c556dc1a2 Fixed pdf timeout.
The function to parse the object dictionaries fails if the dictionary
ends during a comment string.  Set the end in that condition.
2020-07-30 10:41:33 -07:00
Jonas Zaddach (jzaddach)
d5a733ef90 XLM (Excel 4.0) macro detection and extraction
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.

This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.

The code is based on Didier Steven's plugin_biff for oletools.py.
2020-04-29 14:19:41 -07:00
Micah Snyder
97a0647e88 Additional variable type changes for correctness and to silence warnings. A handful of other minor changes to silence warnings. Corrected a number of function definitions so they return cl_error_t rather than int. 2019-10-02 16:08:25 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder (micasnyd)
89d5207b31 Added new pdf object stream parsing capability. 2018-12-02 23:06:58 -05:00
Mickey Sola
992de2e2c0 bb12031 - 0.100.1 - resolving pdf parsing DoS; patch by aCaB 2018-07-30 09:14:36 -04:00
Micah Snyder
3690396877 bb11980 & bb12008 (again). Change to pdf_parse_string after evaluating function usage and the pdf format. 2018-04-04 13:25:34 -04:00
Micah Snyder
8dc0817ee9 bb12008: correction to object length check when parsing strings in pdf dictionaries. 2018-04-04 13:25:28 -04:00
Mickey Sola
99eadf7a9a 0.100.x - bb11980 - fixing length check exit condition for pdf string parsing 2018-03-01 16:55:50 -05:00
Mickey Sola
495fce9174 0.100.x - removing checklen constraint based on key string size when parsing pdf string objects 2018-03-01 16:55:49 -05:00
Mickey Sola
0df2fedf28 0.100.x - tightening up off-by-one error in pdf string parsing. 2018-03-01 16:55:49 -05:00
Mickey Sola
700ed96af5 [PATCH] 0.99.x - bb11980 - fixing oob read in pdf parsing 2018-03-01 16:55:49 -05:00
Mickey Sola
87aaa10b29 [PATCH] 0.100.x - bb11973 - fixing pdf oob read - suggested solution by Suleman Ali 2018-03-01 16:55:49 -05:00
Micah Snyder
e09d884341 eliminated a large number of warnings, many of which had to do with mixing types. i switched some types to size_t and a couple to ptrdiff_t to make things more consistent, but there is a huge amount of work to be done to make types consistent. int, unsigned int, unsigned, off_t, and other types are ill-suited to storing buffer lengths or memory addresses. 2017-08-16 17:31:45 -04:00
Steven Morgan
b2e3350bc2 bb11803 - Fix pdf out of bound reference. 2017-03-16 15:06:09 -04:00
Steven Morgan
22cb38ed24 pull request #53(2/4): Spelling fix by klemens(ka7). 2016-10-19 15:57:45 -04:00
Kevin Lin
1f2949190f pdf: fix octal code conversion resolution 2015-08-31 14:58:13 -04:00
Kevin Lin
f4cbcd53a1 cid 12123 - fix error state in allocating memory in pdf finalize 2015-08-19 11:14:52 -04:00
Kevin Lin
76c9c9ddb5 cid 12135/12134 - fix error state for allocating memory in pdf parsing 2015-08-19 11:14:52 -04:00
Kevin Lin
218e1626ee bb#11372 - finalize encrypted hex strings correctly 2015-08-14 12:22:49 -04:00
Steven Morgan
c00baa37c6 Beef up iconv_open error messages to show the source encoding and strerror. 2015-06-17 11:59:35 -04:00
Kevin Lin
9d33052fe7 pdf: correctly handle encryption objects to decrypt 2015-04-01 17:41:59 -04:00
Kevin Lin
3a925de01f fixed major issue in UTF conversion for pdf preclass 2015-03-27 13:20:03 -04:00
Kevin Lin
ddc1421955 pdf: clang and general compiler fixes 2015-03-20 17:36:33 -04:00
Kevin Lin
7bbb67ea84 pdfng: fixed small memory leak 2015-03-20 16:44:41 -04:00
Kevin Lin
24db616f5b pdf: base64 encode strings that fail to finalize 2015-03-20 16:36:41 -04:00
Kevin Lin
00daf527e6 pdf: removed debugging messages 2015-03-20 15:49:05 -04:00
Kevin Lin
d7effb639a pdf: decryption does not NULL terminate 2015-03-20 15:42:56 -04:00
Kevin Lin
e2b1880fa6 pdf: string decryption and code clean-up
pdfng: fixed a bug in escape sequence handling
2015-03-20 15:38:47 -04:00
Kevin Lin
ba1f4d0186 removed excessive debugging in escape conversion 2015-03-19 14:59:18 -04:00
Kevin Lin
ab9611d4c1 fixed converity issue ID 12109
buffer was not freed on rare error case
2015-03-16 13:11:56 -04:00
Kevin Lin
e098bf4bd9 Revert "pdf strings are now base64 encoded if utf conversion fails"
This reverts commit 6c3cc09415.
2015-03-02 19:05:09 -05:00
Kevin Lin
0a185b8253 pdf string UTF-16 conversion no longer solely depends on ICONV
reason: no ICONV meant no conversion even though conversion function existed
2015-03-02 18:55:02 -05:00
Kevin Lin
6c3cc09415 pdf strings are now base64 encoded if utf conversion fails
reason: non-converted string aren't printable (invalid UTF-8)
2015-03-02 14:40:59 -05:00
Kevin Lin
571d834910 bb#11238 - added missing PDF preclass operations
> added whitespace fix for indirect references strings
> added PDF escape sequence handling (including octal)
2015-01-12 13:45:36 -08:00