2007-12-13 19:47:07 +00:00
|
|
|
/*
|
2025-02-14 10:24:30 -05:00
|
|
|
* Copyright (C) 2013-2025 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
|
2019-01-25 10:15:50 -05:00
|
|
|
* Copyright (C) 2007-2013 Sourcefire, Inc.
|
2008-04-02 15:24:51 +00:00
|
|
|
*
|
|
|
|
* Authors: Alberto Wu
|
2007-12-13 19:47:07 +00:00
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License version 2 as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write to the Free Software
|
|
|
|
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
|
|
|
|
* MA 02110-1301, USA.
|
|
|
|
*/
|
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
/*
|
|
|
|
* General Structure for PKZIP files:
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Local file header 1 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | File data 1 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Data descriptor 1 (optional) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Local file header 2 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | File data 2 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Data descriptor 2 (optional) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | ... |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Local file header N |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | File data N |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Data descriptor N (optional) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Archive Decryption Header (optional, v6.2+) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Archive Extra Data Record (optional, v6.2+) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Central directory |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
*
|
|
|
|
* This and additional diagrams from courtesy of:
|
|
|
|
* https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html
|
|
|
|
*
|
|
|
|
* See also:
|
|
|
|
* https://www.pkware.com/documents/casestudies/APPNOTE.TXT
|
|
|
|
*
|
|
|
|
* And see also: unzip.h
|
|
|
|
*
|
|
|
|
* Note the diagrams and current implemementation do not implement all features.
|
|
|
|
*/
|
|
|
|
|
2007-12-13 19:47:07 +00:00
|
|
|
#ifndef __UNZIP_H
|
|
|
|
#define __UNZIP_H
|
|
|
|
|
|
|
|
#if HAVE_CONFIG_H
|
|
|
|
#include "clamav-config.h"
|
|
|
|
#endif
|
|
|
|
|
2014-10-14 17:15:18 -04:00
|
|
|
#include "others.h"
|
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
/**
|
|
|
|
* @brief Callback function type for handling extracted files.
|
|
|
|
*
|
|
|
|
* The `unzip_single_internal` function lets you specify a callback function to handle the extracted file.
|
|
|
|
* Other functions like `cli_unzip` and `cli_unzip_single` use `zip_scan_cb` as the default callback and
|
|
|
|
* thus just scan the file.
|
|
|
|
*
|
|
|
|
* Note: The callback function must match the signature of `cli_magic_scan_desc`.
|
|
|
|
*/
|
2022-03-09 22:26:40 -08:00
|
|
|
typedef cl_error_t (*zip_cb)(int fd, const char *filepath, cli_ctx *ctx, const char *name, uint32_t attributes);
|
2020-03-21 14:15:28 -04:00
|
|
|
#define zip_scan_cb cli_magic_scan_desc
|
2014-05-01 16:49:17 -04:00
|
|
|
|
2014-10-14 17:15:18 -04:00
|
|
|
#define MAX_ZIP_REQUESTS 10
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @brief Structure to hold zip file search requests.
|
|
|
|
*
|
|
|
|
* This structure is used to hold multiple file names that we want to search for in a zip archive.
|
|
|
|
* It is used by the `unzip_search` function.
|
|
|
|
*/
|
2014-10-14 17:15:18 -04:00
|
|
|
struct zip_requests {
|
|
|
|
const char *names[MAX_ZIP_REQUESTS];
|
|
|
|
size_t namelens[MAX_ZIP_REQUESTS];
|
|
|
|
int namecnt;
|
|
|
|
|
|
|
|
uint32_t loff;
|
|
|
|
int found, match;
|
|
|
|
};
|
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
/**
|
|
|
|
* @brief Unzip a zip file.
|
|
|
|
*
|
|
|
|
* Scan each extracted file.
|
|
|
|
*
|
|
|
|
* @param ctx The scan context containing the file map and other scan parameters.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS on success, or an error code on failure.
|
|
|
|
*/
|
|
|
|
cl_error_t cli_unzip(cli_ctx *ctx);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @brief Unzip a single file from a zip archive.
|
|
|
|
*
|
|
|
|
* Scan the file after extracting it.
|
|
|
|
*
|
|
|
|
* @param ctx The scan context containing the file map and other scan parameters.
|
|
|
|
* @param local_header_offset The offset of the local file header in the zip archive.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS on success, or an error code on failure.
|
|
|
|
*/
|
2025-08-14 21:17:46 -04:00
|
|
|
cl_error_t cli_unzip_single(cli_ctx *ctx, size_t local_header_offset);
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
Reduce unnecessary scanning of embedded file FPs (#1571)
When embedded file type recognition finds a possible embedded file, it
is being scanned as a new embedded file even if it turns out it was a
false positive and parsing fails. My solution is to pre-parse the file
headers as little possible to determine if it is valid. If possible,
also determine the file size based on the headers. That will make it so
we don't have to scan additional data when the embedded file is not at
the very end.
This commit adds header checks prior to embedded ZIP, ARJ, and CAB
scanning. For these types I was also able to use the header checks to
determine the object size so as to prevent excessive pattern matching.
TODO: Add the same for RAR, EGG, 7Z, NULSFT, AUTOIT, IShield, and PDF.
This commit also removes duplicate matching for embedded MSEXE.
The embedded MSEXE detection and scanning logic was accidentally
creating an extra duplicate layer in between scanning and detection
because of the logic within the `cli_scanembpe()` function.
That function was effectively doing the header check which this commit
adds for ZIP, ARJ, and CAB but minus the size check.
Note: It is unfortunately not possible to get an accurage size from PE
file headers.
The `cli_scanembpe()` function also used to dump to a temp file for no
reason since FMAPs were extended to support windows into other FMAPs.
So this commit removes the intermediate layer as well as dropping a temp
file for each embedded PE file.
Further, this commit adds configuration and DCONF safeguards around all
embedded file type scanning.
Finally, this commit adds a set of tests to validate proper extraction
of embedded ZIP, ARJ, CAB, and MSEXE files.
CLAM-2862
Co-authored-by: TheRaynMan <draynor@sourcefire.com>
2025-09-23 15:57:28 -04:00
|
|
|
/**
|
|
|
|
* @brief Verify a single local file header.
|
|
|
|
*
|
|
|
|
* Does not extract or scan the file.
|
|
|
|
*
|
|
|
|
* @param[in,out] ctx Scan context
|
|
|
|
* @param offset Offset of the local file header
|
|
|
|
* @param[out] size Will be set to the size of the file header + file data.
|
|
|
|
* @return cl_error_t CL_SUCCESS on success, or an error code on failure.
|
|
|
|
*/
|
|
|
|
cl_error_t cli_unzip_single_header_check(cli_ctx *ctx, uint32_t offset, size_t *size);
|
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
/**
|
|
|
|
* @brief Unzip a single file from a zip archive.
|
|
|
|
*
|
|
|
|
* Different from `cli_unzip_single`, this function allows for a custom callback to be used after extraction.
|
|
|
|
* In other words, it can be used to extract a file without scanning it immediately.
|
|
|
|
* This is useful for cases where you want to handle the file differently.
|
|
|
|
*
|
|
|
|
* @param ctx The scan context containing the file map and other scan parameters.
|
|
|
|
* @param local_header_offset The offset of the local file header in the zip archive.
|
|
|
|
* @param zcb The callback function to invoke after extraction. See `zip_scan_cb`.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS on success, or an error code on failure.
|
|
|
|
*/
|
2025-08-14 21:17:46 -04:00
|
|
|
cl_error_t unzip_single_internal(cli_ctx *ctx, size_t local_header_offset, zip_cb zcb);
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @brief Add a file a bundle of files to search for in a zip archive.
|
|
|
|
*
|
|
|
|
* @param requests The `zip_requests` structure to modify.
|
|
|
|
* @param name The name of the file to add.
|
|
|
|
* @param nlen The length of the file name.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS on success, or an error code on failure.
|
|
|
|
*/
|
|
|
|
cl_error_t unzip_search_add(struct zip_requests *requests, const char *name, size_t nlen);
|
2014-10-14 17:15:18 -04:00
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
/**
|
|
|
|
* @brief Search for files in a zip archive.
|
|
|
|
*
|
|
|
|
* This function searches for one or more files in a zip archive and scans them.
|
|
|
|
*
|
|
|
|
* Disclaimer: As compared with `cli_unzip`, this function depends on the central directory header.
|
|
|
|
* It will not work correctly if the zip archive does not have a central directory header
|
|
|
|
* or the file you're looking for is not listed in the central directory.
|
|
|
|
*
|
|
|
|
* @param ctx The scan context containing the file map and other scan parameters.
|
|
|
|
* @param requests The `zip_requests` structure containing the files to search for.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS if nothing was found.
|
|
|
|
* Returns CL_VIRUS if a match was found.
|
|
|
|
* Returns a CL_E* error code on failure.
|
|
|
|
*/
|
2025-08-14 21:17:46 -04:00
|
|
|
cl_error_t unzip_search(cli_ctx *ctx, struct zip_requests *requests);
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @brief Search for a single file in a zip archive.
|
|
|
|
*
|
|
|
|
* This function searches for a single file in a zip archive.
|
|
|
|
*
|
|
|
|
* Disclaimer: As compared with `cli_unzip`, this function depends on the central directory header.
|
|
|
|
* It will not work correctly if the zip archive does not have a central directory header
|
|
|
|
* or the file you're looking for is not listed in the central directory.
|
|
|
|
*
|
|
|
|
* @param ctx The scan context containing the file map and other scan parameters.
|
|
|
|
* @param name The name of the file to search for.
|
|
|
|
* @param nlen The length of the file name.
|
|
|
|
* @param loff The offset of the file in the zip archive.
|
|
|
|
* @return cl_error_t Returns CL_SUCCESS if nothing was found.
|
|
|
|
* Returns CL_VIRUS if a match was found.
|
|
|
|
* Returns a CL_E* error code on failure.
|
|
|
|
*/
|
|
|
|
cl_error_t unzip_search_single(cli_ctx *ctx, const char *name, size_t nlen, uint32_t *loff);
|
2014-10-14 17:15:18 -04:00
|
|
|
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format off
|
2007-12-13 19:47:07 +00:00
|
|
|
#ifdef UNZIP_PRIVATE
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
2007-12-13 19:47:07 +00:00
|
|
|
#define F_ENCR (1<<0)
|
|
|
|
#define F_ALGO1 (1<<1)
|
|
|
|
#define F_ALGO2 (1<<2)
|
|
|
|
#define F_USEDD (1<<3)
|
|
|
|
#define F_RSVD1 (1<<4)
|
|
|
|
#define F_PATCH (1<<5)
|
|
|
|
#define F_STRNG (1<<6)
|
|
|
|
#define F_UNUS1 (1<<7)
|
|
|
|
#define F_UNUS2 (1<<8)
|
|
|
|
#define F_UNUS3 (1<<9)
|
|
|
|
#define F_UNUS4 (1<<10)
|
|
|
|
#define F_UTF8 (1<<11)
|
|
|
|
#define F_RSVD2 (1<<12)
|
|
|
|
#define F_MSKED (1<<13)
|
|
|
|
#define F_RSVD3 (1<<14)
|
|
|
|
#define F_RSVD4 (1<<15)
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format on
|
2007-12-13 19:47:07 +00:00
|
|
|
|
|
|
|
enum ALGO {
|
2018-12-03 12:40:13 -05:00
|
|
|
ALG_STORED,
|
|
|
|
ALG_SHRUNK,
|
|
|
|
ALG_REDUCE1,
|
|
|
|
ALG_REDUCE2,
|
|
|
|
ALG_REDUCE3,
|
|
|
|
ALG_REDUCE4,
|
|
|
|
ALG_IMPLODE,
|
|
|
|
ALG_TOKENZD,
|
|
|
|
ALG_DEFLATE,
|
|
|
|
ALG_DEFLATE64,
|
|
|
|
ALG_OLDTERSE,
|
|
|
|
ALG_RSVD1,
|
|
|
|
ALG_BZIP2,
|
|
|
|
ALG_RSVD2,
|
|
|
|
ALG_LZMA,
|
|
|
|
ALG_RSVD3,
|
|
|
|
ALG_RSVD4,
|
|
|
|
ALG_RSVD5,
|
|
|
|
ALG_NEWTERSE,
|
|
|
|
ALG_LZ77,
|
|
|
|
ALG_WAVPACK = 97,
|
|
|
|
ALG_PPMD
|
2007-12-13 19:47:07 +00:00
|
|
|
};
|
|
|
|
|
2019-07-11 20:20:48 -04:00
|
|
|
/*
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
* Local File Header Structure:
|
|
|
|
*
|
|
|
|
* 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf
|
|
|
|
* +---------------+-------+-------+-------+-------+-------+-------+
|
|
|
|
* | P K 0x03 0x04 | Vers | Flags | Compr |Mod Tm |Mod Dt | CRC 32|
|
|
|
|
* +-------+-------+-------+-------+-------+-------+-------+-------+
|
|
|
|
* | CRC 32| Compr Size | Uncompr Size |FName L|Extra L| |
|
|
|
|
* +-------+---------------+---------------+-------+-------+ +
|
|
|
|
* | File Name (variable) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Extra field (variable) |
|
|
|
|
* +---------------------------------------------------------------+
|
2019-07-11 20:20:48 -04:00
|
|
|
*/
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
|
|
|
// struct LH {
|
|
|
|
// uint32_t magic;
|
|
|
|
// uint16_t version;
|
|
|
|
// uint16_t flags;
|
|
|
|
// uint16_t method;
|
|
|
|
// uint32_t mtime;
|
|
|
|
// uint32_t crc32;
|
|
|
|
// uint32_t csize;
|
|
|
|
// uint32_t usize;
|
|
|
|
// uint16_t flen;
|
|
|
|
// uint16_t elen;
|
|
|
|
// char fname[flen]
|
|
|
|
// char extra[elen]
|
|
|
|
// } __attribute__((packed));
|
2007-12-13 19:47:07 +00:00
|
|
|
|
2019-07-11 20:20:48 -04:00
|
|
|
/*
|
|
|
|
* Local File Header convenience macros:
|
|
|
|
*/
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format off
|
2019-07-11 20:20:48 -04:00
|
|
|
#define LOCAL_HEADER_magic ((uint32_t)cli_readint32((uint8_t *)(local_header)+0))
|
|
|
|
#define LOCAL_HEADER_version ((uint16_t)cli_readint16((uint8_t *)(local_header)+4))
|
|
|
|
#define LOCAL_HEADER_flags ((uint16_t)cli_readint16((uint8_t *)(local_header)+6))
|
|
|
|
#define LOCAL_HEADER_method ((uint16_t)cli_readint16((uint8_t *)(local_header)+8))
|
|
|
|
#define LOCAL_HEADER_mtime ((uint32_t)cli_readint32((uint8_t *)(local_header)+10))
|
|
|
|
#define LOCAL_HEADER_crc32 ((uint32_t)cli_readint32((uint8_t *)(local_header)+14))
|
|
|
|
#define LOCAL_HEADER_csize ((uint32_t)cli_readint32((uint8_t *)(local_header)+18))
|
|
|
|
#define LOCAL_HEADER_usize ((uint32_t)cli_readint32((uint8_t *)(local_header)+22))
|
|
|
|
#define LOCAL_HEADER_flen ((uint16_t)cli_readint16((uint8_t *)(local_header)+26))
|
|
|
|
#define LOCAL_HEADER_elen ((uint16_t)cli_readint16((uint8_t *)(local_header)+28))
|
|
|
|
#define SIZEOF_LOCAL_HEADER 30
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format on
|
2007-12-13 19:47:07 +00:00
|
|
|
|
2019-07-11 20:20:48 -04:00
|
|
|
/*
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
* Central Directory Structure:
|
|
|
|
*
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Central directory file header 1 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Central directory file header 2 |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | ... |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Central directory file header N |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Digital Signature |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Data descriptor N (optional) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Zip64 end of central directory Record |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Zip64 end of central directory locator |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | End of central directory record |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
*
|
|
|
|
* Central Directory File Header structure:
|
|
|
|
*
|
|
|
|
* 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf
|
|
|
|
* +---------------+-------+-------+-------+-------+-------+-------+
|
|
|
|
* | P K 0x01 0x02 | Vers |Vers Nd| Flags | Compr |Mod Tm |Mod Dt |
|
|
|
|
* +---------------+-------+-------+-------+-------+-------+-------+
|
|
|
|
* | CRC 32 | Compr Size | Uncompr Size |FName L|Extra L|
|
|
|
|
* +---------------+-------+-------+-------+-------+-------+-------+
|
|
|
|
* |F Com L|D #strt|IntAttr|Ext Attributes |Offset L Header| |
|
|
|
|
* +---------------+-----------------------+---------------+ +
|
|
|
|
* | File Name (variable) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | Extra field (variable) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
* | File comment (variable) |
|
|
|
|
* +---------------------------------------------------------------+
|
|
|
|
*
|
|
|
|
* End of central directory record structure:
|
|
|
|
*
|
|
|
|
* 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf
|
|
|
|
* +---------------+-------+-------+-------+-------+---------------+
|
|
|
|
* | P K 0x05 0x06 |Disk # |Dsk#cd |DskEnts|T.Entrs|Central Dir Sz |
|
|
|
|
* +---------------+-------+-------+-------+-------+---------------+
|
|
|
|
* | Offset of CD |CommLen| ZIP file comment (variable) |
|
|
|
|
* +---------------+-------+-------+-------+-------+-------+-------+
|
2019-07-11 20:20:48 -04:00
|
|
|
*/
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
|
|
|
|
// struct CH {
|
|
|
|
// uint32_t magic;
|
|
|
|
// uint16_t vermade;
|
|
|
|
// uint16_t verneed;
|
|
|
|
// uint16_t flags;
|
|
|
|
// uint16_t method;
|
|
|
|
// uint32_t mtime;
|
|
|
|
// uint32_t crc32;
|
|
|
|
// uint32_t csize;
|
|
|
|
// uint32_t usize;
|
|
|
|
// uint16_t flen;
|
|
|
|
// uint16_t elen;
|
|
|
|
// uint16_t clen;
|
|
|
|
// uint16_t dsk;
|
|
|
|
// uint16_t iattrib;
|
|
|
|
// uint32_t eattrib;
|
|
|
|
// uint32_t off;
|
|
|
|
// char fname[flen]
|
|
|
|
// char extra[elen]
|
|
|
|
// char comment[clen]
|
|
|
|
// } __attribute__((packed));
|
2007-12-13 19:47:07 +00:00
|
|
|
|
2019-07-11 20:20:48 -04:00
|
|
|
/*
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
* Central Directory File Header convenience macro's.
|
2019-07-11 20:20:48 -04:00
|
|
|
*/
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format off
|
2019-07-11 20:20:48 -04:00
|
|
|
#define CENTRAL_HEADER_magic ((uint32_t)cli_readint32((uint8_t *)(central_header)+0))
|
|
|
|
#define CENTRAL_HEADER_vermade ((uint16_t)cli_readint16((uint8_t *)(central_header)+4))
|
|
|
|
#define CENTRAL_HEADER_verneed ((uint16_t)cli_readint16((uint8_t *)(central_header)+6))
|
|
|
|
#define CENTRAL_HEADER_flags ((uint16_t)cli_readint16((uint8_t *)(central_header)+8))
|
|
|
|
#define CENTRAL_HEADER_method ((uint16_t)cli_readint16((uint8_t *)(central_header)+10))
|
|
|
|
#define CENTRAL_HEADER_mtime ((uint32_t)cli_readint32((uint8_t *)(central_header)+12))
|
|
|
|
#define CENTRAL_HEADER_crc32 ((uint32_t)cli_readint32((uint8_t *)(central_header)+16))
|
|
|
|
#define CENTRAL_HEADER_csize ((uint32_t)cli_readint32((uint8_t *)(central_header)+20))
|
|
|
|
#define CENTRAL_HEADER_usize ((uint32_t)cli_readint32((uint8_t *)(central_header)+24))
|
|
|
|
#define CENTRAL_HEADER_flen ((uint16_t)cli_readint16((uint8_t *)(central_header)+28))
|
|
|
|
#define CENTRAL_HEADER_extra_len ((uint16_t)cli_readint16((uint8_t *)(central_header)+30))
|
|
|
|
#define CENTRAL_HEADER_comment_len ((uint16_t)cli_readint16((uint8_t *)(central_header)+32))
|
|
|
|
#define CENTRAL_HEADER_disk_num ((uint16_t)cli_readint16((uint8_t *)(central_header)+34))
|
|
|
|
#define CENTRAL_HEADER_iattrib ((uint16_t)cli_readint16((uint8_t *)(central_header)+36))
|
|
|
|
#define CENTRAL_HEADER_eattrib ((uint32_t)cli_readint32((uint8_t *)(central_header)+38))
|
|
|
|
#define CENTRAL_HEADER_off ((uint32_t)cli_readint32((uint8_t *)(central_header)+42))
|
2018-12-03 12:37:58 -05:00
|
|
|
// clang-format on
|
2015-07-14 17:25:01 -04:00
|
|
|
|
ZIP: Fix infinite loop + significant code cleanup
An infinite loop may occur when scanning some malformed ZIP files.
I introduced this issue in 96c00b6d80a4cb16cb2d39111614733e4a62221d
with this line:
```c
// decrement coff by 1 to account for the increment at the end of the loop
coff -= 1;
```
The problem is that the function may return 0, which should
indicate that there are no more files. The result was that
`coff` would stay the same and the loop would repeat.
This issue is in 1.5 development and affects the 1.5.0 beta but
does not affect any production versions.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1534
Special thanks to Sophie0x2E for an initial fix, proposed in
https://github.com/Cisco-Talos/clamav/pull/1539
In review, I was uncomfortable with other existing code and
decided to to a more significant overhaul of the error handling
in the ZIP module.
In addition to cleanup, this commit has some functional changes:
- When parsing a central directory file header inside of
`parse_central_directory_file_header()`, it will now fail out if the
"extra length" or "comment length" fields would exceced the length of
the archive. That doesn't mean the associated local file header won't
be parsed later, but it won't use the central directory file header
to find it. Instead, the ZIP module will have to find the local file
header by searching for extra records not listed in the central directory.
This change was mostly to tidy up complex error handling.
- Add two FTM new signatures to identify split ZIP archives.
This signature identifies the first segment (first file) in a split or
spanned ZIP archive. It may also be found on a single-segment "split"
archive, depending on the ZIP archiver.
```
0:0:504b0708504b0304:ZIP (First segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Practically speaking, this new signature makes it so ClamAV identifies
the file as a ZIP right away without having to rely on SFX_ZIP detection.
Extraction is then handled by the ZIP `cli_unzip` function rather than
extracting each with `cli_unzip_single` which handles SFX_ZIP entries.
Note: ClamAV isn't capable of finding additional files on disk to support
handling the additional segments. So it doesn't make any difference with
handling those other files.
This signature is for single-segment split/spanned archives, depending
on the ZIP archiver.
```
0:0:504b0303504b0304:ZIP (Single-segment split/spanned):CL_TYPE_ANY:CL_TYPE_ZIP
```
Like the first one, this also means we won't rely on SFX_ZIP detection
and will treat this files as regular ZIPs.
- Added a test file to verify that ClamAV can extract a single-file
"split" ZIP.
- Added a clamscan test with test files to verify that scanning a split
archive across two segments correctly extracts the properly formed zip
file entries. Sadly, we can't join the segments to extract everything.
2025-08-04 22:50:48 -04:00
|
|
|
#define SIZEOF_CENTRAL_HEADER 46 // Excluding variable size fields
|
|
|
|
#define SIZEOF_ENCRYPTION_HEADER 12 // Excluding variable size fields
|
|
|
|
#define SIZEOF_END_OF_CENTRAL 22 // Excluding variable size fields
|
|
|
|
|
2007-12-13 19:47:07 +00:00
|
|
|
#endif /* UNZIP_PRIVATE */
|
|
|
|
|
|
|
|
#endif /* __UNZIP_H */
|