Commit graph

220 commits

Author SHA1 Message Date
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Jonas Zaddach (jzaddach)
cd977727f0 Add LZMA & BZip2 decompression to bytecode API
Adds LZMA and BZip2 decompression routines to the bytecode API.
The ability to decompress LZMA and BZip2 streams is particularly
useful for bytecode signatures that extend clamav executable
unpacking capabilities.

Of note, the LZMA format is not well standardized. This API
expects the stream to start with the LZMA_Alone header.

Also fixed a bug in LZMA dictionary size setting.
2020-04-29 09:26:07 -07:00
Micah Snyder
50455664a7 libclamav: Fix fmap leak in bytecode runtime
Fixes an fmap leak in the bytecode switch_input() API.  The
switch_input() API provides a way to read from an extracted file instead
of reading from the current file.  The issue is that the current
implementation fails to free the fmap created to read from the extracted
file on cleanup or when switching back to the original fmap.  In
addition, it fails to use the cli_bytecode_context_setfile() function
to restore the file_size in the context for the current fmap.

Fixes a couple fmap leaks in the unit tests.
2020-04-20 11:26:43 -07:00
Jonas Zaddach (jzaddach)
a627d09689 Correct disassembling of bytecode ICMP_ULT instruction 2020-02-12 19:59:47 -08:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
97a0647e88 Additional variable type changes for correctness and to silence warnings. A handful of other minor changes to silence warnings. Corrected a number of function definitions so they return cl_error_t rather than int. 2019-10-02 16:08:25 -04:00
Andrew
4de072327a Rename MAX_BC to MAX_TRACKED_BC for consistency 2019-10-02 16:08:23 -04:00
Mickey Sola
1b5a59c416 bytecode - J867 - fix memory leak that occurs withn the bytecode interpreter while libjson is enabled 2019-10-02 16:08:21 -04:00
Jonas Zaddach
c84683f2f4 Mach-O bytecode unpackers 2019-10-02 16:08:21 -04:00
Jonas Zaddach
2b776e4b89 Linux bytecode unpackers 2019-10-02 16:08:21 -04:00
Andrew
df8dfda9cd Address code-review comments, fix several memleaks
Changes include:
 - Fixing several memory leaks noticed when running with ASan
 - Adds documentation for several functions and structs
 - Simplifies the interface for using cli_targetinfo_init/destroy
   and cli_exe_info_init/destroy
 - A few other minor changes
2019-10-02 16:08:20 -04:00
Andrew
7ba310e605 PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function.  I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below).  If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later.  Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.

I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later

PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements

Also:
 - Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS

    When doing performance testing with the latest CVD, MAX_BC and
    MAX_TRACKED_PCRE need to be raised to track all the events.
    Allow these to be specified via CFLAGS by not redefining them
    if they are already defined

- Fix an issue preventing wildcard sizes in .MDB/.MSB rules

    I'm not sure what the original intent of the check I removed was,
    but it prevents using wildcard sizes in .MDB/.MSB rules.  AFAICT
    these wildcard sizes should be handled appropriately by the MD5
    section hash computation code, so I don't think a check on that
    is needed.

- Fix several issues related to db loading
     - .imp files will now get loaded if they exist in a directory passed
       via clamscan's '-d' flag
     - .pwdb files will now get loaded if they exist in a directory passed
       via clamscan's '-d' flag even when compiling without yara support
     - Changes to .imp, .ign, and .ign2 files will now be reflected in calls
       to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
       compiling without yara support)
     - The contents of .sfp files won't be included in some of the signature
       counts, and the contents of .cud files will be
     - Any local.gdb files will no longer be loaded twice

- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-10-02 16:08:20 -04:00
Jonas Zaddach
d1f7ff12a3 Prettify printing of bytecode arguments 2019-10-02 16:08:19 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder
964a1e7321 Converting http urls to https urls. Primary focus was on clamav.net urls. I updated a couple others and fixes a few broken links as well. There are many (non-clamav.net) urls I didn't address, especially in 3rd party or contrib code. 2018-04-02 07:58:33 -04:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Micah Snyder
c9a070c9d3 More cleanup re: variables possibly used before initialized. 2018-02-08 16:00:24 -05:00
Micah Snyder
653b471b5b eliminating format-string related warnings that appear on ubuntu 16.04 x64. 2017-10-11 15:03:33 -04:00
Micah Snyder
7e64560ce5 eliminating warnings that cropped up in 32bit ubuntu (16.04) 2017-08-31 11:00:34 -04:00
Micah Snyder
d18d72219f Eliminating warnings, converting iterator variables to size_t when used to compare against sizeof(). added a couple of missing #includes. 2017-08-11 16:01:50 -04:00
Mickey Sola
60aad52faf bc - adding bc_idx sanity check when running bc lsigs 2017-06-21 15:53:14 -04:00
Steven Morgan
d32e039654 fix cli_bcapi_extract_new() return code path virus reporting. 2017-06-20 17:15:15 -04:00
Steven Morgan
cbf5017a7d bb11805 fix multiple results. Refactor false positive and heuristic precedence logic. 2017-04-18 12:07:06 -04:00
Steven Morgan
fd43d6103c bb11742 - fix compiler warnings. Patch contributed by Ruga. 2017-02-10 12:53:24 -05:00
Steven Morgan
22cb38ed24 pull request #53(2/4): Spelling fix by klemens(ka7). 2016-10-19 15:57:45 -04:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Steven Morgan
a80453e6e9 Merge master to features/yara. 2015-05-01 18:36:48 -04:00
Kevin Lin
fe54f710fc clambc info option updated for new hook type 2015-03-03 16:12:22 -05:00
Kevin Lin
47c2d618cd added BC_PRECLASS hook support; replaces target type 13 2015-03-03 15:00:55 -05:00
Kevin Lin
c648e6b490 Merge branch 'master' into klin/pcre_support
Conflicts:
	clamconf/clamconf.c
	clamscan/manager.c
	docs/signatures.tex
	sigtool/sigtool.c
2014-10-31 11:10:41 -04:00
Kevin Lin
90379a9e98 fixed formatting for short names in perf tracking 2014-10-22 17:56:49 -04:00
Kevin Lin
5c2c723361 added pcre execution time and match performance tracking
fixed an issue with statistics reporting with no signatures loaded
2014-09-16 15:56:56 -04:00
Kevin Lin
99e22630f4 opts: converted bytecode-statistics to generic statistics w/ strarg 2014-09-16 15:26:19 -04:00
Kevin Lin
032ec2192e fixed issue in bytecode statistics avg time reporting 2014-09-09 17:48:35 -04:00
Kevin Lin
7c9c4fab22 bytecode: various changes from code review 2014-08-26 17:02:41 -04:00
David Raynor
0b28c74878 Assign the right type in cli_bytetype_helper 2014-08-26 11:03:17 -04:00
Kevin Lin
5b5be2a65d win32: fixed additional OS specific build issues
bc2llvm: removed redundant macro causing issues in win32
2014-07-25 20:10:23 -04:00
Kevin Lin
0ff13b3138 clambc: added diagnostic tools for bytecode IR
clambc: added option to print bytecode IR
TODO: add diagnostic functions to win32 project

Conflicts:

	shared/optparser.c
2014-07-25 12:06:13 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00
Kevin Lin
c6a3b294a9 bytecode: fixed a compiler issue and warnings 2014-06-03 11:47:57 -04:00
Kevin Lin
3107a6c24f bytecode: fixed issue with older versions of g++ 2014-06-03 11:19:01 -04:00
Kevin Lin
f3575db23c bytecode: added json-specific ctx members 2014-05-06 13:43:02 -04:00
Shawn Webb
b2e7c931d0 Use OpenSSL for hashing. 2014-02-08 00:31:12 -05:00
David Raynor
dac4e48755 libclamav: non-LLVM interpreter, fix edge check, cid #10432 & #10446 2013-08-09 17:07:10 -04:00
David Raynor
6a9086d240 libclamav: cli_bytecode_prepare_interpreter() free in error case, cid #10504 & #10505 2013-08-09 15:48:13 -04:00