mirror of
				https://github.com/Cisco-Talos/clamav.git
				synced 2025-10-31 16:10:54 +00:00 
			
		
		
		
	
		
			
	
	
		
			401 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			401 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | # ClamAV Development
 | ||
|  | This page aims to provide information useful when developing, debugging, or | ||
|  | profiling ClamAV. | ||
|  | 
 | ||
|  | ## Building ClamAV for Development
 | ||
|  | Below are some recommendations for building ClamAV so that it's easy to debug: | ||
|  | 
 | ||
|  | ### Running ./configure
 | ||
|  | Suggestions: | ||
|  | 
 | ||
|  | - Modify the CFLAGS variable as follows (assuming you're build with gcc): | ||
|  | 
 | ||
|  |     - Include gdb debugging information (`-ggdb`).  This will make it easier | ||
|  |       to debug with gdb | ||
|  | 
 | ||
|  |     - Disable optimizations (`-O0`).  This will ensure the line numbers you | ||
|  |       see in gdb match up with what is actually being executed. | ||
|  | 
 | ||
|  | - Run configure with the following options: | ||
|  | 
 | ||
|  |     - ``--prefix=`pwd`/build``: This will cause `make install` to install | ||
|  |       into the specified directory to avoid potentially tainting a release | ||
|  |       install of ClamAV that you may have. | ||
|  | 
 | ||
|  |     - `--enable-debug`: This will define *CL_DEBUG*, which mostly just enables | ||
|  |       additional print statements that are useful for debugging | ||
|  | 
 | ||
|  |     - `--enable-check`: Enables the unit tests, which can be run with 'make | ||
|  |       check' | ||
|  | 
 | ||
|  |     - `--enable-coverage`: If using gcc, sets `-fprofile-arcs -ftest-coverage` | ||
|  |       so that code coverage metrics will get generated when the program is run. | ||
|  |       Note that the code inserted to store program flow data may show up in | ||
|  |       any generated flame graphs or profiling output, so if you don't care | ||
|  |       about code coverage, omit this | ||
|  | 
 | ||
|  |     - `--enable-libjson`: Enables libjson, which enables the `--gen-json` option. | ||
|  |       The json output contains additional metadata that might be helpful when | ||
|  |       debugging. | ||
|  | 
 | ||
|  |     - `--enable-static --disable-shared`: This will only build libclamav and | ||
|  |       the supporting libraries as static libraries, and will result in the | ||
|  |       clamscan that is built having this code embedded.  This is useful for | ||
|  |       running programs like gprof which don't handle profiling code in shared | ||
|  |       objects. | ||
|  | 
 | ||
|  |     - `--with-systemdsystemunitdir=no`: Don't try to register clamd as a | ||
|  |       systemd service | ||
|  | 
 | ||
|  |     - You might want to include the following flags also so that the optional | ||
|  |       functionality is enabled: `--enable-experimental --enable-clamdtop | ||
|  |       --enable-libjson --enable-milter --enable-xml --enable-pcre`. | ||
|  |       Note that this may require you to install additional development | ||
|  |       libraries. | ||
|  | 
 | ||
|  |     - I ran into problems building with llvm on Ubuntu 18.04, so add | ||
|  |       `--disable-llvm` | ||
|  | 
 | ||
|  | Altogether, the following configure command can be used: | ||
|  | 
 | ||
|  | ``` | ||
|  | CFLAGS="-ggdb -O0" ./configure --prefix=`pwd`/built --enable-debug --enable-check --enable-coverage --enable-libjson --enable-static --disable-shared --with-systemdsystemunitdir=no --enable-experimental --enable-clamdtop --enable-libjson --enable-xml --enable-pcre --disable-llvm | ||
|  | ``` | ||
|  | To satisify all library dependencies, something like this should work | ||
|  | (from Ubuntu 18.04): | ||
|  | ``` | ||
|  | sudo apt-get install git gcc libxml2-dev libssl-dev make libmilter-dev libcurl4-openssl-dev libjson-c-dev check pkgconf libncurses5-dev libpcre3-dev g++ libtool libbz2-dev | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Running make
 | ||
|  | Run the following to finishing building.  `-j2` in the code below is used to | ||
|  | indicate that the build process should use 2 cores.  Increase this if your | ||
|  | machine is more powerful. | ||
|  | ``` | ||
|  | make -j2 | ||
|  | make install | ||
|  | ``` | ||
|  | Also, you can run 'make check' to run the unit tests | ||
|  | 
 | ||
|  | ### Downloading the Official Ruleset
 | ||
|  | If you plan to use custom rules for testing, you can invoke clamscan via | ||
|  | `./built/bin/clamscan`, specifying your custom rule files via `-d` parameters. | ||
|  | 
 | ||
|  | If you want to download the official ruleset to use with clamscan, do the | ||
|  | following: | ||
|  | 1. Run `mkdir -p built/share/clamav` | ||
|  | 2. Comment out line 8 of etc/freshclam.conf.sample | ||
|  | 3. Run `./built/bin/freshclam --config-file etc/freshclam.conf.sample` | ||
|  | 
 | ||
|  | ## General Debugging
 | ||
|  | ### Useful clamscan Flags
 | ||
|  | The following are useful flags to include when debugging clamscan: | ||
|  | 
 | ||
|  | - `--debug --verbose`: Print lots of helpful debug information | ||
|  | 
 | ||
|  | - `--gen-json`: Print some additional debug information in a JSON format | ||
|  | 
 | ||
|  | - `--statistics=pcre --statistics=bytecode`:  Print execution statistics on any | ||
|  |   PCRE and bytecode rules that were evaluated | ||
|  | 
 | ||
|  | - `--dev-performance`: Print per-file statistics regarding how long scanning | ||
|  |   took and the times spent in various scanning stages | ||
|  | 
 | ||
|  | - `--detect-broken`: This will attempt to detect broken executable files.  If | ||
|  |   an executable is determined to be broken, some functionality might not get | ||
|  |   invoked for the sample, and this could be an indication of an issue parsing | ||
|  |   the PE header or file.  This causes those binary to generate an alert instead | ||
|  |   of just continuing on. | ||
|  | 
 | ||
|  | - `--max-filesize=2000M --max-scansize=2000M --max-files=2000000 | ||
|  |    --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000 | ||
|  |    --max-htmlnotags=2000000 --max-scriptnormalize=2000000 | ||
|  |    --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000 | ||
|  |    --max-rechwp3=2000000 --pcre-match-limit=2000000 | ||
|  |    --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`: | ||
|  |   Effectively disables all file limits and maximums for scanning.  This is | ||
|  |   useful if you'd like to ensure that all files in a set get scanned, and would | ||
|  |   prefer clam to just run slowly or crash rather than skip a file because it | ||
|  |   encounters one of these thresholds | ||
|  | 
 | ||
|  | The following are useful flags to include when debugging rules that you're | ||
|  | writing: | ||
|  | 
 | ||
|  | - `-d`: Allows you to specify a custom ClamAV rule file from the command line | ||
|  | 
 | ||
|  | - `--bytecode-unsigned`: If you are testing custom bytecode rules, you'll need | ||
|  |   this flag so that clamscan actually runs the bytecode signature | ||
|  | 
 | ||
|  | - `--all-match`: Allows multiple signatures to match on a file being scanned | ||
|  | 
 | ||
|  | - `--leave-temps --tmpdir=/tmp`: By default, ClamAV will attempt to extract | ||
|  |   embedded files that it finds, normalize certain text files before looking | ||
|  |   for matches, and unpack packed executables that it has unpacking support for. | ||
|  |   These flags tell ClamAV to write these intermediate files out to the | ||
|  |   directory specified.  Usually when a file is written, it will mention the | ||
|  |   file name in the --debug output, so you can have some idea at what stage in | ||
|  |   the scanning process a tmp file was created. | ||
|  | 
 | ||
|  | - `--dump-certs`: For signed PE files that match a rule, display information | ||
|  |   about the certificates stored within the binary.  Note - sigtool has this | ||
|  |   functionality as well and doesn't require a rule match to view the cert data | ||
|  | 
 | ||
|  | ### Useful sigtool Flags
 | ||
|  | sigtool pulls in libclamav and provides shortcuts to doing tasks that clamscan | ||
|  | does behind the scenes.  These can be really useful when writing a signature or | ||
|  | trying to get information about a signature that might be causing FPs or | ||
|  | performance problems. | ||
|  | 
 | ||
|  | The following sigtool flags can be useful when debugging: | ||
|  | 
 | ||
|  | - `--unpack`: Unpack the specified CVD/CLD file | ||
|  | 
 | ||
|  | - `--decode`: Given a ClamAV signature from STDIN, show a more user-friendly | ||
|  |   representation of it | ||
|  | 
 | ||
|  | - `--hex-dump`: Given a sequence of bytes from STDIN, print the hex equivalent | ||
|  | 
 | ||
|  | - `--mdb`: Generate section hashes of the specified file | ||
|  | 
 | ||
|  | - `--imp`: Generate import hashes of the specified file | ||
|  | 
 | ||
|  | - `--html-normalise`: Normalize the specified HTML file in the way that | ||
|  |   clamscan will before looking for rule matches.  This makes it either to write | ||
|  |   rules that will actually match. | ||
|  | 
 | ||
|  | - `--ascii-normalise`: Normalized the specified ASCII text file in the way that | ||
|  |   clamscan will before looking for rule matches | ||
|  | 
 | ||
|  | - `--print-certs`: Print the Authenticode signatures of any PE files specified. | ||
|  |   This is useful when writing signature-based .crb rule files. | ||
|  | 
 | ||
|  | - `--vba`: Extract VBA/Word6 macro code | ||
|  | 
 | ||
|  | ### Using gdb
 | ||
|  | Given that you might want to pass a lot of arguments to gdb, consider taking | ||
|  | advantage of the `--args` parameter.  For example: | ||
|  | ``` | ||
|  | gdb --args ./built/bin/clamscan -d /tmp/test.ldb -d /tmp/blacklist.crb -d --dumpcerts --debug --verbose --max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-iconspe=2000000 f8f101166fec5785b4e240e4b9e748fb6c14fdc3cd7815d74205fc59ce121515 | ||
|  | ``` | ||
|  | 
 | ||
|  | When using ClamAV without libclamav statically linked, if you set breakpoints | ||
|  | on libclamav functions by name, you'll need to make sure to indicate that | ||
|  | the breakpoints should be resolved after libraries have been loaded. | ||
|  | 
 | ||
|  | For other documentation about how to use gdb, check out the following | ||
|  | resources: | ||
|  |  - [A Guide to gdb](http://www.cabrillo.edu/~shodges/cs19/progs/guide_to_gdb_1.1.pdf) | ||
|  |  - [gdb Quick Reference](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf) | ||
|  | 
 | ||
|  | ## Hunting for Memory Leaks
 | ||
|  | You can easily hunt for memory leaks with valgrind.  Check out this guide to | ||
|  | get started: | ||
|  |  - [Valgrind Quick Start](http://valgrind.org/docs/manual/quick-start.html) | ||
|  | If checking for leaks, be sure to run clamscan with samples that will hit as | ||
|  | many of the unique code paths in the code you are testing.  An example | ||
|  | invocation is as follows: | ||
|  | ``` | ||
|  | valgrind --leak-check=full ./built/bin/clamscan -d /tmp/test.ldb --leave-temps --tempdir /tmp/test --debug --verbose /tmp/upx-samples/ > /tmp/upx-results-2.txt 2>&1 | ||
|  | ``` | ||
|  | Alternatively, on Linux, you can use glibc's built-in leak checking | ||
|  | functionality: | ||
|  | ``` | ||
|  | MALLOC_CHECK_=7 ./built/bin/clamscan | ||
|  | ``` | ||
|  | See the [mallopt man page](http://manpages.ubuntu.com/manpages/trusty/man3/mallopt.3.html) for more details | ||
|  | 
 | ||
|  | ## Computing Code Coverage
 | ||
|  | gcov/lcov can be used to produce a code coverage report indicating which lines | ||
|  | of code were executed on a single run or by multiple runs of clamscan.  NOTE: | ||
|  | for these metrics to be collected, ClamAV needs to have been configured with | ||
|  | the `--enable-coverage` option. | ||
|  | 
 | ||
|  | First, run the following to zero out all of the performance metrics: | ||
|  | ``` | ||
|  | lcov -z --directory . --output-file coverage.lcov.data | ||
|  | ``` | ||
|  | Next, run ClamAV through whatever test cases you have.  Then, run lcov again | ||
|  | to collect the coverage data as follows: | ||
|  | ``` | ||
|  | lcov -c --directory . --output-file coverage.lcov.data | ||
|  | ``` | ||
|  | Finally, run the genhtml tool that ships with lcov to produce the code coverage | ||
|  | report: | ||
|  | ``` | ||
|  | genhtml coverage.lcov.data --output-directory report | ||
|  | ``` | ||
|  | The report directory will have an index.html page which can be loaded into any | ||
|  | web browser. | ||
|  | 
 | ||
|  | For more information, visit the [lcov webpage](http://ltp.sourceforge.net/coverage/lcov.php) | ||
|  | 
 | ||
|  | ## Profiling - Flame Graphs
 | ||
|  | [FlameGraph](https://github.com/brendangregg/FlameGraph) is a great tool for | ||
|  | generating interactive flame graphs based collected profiling data.  The github | ||
|  | page has thorough documentation on how to use the tool, but an overview is | ||
|  | presented below: | ||
|  | 
 | ||
|  | First, install perf, which on Linux can be done via: | ||
|  | ``` | ||
|  | apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r` | ||
|  | ``` | ||
|  | 
 | ||
|  | Modify the system settings to allow perf record to be run by a standard user: | ||
|  | ``` | ||
|  | $ sudo su | ||
|  | # cat /proc/sys/kernel/perf_event_paranoid 
 | ||
|  | # echo "1" > /proc/sys/kernel/perf_event_paranoid 
 | ||
|  | # exit
 | ||
|  | ``` | ||
|  | 
 | ||
|  | Invoke clamscan via perf record as follows, and run perf script to collect the | ||
|  | profiling data: | ||
|  | ``` | ||
|  | perf record -F 100 -g -- ./built/bin/clamscan -d /tmp/test.ldb /tmp/2aa6b18d509090c60c3e4ecdd8aeb16e5f149807e3404c86892112710eab576d | ||
|  | perf script > out.perf | ||
|  | ``` | ||
|  | The '-F' parameter indicates how many samples should be collected during | ||
|  | program execution.  If your scan will take a long time to run, a lower value | ||
|  | should be sufficient.  Otherwise, consider choosing a higher value (on Ubuntu | ||
|  | 18.04, 7250 is the max frequency, but it can be increased via | ||
|  | /proc/sys/kernel/perf_event_max_sample_rate. | ||
|  | 
 | ||
|  | Check out the FlameGraph project and run the following commands to generate | ||
|  | the flame graph: | ||
|  | ``` | ||
|  | perl stackcollapse-perf.pl ../clamav-devel/out.perf > /tmp/out.folded | ||
|  | perl flamegraph.pl /tmp/out.folded > /tmp/test.svg | ||
|  | ``` | ||
|  | 
 | ||
|  | The SVG that is generated is interactive, but some viewers don't support this. | ||
|  | Be sure to open it in a web browser like Chrome to be able to take full | ||
|  | advantage of it. | ||
|  | 
 | ||
|  | ## Profiling - Callgrind
 | ||
|  | Callgrind is a profiling tool included with valgrind.  This can be done by | ||
|  | prepending `valgrind --tool=callgrind ` to the clamscan command. | ||
|  | [kcachegrind](https://kcachegrind.github.io/html/Home.html) | ||
|  | is a follow-on tool that will graphically present the | ||
|  | profiling data and allow you to explore it visually, although if you don't | ||
|  | already use KDE you'll have to install lots of extra packages to use it. | ||
|  | 
 | ||
|  | ## System Call Tracing / Fault Injection
 | ||
|  | strace can be used to track the system calls that are performed and provide the | ||
|  | number of calls / time spent in each system call.  This can be done by | ||
|  | prepending `strace -c ` to a clamscan command.  Results will look something | ||
|  | like this: | ||
|  | ``` | ||
|  | % time     seconds  usecs/call     calls    errors syscall | ||
|  | ------ ----------- ----------- --------- --------- ---------------- | ||
|  |  95.04    0.831430          13     62518           read | ||
|  |   3.22    0.028172          14      2053           munmap | ||
|  |   0.69    0.006005           3      2102           mmap | ||
|  |   0.28    0.002420           7       344           pread64 | ||
|  |   0.16    0.001415           5       305         1 openat | ||
|  |   0.13    0.001108           3       405           write | ||
|  |   0.11    0.000932          23        40           mprotect | ||
|  |   0.07    0.000632           2       310           close | ||
|  |   0.07    0.000583           9        67        30 access | ||
|  |   0.05    0.000395           1       444           lseek | ||
|  |   0.04    0.000344           2       162           fstat | ||
|  |   0.04    0.000338           1       253           brk | ||
|  |   0.03    0.000262           1       422           fcntl | ||
|  |   0.02    0.000218          16        14           futex | ||
|  |   0.01    0.000119           1       212           getpid | ||
|  |   0.01    0.000086          14         6           getdents | ||
|  |   0.00    0.000043           7         6           dup | ||
|  |   0.00    0.000040           1        31           unlink | ||
|  |   0.00    0.000038          19         2           rt_sigaction | ||
|  |   0.00    0.000037          19         2           rt_sigprocmask | ||
|  |   0.00    0.000029           1        37           stat | ||
|  |   0.00    0.000022          11         2           prlimit64 | ||
|  |   0.00    0.000021          21         1           sysinfo | ||
|  |   0.00    0.000020           1        33           clock_gettime | ||
|  |   0.00    0.000019          19         1           arch_prctl | ||
|  |   0.00    0.000018          18         1           set_tid_address | ||
|  |   0.00    0.000018          18         1           set_robust_list | ||
|  |   0.00    0.000013           0        60           lstat | ||
|  |   0.00    0.000011           0        65           madvise | ||
|  |   0.00    0.000002           0        68           geteuid | ||
|  |   0.00    0.000000           0         1           execve | ||
|  |   0.00    0.000000           0         1           uname | ||
|  |   0.00    0.000000           0         1           getcwd | ||
|  | ------ ----------- ----------- --------- --------- ---------------- | ||
|  | 100.00    0.874790                 69970        31 total | ||
|  | ``` | ||
|  | 
 | ||
|  | strace can also be used for cool things like system call fault injection.  For | ||
|  | instance, I was curious whether the 'read' bytecode API call was implemented | ||
|  | in such a way that the underlying read system call could handle EINTR being | ||
|  | returned (which can happen periodically).  To test this, I wrote the following | ||
|  | bytecode rule: | ||
|  | ``` | ||
|  | VIRUSNAME_PREFIX("BC.Heuristic.Test.Read.Passed") | ||
|  | VIRUSNAMES("") | ||
|  | TARGET(0) | ||
|  | 
 | ||
|  | SIGNATURES_DECL_BEGIN | ||
|  | DECLARE_SIGNATURE(zeroes) | ||
|  | SIGNATURES_DECL_END | ||
|  | 
 | ||
|  | SIGNATURES_DEF_BEGIN | ||
|  | DEFINE_SIGNATURE(zeroes, "0:0000") | ||
|  | SIGNATURES_DEF_END | ||
|  | 
 | ||
|  | bool logical_trigger() | ||
|  | { | ||
|  |     return matches(Signatures.zeroes); | ||
|  | } | ||
|  | 
 | ||
|  | #define READ_S(value, size) if (read(value, size) != size) return 0;
 | ||
|  | 
 | ||
|  | int entrypoint(void) | ||
|  | { | ||
|  |     char buffer[65536]; | ||
|  |     int i; | ||
|  | 
 | ||
|  |     for (i = 0; i < 256; i++) | ||
|  |     { | ||
|  |         debug(i); | ||
|  |         debug("\n"); | ||
|  |         READ_S(buffer, sizeof(buffer)); | ||
|  |     } | ||
|  | 
 | ||
|  |     foundVirus(""); | ||
|  |     return 0; | ||
|  | } | ||
|  | ``` | ||
|  | I compiled the rule, made a test file to match against, and ran it under | ||
|  | strace to determine what underlying read system call was used for the bytecode | ||
|  | read function: | ||
|  | ``` | ||
|  | clambc-compiler read_test.bc | ||
|  | dd if=/dev/zero of=/tmp/zeroes bs=65535 count=256 | ||
|  | strace clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes | ||
|  | ``` | ||
|  | It uses pread64 under the hood, so the following command could be used for fault | ||
|  | injection: | ||
|  | ``` | ||
|  | strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes  | ||
|  | ``` | ||
|  | This command tells strace to skip the first 20 pread64 calls (these appear to | ||
|  | be used by the loader, which didn't seem to handle EINTR correctly) but to | ||
|  | inject EINTR for every 10th call afterward.  We can see the injection in action | ||
|  | and that the system call is retried successfully: | ||
|  | ``` | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15007744) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15073280) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15138816) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15204352) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15269888) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15335424) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15400960) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15466496) = 65536 | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15532032) = 65536 | ||
|  | pread64(3, 0x7f6a7ff43000, 65536, 15597568) = -1 EINTR (Interrupted system call) (INJECTED) | ||
|  | pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15597568) = 65536 | ||
|  | ``` | ||
|  | More documentation on this feature can be found in | ||
|  | [this presentation](https://archive.fosdem.org/2017/schedule/event/failing_strace/attachments/slides/1630/export/events/attachments/failing_strace/slides/1630/strace_fosdem2017_ta_slides.pdf) | ||
|  | from FOSDEM 2017. |