mirror of
https://github.com/Cisco-Talos/clamav.git
synced 2025-10-22 20:03:17 +00:00
Removed mirror docs (they are in github’s faq, remove them from tar ball), and corrected links in clamdoc and signatures.
800 lines
36 KiB
TeX
800 lines
36 KiB
TeX
\documentclass[a4paper,titlepage,12pt]{article}
|
|
\usepackage{amssymb}
|
|
\usepackage{pslatex}
|
|
\usepackage[dvips]{graphicx}
|
|
\usepackage{wrapfig}
|
|
\usepackage{url}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
|
|
\begin{center}
|
|
\huge Creating signatures for ClamAV\\
|
|
\vspace{2cm}
|
|
\end{center}
|
|
|
|
\noindent
|
|
\section{Introduction}
|
|
CVD (ClamAV Virus Database) is a digitally signed container that
|
|
includes signature databases in various text formats. The header
|
|
of the container is a 512 bytes long string with colon separated fields:
|
|
\begin{verbatim}
|
|
ClamAV-VDB:build time:version:number of signatures:functionality
|
|
level required:MD5 checksum:digital signature:builder name:build
|
|
time (sec)
|
|
\end{verbatim}
|
|
\verb+sigtool --info+ displays detailed information about a given CVD file:
|
|
\begin{verbatim}
|
|
zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd
|
|
File: main.cvd
|
|
Build time: 09 Dec 2007 15:50 +0000
|
|
Version: 45
|
|
Signatures: 169676
|
|
Functionality level: 21
|
|
Builder: sven
|
|
MD5: b35429d8d5d60368eea9630062f7c75a
|
|
Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8
|
|
JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY
|
|
eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh
|
|
Verification OK.
|
|
\end{verbatim}
|
|
The ClamAV project distributes a number of CVD files, including
|
|
\emph{main.cvd} and \emph{daily.cvd}.
|
|
|
|
\section{Debug information from libclamav}
|
|
In order to create efficient signatures for ClamAV it's important
|
|
to understand how the engine handles input files. The best way
|
|
to see how it works is having a look at the debug information from
|
|
libclamav. You can do it by calling \verb+clamscan+ with the
|
|
\verb+--debug+ and \verb+--leave-temps+ flags. The first switch
|
|
makes clamscan display all the interesting information from
|
|
libclamav and the second one avoids deleting temporary files so
|
|
they can be analyzed further. The now important part of the info
|
|
is:
|
|
\begin{verbatim}
|
|
$ clamscan --debug attachment.exe
|
|
[...]
|
|
LibClamAV debug: Recognized MS-EXE/DLL file
|
|
LibClamAV debug: Matched signature for file type PE
|
|
LibClamAV debug: File type: Executable
|
|
\end{verbatim}
|
|
The engine recognized a windows executable.
|
|
\begin{verbatim}
|
|
LibClamAV debug: Machine type: 80386
|
|
LibClamAV debug: NumberOfSections: 3
|
|
LibClamAV debug: TimeDateStamp: Fri Jan 10 04:57:55 2003
|
|
LibClamAV debug: SizeOfOptionalHeader: e0
|
|
LibClamAV debug: File format: PE
|
|
LibClamAV debug: MajorLinkerVersion: 6
|
|
LibClamAV debug: MinorLinkerVersion: 0
|
|
LibClamAV debug: SizeOfCode: 0x9000
|
|
LibClamAV debug: SizeOfInitializedData: 0x1000
|
|
LibClamAV debug: SizeOfUninitializedData: 0x1e000
|
|
LibClamAV debug: AddressOfEntryPoint: 0x27070
|
|
LibClamAV debug: BaseOfCode: 0x1f000
|
|
LibClamAV debug: SectionAlignment: 0x1000
|
|
LibClamAV debug: FileAlignment: 0x200
|
|
LibClamAV debug: MajorSubsystemVersion: 4
|
|
LibClamAV debug: MinorSubsystemVersion: 0
|
|
LibClamAV debug: SizeOfImage: 0x29000
|
|
LibClamAV debug: SizeOfHeaders: 0x400
|
|
LibClamAV debug: NumberOfRvaAndSizes: 16
|
|
LibClamAV debug: Subsystem: Win32 GUI
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 0
|
|
LibClamAV debug: Section name: UPX0
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0x1e000 0x1e000
|
|
LibClamAV debug: VirtualAddress: 0x1000 0x1000
|
|
LibClamAV debug: SizeOfRawData: 0x0 0x0
|
|
LibClamAV debug: PointerToRawData: 0x400 0x400
|
|
LibClamAV debug: Section's memory is executable
|
|
LibClamAV debug: Section's memory is writeable
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 1
|
|
LibClamAV debug: Section name: UPX1
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0x9000 0x9000
|
|
LibClamAV debug: VirtualAddress: 0x1f000 0x1f000
|
|
LibClamAV debug: SizeOfRawData: 0x8200 0x8200
|
|
LibClamAV debug: PointerToRawData: 0x400 0x400
|
|
LibClamAV debug: Section's memory is executable
|
|
LibClamAV debug: Section's memory is writeable
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 2
|
|
LibClamAV debug: Section name: UPX2
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0x1000 0x1000
|
|
LibClamAV debug: VirtualAddress: 0x28000 0x28000
|
|
LibClamAV debug: SizeOfRawData: 0x200 0x1ff
|
|
LibClamAV debug: PointerToRawData: 0x8600 0x8600
|
|
LibClamAV debug: Section's memory is writeable
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
|
|
\end{verbatim}
|
|
The section structure displayed above suggests the executable is
|
|
packed with UPX.
|
|
\begin{verbatim}
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
|
|
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming
|
|
compression
|
|
LibClamAV debug: UPX: bad magic - scanning for imports
|
|
LibClamAV debug: UPX: PE structure rebuilt from compressed file
|
|
LibClamAV debug: UPX: Successfully decompressed with NRV2B
|
|
LibClamAV debug: UPX/FSG: Decompressed data saved in
|
|
/tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede
|
|
LibClamAV debug: ***** Scanning decompressed file *****
|
|
LibClamAV debug: Recognized MS-EXE/DLL file
|
|
LibClamAV debug: Matched signature for file type PE
|
|
\end{verbatim}
|
|
Indeed, libclamav recognizes the UPX data and saves the decompressed
|
|
(and rebuilt) executable into \verb+/tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede+.
|
|
Then it continues by scanning this new file:
|
|
\begin{verbatim}
|
|
LibClamAV debug: File type: Executable
|
|
LibClamAV debug: Machine type: 80386
|
|
LibClamAV debug: NumberOfSections: 3
|
|
LibClamAV debug: TimeDateStamp: Thu Jan 27 11:43:15 2011
|
|
LibClamAV debug: SizeOfOptionalHeader: e0
|
|
LibClamAV debug: File format: PE
|
|
LibClamAV debug: MajorLinkerVersion: 6
|
|
LibClamAV debug: MinorLinkerVersion: 0
|
|
LibClamAV debug: SizeOfCode: 0xc000
|
|
LibClamAV debug: SizeOfInitializedData: 0x19000
|
|
LibClamAV debug: SizeOfUninitializedData: 0x0
|
|
LibClamAV debug: AddressOfEntryPoint: 0x7b9f
|
|
LibClamAV debug: BaseOfCode: 0x1000
|
|
LibClamAV debug: SectionAlignment: 0x1000
|
|
LibClamAV debug: FileAlignment: 0x1000
|
|
LibClamAV debug: MajorSubsystemVersion: 4
|
|
LibClamAV debug: MinorSubsystemVersion: 0
|
|
LibClamAV debug: SizeOfImage: 0x26000
|
|
LibClamAV debug: SizeOfHeaders: 0x1000
|
|
LibClamAV debug: NumberOfRvaAndSizes: 16
|
|
LibClamAV debug: Subsystem: Win32 GUI
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 0
|
|
LibClamAV debug: Section name: .text
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0xc000 0xc000
|
|
LibClamAV debug: VirtualAddress: 0x1000 0x1000
|
|
LibClamAV debug: SizeOfRawData: 0xc000 0xc000
|
|
LibClamAV debug: PointerToRawData: 0x1000 0x1000
|
|
LibClamAV debug: Section contains executable code
|
|
LibClamAV debug: Section's memory is executable
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 1
|
|
LibClamAV debug: Section name: .rdata
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0x2000 0x2000
|
|
LibClamAV debug: VirtualAddress: 0xd000 0xd000
|
|
LibClamAV debug: SizeOfRawData: 0x2000 0x2000
|
|
LibClamAV debug: PointerToRawData: 0xd000 0xd000
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: Section 2
|
|
LibClamAV debug: Section name: .data
|
|
LibClamAV debug: Section data (from headers - in memory)
|
|
LibClamAV debug: VirtualSize: 0x17000 0x17000
|
|
LibClamAV debug: VirtualAddress: 0xf000 0xf000
|
|
LibClamAV debug: SizeOfRawData: 0x17000 0x17000
|
|
LibClamAV debug: PointerToRawData: 0xf000 0xf000
|
|
LibClamAV debug: Section's memory is writeable
|
|
LibClamAV debug: ------------------------------------
|
|
LibClamAV debug: EntryPoint offset: 0x7b9f (31647)
|
|
LibClamAV debug: Bytecode executing hook id 257 (0 hooks)
|
|
attachment.exe: OK
|
|
[...]
|
|
\end{verbatim}
|
|
No additional files get created by libclamav. By writing
|
|
a signature for the decompressed file you have more chances
|
|
that the engine will detect the target data when it gets
|
|
compressed with another packer.
|
|
|
|
This method should be applied to all files for which you want
|
|
to create signatures. By analyzing the debug information you
|
|
can quickly see how the engine recognizes and preprocesses
|
|
the data and what additional files get created. Signatures
|
|
created for bottom-level temporary files are usually more
|
|
generic and should help detecting the same malware in
|
|
different forms.
|
|
|
|
\section{Signature formats}
|
|
|
|
\subsection{Hash-based signatures}
|
|
The easiest way to create signatures for ClamAV is to use filehash checksums,
|
|
however this method can be only used against static malware.
|
|
\subsubsection{MD5 hash-based signatures}
|
|
To create a
|
|
MD5 signature for \verb+test.exe+ use the \verb+--md5+ option of sigtool:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
|
|
zolw@localhost:/tmp/test$ cat test.hdb
|
|
48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
|
|
\end{verbatim}
|
|
That's it! The signature is ready for use:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
|
|
test.exe: test.exe FOUND
|
|
|
|
----------- SCAN SUMMARY -----------
|
|
Known viruses: 1
|
|
Scanned directories: 0
|
|
Engine version: 0.92.1
|
|
Scanned files: 1
|
|
Infected files: 1
|
|
Data scanned: 0.02 MB
|
|
Time: 0.024 sec (0 m 0 s)
|
|
\end{verbatim}
|
|
You can change the name (by default sigtool uses the name of the file)
|
|
and place it inside a \verb+*.hdb+ file. A single database file can
|
|
include any number of signatures. To get them automatically loaded
|
|
each time clamscan/clamd starts just copy the database file(s) into
|
|
the local virus database directory (eg. /usr/local/share/clamav).
|
|
|
|
\emph{The hash-based signatures shall not be used for text files,
|
|
HTML and any other data that gets internally preprocessed before
|
|
pattern matching. If you really want to use a hash signature in
|
|
such a case, run clamscan with --debug and --leave-temps flags
|
|
as described above and create a signature for a preprocessed file
|
|
left in /tmp. Please keep in mind that a hash signature will stop
|
|
matching as soon as a single byte changes in the target file.}
|
|
|
|
\subsubsection{SHA1 and SHA256 hash-based signatures}
|
|
ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums.
|
|
The format is the same as for MD5 file checksum.
|
|
It can differentiate between them based on the length of the hash string
|
|
in the signature. For best backwards compatibility, these should be
|
|
placed inside a \verb+*.hsb+ file. The format is:
|
|
\begin{verbatim}
|
|
HashString:FileSize:MalwareName
|
|
\end{verbatim}
|
|
|
|
\subsubsection{PE section based hash signatures}
|
|
You can create a hash signature for a specific section in a PE file.
|
|
Such signatures shall be stored inside \verb+.mdb+ files in the
|
|
following format:
|
|
\begin{verbatim}
|
|
PESectionSize:PESectionHash:MalwareName
|
|
\end{verbatim}
|
|
The easiest way to generate MD5 based section signatures is to extract
|
|
target PE sections into separate files and then run sigtool with the
|
|
option \verb+--mdb+
|
|
|
|
ClamAV 0.98 has also added support for SHA1 and SHA256 section based
|
|
signatures. The format is the same as for MD5 PE section based signatures.
|
|
It can differentiate between them based on the length of the hash string
|
|
in the signature. For best backwards compatibility, these should be
|
|
placed inside a \verb+*.msb+ file.
|
|
|
|
\subsubsection{Hash signatures with unknown size}
|
|
ClamAV 0.98 has also added support for hash signatures where the size
|
|
is not known but the hash is. It is much more performance-efficient to
|
|
use signatures with specific sizes, so be cautious when using this
|
|
feature. For these cases, the '*' character can be used in the size
|
|
field. To ensure proper backwards compatibility with older versions of
|
|
ClamAV, these signatures must have a minimum functional level of 73 or
|
|
higher. Signatures that use the wildcard size without this level set
|
|
will be rejected as malformed.
|
|
\begin{verbatim}
|
|
Sample .hsb signature matching any size
|
|
HashString:*:MalwareName:73
|
|
|
|
Sample .msb signature matching any size
|
|
*:PESectionHash:MalwareName:73
|
|
\end{verbatim}
|
|
|
|
\subsection{Body-based signatures}
|
|
ClamAV stores all body-based signatures in a hexadecimal format. In this
|
|
section by a hex-signature we mean a fragment of malware's body converted
|
|
into a hexadecimal string which can be additionally extended using various
|
|
wildcards.
|
|
|
|
\subsubsection{Hexadecimal format}
|
|
You can use \verb+sigtool --hex-dump+ to convert any data into a hex-string:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ sigtool --hex-dump
|
|
How do I look in hex?
|
|
486f7720646f2049206c6f6f6b20696e206865783f0a
|
|
\end{verbatim}
|
|
|
|
\subsubsection{Wildcards}
|
|
ClamAV supports the following extensions for hex-signatures:
|
|
\begin{itemize}
|
|
\item \verb+??+\\
|
|
Match any byte.
|
|
\item \verb+a?+\\
|
|
Match a high nibble (the four high bits).\\ \textbf{IMPORTANT NOTE:}
|
|
The nibble matching is only available in libclamav with the
|
|
functionality level 17 and higher therefore please only use it with
|
|
.ndb signatures followed by ":17" (MinEngineFunctionalityLevel,
|
|
see \ref{ndb}).
|
|
\item \verb+?a+\\
|
|
Match a low nibble (the four low bits).
|
|
\item \verb+*+\\
|
|
Match any number of bytes.
|
|
\item \verb+{n}+\\
|
|
Match $n$ bytes.
|
|
\item \verb+{-n}+\\
|
|
Match $n$ or less bytes.
|
|
\item \verb+{n-}+\\
|
|
Match $n$ or more bytes.
|
|
\item \verb+{n-m}+\\
|
|
Match between $n$ and $m$ bytes ($m > n$).
|
|
\item \verb+(aa|bb|cc|..)+\\
|
|
Match aa or bb or cc..
|
|
\item \verb+!(aa|bb|cc|..)+\\
|
|
Match any byte except aa and bb and cc.. (ClamAV$\ge$0.96)
|
|
\item \verb+(aaaa|bbbb|cccc|..)+\\
|
|
Match alternative strings aaaa or bbbb or cccc. Alternative strings must have identical lengths.
|
|
\item \verb+!(aaaa|bbbb|cccc|..)+\\
|
|
Match any string except aaaa and bbbb and cccc. Alternative strings must have identical lengths.
|
|
(ClamAV$\ge$0.98.2)
|
|
\item \verb+HEXSIG[x-y]aa+ or \verb+aa[x-y]HEXSIG+\\
|
|
Match aa anchored to a hex-signature, see
|
|
\url{https://bugzilla.clamav.net/show_bug.cgi?id=776} for
|
|
discussion and examples.
|
|
\item \verb+(B)+\\
|
|
Match word boundary (including file boundaries).
|
|
\item \verb+(L)+\\
|
|
Match CR, CRLF or file boundaries.
|
|
\end{itemize}
|
|
The range signatures \verb+*+ and \verb+{}+ virtually separate
|
|
a hex-signature into two parts, eg. \verb+aabbcc*bbaacc+ is treated
|
|
as two sub-signatures \verb+aabbcc+ and \verb+bbaacc+ with any number
|
|
of bytes between them. It's a requirement that each sub-signature
|
|
includes a block of two static characters somewhere in its body.
|
|
|
|
\subsubsection{Basic signature format}
|
|
The simplest (and now deprecated) signature format is:
|
|
\begin{verbatim}
|
|
MalwareName=HexSignature
|
|
\end{verbatim}
|
|
ClamAV will scan the entire file looking for HexSignature. All
|
|
signatures of this type must be placed inside \verb+*.db+ files.
|
|
|
|
\subsubsection{Extended signature format}\label{ndb}
|
|
The extended signature format allows for specification of additional
|
|
information such as a target file type, virus offset or engine version,
|
|
making the detection more reliable. The format is:
|
|
\begin{verbatim}
|
|
MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]
|
|
\end{verbatim}
|
|
where \verb+TargetType+ is one of the following numbers specifying
|
|
the type of the target file:
|
|
\begin{itemize}
|
|
\item 0 = any file
|
|
\item 1 = Portable Executable, both 32- and 64-bit.
|
|
\item 2 = file inside OLE2 container (e.g. image, embedded executable,
|
|
VBA script). The OLE2 format is primarily used by MS Office and MSI
|
|
installation files.
|
|
\item 3 = HTML (normalized: whitespace transformed to spaces, tags/tag
|
|
attributes normalized, all lowercase), Javascript is normalized too:
|
|
all strings are normalized (hex encoding is decoded), numbers are
|
|
parsed and normalized, local variables/function names are normalized
|
|
to 'n001' format, argument to eval() is parsed as JS again,
|
|
unescape() is handled, some simple JS packers are handled,
|
|
output is whitespace normalized.
|
|
\item 4 = Mail file
|
|
\item 5 = Graphics
|
|
\item 6 = ELF
|
|
\item 7 = ASCII text file (normalized)
|
|
\item 8 = Unused
|
|
\item 9 = Mach-O files
|
|
\item 10 = PDF files
|
|
\item 11 = Flash files
|
|
\item 12 = Java class files
|
|
\end{itemize}
|
|
And \verb+Offset+ is an asterisk or a decimal number \verb+n+ possibly
|
|
combined with a special modifier:
|
|
\begin{itemize}
|
|
\item \verb+*+ = any
|
|
\item \verb+n+ = absolute offset
|
|
\item \verb+EOF-n+ = end of file minus \verb+n+ bytes
|
|
\end{itemize}
|
|
Signatures for PE, ELF and Mach-O files additionally support:
|
|
\begin{itemize}
|
|
\item \verb#EP+n# = entry point plus n bytes (\verb#EP+0# for \verb+EP+)
|
|
\item \verb#EP-n# = entry point minus n bytes
|
|
\item \verb#Sx+n# = start of section \verb+x+'s (counted from 0)
|
|
data plus \verb+n+ bytes
|
|
\item \verb#SEx# = entire section \verb+x+ (offset must lie within section
|
|
boundaries)
|
|
\item \verb#SL+n# = start of last section plus \verb+n+ bytes
|
|
\end{itemize}
|
|
All the above offsets except \verb+*+ can be turned into
|
|
\textbf{floating offsets} and represented as \verb+Offset,MaxShift+ where
|
|
\verb+MaxShift+ is an unsigned integer. A floating offset will match every
|
|
offset between \verb+Offset+ and \verb#Offset+MaxShift#, eg. \verb+10,5+
|
|
will match all offsets from 10 to 15 and \verb#EP+n,y# will match all
|
|
offsets from \verb#EP+n# to \verb#EP+n+y#. Versions of ClamAV older than
|
|
0.91 will silently ignore the \verb+MaxShift+ extension and only use
|
|
\verb+Offset+.\\
|
|
|
|
\noindent
|
|
Optional \verb+MinFL+ and \verb+MaxFL+ parameters can restrict the signature
|
|
to specific engine releases. All signatures in the extended format must be
|
|
placed inside \verb+*.ndb+ files.
|
|
|
|
\subsubsection{Logical signatures}\label{ndb}
|
|
Logical signatures allow combining of multiple signatures in extended
|
|
format using logical operators. They can provide both more detailed and
|
|
flexible pattern matching. The logical sigs are stored inside \verb+*.ldb+
|
|
files in the following format:
|
|
\begin{verbatim}
|
|
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
|
|
Subsig1;Subsig2;...
|
|
\end{verbatim}
|
|
where:
|
|
\begin{itemize}
|
|
\item \verb+TargetDescriptionBlock+ provides information about the
|
|
engine and target file with comma separated \verb+Arg:Val+ pairs,
|
|
currently (as of 0.95.1) only \verb+Target:X+ and \verb+Engine:X-Y+
|
|
are supported.
|
|
\item \verb+LogicalExpression+ specifies the logical expression
|
|
describing the relationship between \verb+Subsig0...SubsigN+.\\
|
|
\textbf{Basis clause:} 0,1,...,N decimal indexes are SUB-EXPRESSIONS
|
|
representing \verb+Subsig0, Subsig1,...,SubsigN+ respectively.\\
|
|
\textbf{Inductive clause:} if \verb+A+ and \verb+B+ are
|
|
SUB-EXPRESSIONS and \verb+X, Y+ are decimal numbers then
|
|
\verb+(A&B)+, \verb+(A|B)+, \verb+A=X+, \verb+A=X,Y+, \verb+A>X+,
|
|
\verb+A>X,Y+, \verb+A<X+ and \verb+A<X,Y+ are SUB-EXPRESSIONS
|
|
\item \verb+SubsigN+ is n-th subsignature in extended format possibly
|
|
preceded with an offset. There can be specified up to 64 subsigs.
|
|
\end{itemize}
|
|
Keywords used in \verb+TargetDescriptionBlock+:
|
|
\begin{itemize}
|
|
\item \verb+Target:X+: Target file type
|
|
\item \verb+Engine:X-Y+: Required engine functionality (range; 0.96)
|
|
\item \verb+FileSize:X-Y+: Required file size (range in bytes; 0.96)
|
|
\item \verb+EntryPoint+: Entry point offset (range in bytes; 0.96)
|
|
\item \verb+NumberOfSections+: Required number of sections in executable (range; 0.96)
|
|
\item \verb+Container:CL_TYPE_*+: File type of the container which stores the scanned file
|
|
\item \verb+IconGroup1+: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
|
|
\item \verb+IconGroup2+: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
|
|
\end{itemize}
|
|
Modifiers for subexpressions:
|
|
\begin{itemize}
|
|
\item \verb+A=X+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched exactly X times; if it refers to
|
|
a (logical) block of signatures then this block must generate exactly
|
|
X matches (with any of its sigs).
|
|
\item \verb+A=0+ specifies negation (signature or block of signatures
|
|
cannot be matched)
|
|
\item \verb+A=X,Y+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must be matched exactly X times; if it refers to
|
|
a (logical) block of signatures then this block must generate X matches
|
|
and at least Y different signatures must get matched.
|
|
\item \verb+A>X+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched more than X times; if it refers to
|
|
a (logical) block of signatures then this block must generate more
|
|
than X matches (with any of its sigs).
|
|
\item \verb+A>X,Y+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched more than X times; if it refers to
|
|
a (logical) block of signatures then this block must generate more than
|
|
X matches and at least Y different signatures must be matched.
|
|
\item \verb+A<X+ and \verb+A<X,Y+ as above with the change of "more"
|
|
to "less".
|
|
\end{itemize}
|
|
Examples:
|
|
\begin{verbatim}
|
|
Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
|
|
6616e;deadbeef
|
|
|
|
Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
|
|
46566616e
|
|
|
|
Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
|
|
46566616e;deadbeef
|
|
|
|
Sig4;Target:1,Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100
|
|
f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
|
|
(63|64)61706528;S+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58d
|
|
cf43987e4f519d629b103375;SL+550:6300680065005c0046006900
|
|
\end{verbatim}
|
|
ClamAV 0.96 introduced support for special macro subsignatures in
|
|
the following format: \verb+${min-max}MACROID$+, where \verb+MACROID+
|
|
points to a group of signatures and \verb+{min-max}+ specifies the
|
|
offset range at which one of the group signatures should match.
|
|
The range is calculated against the match offset of the previous
|
|
subsignature. The macro subsignature makes its preceding subsignature
|
|
considered a match only if both of them get matched. For more
|
|
information and examples please see
|
|
\url{https://bugzilla.clamav.net/show_bug.cgi?id=164}.
|
|
|
|
\subsection{Icon signatures for PE files}
|
|
ClamAV 0.96 includes an approximate/fuzzy icon matcher to help
|
|
detecting malicious executables disguising themselves as innocent
|
|
looking image files, office documents and the like.
|
|
|
|
Icon matching is only triggered via .ldb signatures using the special
|
|
attribute tokens \verb+IconGroup1+ or \verb+IconGroup2+. These identify
|
|
two (optional) groups of icons defined in a .idb database file. The
|
|
format of the .idb file is:
|
|
\begin{verbatim}
|
|
ICONNAME:GROUP1:GROUP2:ICON_HASH
|
|
\end{verbatim}
|
|
where:
|
|
\begin{itemize}
|
|
\item \verb+ICON_NAME+ is a unique string identifier for a specific
|
|
icon,
|
|
\item \verb+GROUP1+ is a string identifier for the first group of
|
|
icons (\verb+IconGroup1+)
|
|
\item \verb+GROUP2+ is a string identifier for the second group of
|
|
icons (\verb+IconGroup2+),
|
|
\item \verb+ICON_HASH+ is a fuzzy hash of the icon image
|
|
\end{itemize}
|
|
The \verb+ICON_HASH+ field can be obtained from the debug output of
|
|
libclamav. For example:
|
|
\begin{verbatim}
|
|
LibClamAV debug: ICO SIGNATURE:
|
|
ICON_NAME:GROUP1:GROUP2:18e2e0304ce60a0cc3a09053a30000414100057e
|
|
000afe0000e 80006e510078b0a08910d11ad04105e0811510f084e01040c080
|
|
a1d0b0021000a39002a41
|
|
\end{verbatim}
|
|
|
|
\subsection{Signatures for Version Information metadata in PE files}
|
|
Starting with ClamAV 0.96 it is possible to easily match certain
|
|
information built into PE files (executables and dynamic link libraries).
|
|
Whenever you lookup the properties of a PE executable file in windows,
|
|
you are presented with a bunch of details about the file itself.
|
|
|
|
These info are stored in a special area of the file resources which goes
|
|
under the name of \verb+VS_VERSION_INFORMATION+ (or versioninfo for short).
|
|
It is divided into 2 parts. The first part (which is rather uninteresting)
|
|
is really a bunch of numbers and flags indicating the product and file
|
|
version. It was originally intended for use with installers which, after
|
|
parsing it, should be able to determine whether a certain executable or
|
|
library are to be upgraded/overwritten or are already up to date. Suffice
|
|
to say, this approach never really worked and is generally never used.
|
|
|
|
The second block is much more interesting: it is a simple list of key/value
|
|
strings, intended for user information and completely ignored by the OS.
|
|
For example, if you look at ping.exe you can see the company being \emph{"Microsoft
|
|
Corporation"}, the description \emph{"TCP/IP Ping command"}, the internal name
|
|
\emph{"ping.exe"} and so on... Depending on the OS version, some keys may be given
|
|
peculiar visibility in the file properties dialog, however they are internally
|
|
all the same.
|
|
|
|
To match a versioninfo key/value pair, the special file offset anchor \verb+VI+ was
|
|
introduced. This is similar to the other anchors (like \verb+EP+ and \verb+SL+)
|
|
except that, instead of matching the hex pattern against a single offset, it checks
|
|
it against each and every key/value pair in the file. The \verb+VI+ token doesn't
|
|
need nor accept a \verb#+/-# offset like e.g. \verb#EP+1#. As for the hex signature
|
|
itself, it's just the utf16 dump of the key and value. Only the \verb+??+ and
|
|
\verb+(aa|bb)+ wildcards are allowed in the signature. Usually, you don't need to
|
|
bother figuring it out: each key/value pair together with the corresponding VI-based
|
|
signature is printed by \verb+clamscan+ when the \verb+--debug+ option is given.
|
|
|
|
For example \verb+clamscan --debug freecell.exe+ produces:
|
|
\begin{verbatim}
|
|
[...]
|
|
Recognized MS-EXE/DLL file
|
|
in cli_peheader
|
|
versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
|
|
cli_peheader: parsing version info @ rva 9608 (1/1)
|
|
VersionInfo (d2de): 'CompanyName'='Microsoft Corporation' -
|
|
VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
|
|
630072006f0073006f0066007400200043006f00720070006f0072006100740
|
|
069006f006e000000
|
|
VersionInfo (d32a): 'FileDescription'='Entertainment Pack
|
|
FreeCell Game' - VI:460069006c006500440065007300630072006900700
|
|
0740069006f006e000000000045006e007400650072007400610069006e006d
|
|
0065006e00740020005000610063006b0020004600720065006500430065006
|
|
c006c002000470061006d0065000000
|
|
VersionInfo (d396): 'FileVersion'='5.1.2600.0 (xpclient.010817
|
|
-1148)' - VI:460069006c006500560065007200730069006f006e00000000
|
|
0035002e0031002e0032003600300030002e003000200028007800700063006
|
|
c00690065006e0074002e003000310030003800310037002d00310031003400
|
|
380029000000
|
|
VersionInfo (d3fa): 'InternalName'='freecell' - VI:49006e007400
|
|
650072006e0061006c004e0061006d006500000066007200650065006300650
|
|
06c006c000000
|
|
VersionInfo (d4ba): 'OriginalFilename'='freecell' - VI:4f007200
|
|
6900670069006e0061006c00460069006c0065006e0061006d0065000000660
|
|
0720065006500630065006c006c000000
|
|
VersionInfo (d4f6): 'ProductName'='Sistema operativo Microsoft
|
|
Windows' - VI:500072006f0064007500630074004e0061006d00650000000
|
|
000530069007300740065006d00610020006f00700065007200610074006900
|
|
76006f0020004d006900630072006f0073006f0066007400ae0020005700690
|
|
06e0064006f0077007300ae000000
|
|
VersionInfo (d562): 'ProductVersion'='5.1.2600.0' - VI:50007200
|
|
6f006400750063007400560065007200730069006f006e00000035002e00310
|
|
02e0032003600300030002e0030000000
|
|
[...]
|
|
\end{verbatim}
|
|
Although VI-based signatures are intended for use in logical signatures you can test them
|
|
using ordinary \verb+.ndb+ files. For example:
|
|
\begin{verbatim}
|
|
my_test_vi_sig:1:VI:paste_your_hex_sig_here
|
|
\end{verbatim}
|
|
Final note. If you want to decode a VI-based signature into a human readable form you can use:
|
|
\begin{verbatim}
|
|
echo hex_string | xxd -r -p | strings -el
|
|
\end{verbatim}
|
|
For example:
|
|
\begin{verbatim}
|
|
$ echo 460069006c0065004400650073006300720069007000740069006f006e
|
|
000000000045006e007400650072007400610069006e006d0065006e007400200
|
|
05000610063006b0020004600720065006500430065006c006c00200047006100
|
|
6d0065000000 | xxd -r -p | strings -el
|
|
FileDescription
|
|
Entertainment Pack FreeCell Game
|
|
\end{verbatim}
|
|
|
|
\subsection{Trusted and Revoked Certificates}
|
|
Clamav 0.98 checks signed PE files for certificates and verifies each
|
|
certificate in the chain against a database of trusted and revoked
|
|
certificates. The sinagure format is
|
|
\begin{verbatim}
|
|
Name;Trusted;Subject;Serial;Pubkey;Exponent;CodeSign;TimeSign;CertSign;
|
|
NotBefore;Comment[;minFL[;maxFL]]
|
|
\end{verbatim}
|
|
where the corresponding fields are:
|
|
\begin{itemize}
|
|
\item \verb+Name:+ name of the entry
|
|
\item \verb+Trusted:+ bit field, specifying whether the cert is
|
|
trusted. 1 for trusted. 0 for revoked
|
|
\item \verb+Subject:+ sha1 of the Subject field in hex
|
|
\item \verb+Serial:+ the serial number as clamscan --debug --verbose
|
|
reports
|
|
\item \verb+Pubkey:+ the public key in hex
|
|
\item \verb+Exponent:+ the exponent in hex. Currently ignored and
|
|
hardcoded to 010001 (in hex)
|
|
\item \verb+CodeSign:+ bit field, specifying whether this cert
|
|
can sign code. 1 for true, 0 for false
|
|
\item \verb+TimeSign:+ bit field. 1 for true, 0 for false
|
|
\item \verb+CertSign:+ bit field, specifying whether this cert
|
|
can sign other certs. 1 for true, 0 for false
|
|
\item \verb+NotBefore:+ integer, cert should not be added before
|
|
this variable. Defaults to 0 if left empty
|
|
\item \verb+Comment:+ comments for this entry
|
|
\end{itemize}
|
|
The signatures for certs are stored inside \verb+.crb+ files.
|
|
|
|
\subsection{Signatures based on container metadata}
|
|
ClamAV 0.96 allows creating generic signatures matching files stored
|
|
inside different container types which meet specific conditions.
|
|
The signature format is
|
|
\begin{verbatim}
|
|
VirusName:ContainerType:ContainerSize:FileNameREGEX:
|
|
FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
|
|
Res1:Res2[:MinFL[:MaxFL]]
|
|
\end{verbatim}
|
|
where the corresponding fields are:
|
|
\begin{itemize}
|
|
\item \verb+VirusName:+ Virus name to be displayed when signature matches
|
|
\item \verb+ContainerType:+ one of \verb+CL_TYPE_ZIP+, \verb+CL_TYPE_RAR+,
|
|
\verb+CL_TYPE_ARJ+,\\
|
|
\verb+CL_TYPE_CAB+, \verb+CL_TYPE_7Z+, \verb+CL_TYPE_MAIL+, \verb+CL_TYPE_(POSIX|OLD)_TAR+,\\
|
|
\verb+CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)+ or \verb+*+ to match
|
|
any of the container types listed here
|
|
\item \verb+ContainerSize:+ size of the container file itself (eg. size of
|
|
the zip archive) specified in bytes as absolute value or range \verb+x-y+
|
|
\item \verb+FileNameREGEX:+ regular expression describing name of the target file
|
|
\item \verb+FileSizeInContainer:+ usually compressed size; for MAIL, TAR and CPIO ==
|
|
\verb+FileSizeReal+; specified in bytes as absolute value or range
|
|
\item \verb+FileSizeReal:+ usually uncompressed size; for MAIL, TAR and CPIO ==
|
|
\verb+FileSizeInContainer+; absolute value or range
|
|
\item \verb+IsEncrypted+: 1 if the target file is encrypted, 0 if it's not and
|
|
\verb+*+ to ignore
|
|
\item \verb+FilePos+: file position in container (counting from 1); absolute value
|
|
or range
|
|
\item \verb+Res1+: when \verb+ContainerType+ is \verb+CL_TYPE_ZIP+ or
|
|
\verb+CL_TYPE_RAR+ this field is treated as a CRC sum of the target file
|
|
specified in hexadecimal format; for other container types it's ignored
|
|
\item \verb+Res2+: not used as of ClamAV 0.96
|
|
\end{itemize}
|
|
The signatures for container files are stored inside \verb+.cdb+ files.
|
|
|
|
\subsection{Signatures based on ZIP/RAR metadata (obsolete)}
|
|
The (now obsolete) archive metadata signatures can be only applied
|
|
to ZIP and RAR files and have the following format:
|
|
\begin{verbatim}
|
|
virname:encrypted:filename:normal size:csize:crc32:cmethod:
|
|
fileno:max depth
|
|
\end{verbatim}
|
|
where the corresponding fields are:
|
|
\begin{itemize}
|
|
\item Virus name
|
|
\item Encryption flag (1 -- encrypted, 0 -- not encrypted)
|
|
\item File name (this is a regular expression - * to ignore)
|
|
\item Normal (uncompressed) size (* to ignore)
|
|
\item Compressed size (* to ignore)
|
|
\item CRC32 (* to ignore)
|
|
\item Compression method (* to ignore)
|
|
\item File position in archive (* to ignore)
|
|
\item Maximum number of nested archives (* to ignore)
|
|
\end{itemize}
|
|
The database file should have the extension of \verb+.zmd+ or
|
|
\verb+.rmd+ for zip or rar metadata respectively.
|
|
|
|
\subsection{Whitelist databases}
|
|
To whitelist a specific file use the MD5 signature format and place
|
|
it inside a database file with the extension of \verb+.fp+.
|
|
To whitelist a specific file with the SHA1 or SHA256 file hash signature
|
|
format, place the signature inside a database file with the extension
|
|
of \verb+.sfp+.\\
|
|
|
|
\noindent
|
|
To whitelist a specific signature from the database you just add
|
|
its name into a local file called local.ign2 stored inside the
|
|
database directory. You can additionally follow the signature name
|
|
with the MD5 of the entire database entry for this signature, eg:
|
|
\begin{verbatim}
|
|
Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
|
|
\end{verbatim}
|
|
In such a case, the signature will no longer be whitelisted when
|
|
its entry in the database gets modified (eg. the signature gets
|
|
updated to avoid false alerts).
|
|
|
|
\subsection{Signature names}
|
|
ClamAV uses the following prefixes for signature names:
|
|
\begin{itemize}
|
|
\item \emph{Worm} for Internet worms
|
|
\item \emph{Trojan} for backdoor programs
|
|
\item \emph{Adware} for adware
|
|
\item \emph{Flooder} for flooders
|
|
\item \emph{HTML} for HTML files
|
|
\item \emph{Email} for email messages
|
|
\item \emph{IRC} for IRC trojans
|
|
\item \emph{JS} for Java Script malware
|
|
\item \emph{PHP} for PHP malware
|
|
\item \emph{ASP} for ASP malware
|
|
\item \emph{VBS} for VBS malware
|
|
\item \emph{BAT} for BAT malware
|
|
\item \emph{W97M}, \emph{W2000M} for Word macro viruses
|
|
\item \emph{X97M}, \emph{X2000M} for Excel macro viruses
|
|
\item \emph{O97M}, \emph{O2000M} for generic Office macro viruses
|
|
\item \emph{DoS} for Denial of Service attack software
|
|
\item \emph{DOS} for old DOS malware
|
|
\item \emph{Exploit} for popular exploits
|
|
\item \emph{VirTool} for virus construction kits
|
|
\item \emph{Dialer} for dialers
|
|
\item \emph{Joke} for hoaxes
|
|
\end{itemize}
|
|
Important rules of the naming convention:
|
|
\begin{itemize}
|
|
\item always use a -zippwd suffix in the malware name for signatures
|
|
of type zmd,
|
|
\item always use a -rarpwd suffix in the malware name for signatures
|
|
of type rmd,
|
|
\item only use alphanumeric characters, dash (-), dot (.), underscores
|
|
(\_) in malware names, never use space, apostrophe or quote mark.
|
|
\end{itemize}
|
|
|
|
\section{Special files}
|
|
|
|
\subsection{HTML}
|
|
ClamAV contains a special HTML normalisation code which helps to detect
|
|
HTML exploits. Running \verb+sigtool --html-normalise+ on a HTML file
|
|
should generate the following files:
|
|
\begin{itemize}
|
|
\item nocomment.html - the file is normalized, lower-case, with all
|
|
comments and superflous white space removed
|
|
\item notags.html - as above but with all HTML tags removed
|
|
\end{itemize}
|
|
The code automatically decodes JScript.encode parts and char ref's (e.g.
|
|
\verb+f+). You need to create a signature against one of the created
|
|
files. To eliminate potential false positive alerts the target type should
|
|
be set to 3.
|
|
|
|
\subsection{Text files}
|
|
Similarly to HTML all ASCII text files get normalized (converted
|
|
to lower-case, all superflous white space and control characters removed,
|
|
etc.) before scanning. Use \verb+clamscan --leave-temps+ to obtain
|
|
a normalized file then create a signature with the target type 7.
|
|
|
|
\subsection{Compressed Portable Executable files}
|
|
If the file is compressed with UPX, FSG, Petite or other PE packer
|
|
supported by libclamav, run \verb+clamscan+ with
|
|
\verb+--debug --leave-temps+. Example output for a FSG compressed file:
|
|
\begin{verbatim}
|
|
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression
|
|
LibClamAV debug: FSG: found old EP @119e0
|
|
LibClamAV debug: FSG: Unpacked and rebuilt executable saved in
|
|
/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c
|
|
\end{verbatim}
|
|
Next create a type 1 signature for \verb+/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c+
|
|
|
|
\end{document}
|