Hello, that's me again, if you didn't get bored of me before. Originally this part was supposed to be written by another famouse coder, but he did not do that due to beeing short of time. Unfortunately, I'm much much shorter of them than him. But never mind: we are going to aim at heuristical principles in a short articles, but be sure to read also main articles about antiviruses.
The main reason of heurisics is, as I already mentioned,
to detect unkown viruses as many viruses appears every month and it becomes
difficult to keep track of them. First it was introduces by F-Prot (well,
some kind of, a bit hard to say if we can call it heuristic) and first
real implemantation in well-known TBAV.
At first, we should define what heuristics exactly is, I try it by my own: heuristic scanner is a program (anti-virus, more exaclty) that is able to detect viruses by analyzis of their code - what they do. But to decide if the given code is a virus or not isn't easy even if it can look like it is - it is difficult to made it reliable. If you have a look on viruses, the code they use, if you have a look on many many viruses, like avers did, you can easily tell the things that are common for all the viruses. This are the beginings of heuristics - F-Prot used something that was called in av-community "heuristics scan-strings". A short scan strings, searched in whole body, of these typical constructions: like write command which is typicaly mov ah, 40h; mov cx, 1234h (size) ; int 21h. This is of course only illustration, this can be done in many ways, but not that much to have most of them factorized (using wild-card scan-stings). If several of these scan-strings are found, you can say there is probably a virus. But many regular programs written in assembler looks this way and not to have false possitives it is required to hit many of these strings to report a possible virus infection.
Some avers trusted to f-prot's reports of possible viruses, but presented form wasn't quite reliable and moreover, it was not able to detect more comlpicated pieces as it was set-up-ed to low sensitivity.
TBAV ruled the world
for a short time at least. Franz Heldman presented a brand new technology called heuristics in excelent look. (but only for a first look). For a first look all stared in amazement: avers because they even never think about such a things (many of them are only doing their work without real invetions), and vx-ers because it was able to detect even viruses they are going to write. But reallity was a bit different: avers for a long time didn't count TBAV's heuristics into scanning methods at all, they reported heuristics as not reliable (mostly because they weren't able to replicate this technology even in simple look as TBAV has). Virus writers started to find a ways how to fool TBAV (as soon as they stopped affraid of it).
Let's see how TBScan works: it uses passive heuristics (structured dissassembly) to analyze instructions. Main aim was as before - to detect usual code-sequences found in viruses. Tbscan marked them with letters by each file, and there were so many flags during years of development of tbav that covers whole alphabet plus some other characters. Starting from entry-point Tbscan checks instruction by instruction judging them and marking known code-sequences. But thats not enought, for sure. Also jumps are followed and on conditional jumps both paths are disassembled. However, as dissassembly is done in single-pass, a simple tricks that breakes intructions, etc can make tbscan to loose its track of code. Also it is easy to fool tbscan by doing the things in non-usual way or indirectly. As it disassembles the code, even simple mov ah, 3f; inc ah were enought to do so.
Tbscan also has many false possitives due to its not-fully relieable technology - when tbscan lost track of codeflow (that happens quite often) it detects many flags on garbage code it finds. There were a quite long database of files that are known false postitives - some kind of anti-scan-strings, if found, heuristics is not performed on such a file.
TBAV's main weak point is it is so clear for everyone - even for virus writers they may easily guess how it works - and how to avoid to be caught. As soon as TBAV becomes popular, neartly everyone started to exclame their features they are tbav-proof. All is needed, during programmig, periodicaly run tbscan to see when it displays its flags. Well, main keypoints to keep tbav far from you is to use good encryption (that can't be passed by tbscan's decryptor), or to do things not as clearly as it is usual. Tbscan detects only usual schemes, so simple tricks like and-s, add/sub on comparing will work. However, tbscan is out of game today. There were also some plagiats, a german 'Suspiciouse' (as I remember), but all they went as unsucessful as tbav.
To fix these disadvantages it is possible to partialy find out the values of registers by semi-emulating of piece of code before key instruction (e.g. int 21). Only registers are emulated and memory access only for reading (not to damage memory). This is used for example by active heuristic scanners to analyze code they can't reach (we can call it local semi-emulation). In this stage doing mov/inc will not help, but doing rot-s or and-s instead of comparing will sure fool this alorythms.
Improoving heuristics - emulation
There were a lot of big words how to do heuristics in real way, to do the things as they really are in file, but not runned. Someone may guess a single-stepping might be used, but in reality it weren't ever used for it. It is equivalent to running each file, but checking what you are executing. But your automatic debugging (its somesthing like it) can't be used due to many protective envelopes that are designed to crash debugers. In other words single stepping was never used for active heuristics as it can crash several times on a hard disc files per scan. I remember, for example, dedicated scanner for EMM1:Level_3 that uses single-stepping. It hangs several times in my utilities directory, runs many files (even pkzip), etc.
In fact, only emulation can be used for active heuristics - that is to check for is file exactly doing, and to decide if it is viral code or not. In this point of view, there are two primary objectives. First one is a bit like before - to find out suspective code constructions, but it is less important now. The more important is to monitor activities that are really done. Let's imagine what virus usualy do - it tests something, becomes resident (if it is a resident virus), and infects files on some certain activity in system. Well, and for example becoming resident can be easily caught by active heuristics even if it is done in unreadable way - because it detects direct modifications (let's talk about dos now) of 0: or MCBs. But what really defines a virus is a infection of other files - if emulated program searches for executables, modifies them (in order to replicate) it is virus nearl for sure. If virus only installs itself into memory, a simple tests are run in virual machine - a file is runned (and checked for infection), or opened for r/w, or opened on removable drive (likely copied to floppy). This usualy notices any virus. Now you can surely guess some tips how to fool them. But we have to continue: Because active heuristics is not 100% stable, there is usualy still engine for searching of typical constructions with local semi-emulation (mentioned before). Capabilities of virus can be detected also this way - even if they may not appear from the first view (or emulation ;-)
Now have a look at limits of emulators - this is primary subject to be undetectable by active heuristics: there is virtual machine. Its main advantage and disatvantage at the same time. Of course, it is not V86 virtual machine you probably think, but emulated computer:
Currently leading heuristic scanners are NOD/iCE32 and Dr.Web - both of
them are using mentioned technologies (with also mentioned limitations), but
only for dos executables (how lucky). At the present time, none of them has
32bit emulator (I mean not written in 32bit, but fully emulating 32bit) and thats why
they can't perform active heuristic for Windows executables (PE/LX), and
viruses for windows are not that much affected by their heuristics power.
For 32bit Win executables they are using only passive part - i.e. disassembly
and searching for suspicious code constructions (typical viral sequences).
These two antiviruses has much less false possitives, as they need exact actions to judge the file as infected. (but they uses anti-scan-strings as well, because there are always false posstives). But the most amazing thing uppon them is quite detailed description they report for infected file (especialy by NOD) - they can find out if virus infect boot as well as com/exe files, if it infects sys files, if it is resident, stealth, polymorphical, etc. You can nicely see it on NOD/iCE in which the scan-strings can be turned of to use only a heurisics. The hit-rates running heuristics-only are quite impressive.
Other heurisics scanners
There are of course some other heuristics scanners in the world, but less important. AVP has a kind of active heurisics too, as it is part of generic decryption engine AVP has. But as AVP is the highest-standard antivirus in the world and it has really lots of scanstrings, its heuristics can be set-up-ed for lower sensitivity which also brings less false possitives. Heuristics is also less visible, because it reports unknown virus really rarely.
Simmilar situation is for Dr.Solomon's Toolkit (not Solomon's any more, of course). In our tests we modified toolkit's viral databse not to have any scanstrings to test heuristics only. Result was as expected: less than 70% (slightly vary). You really don't need to affair of this heuristical engine, if you can beat those mentioned above. Solomon added only some very easy one (like AVP) to have some less hitrate also on unknown viruses. But Solomon's policy in scan-strings was to add anything, no matter if it is a virus - so they have a biggest hitrates without any thinking of it (this is why I don't like it).
The worst heuristical scanner I know is AVG, time ago it has same weak point as Tbscan has, even more - it shows emulation process (with optional step-by-step confirmation) and you can see code and registers - and easily test the bugs in it :) It was showen only to impress audience, because it was really buggy and useless. It was several times improoved, but without reasonable result - first versions were extremly slow and buggy. Afterwards, they used new scanning/heuristical core (developed by someone else who joined their team) which is a bit faster and better, but still pretty weak.
To finish a overview of others I have to mention NAI as well. But it is rather easy to accomplish, because NAI has no own technology (or really very little - only some programmers that downgrade buyed technology by putting it together with others). NAI buys anything that can be buyed, currently as far as I know they are using engine of dr.solomon with roughly same capabilities as dr.solomon. May be they'll try to buy another heuristical scanner... Who knows...
Now we are in second, more-less important chapter of this article. Heuristical cleaning was firstly presented by TBAV, program named Tbclean. But at first we have to explain what heuristical cleaning is: a cleaning of virus from file without knowing virus exactly, just by tracing it or more complex automated analyzis. But heuristical cleaning is less important than scanning, because it is much more reliable and also much less used. Moreover, the hitrate is analyzed in test-tables, not these high-tech features.
Tbclean performs it in most easy way. As TBAV was lack of emulator engine, Tbclean uses single-stepping to trace program. You can surely guess it will not work in many cases. Of course - it crashes (whole computer) on protective envelopes, and sometimes also on usual programs. But it sometimes works. Principle was simple tracing virus, because virus when it does usual things, reconstructs host body and passes control there. Idea is to allow reconstruction (but disallow instalation, if possible), and on jump to host body make a snapshot of reconstructed file. Passing control back to host was detected by jumping (or ret or whatever) to offset 100h (for com's), or far-jump (retf respectively) for exe files. Nearly every virus ends this way. All is needed is to write image back to disk and work is done (it is verry simmilar to exe-unpackers with tracing).
To prevent instalation, tbscan for example returns for GetDosVersion call version 2, that most of viruses refuses. Simple and effective. But there were (are) also many other tricks. Some of them you may guess if you want to write simmilar cleaner: prevent of some instructions (like cli, i/o ports, hard stack modifications, filter interrupts (they are redirected not to be accidentaly infected - int21 for example usualy returns error (carry set)). Big problem it to find out where to cut-down the file. File can be easily reconstructed, but cleaner don't know where to cut it. There are several possibilities:
Tbclean was first and really simple and buggy. It crashed very often, in many cases reconstructed file was corrupted, and moreover - as it becomes pretty famouse there were tricks like in virus Varicella, that was really executed whey it was cleaned by tbclean (this virus mades Franz Heldman really angry ;) But in these days it is forgotten as well as whole TBAV.
Emulation makes it reliable
Yes, thats right. With a new generation of heuristical scanners (Dr.Web and NOD/iCE), there are better possibilities to clean file - and not to crash. Principle remains the same - emulate virus as much as possible, to find out as many information as possible. This is some deep heavy woodoo magic of avers - thats why only really few of them can do this. Of course, much better is exact disinfection which is much more valueable (like AVP has), but to impress us (vx-ers) and other avers, this magic is here ;-)
At the present time, however, it is functional enought only in NOD/iCE (pretty impressive, but due to mentioned limitation of their emulator it doesn't work for 32bit files (win)). Heuristics cleaning is also used in AVG, but I would say the same as to their heuristical scanner: it was really poor time ago, and after upgrading core it is still not enought. Finally, there is Dr.solomon (which is also used as core engine in NAI's now) but it doesn't perform generic cleaning at all (I mean cleaning unknown viruses). As far as I have informations, it is only used to disinfect virus that are possitively known as cleanable. Thats why you will not notice a heuristical cleaning at all. Reffering to things above, I will focus primarily on NOD/iCE as it uses imho best technologies of all mentioned.
Generic idea is same - to emulate virus to allow him reconstruct host file,
and use retrieved infromation to repair host. All is done in emulated virtual
PC, with emulated disks. The simpliest way of disintection is to save
reconstructed file and to cut out virus. As there are several types of
appending virus to file, it might slightly differ on technology how to
cut it out. Clever technology, if virus works fine in emulator, is to virtually
infect several virtual files (goats) to find out size of virus and where
the virus is stored (by diffing with original file). This way they may guess
nearly exact the infection methods of virus. For combined boot/exe/com viruses,
as they are usualy installing itself to mbr, a virtual reboot is done to
activate them in virtual pc after they installs into virtual mbr. (oops, so
virtually ;) After guessing the size and knowing where virus stores itself
in file, a real infected file might be run (virtually, again) to repair
host, find out entry-point, and cut it using informations retrieved before
(if they are not available, some alchemy is done or virus body is left
This way it looks simply, but isn't. I guess heuristics scanning and heuristics cleaning is top high-tech technology avers are using now (also reffer to my overview of antivirus methods). Even if the principles and ideas are simple there is lots of things to be done to make virus work in virtual pc, so the complications you might prepare for heuristics (cleaning especialy) might be awarded by your success. Good luck!