Tie your seatbelts, prepare for long adventure through virus history. I will list basic principles of war between viruses and antiviruses to show you how the story was going on. Most probably I will not be able to keep it in chronological order but I try to use logical order, to show main technologies and counteractions on both sides.
The story begins long long time ago (sounds like a fairtale, isn't it?) when first viruses were written. Doesn't matter which one exactly it was, the more important is that some of them appears on user's computers. At that time this war begins and it is continuing and growing up to now.
No matter how big invetion were first self-replicating algorhitms, viruses are not first programs that were able to do so. It started with worms and other hardly-clasifieable pieces of code a time before virii. But viruses make change and normal people having computers becomes infected.
The very first viruses all follows one of two basic schemes. File and boot viruses and some of them survives up to now. Old boot viruses are quite simple: they are spread in boot sector of floppy discs, and on booting from such a floppy it copies itself into partition table and becomes resident (useful if there is no hard disc in computer). Once it is resident, it infects bootsectors of all floppies beeing writen. Thats all folks, all it fits into one sector. Michelangello or Stoned fits into this class.
File viruses like Jerusalem uses simple appending parasitic infection, infects com or exe files (or both of them). When infected file is executed, it usualy becomes memory resident and infects all executed files since. Some of them even don't have double-infection check (like Jerusalem) and often runned programs can become quite long. I think all you know basic principles so I'm not going to explain such a trivial things.
At that time situation was quite easy. May be some of you seen, for example scan19 - yes it detects 19 viruses! There were really few viruses at that time. How to deal with tivial viruses? Well, first antiviruses were really stupid and slow. Any program is and unique sequence of instructions - that something what every programmer understands. But what one (aver) can do if he (usualy) doesn't understand file structures? The result were simple algorithms very simmilar to searching for text in text editor - a whole(!) file is checked for specified string. This is origin of name "scan-string" which is a fixed sequence of bytes choosed from virus body. Moreover, some of first antiviruses scans file as many times as many strings they have. One may guess it is quite unefficient and slow. Sure! But at that time disks were really small (and computers slow as well). This technology was biggest invetion in order to fight viruses ever, I can say. It survives up to today but in modified forms - as viruses are still using fixed code (plain or encrypted or whatever) and they can be easily identified this way.
Antiviruses are bussiness. A big bussiness if one have a look at NAI. Beginigs
were quite different, as many independent (free) antiviruses were available
just to help people. But one can't stay competition with big money - look
at Microsoft to see why. Today, to keep track of a big number of new viruses
a many peoples are needed to work on antivirus for a full-time, and everyone
needs money. And people have to buy (or support) antiviruses as they affraid
of virus. Many people around the world things that viruses have to
destroy something - thats why they don't like viruses. But noone cares
that Windows crashes caused much more destruction than viruses. Because
it is normal. Weird, isn't it?
Well, this fear of viruses was started with biggest computer virus hoax ever, initiated by McAffee - in order to make money, of course. It was Michelangello couple years ago, may be some of you remember it: McAffee informed about upcomming big computer dissaster caused by extremly dangerouse virus Michelangello. They estimated 20 milions of destroyed computers at activation date. 20 milions were too big number even in those days as there weren't as many computers around the world as today. This hoax comes from publisher to publisher and it grew bigger and bigger - and information about this computer apocalypse appears in many countries. I remember dady of my schoolfellow forbid him to turn on his computer (Sinclair ZX Spectrum with 8 bit Z80 cpu!) because a virus can came to is through network (power network of 220V!) and it can be destructed. Wow! Unbelieveable, isn't it? Even more that repair disc destroyed by Michelangello tooks few seconds with diskedit. But noone mentioned it in this hoax, of course. As activation day passed, everyone understoods I hope, too few computers were destructed (comparing to 20M) but this hoax succeed: people starts really affraid of viruses, and antiviruses are sold worldwide - they become a big bussiness.
Old techniques of scanning (scan-strings)
I already mentioned first scanning methods, based on scan-strings (sequence of bytes selected from virus body). If they are found in file, it is marked as infected. Some of first antiviruses scans whole file for such a string, but later on they scanned only some specified area usualy used by viruses: begining of the file, end of file, and/or around exe's entry-point or com's first jump target. Usualy aproximately 6kB were (or are) scanned - it is quite little to load it fastly and quite enought for most of viruses - at least part of body should be there. Scan-strings are checked at every position in loaded buffer, scanning is at suitable speed.
Here should I put little discussion about scan-strings and how to choose them: at first, I will mention other forms of scan strings later on. Choosing scan-string is not as trivial as one may guess. At first, such a string have to be in loaded buffer in any case. Scan string should be as short as possible (in order to save space and scanning time), but as long as possible at the same time (in order to detect only this virus with no false possitives). This sequence should be typical for virus (preferable this virus only), and not to be found in any other regular file. If does, it is called false possitive identification. It is rather difficult to have no false possitives with many short strings - as there are many programs and one simply can't have them all.
An example of really bad scan-string is e.g. E800005x, it is short, really typical for viruses. All you know basic opcodes I assume from head, but I'll translate it to: call $+2, pop xx. But it can be found nearly in any virus, and in many regular programs written in assembler. Hope you got the point.
Another discussion is if this string should identify more viruses at once, or one-and-only. If it identifies for example huge part of Jerusalem family, it is advantage that it may identify also new mutations. But it is not suitable identification for cleaning, as they partialy differ from version to version. Today's trend is to have as exact identification as possible. But even today it is not possible. This leads to another extrema, typical for Dr.Solomon's Toolkit: to identify versions even it there are not. An example: virus named Z (pure fiction), there is only one version in real, but in aver's collection it is separated into Z.A and Z.B, and solomon identifies them as two versions but most of others not. But if you take Z.B infected file and replicate it, it is caught by solomon as Z.A. What's going on? Well, they have selected scan-string also from host file. Usualy noone takes care as avers in many cases are not replicating files and only some selected samples are travelling round the world - so they may get 100% hit rate on some virus - but only in virus collections avers have and not in real life. Remember this: they have only few samples (usualy), and whey are not active (not executed). The Tremor story later on tells you why.
Early battles - fooling simple scanners
Situation stabilizies: there were some viruses, but avers weren't able to beat them completly. Moreover, once it is bussines, they don't want to win this battle totaly as there will be no war anymore and no bussines anymore. Think with me: scanners were available and finding viruses at suitable rate. Of course there are still peoples not using updated scanners all the time so viruses can survive, but new viruses once they are found are added to scanners and can be easily identified. Too bad for virus-writer spending days or weeks to create nice piece of code to be breaked in a minutes. You have to invent something. There were two answers - stealth and encryption.
Now let's think how scanners works in that time: scanner runned on computer infected with virus opens each file and checks it for some id string. How you can hide? You can become "invisible" once you have total control over computer (elementary under DOS) and hide files beeing scanned. This is called stealth (due to U.S. Bombers B-2 called "Steath" - invisible for radars) and we may talk about two implementations for files: disinfection on-fly (each opened file is disinfected and again infected on closing) and true stealth (all file operations are checked and modified). And for boot viruses a sector redirecting is used.
Computer is infected with stealth virus. Virus is active in memory, user runs his favorite scanner and it is searching for strings in files - but as it opens files with viruses, it can't find anything as virus hides itself. Nice, isn't it?
Imagine you are an aver (as you have think for both sides, otherwise you can't rule this war) - what would you do with stealth viruses? Simplest answer is to scan memory as well, and if virus is found, ask user to boot from clean floppy and run scanner this way - then there is no virus in memory and all is as before with regular viruses. Easy easy.
Memory scanning in old times was simmilar to file scanning. All memory is checked for same strings as files, if found - a virus is reported in memory. To speedup the things some antiviruses doesn't scan whole memory but only possible locations - they may skip ROMs, antivirus itself, etc. But it differs from one implementation to another. Memory scanning is not a big technical mirracle.
Once virus is found, some antiviruses were able to patch virus to be inactive and to continue without need to boot from clean floppy. But due to many viruses appearing later it is not usual to do so today as there are too many viruses and you can't write such a routines for every of them. AVP, for example performs such a activity even now, but only a for few most common viruses. However it is quite userful for lazy users. Inactivating can be done easily by replacing virus handlers with jump to original entry-point of hooked interrupt. Also usualy a virus body is erased (except jumps of hooked interrupts in order to keep interrupt chain functional) not to report virus again. It must be done in interrupt-shield (cli) of course to protect for asynchronouse break-downs.
Another idea how to partialy inactivate virus in memory presented by some antiviruses is known-entry-point methhod. There are two basic interrupts under dos: int 21h for files and int 13h for sectors (boot viruses). If you know the original entry point (you know this version of dos or you have stored this entry-point at installation process) you may find out if some virus is in memory and you can access functions without virus' influence. Of course, for int 13h you must check not for real interrupt pointer as it points to DOS, but for internal pointer in DOS that points to ROM as boot viruses are loaded (and hooks int13) before DOS does it. But this technology in general has many weak points and it is forgotten today. As even legal programs may redirect those interrupts because Microsoft designed its "OS" this way. For example caches, networks, etc redirects this interrupt. Novell netware for example uses redirecting int21 instead of MS's recomended network redirector facility for network implementation (because it is implemented in versions 4+). If you call int21 entrypoint directly you can't scan Novell's disks. This technology caused many crashes and is unusable in generic case, you may check my another article about this: why not to use direct disk access which deals with these things.
Once stealth viruses can be found in memory, another tryies comes with encryption expetiments. It started with first encrypted viruses that had main virus body encrypted but there must be at least short decryption routine. And this routine is still a fixed sequence of bytes - and it can be identified with a scan-string. One may guess there is no improovement. Actually, not a big one, but it starts development in this direction.
Situation complicates a bit. Avers are forced to you one scan string, for example only fixed 16 bytes of decryptor. Btw: some stupid avers choosed scan-strings from virus body - e.g. xored each time with another value, so they were able to catch only samples they have, but nothing else :-) Well, let's think about simple xor routine, quite fixed, however there are several variable bytes: encryption constant (let's talk about one byte) and starting offset. As they are not at the same place, the 16 bytes of decryptor (pure example) is broken into 3 chunks of fixed bytes, biggest of them let's say 6 bytes long. And avers have a problem: 6 bytes are really not enought for scan-string, as they are not absolutely unique - part of unvaluable loop can be found in other programs (see discussion on scan-strings above). Oops, how to deal with it? (Think once again as aver) What would you do? Once you have some technology implemented, functional and tested it is best for you to use it at maximum. Scan-strings ... well, how about wildcards? Thats it: all you need is to have one-byte substitution like '?' in shell patterns. In this case you can have still 16 bytes long scan-string with 3 variable bytes. It fits the requirements and all is as before - you have a scan-string to identify virus, all is okay. The most important is they were able to deal with it, but it tooks some time - and it gives viruses possibility to be spread. This is first implementation of wildcards in avir's scan-string history, but not last change in scan-string methodology of course...
Another problem that appears here is encryptor vs. body dilema. Once it identifies virus by encryptor only, it can't make a difference between versions, moreover it can't make difference between different viruses with same (or roughly same) decryptor. Well, cleaning problem can be solved by easy de-xoring by cleaning routine - you must to do so if you want to clean encrypted virus - and you can check the difference after decrypting. But this is important change in methodology - as there no exact identification before cleaning and identification must be done once again at cleaning process in different conditions (a cleaning routine or scanner executed once again can do it). This problem still remains and I will return to it later on with MtE.
Avers handled encryption with wild-cards, you have to think about something new again, unless you want to be caught in a days. Once virus have some simple encryptor, you can improove it a bit: you can increase variability not to be handled by '?' wildcards by inserting of nop's or any simple junk instructions. Then your decryption instructions are not at fixed distances and simple wildcards will but be able to handle them. For example, if you have two fixed instructions together, a scan-string can be choosed from both of them. But if you insert 1-5 nops, scan-string with '?' will not deal with it (unless there are 5 scan-strings ;-) Simple, and it can't be handled by current methods.
How avers can find such encryptors? They implemented another type of wildcard for it - '*' equivalent for variable number of random bytes. It tooks some aditional work but it as handled. Depends on implementation how many random bytes they allow - if it is fixed, or a limits are included in scan-string, or whatever. Scan-strings becomes differ from avir to avir. They were still able to handle all viruses with scan-strings but there become a big number of strings to be used that slows down scanning itself (today it looks like a kids game but viruses were at much the lower level than now too). Some avirs starting using some hierarchy in strings, methods of strings and substrings (smaller set for generic identification and if found, more detailed set), pre-sorting of strings into radix-tables, etc. It depends but all of them follows basic principles and fulfills the requirements.
Interesting idea to speed-up scanning process is single-point scan-string, checked at fixed relative offset to some important file position (e.g. entrypoint, file start or file end). Such a string can be shorter, as it is checked only once at fixed offset (comparing to strings checked in whole loaded part of file) that decrease possibility of false attacks (and it saves memory as well). It is much faster to scan for such a strings, and it is easier to distinguish between versions. If a single-point string is well choosen it can be only 4-6 bytes long, comparing to 12-18 bytes of regular string.
Way from variable encryption to metamorphism
Once you are modifing a decryptor with some trivial junk instructions (there is no reason to put there some harder one, as all was needed to beat fixed strings, and nops can do that as well as other instructions) you can do even something more. Scan string is fixed sequence of bytes, but if you change and indexing register, it becomes different sequence of bytes. Decryptors started to change in every infection, changine indexing register, decrypting instruction, loop method, etc. Encryption scheme is pretty visible, however it slightly increases byte-level variability up to level when even wild-card scan-strings can't be used at all, or they can't be used at suitable reliability.
This is something we can call variable encryptors or metamorphism - everyone call it in different way, avers clasify it even as a low polymorphical engines in order to show how clever they are. However, now there were no matter of junk instructions (are there or not) once valueable bytes of decryptor instructions can't be checked. It was presented in many forms in viruses and it requests new answer from avers.
is the name of technology they present. As tries with mask for scan-string (to filter-out part of byte beeing variable) doesn't show suitable results, a something new had to be found. Scanners started to use (parallely) short routines to distinguish if piece of code is a known decryptor or not. It checks for some code sequences or forms, if it fits hard-coded requirements, file is reported as infected by virus. Usualy they had as many algorithm routines as many decryptors they want to recognize. As scheme of encryptors they are checking for follows really easy rules, it can be tested with satisfieable results for positive infection.
Simple encrypted viruses were checked this way. But most of top avirs are not using this for trivial virii now (some of them does, i.e. Avast! which always (ok, usualy) rates in VB's 100% award group - but you can see tests - it can't identify even simply encrypted viruses exactly). So most of top avirs are using some kind of tracing (i.e. emulating) because it is required today to handle many of complicated viruses, in some sort of generic decryptor - routine which is able to decrypt simply encrypted viruses (or more complicated, it again depends on implementation).
It was another interesting things antiviruses offers in old times - may be some of you remember for example TNT Antivirus (it is gone) that does it. Functionality is simple - viruses usualy uses some marks to tag which file was already infected, not to infect it again. (all this is nearly same for boots/mbrs). All you are using some variable set in file, or virus body (some bytes) already found in file, or changing time/date of file. By inoculating those atributes are set and virus will not infect it again. Sounds nice, but unfunctional in general :) In that time there were not as many viruses, but it becomes imposible too - you simply can't inoculate files agains all viruses. If they are checking for seconds of modification there can be two different viruses that set it to two different values, so you can't cover both of them. Yes, viruses aren't testing files only for their flags, but for some limits too. But some of them you can't fake - for example some values in exe header, or overlays (program might become unfunctional).
These are the reasons why it can't be used for large number of viruses or for all viruses - it can be done for one or some small number of viruses. Moreover, noone todays spends a lot of time with analyzis of viruses today - most of them are analysed in a short time, and you have to know them completely to do inoculation, otherwise it may damage inoculated files. Well, in other words I don't think there are any reasons to take care about inoculaton today.
"The Final solution"
This way some av companies called their antivirus systems another time ago. They presented "the final antivirus that can deal with every virus without knowing it". Sounds good, isn't it? And it is more-less true. Do you have any idea what is it? Well, checksumming, thats it. Idea is simple, and it works in many cases: all files that can be infected are chekcsummed (some kind of crc is calculated), plus filesize, some bytes from header, entrypoint, etc are backuped. Then, if virus infects some file (it must not be stealth) a change in lenght or contens is detected. First checksummers were really slow, as they checked crc of whole file, and it takes some time to load it. But it can be speeded-up rapidly by checksumming only important areas (header, entrypoint, fileend) with same success. Well, once some change is detected, file can be repaired by trying some of available repairing schemas (typicaly there are only few of them how viruses inserts itself into file) and if result of some of them matches original crc-s, file is successfully repaired.
Sounds nice, but it has several problems (lucky, lucky): at first it can't even detect stealth viruses. But if they are not in memory, they are still valueable. Another big problem are lazy users - because most of them are using (and downloading) antivirus only when they have some suspect or if they are really infected - and there is no sense to make crc snapshot of infected files ;) And finally, there are still viruses (and this is how you can avoid checksummer's success) that are not infecting files standardly, or modifies some bytes deep in host code, or whatever that doesn't match implemented schemas.
Checksummers didn't get a big success for these reasons, but they are still useable in many cases and even more, with combination of heuristical cleaner they can be more efficient. But there are still lazy users which are not using antivirus until they are infected. Because of it this can't be a really big weapon against viruses in global. But there are still antiviruses are using it, and can reach a big efficiency of detecting and cleaning.
MtE breakthrough - polymorphism
Dark Avenger, world most famouse virus writer from Bulgaria, become famouse mostly because of a 3kB long object file he released. It is known as MtE 0.9 beta (short name of Self-Mutating Engine) which made many avers not to sleep for many nights. This smashing breakthrought was many times plagiated by some virus coders, but I think it was never (or nearly never) as good as MtE. What it was? Imagine situation of scanners before: all were based on scan-strings with wildcards, at most some easy checking routines (usualy not). But Dark Avenger informed the world in FidoNet message-group (I don't think that some of current guys on scene remembers fido) about his library that that can encrypt virus in 4.2 bilion different ways (4G you should understand) that can beat all scanners. Moreover it was real.
MtE started new era called polymorphism. It was able to generate a decryptor containing many instructions withou visible schematics in it. Random maps of registers, several accessing modes, fake codeflow alternatives, all was so unusual. Only schematic thing was end of loop - usualy dec/dec/jnz sequence, as avers decided there is always this sequence. Most of them thinks it also now because Bontchev said so, but there isn't :) I got this result on thousands of generated samples I made - with some probability it creates other loop instructions sequence.
MtE 0.9 was distributes with sample virus - non-resident com/exe infector, which was many time patched by lamers that are not able to write its own virus (with MtE library or not) and many very-simmilar viruses with MtE appears. A usual name of sample virus is MtE:Dedicated, because it contains a string: "This virus is dedicated to Sarah Gordon who wanted to have a virus named after her." (hope I remember it right). Here we have - famouse and hated (and foolish) Sarah - even guys from av scene doesn't like her theoretical stuff, but they can't say it clearly as we can :) She became famouse (except it is a woman ;-) due to her investigation about virus writers and their origins. Funny but unusable and she sure hopes it is forgotten now ;) Well, but back to MtE: Also library MtE 1.0 appears in the world, but it was nearly forgotten as it doesn't bring new features, and most viruses are using original version 0.9.
Algorithmic scanners once again
MtE goes above the limits of old antiviruses and presents some completely new idea they have to fight with. Some unreliable detectors appears that checks for some secondary flags, like entry-points, or some code-sequence, or file-tagging, but they weren't quite functional. It tooks rather long time (months!) until a good detector was written and build into antiviruses. Partialy at that time it tooks many days until virus can get from country to country - unlikely today. But no matter of that there were a big compentition between antiviruses to catch all samples of MtE. A many independent test were made (it was never before or after) testing antiviruses on thousands of samples if it can find all MtE samples. It tooks lots of time to all antiviruses to reach 100% hit-rate.
Again a question of exact detection appears. Recomentations of CARO suggested the sollution (and some antiviruses follows it, like TBAV in that time) that polymorphic library should be part of virus name, separated by semicolon. For example MtE:Pogue.A - rest of hiererachical virus name should be dot-separated as before to display versions/revisions. However, it was quite difficult for avers to decide if there is a MtE encryptor at all, they weren't able to go under this generated encryptor.
How they were detecting MtE? Well, a algorithmical scanners were this solution
once again. But withou visible schema it wasn't so easy. Most of antiviruses
used (and mostly they are using also now) an acceptage-disassembler. The idea
is simple: MtE generates only some instructions, loop is always terminated
by dec/dec/jnz (well, not always, but no matter now) all you need is to know
is given instruction can be generated by MtE and the size of instruction (to
know where the next instruction is). If a jnz is found, you need to check
if it is in backward direction and there are two dec-s before it. Well, and
to solve conditional paths - just try to pass both of the using recursion.
If test is passed and backward jnz is found, a MtE virus is reported.
Such a test is fast enought, hits all infected samples and has no (or really
really little) false positives. And it can be as little as a bit more than
300 bytes as it is illustrated in TBAV.
Thats why some of antiviruses can't report exactly what virus is encrypted by some polymorphical library - they are checking (usualy) if decryptor can be generated by coresponded poly engine. This technology is intended to be non-destructive analyzis (not to load this code once again) in comparison with emulation.
Plagiating MtE - polymorphic era
Success of MtE was never replicated. At first I think none of routines were as good as MtE, they were different - as usualy noone understoods Darkie's code. But to be as successful as MtE it is not enought to write a good polymorphic engine - this is something I want you to understand - as current avir technologies can handle polymopric viruses mostly without problems. To be as successful as MtE one have to make simmilar breaktrough - something that is incompatible with current thinking of antivirus guys.
When MtE kicks all the avers pretty hard, a many virus coders started to write simmilar engines (as once MtE was already detectable). First one I remember was TPE - Trident Polymorphic Engine (released by Trident group). There was a big fear from AV side of it because all they sill remembers MtE's fear. However, TPE wasn't as successful as MtE in world becase most of viruses weren't spread enought to be important. TPE technology was a bit different than MtE's - it uses several schemes of main encryptor, picking one of them plus some number of introduction schemes placed before encryption loop. It was rather schematic, but there were many schemes so it wasn't visible for the first view. Hovever, some detecting routines used simmilar algorithm as for MtE, some detected each scheme in encryptor and checks it. In general, TPE as handled much easier by avers - as they knew how to deal with it already.
I will not make big differences between each polymorphical
engine as they are principialy unimportant. Some engines were really easy
piece of cake for avers, some made them a lot of problems. Some noticeable
poly engines usualy only reach some limits of antiviruses but never goes
above them - thats why other poly engines weren't so successful - because
MtE settuped a rather high limit. I can show you it on SMEG - all why
it was so dangerouse for avers is because it can generate a really long
Well, a big fear of avers can be, if many polymorphical viruses (or engines) appears in a short time, each of them non-trivial (on some of limits of scanners), it will be really hard to implement specialised scanning routines for all of them, if they are reported in-wild.
Well, this is a another big chapter, developed together with other technologies. A time ago, heuristic was only a experiment how one can catch unknown virii. But wasn't quite relieable, widely it was introduced by TBAV (sure everyone knows). As we have completely dedicated article to this topic I will not describe it here - only to describe its reasons and influence in history.
Finally, with more and more viruses comming each month, some avers tried to find out something that can detect even those they are not able to add so fast - to detect unknown viruses, in general. In fact it has same proposal as checksumming already mentioned. For a long time heuristics was some kind of avers alchemy to improove their hit-rates. It was magic that everyone admire (avers, virus-writes, gurus, coders and regular lamers), but noone trust. Funny, isn't it? First for wide public, surely not best, and mostly fooled by virus-writers is TBAV. TBAV puts all its power into fast heuristics but it has primary weak point - it was passive instead of active (disassembling instead of emulation) and it wasn't able to go through encryptors. Another bad thing for TBAV were displayed flags so anyone can see what internal flags were found on given file. And using documentation you can find out what TBAV suspects on your virus - and you can tune up not to be detected by TBAV easily. Soon many viruses started to be anti-tbav that means not detected by tbav's heuristic by default (today it is some sort of standard). It is too bad for heuristic - as it is designed to catch new viruses, but if they are all designed not to be detected by such a heuristic, there is no way to do so. TBAV's heuristic finds its death in these things.
TBAV (followed by some plagiats) uses, as I already mentioned, a passive method or disassembly (in other words) that analyses code (instructions) and detects some suspecting schemas - like setting registers and calling interrupts, etc. There were a lot of flags (nearly for every letter of alphabet) for many things and they are detected in different ways. But it was rather easy to fool, simply if it looks for mov ah,40 int 21, all you need is to do mov ah, 3f inc ah int 21 and TBAV will not complain. For this reason anviruses that still uses passive analyzis as main weapon combines it with register emulation (tbav as well) that can (a bit) keep a track of values in registers. When int 21 is found, for example, a 10 instructions before are likely analyzed to find out values of registers. It works in many cases and do not work in many cases as well.
Most funny thing was decryptor detection. It didn't work in many cases, and then tbav runs to detect instructions from encrypted area - and usualy it founds many suspected instructions there of course. Well, I'm not here to judge TBAV or other avir, for this proposal we have another article.
Another more powerful heuristic is presented by AVP (but
it is usualy hidden as avp displays regulary detected viruses at first),
by DrWeb and Nod-iCE. They are using active heuristics (emulating as much
as possible) and are able to detect much more suspected activities. Also,
you don't see any flags there, so it is harder to fool them. But AVP's heurstic
as well as Dr.Solomon's are setuped to be less-sensitive as they can detect
plenty of viruses by scan-strings and they do not need to be as successful
on uknown viruses as others. For this reason of course they have less
false-positives as well (our experiments some time ago shows that hit-rate
of Dr.Solomon's heuristic for example is round about 70%).
Active heuristic (emulation) is destructive to code, as it emulates as much as possible, and it must be trickily combined with scanning. But it simplifies scanning as emulation can simply go through decryptors and then av can detect virus exactly as it is already in decrypted state. For this reason it is also called as generic-decryptor in some antiviruses - if they are using emulation only for this. But heuristics finally after years of beeing unsure becomes a standard, and as it is showed by Nod-iCE and DrWeb, it can be really relieable. This what emulation gives us. However top antiviruses today uses combination of both methods.
Weak point of passive heuristic (or disassembly) is disassembly itself:
there is difficult to find out values of registers even in simple cases.
Of course it depends on implementation of heuristic. Also any encryption,
or data-depended or highly-structured code can't be understood by
disassembly-based heuristic scanner. As heuristic scanner looks for typical
structure of instructions of viruses (searching for executable files,
accessing and modifying them, becoming resident, etc) do this things in
some tricky way, not clearly and visible.
To fool emulation is much more difficult. Emulation typical executes code of virus, like in regular computer, establishing some circumstances and testing if code is performing usual virii activity. At first, emulators are limitied by its definition - they are much slower than regular machine, so long decryptors or routines jumping long time each to other are aborted on a timeout - because heuristic can't hang for a long time on one file. Then there are limits of processor - only one type of processor can be emulated (more-less) perfectly. You can test processor if it works in the way it should: undocumented (but mostly unknown!) instructions, may be some badly implemented instructions in their emulator (its hard to find). However, it is just work for couple of minutes for them to implement another instruction. But there are also other limits - machine can't be emulated completly: entire of file can't be loaded (imagine loading 500k exe file), virtual machine doesn't work like it should - many of interrupts may not work, things doing by other parts of system are not also completly emulated, i/o ports usualy doesn't work (may be some easies of them are emulated, but they can't work with all of them), etc. Hardest for avers should be reaching limits of emulator, because they can't extend their limits every time: memory length, file loading, emulation speed.
Have you ever crack a window? Just take a rock, and throw it to the window. Easy, isn't it? All right, I'm not going to write about it, but about real Windows - Microsoft's revange to the rest of the world. Time ago, with Win3.x world was devided between ones that doesn't like Windows (or even hate) and to the ones that likes Windows. (who of them used it, it doesn't matter now). Simmilar it was at the virus scene - most of them stayed at DOS level for three main reasons - there were no need to write for Win and DOS was good enought, it was less documented and finally many of coders weren't able to code something for Windows. Now it is a bit changed. Microsoft rocks the world with Windows 9x and turned everything to be PE-ized. Well, history repeats:
First Windows viruses were simple examples. At first, a file format is a bit changed - NE has extra fields, there are different circumstaces in protected mode, but interrupts still works and things are more-less similar. So the first viruses were simple non-resident infectors. And all avers needed to do is to implement scanning of secondary entrypoint in NE. Virus was pretty visible, simple scan-string can be used. Later I remember a big rumours about first resident viruses in Windows. A many discussions started if viruses will stay under DOS or will move under Windows. Today I think all you know whats true. But Windows are more complicated - there are more files beeing target of infection - DLL for example, and more things to infect in them. Interesting example was virus that infects exported labels in DLL, for example exported function XyzA, instead of regular entry-point was infected. What scanner must search for in this case? It has to go through all the exports! And there can be a lot of them that will decrese speed of scanning rapidly. It is still interesting idea. Only way it was handled by avers was scanning file-end (what they usualy do) for string, and oops - virus is there. But if the things are more complicated to scan, some encryption for example or not to be located on some easy determinable place - it will be really bad for avers (they'll have to emulate code at every export label).
Memory scanning is also not possible in a way it was for scanners. Under Windows 3.xx one can do what he wants - some viruses for example for this reason goes to Ring 0, but antivirus can do the same to scan all the memory. But there is of course more memory to be scanned and it is rather slower. Today memory scanning in Windows is not that prefered. Instead of it, a resident scanners are used more widely, as they are more consistent now with operating system (Win 9x) and there is not as big hunting for memory as it were under DOS with Bill's world famouse 640k limit.
was a real Microsoft's smash. Some of users tried to ignore it, but time shows that Win95 changed the world - everyone start using it. Today virus writers must to focus on this platform, because there are lot of users. First tries for Windows 95 started at the time when only beta version was available. VLAD promptly prepared first virus for Win95, and they spend a lot of time with exploring the details - it was Bizatch (by Quantum/VLAD). But their virus did not work under final Win95. The reason why it didn't work is simple - may be some of you know book called "Inside Windows 95". A lot of userful things regarding windows internals is published there - it was also published before Win95 was released. For this reasons, programmers at Microsft got order to changed some important things in Win95 to be incompatible with the book already written (to deny access to internal things). Also magic numbers of imports were changed there, and imported label for example FileOpenA was no longer correctly linked at load time.
Another interesting is a Bizatch story. Because avers has access to beta version of this virus (well there are some guys at virus scene that can trade internal things with avers without remorse) and they firstly assumed it is not functional. Of course, a real version they also got later on. But they named it Boza (well, all-around-the-world-hated Vesselin Bontchev (even avers hate him because of his ego)) - because he doesn't want to please a virus author. But - CARO rules (setuped by Vesselin!) says that if virus calls itself in some way, this name should be choosed primary. And Bizatch is: "Please note: name of this virus is [Bizatch] written by Quantum of VLAD". Instead of it Vesselin find a name Boza (from bulgarian alcoholic drink) - with no connection to original virus (this is the worst case suggested by CARO naming rules). Everyone at the scene was angry about avers, of course.
Forget about flame wars now. Scanning - that what is interesting. Windows 95
were 32-bit, but format they used - PE, was used even time ago. Windows NT 3.x
used it as well as win32s extension to 3.xx versions. At first what one
can expect from 32bit file format - all offsets and pointers are 32bit, of
course. Other principles are more-less simmilar to NE - there is primary
entry point, several segments can be defined, many exported functions
(for 32bit DLLs), etc. But things are same as before - to scan for simple
virus (non-encrypted) all is needed to load entry point of file and
scan for some bytes - all is as before, only PE loader is needed.
Now let's see what weapons avers have against the Windows viruses. At first we have a look at oldies scanning methods: scan-string scanning can be used in a same way as before. Checksummers may also do its work but a PE (or NE) schemes must be implemented there. The hardest part is heuristics and generic decryption (well, or both at the same time). For PE a 32bit emulator must be programmed and at the present time I don't know about any antivirus having it fully functional DrWeb is preparing it, but not yet... For this reason current heuristic engines uses for 32bit PE only passive heuristics (some kind of disassmbly). And thats why there aren't generic decryptors and each polymorphical virus for Win9x must be handled separately. But all Win9x viruses can be detected by its decryptor and - there are not many polymorphical viruses for Win9x that are principialy different so at the present time a generic decryption is not as urgent as it was for DOS.
Microsoft offers many virus-friendly enviroments. During all the history it was this way and another powerfull macro system becomes a new platform for viruses. Yeah, MS Office did it again. First tries to write virus for Word were something like jokes. Most of people hassitated to call it virus, "self-spreading macro" was most obviouse definition. But today everyone call it a virus, and there are realy many of them now. One may guess there are even more macro viruses (or other script viruses) appears monthly than "regular" viruses.
Scanners have their life more complicated once again. Microsoft keeps "structured document format" documentation for themsefs claiming: it is a internal format, you don't need to know about it, just use our programming interface. However, Microsoft's interface doesn't allow enought to scan for viruses in macro area. Avers had to find out document format by themselfs. Many of them weren't able to do it for a long time (until third-party documentation appears). Because this format is up to twice fragmentized, and moreover - fragmentation definition can be fragmented as well. Scanning macros was so unusual for avers, so some antiviruses scanned whole files (funny, if virus body scanned for can be fragmented too - and they are not able to catch fragmented pieces), or even specialized antiviruses appears and were rather successful at market - like F-Win or HMVS.
From virus-maker's point of view there is no more needed than to understood macro commands. But avers has to do much more. Microsoft's document format can be encapsulated in other formats and it is needed to scan them all (like MS Excnage's folders, etc). Once they have reading routines to access macro area, a regular scan-strings can be used. Some of first macro scanners just scanned from names of macros, but it is outdated today. Scan-strings are really relieable. However, or polymorphic macro viruses things are more complicated. For these reasons and again - to catch new viruses appearing every day, a heuristic scanners appears. They are based on dissassembly of macro code (accessing macro area and walking through instructions, finding unusual and/or suspected instructions or combinations). For macros heuristic is much more reliable as instruction set is much more limitied, there are no registers or widely accessible memory, etc.
Congratulation if you read all the things above. Hope it was not boring, and it helps you some way. The main thing I tried to present here is you have to think, not plagiating other viruses, not doing all viruses same way one right like another - but to show you that you have to understand scanning methods in order to write better viruses. Because the more your virus complicates life to the avers, the more it is successful. If you can write something that completly beats currently used methods, thats the best. I can give an example of slovak viruses like Dark Paranoid, or TMC:Level_42, or let's start with german virus Tremor: its nothing unusual except after it was detected by avers and added into scanners - it permutates (changed usual schema) and old samples weren't caught by antiviruses again. Or Dark Paranoid: as they weak point of stealth viruses is their presence in memory (and they can be detected there easily), Dark Paranoid is encrypted memory, having polymorphical handler of single-step interrupt to encrypt only one instruction beeing executed. In this way Dark Paranoid can't be caught in memory by simple scan-string, it can't be caught in files once it is stealth. Or TMC, that stands for Tiny Mutation Compiler (well, linker actually) is able to permutates its own instructions placing them in random order, connecting them with jumps and contitional jums and finally relocates all memory access instructions and jumps. Scan string for it can't be choosed as it can be broken after every single instruction. Moreover, in files it has only permutator and linker stored with data used to constructuct and link whole body (not a instructions) - and it takes really long even to emulation heuristic to construct whole virus and to test it.
These are examples of non-traditional thinking. Find your own way, break the limits of current point of view - this way you can efectively beat avers - that they affraids most of all: they can't change principles of their scanners every day. Think of it...