Register Initialising Using Only
Arithmetic Instructions
by roy g biv
                            Register Initialising
                                  Using Only
                           Arithmetic Instructions
                              roy g biv / defjam

                                 -= defjam =-
                                  since 1992
                     bringing you the viruses of tomorrow
                                    today!


Former  DOS/Win16  virus writer, author of several virus  families,  including
Ginger  (see Coderz #1 zine for terrible buggy example, contact me for  better
sources  ;),  and Virus Bulletin 9/95 for a description of what   they  called
Rainbow.   Co-author  of  world's first virus using circular  partition  trick
(Orsam, coded with Prototype in 1993).  Designer of world's first XMS swapping
virus  (John Galt, coded by RT Fishel in 1995, only 30 bytes stub, the rest is
swapped  out).   Author of world's first virus using Thread Local Storage  for
replication  (Shrug, see Virus Bulletin 6/02 for a description, but they  call
it Chiton), world's first virus using Visual Basic 5/6 language extensions for
replication  (OU812), world's first Native executable virus (Chthon),  world's
first  virus  using process co-operation to prevent termination  (Gemini,  see
Virus  Bulletin 9/02 for a description), world's first virus using polymorphic
SMTP  headers (JunkMail, see Virus Bulletin 11/02 for a description),  world's
first viruses that can convert any data files to infectable objects (Pretext),
world's  first  32/64-bit  parasitic  EPO .NET  virus  (Croissant,  see  Virus
Bulletin  11/04  for a description, but they call it Impanate), world's  first
virus  using  self-executing HTML (JunkHTMaiL, see Virus Bulletin 7/03  for  a
description), world's first virus for Win64 on Intel Itanium (Shrug, see Virus
Bulletin 6/04 for a description, but they call it Rugrat), world's first virus
for  Win64 on AMD AMD64 (Shrug), world's first cross-infecting virus for Intel
IA32  and  AMD  AMD64  (Shrug),  world's  first  viruses  that  infect  Office
applications  and  script  files  using the same  code  (Macaroni,  see  Virus
Bulletin  11/05  for  a description, but they call it Macar),  world's   first
viruses  that  can infect both VBS and JScript using the same code (ACDC,  see
Virus  Bulletin 11/05 for a description, but they call it Cada), world's first
IDA  plugin virus (Hidan), world's first viruses that use the Microsoft Script
Encoder  to  dynamically encrypt the virus body (Screed), world's first  virus
for  StarOffice and OpenOffice (Starbucks), and world's first virus IDC  virus
(ID10TiC).   Author  of various retrovirus articles (eg  see Vlad #7  for  the
strings  that make your code invisible to TBScan).  This is my third virus for
Win64.  It is the world's first polymorphic virus for Win64 on AMD AMD64.


What is it?

Probably all polymorphic engines use explicit register initialising.  By this,
I mean one of these instructions:

    xor reg, reg
    sub reg, reg
    mov reg, value
    push value / pop reg

It  means  that  anyone can see the start of the decryptor  because  of  these
instructions.  We can try to hide the decryptor by using lots of fake routines
and similar tricks, but we can't completely avoid this problem.  Or can we?

What if we did not use any of those instructions?  What if we use only AND and
OR to initialise instructions?  Then we can have many fake instructions before
the  real  ones, and it is hard to see where the decryptor really starts.   We
can  also add other instructions, like ADD/SUB/XOR.  Before the registers  are
initialised, we can use these instructions to temporarily alter the value, but
we  must  restore  the original value before we attempt to update any  of  the
bits.   After the registers are initialised, we can use these instructions  to
select the next value to use.


How does it work?

We  initialise  the registers by keeping an array of "unknown" bits  for  each
register,  and an array of values for each register.  We start by setting  all
of  the  "unknown"  bits to 1, to say that we don't know any  of  the  values.
Whenever  we  control  some  of  the  bits using  AND  or  OR,  we  clear  the
corresponding "unknown" bits, and update the same bits in the register values.
Once  all of the "unknown" bits are cleared, we can begin to use the register,
and the whole value is known.

This  method  works  for any size of register, so it's even  possible  on  the
64-bit  platform.  However, only Itanium supports 64-bit immediates.  For  the
AMD64 platform, all 32-bit immediates are treated as signed values, so we must
be  very  careful to use only 31-bit immediates, otherwise the  sign-extension
can cause unexpected behaviour if we access memory using registers.

We  use AND and OR because they are the simplest method.  For any clear bit in
the  AND mask, the same "unknown" bits can be cleared, and the register  value
bits  can be cleared.  For any set bit in the OR mask, the same "unknown" bits
can  be  set, and the register value bits can be set.  Here is an  example  of
that.  Let ABCDEFGH be eight unknown bits.  Let us perform some operations and
see what happens:

    value        unknown
    ABCDEFGH     11111111
    AND
    11001100
    =
    AB00EF00     11001100

Four unknown bits left.

    AB00EF00     11001100
    OR
    10101010
    =
    1B001F10     01000100

Two unknown bits left.

    1B001F10     01000100
    AND
    10111011
    =
    10001010     00000000

No unknown bits left, and our value is fully known!

In  a  more complex case, we can also use ADD, but it can initialise only  one
bit  per round.  In that case, an unknown bit that is one position to the left
of  a known set bit can be cleared by adding the value of the known bit.   The
problem  is that once that is done, every bit to the left of the newly cleared
bit  becomes  unknown, until we reach a known clear bit.  At that  point,  the
known  clear bit becomes unknown, but any other known bits remain known.  Here
is an example of that.  Let us assume that B and C are 0, and G is 1.

    value        unknown
    A00DEF1H     10011101
    ADD
    00000010
    =
    A0CDE10H     10111001

Now  we see that B is unchanged, C becomes unknown, and F becomes known.   The
use  of  ADD is better when all of the bits to the left of the known  bit  are
unknown, to avoid "losing" bits like the C bit in the example.

In  some cases, we can also use unexpected instructions like ADC and SBB, even
CMP.   For example, we know that AND/OR/XOR always clear the carry, so ADC and
SBB behave just like ADD and SUB in those cases.  Also, if we know the top bit
of  our  register value, even if no other bits are known, then we can use  CMP
and  "guess" the result if the top bit of our register value is different from
the  top  bit of our operation value.  Of course, if we know the whole  value,
then we can use CMP directly to know the result.


Entry Point Obscuring

Now  let us talk about hiding the start of the decryptor.  It seems like  just
using  lots  of fake instructions would be enough, but it is not  so,  because
when  we reach the real decryptor, we will initialise the registers  properly,
no  matter what value they have.  There is no way to defend against that,  but
we  can  interfere a little bit by altering sensitive registers (like ESP)  in
the  fake  instructions.  As before, we must restore those registers to  their
original value if we want to use them, but in the fake instructions before the
decryptor, we just "forget" to restore them.  This allows us to attack the CPU
emulators.   A good emulator will see an exception for the bad memory  access,
and  a  bad  emulator will allow the memory access but sometimes  corrupt  the
decryptor instead.

For  a  register like ESP, where we can never know its true value,  there  are
still  some bits that are known.  For the 32-bit platform, we know that ESP is
always  dword-aligned,  so  we can play with the low two bits.   I  wanted  to
completely  destroy  the ESP value before the decryptor, by using AND  and  OR
with  large values, but then I realised that it could be used to know when the
decryptor  starts, since at that time, such instructions cannot appear.   That
means that in a sequence like this one:

    or  esp, 12345678
    and esp, 87654321
    adc esp, 12345678
    sub esp, 12345678
    or  esp, 00000001 <- decryptor starts here
    and esp, fffffffe

the  first two instructions can be safely skipped, because they are  obviously
fake.  Instead, I use the allowed values, but I don't restore the register:

    adc esp, 12345678
    or  esp, 00000001 <- forget to sub first
    add esp, 12345678 <- make it worse
    and esp, ffffffff <- forget to and bit 0

This is harder to detect, and now ESP points randomly in memory.


Greets to friendly people (A-Z):

Active - Benny - Obleak - Prototype - Ratter - Ronin - RT Fishel -
sars - SPTH - The Gingerbread Man - Ultras - uNdErX - Vallez - Vecna -
VirusBuster - Whitehead


rgb/defjam jul 2006
iam_rgb@hotmail.com