The followin' document is an education purpose only. Author isn't responsible for any misuse of the things written in this document.

Threads r relatively new, but very useful and very perspective feature/tech used by some new Win9X/NT viruses. This article describes everything important about threads. Becoz I wrote many multithreaded viruses and actually I'm coding new one, I decided to write this. Everything, what is described here I researched - so, be tolerant to this - this article ain't for lamerz and I expect, u will research a bit on it and won't only rewriting existing code.
This is my third article about threads. If u would like to read my first articles, then u have to wait for 29A#4 releasion (there r explained more details with more examples). Be patient and promise me u will read it :)

Introduction - why threads?
In my opinion, to know threads is must. If someone doesn't know threads, then he doesn't know Win32. That's a shame - many VXerz which code for Win32 doesn't know threads, albeit it has many advantages. I think it is same "kewl and useful technology" as was polymorphism.
Well, here comes the main question - what is thread? It is hard to explain to someone, who doesn't know what processes. Ok, then, what is process?

Processes - definition
Process is defined as one instance of running program. Example: u have one program - calculator. If u will have three calculators running, u still have one program, but also three executed processes. Oppositely to Win16 interface, process in Win32 is nonactive. Process can only own something - 4GB private address space, code, data, handles, allocated memory, kernel objects and such like. Everything allocated by process or system is automatically dealocated after process will quit. Process can't execute code.

Threads - definition
Thread is kernel object and is owned by process. Thread is executing code. Where in Win16 operating system was swiching (commiting processor time) between tasks (processes), in Win32 is operating system switching between threads. Process can create so many threads as it want (blah, it's limited by memory and DWORD capacity :D). Imagine this situation: u have executed one instance of Calculator and WinWord. Calculator has only one thread and WinWord has five threads. In that case, operating system will commit processor time "parallely" to six threads (depending on set priorities) - Win16 could switch only between processes and there were no threads.
Threads r very often used in Win32. For example, if u wanna print something from some editor, then editor will create new thread which will service printing and u will still be able to edit text - one thread for editting, second thread for printing. Big advantage is that all threads r scheduled by operating system. All u need is to synchronize them - and thats the most difficult.
When new process is created, the system will by default create first thread, also called "primary thread". Remember it!

Using threads under Win32 environment
Here I will talk about Win32 compatible way of using threads.

1. Theory
Coding threads for Win32 seems easy. In fact, it's easy, but u must know some system structures and characteristics.
If u will create threads by following API, your code should work on all Win32 platformz. Before we will talk about creating threads, u should know some important things.

Thread owns its:

Context structure contains all registers. Everytime, when system switch to another thread, it will restore all registers from that strucure. Context structure is the only one processor-dependent structure in all Win32. Every thread has also its own stack allocated in 4GB address space. Standard size of stack is 1MB.

Process can be created by CreateThread API function:

a) lpsa: Pointer to SECURITY_ATTRIBUTES structure. For default security attributes use NULL value.
b) cbStack: Size of stack. Use NULL for default stack = 1MB.
c) lpStartAddr: Address of thread function.
d) lpvTParam: 32bit parameter which will be passed to thread.
e) fdwCreate: If u want to create thread, but suspend commiting cpu time, then push CREATE_SUSPENDED. That thread will stay suspended until u call ResumeThread API.
f) lpThreadID: Must be valid address to DWORD variable. ID of created thread will be stored there.

This API will create new Win32 compatible thread. As output u will get actual process related handle and whole-system related ID number. Thread handle as any other handles is valid only for one actual process, ID number is valid in all system until thread will be closed.

Thread can be terminated by ExitThread or TerminateThread APIs:

void ExitThread (UINT fuExitCode);
This API will terminate actual thread and set exit code to fuExitCode.
BOOL TerminateThread (HANDLE hThread, UINT fuExitCode);
This API can terminate thread handled by hThread handle. Oppositely to previous API, this API can terminate any thread, not only actual one. But be carefull! If u terminate thread which is writing to disk, it can cause damages to system!

There r many other APIs for work with threads, but this article ain't so much practical as theoretical. If u r really interested in thread and if u wanna know more about it, download Win32 SDK or contact me.

2. Synchronization
As I said some minutes before, thread synchronization is something really difficult, if not the most difficult thing. It is REALLY important to take a special care on it. Becoz threads can run separatelly and independently from other threads and u want to control all threads u created, u have to SYNCHRONIZE them.
Remember this:

Thread can suspend itself from commiting CPU time until kernel object will be signalised (in our case terminated). For that purpose there is one API in Win32 interface called WaitForSingleObject:

DWORD WaitForSingleObject (HANDLE hHandle, DWORD dwMilliseconds);
a) hHandle: H andle to kernel object.
b) dwMilliseconds: Number of milliseconds to wait. If u want to wait until object will be signalised for unlimited time, pass -1 to this API.
Return values:
-1 The function failed, u can get extended error code by calling GetLastError API.
0 If object has been signalised.
80h Thread waited for signalisation of object and object was signalised coz object has been abandoned.
102h Object hasn't been signalised in time spec. by dwMilliseconds parameter.

3. Practice
Ok, I hope u understood everything and if not, u will do so after some examples. The easiest, but the least efficent idea is:

  1. Create new thread and make all viral actions inside it (see CreateThread API)
  2. Sleep primary thread until primary thread will be finished (see WaitForSingleObject API)

By this we will create new thread, wait for its termination and close its handle. Thread will store all registers, get parameter to EAX register, restore all registers (needed by RET) and terminate thread (and code above will be able to continue).

Another idea, more efficent is:

  1. Create new thread, wait some seconds and make all viral action inside it
  2. Jump to host without waiting for thread termination

By this we will create new thread, close its handle and jump to host program. Thread will on the background suspend itself for 10 seconds (suspend thread from CPU time commiting), make some viral actions and terminate itself. We suspended our thread, becoz it will be less suspicious to user (virus won't slow down the system immediatelly).

All these algorithms r very simple and gives the AVs chance to trace them (everytime only one virus thread runs). It also ain't very good to create many threads where still only one will be alive (such as in my Win32.Leviathan). Much better idea is to run two or more threads "in the same" time, where one thread cannot run without second one (and that second one cannot run without third one and so on...). It makes the analysis much harder, if not impossible. The main idea is: let all threads be running and let operating system synchronize them (do not synchronize them manually, becoz AV will be able to emulate it - they aren't able to do that now, but I think they will do so after some months).

Here is an algorithm:
  1. Create two threads, primary thread will pass execution to host program
  2. Let first thread make 50% of some action and second thread to make next 50% of action without synchronization
  3. After everything is complete (when first thread will terminate itself, it will set flag. After two flags will be set ...), then create another two threads, where another halfs of actions will be done and terminate previous two threads.
  4. Recursively do all of this until all viral actions will be done.

This won't be so easy to trace (also not for u :D)... this is my idea of future working of viruses. I never coded virus using this algorithm, but I will do that. Now, it's only an idea...

Using Ring-X threads under Ring-0
Now u should know all important things about threads. All previous examples can work under all Win32 platformz - it's the most compatible way.

I also wrote article about Ring-0 and Ring-3 threads under Ring-0, which will be published in 29A#4. From that time I didn't find anything new, so I haven't anything new to show ya here. I think it wouldn't be good to copy whole article here, becoz that article would be shit then. Please, wait for 29A#4 releasion and u will find there also that article. Thank you.

Advantages of threads created from Ring-0:
  1. Anti-Debug
  2. Anti-Heuristic
  3. Residency
  4. And much more...

I hope, that u enjoyed threads and that u will implement them in your next virus. If u didn't understand anything, then contact me at benny@post.cz and I will help ya.
Last greetingz goes to *-zine stuff, especially to Flush, Mgl and Navrhar for letting me publish this article. Good luck in coding guys!

Benny / 29A, 1999