pe.gif (3302 bytes)

- A word from the author
- Preface
- PE EXE structure overview
- PE EXE structure (description)
- Standard sections
- Kernel32 problem
- Attach

A word from the author
I take NO responsibility for any damage that is caused by using this project.

"In no event shall I, the author, be liable for any consequential, special, incidental or indirect damages of any kind arising out of the delivery, performance or use of this project."

This project is unique and some of material here can not be found elsewhere! You got here some things that even a good viruses forgot!

The purpose of this project is not only to lead the reader into Win95 virus world. There are a lot of things that can be 'born' from this project: an always good idea is an EXE file crypter for programs protection.

Windows NT is based on VAX, VMS and UNIX roots. Many of Windows NT creators are also been involved in development of those OSes before they join Micro$oft. So it is logic that they build inside Windows NT some properties that has been already tested. Format of executable and object files which they used was COFF (Common Object File Format). That was a very good starting point, but it was necessary to change it a bit to corespondent the need of 'modern' OS. So PE (Portable Executable) format was born. It was called 'Portable' cause all Windows NT ports on different platforms (x86, MIPS, Alpha...) use this same format. Of course, there are differences, for ex.: different CPU instruction coding etc. but the main goal is the fact that OS loader and OS tools do not have to be completely rewritten from the beginning for every platform.

PE file structure is described in WINNT.H header (for Borlands IDE it is: NTIMAGE.H). There are defined all PE file structures. Unfortunately, structure members have very long names, and often have different meaning, structures are big, nested etc. But we should expect all this, cause we know where it came from, right?

Little note for future OS creators: executable structure also shows OS characteristics.

PE EXE structure overview
Here you will not find the whole structure of PE EXE, but only important part of it. Complete descriptions can be found on the net. However I didn't found any project that will satisfy me completly. Anyway, Matt Pietrek leads.

PE EXE structure looks like this:

|       MS-DOS MZ header        |
| MS-DOS Real-Mode Stub program |
|        PE EXE Header          |
|    PE EXE Optional Header     |
|     section header #1         |
|     section header #2

|        section #1            |
|        section #2

First important thing you need to know is that PE EXE on disk looks very much the same like its memory image, when Loader (part of Windows KERNEL that load and run files) load it there. Loader map files into virtual address space. So, all you need is to know where and how Loader maps parts of PE file in memory reserved for that.

Now is the time to explain what 'Relative Virtual Address' (RVA) means. RVA is simply an offset from the beginning of mapped PE file. For example: if Loader maps some file from address: 0x40000, and if RVA of some data is 0x464 than you can find it in memory on virtual address: 0x40000 + 0x00464 = 0x40464. Also note that there is a relation between RVA of data and its PE EXE file offset!

One more term is need to be explained: PE file section. Sections are continuous parts of PE file with variable length which stays continuous in memory, after file loading. Sections hold all raw data: code, data, resource info etc. When Loader load PE file and map it into reserved memory then Loader put sections in memory in specific order. But it does not touch sections and do not break them. Every section has its own purpose: some hold code or data, but some are created by linker just for OS use.

PE EXE structure (description)
Here comes detailed description of PE EXE file structure from 'attaching files' perspective: trying to explain how to join two files for good or bad reason:) This description refer on structures defined in standard WINNT.H file. All referred structures are written in different font. Structure elements that are important to us are marked with red star.

MS-DOS MZ Header
First 64 bytes of every EXE file contains standard MS-DOS header. It has to stay because of compatibility with previous versions of Micro$oft OS. This structure is defined in: IMAGE_DOS_HEADER. From all elements of this structure here is what we need:

WORD e_magic
So called magic number: first two bytes of every EXE file must be letters 'M' and 'Z' (those are initials of programmers who worked on first DOS versions). But, not many people knows that this letters can be given in a different order: 'Z' and 'M'.

LONG e_lfanew
File offset (and RVA, too) of 'PE EXE Header'. To be more precisely, it points 4 bytes before that structure, on the PE 'signature' dword.

MS-DOS Real-Mode Stub program
This is normal DOS program with one purpose: to write a error message when somebody starts PE exe file from old DOS version. But, this can be any kind of MS-DOS program so you can create a program that work on two different OSes: DOS and Windows9x!

PE EXE Header
Here starts PE EXE. Before this structure in file (and in memory) is 'signature' dword on which points e_lfanew, like I said before. This 'signature' dword tell the OS what kind of EXE is this: is it PE, LE, NE etc. For PE files value of this 'signature' dword must be 0x00004550 (PE00). After 'signature' dword goes PE EXE header. It is defined in: IMAGE_FILE_HEADER. Elements of this structure are:

WORD Machine
CPU identification (ID number): Intel i860 (0x14D), Intel i386 (0x14C), MIPS R3000 (0x162), MIPS R4000 (0x166), DEC Alpha AXP (0x183).

WORD NumberOfSections
Total number of sections.

DWORD TimeDateStamp
Time of file creation in seconds since 31-dec-1969, 4:00 PM.

DWORD PointerToSymbolTable
DWORD NumberOfSymbols
Not important.

WORD SizeOfOptionalHeader
Size of 'PE File Optional Header' structure that comes right after this one.

WORD Characteristics
Not important.

PE EXE Optional Header
This structure come right after last one. Note that this structure must exist, regardless of its name: 'Optional'! It has this name because on different platforms this structure looks different. Anyway, this structure is defined in: IMAGE_OPTIONAL_HEADER. Here are elements of this structure (for Intel processors):

WORD Magic
Some magic number, again. (not important)

BYTE MajorLinkerVersion
BYTE MinorLinkerVersion
Version of linker that build this exe. Not important.

DWORD SizeOfCode
DWORD SizeOfInitializedData
DWORD SizeOfUninitializedData
Size of code,data, bss section (for uninitialized data). Not important.

DWORD AddressOfEntryPoint
RVA address where PE EXE is started by Loader. On this address program begins.

DWORD BaseOfCode
DWORD BaseOfData
RVA of code and data section start. Not important.

DWORD ImageBase
Virtual address where whole PE EXE is loaded (usually is 0x400000). All virtual address are calculated from this one: just add their RVA on this base address.

DWORD SectionAlignment
This is alignment of sections virtual addresses (usually 0x1000).

DWORD FileAlignment
File size alignment (usually 0x200). Note that this is a section alignment, too.

WORD MajorOperatingSystemVersion
WORD MinorOperatingSystemVersion
WORD MajorImageVersion
WORD MinorImageVersion
WORD MajorSubsystemVersion
WORD MinorSubsystemVersion
DWORD Win32VersionValue
Unimportant things.

DWORD SizeOfImage
This is total size of memory that Loader have to take care of. If it is not adjusted right Loader will show error message that there is not enough memory and that you should close other running programs.

DWORD SizeOfHeaders
Size of all PE headers and tables. After all of it there comes raw data.

DWORD CheckSum
This should be CRC checksum, if is set by linker. Not important.

WORD Subsystem
Interface type needed for PE EXE: NATIVE (1) - does not need subsystem, like device driver; WINDOWS_GUI (2) - it needs Windows GUI; WINDOWS_CUI (3) - it is console application; OS2_CUI (5) - OS/2 console application; POSIX_CUI (7) - Posix console application.

WORD DllCharacteristics
DWORD SizeOfStackReserve
DWORD SizeOfStackCommit
DWORD SizeOfHeapReserve
DWORD SizeOfHeapCommit
DWORD LoaderFlags
Not important.

DWORD NumberOfRvaAndSizes
Number of fields in DataDirectory. Always set to 16.

This is array of 16 entries. Every entry specify position and size of some important part of PE file: section, table etc. Every entry is, in fact, a structure defined as: IMAGE_DATA_DIRECTORY that have only 2 members:
    DWORD VirtualAddress - entry RVA, and
    DWORD Size - entry size.
Today only 11 entries are used from total 16, but lately there are 2 more entries that is used - but we can't expect that they exist in every file, of course. From all this entries the most important for us is second entry in the row, (that is DataDirectory[1]) because it specify position and size of IMPORT section. What are other entries for - you can find that in WINNT.H in IMAGE_DIRECTORY_ENTRY_XXX definitions.
One of two new entries is DataDirectory[12] and it is important because it specify where IAT (Import Address Table) is, but, unfortunately, like I said before, this entry exist only in EXEs builded with newer linker versions and we can't use it.
When some part of PE file is not present in a EXE its entry values are set to zero.
Purpose of DataDirectory is to help Loader to fast find some important parts of PE file.

Section Table
Right after PE headers, and before raw data, there is an array of Section Headers - array of header structures for every section that exist in PE EXE. Every section that exist in EXE must have its own 40bytes long Section header listed here. Total number of sections is given before. Section header is defined in: IMAGE_SECTION_HEADER. This structure have this elements:

8bytes for ANSI section name. Usually, the section name starts with a dot ('.'). Also, some section names have very informative names (.text, .data etc) but we can't count on that. Also note that if name of section is 8 bytes long there will be no zero string terminator on the end of the name.

union {
    DWORD PhysicalAddress
    DWORD VirtualSize
} Misc;
This is one of most interesting part of PE file. This is union so for PE EXE manipulation we use VirtualSize (PhysicalAddress is used for OBJ files). This element specify length of raw section data. This is non-alignment section size!
But, this value can be set to 0!!! In this case Loader assume that raw section data is equal to alignment section length (which is given further in structure). Watcom linker generate PE EXE files with VirtualSize filed set to 0. Viruses almost always use this data, and not many people knows that this size can be 0 - so virus don't attach itself good on the file! If somebody runs this badly infected file Windows will usually give error message and program will not work (virus either)! So here is an idea how to put some 'anti-viral' shield inside your programs, without any piece of the code. Some sophisticated anti-viral shield could also check if this value is zero or not. Of course, all this is actual only until is un-discovered:)

DWORD VirtualAdress
Section RVA address. Add this address to ImageBase (given before) and you got virtual memory address of section. Note: here RVA is not equal with section PE file offset, but there is still a relation between them.

DWORD SizeOfRawData
Alignment section length. Alignment step is defined before, in FileAlignment.

DWORD PointerToRawData
File offset of raw section data. Like it was said before, all sections (it raw data) are after this Section headers array. During loading PE EXE file in memory Loader read section raw data from file offset that is specified here and put it on address specified in VirtualAdress.

DWORD PointerToRelocations
DWORD PointerToLineNumbers
DWORD NumberOfRelocations
DWORD NumberOfLineNumbers

DWORD Characteristics
Contain section characteristic (attributes). Every section has it own characteristics. Code section usually have this attributes set: executable, readable, code. Data section set also this attributes: readable and writeable.
Here is important code section 'writeable' attribute cause it says to KERNEL is it possible to write in code section or not.

Finally, after PE headers and Section Header array there comes program itself, divided into sections. Every section usually contain unique program functionality: data, code, import data, export data etc. But, there can exist more than one section with same functionality (but it is not recommended), for example: 2 code sections.

Standard sections
Here goes the descriptions of standard sections, with special care of code, import and export sections.

Code section
Linkers usually collect all code from OBJ files and put it in one section. This section has often named as '.text'. In this section is program code.
Here is important to see how program call a function that is imported from a DLL. Those function are, for example, all WinAPI functions and they are imported from system DLLs. Because DLLs are not always loaded on the same address in memory, or even are not existing in memory before program start, Loader must use a efficient way to call all imported functions. So what Loader (and previously linker) does is that all those calls from program code redirect to the special part of Code section where are instructions like:
where XXXXXXXX is address of somewhere inside of program Import section. During loading PE EXE Loader fills Import sections with proper DLL imported function memory offsets, for every imported function. All this performance is called 'importing function'. So Loader does not have to patch every CALL instruction in the code, but it has to set memory offset of every imported function in Import section. Drawback of this method is that you can't get imported function address directly from the code, but, do not worry, you can get them in other way.
Lets say, for example, that somewhere in program is a call to GetMessage WinAPI function that belongs to system USER32.DLL. So here is a little picture that explains this mechanism:

                   Program                                     USER32.DLL

           |                        |
           +-Import-----------------+                      +-----------------+
           |          :             |          0x77879426: |        :        |
0x401042:  |      0x77879426        | -------------------> | GetMessage code |
           |          :             |                      |        :        |
           +------------------------+                      +-----------------+
           |                        |

           |                        |
           |           :            |
0x404408:  | JMP DWORD PTR [401042] |
           |           :            |
           |           :            |
           |    CALL GetMessage     |
                 (call 404408)
           |           :            |
           |                        |

Data section
Data section hold all global and static variables data which are initialized during code compiling. Here goes strings, too. Linker usually join all data section into one and usually name it as '.data'. Local variables are stored in local thread stack and they are not in data or bss section.

BSS section
Here are all uninitialized static and global variables. This section does not take any physic file space, so it PointerToRawData and/or SizeOfRawData is equal to 0. Borland linkers usually do not have this section - it is, in fact, inside their data section. Name of this section is usually '.bss'.

Resource section
Usually is named as '.rsrc' and here are all resource data.

Import sekction
Import section (usually named as '.idata') hold all informations that helps Loader to find addresses of all imported functions which are used in program so that Loader can patch part of Import section so that program can work properly. Import section starts with an array of IMAGE_IMPORT_DESCRIPTOR structures. Every DLL that contain imported function is represented in this array. One DLL also can be represented more than once: each time with different set of imported functions. This array have, of course, variable length so at the end of the array there is an empty structure (all elements are NULL) terminator. Here goes structure elements:

DWORD Characteristic
Well, this is not characteristic at all: it is RVA address of special pointer array. This pointer array is called HintName Array (HNA) and have similarly function as IAT (Import Address Table). But, when Loader run a program it does not change this table at all, like it changes IAT.

DWORD TimeDateStamp
Not important.

DWORD ForwarderChain
This element is here for special function importing technic: 'forwarding'. Forwarding is when functions are not imported directly from requested DLL, but from forwarded DLL. For example, NTDLL.DLL have exports of some functions that are, in fact, from KERNEL32.DLL. When they are called from program, and if they are imported from NTDLL.DLL, Windows will not search for them in NTDLL.DLL code but in KERNEL32.DLL code, where those functions really are. This structure element can be important in some cases.

RVA address of ASCIIZ string that contains the name of imported DLL.

RVA address of Import Address Table (IAT). IAT is an array of pointers that, in the PE EXE file, points to the imported function names and theirs ordinal numbers. Ordinal number is unique DLL function number. Also this number is known as Hint. So, every pointer in IAT points to structure defined in IMAGE_IMPORT_BY_NAME which has 2 elements:
    WORD Hint - ordinal number, and
    BYTE Name[?] - ASCIIZ name of imported function.
Here exist one undocumented thing: there are special cases when PE EXE imports some special functions only by their ordinal number and not by its name, so name is not given. This is case with some system programs (like EXPLORER.EXE) which use some OSes undocumented and non public functions. Those cases can and must be detected when a pointer from IAT points to signed int that is less than 0. To be precisely, pointer points to DWORD which contain special function ordinal number with most significant bit set (for example: value 0x80000012 is for ordinal number 0x12).
During loading and running program, Loader use all this information like this: for every pointer from IAT Loader reads the name of function which is imported and finds out its address in current DLL. Loader than change IAT so pointers from there now points directly to the imported function code!

Next diagram shows one standard import structure in PE EXE file.

                            HNA                              IAT
                           +---+    +------------------+    +---+
                        /->|   |--->|        44        |<---|   |<---\
                        |  |   |    |   "GetMessage"   |    |   |    |
                        |  +---+    +------------------+    +---+    |
IMAGE_IMPORT_DESCRIPTOR |  |   |--->|        72        |<---|   |    |
  +-----------------+   |  |   |    |    "LoadIcon"    |    |   |    |
  | Characteristics |---/  +---+    +------------------+    +---+    |
  +-----------------+      |   |--->|        19        |<---|   |    |
  |  TimeDateStamp  |      |   |    |"TranslateMessage"|    |   |    |
  +-----------------+      +---+    +------------------+    +---+    |
  | ForwarederChain |        :               :                :      |
  +-----------------+                                                |
  |      Name       |---> "USER32.DLL"                               |
  +-----------------+                                                |
  |   FirstThunk    |------------------------------------------------/

Like I said before, this is how PE exe file looks like. When Loader loads this PE EXE it changes IAT pointers so that they point directly to the code of imported function GetMessage, LoadIcon etc. which are in USER32.DLL.

Unfortunately there are two exceptions, and there is not much documentation about them. First exception came from Borland: programs that are linked with TLINK32 have no HNA (it points to NULL). Like this they can save some file space. The reason why Borland discard HNA table is because HNA and IAT have almost the same function. Still, there is the difference between IAT and HNA, because Loader changes IAT at load-time, so after program run HNA still points to the names of imported functions and their hints. Anyway, Borland decide to discard HNA information.
Second exception came from Micro$oft. It is about IAT: they wanted to speed up things so some programs have IAT with pointers that already points to the imported functions (like explorer.exe from Win95 OSR2). IAT here already contain imported function addresses! This is opposite from the fact that DLL are always on different addresses! Anyway, if you wanna work with functions imports you should use both HNA and IAT.

Here is interesting another thing: Import section have usually read-write attributes set so it is easy to intercept imported function calls! All you have to do is to patch IAT so that it points on intercepting function. Later, from intercepting function you just call original imported function. This is nice method especially cause there is no need to modify code!

Export section
This section (usually called as '.edata') is standard section for DLLs. It contains all information about export functions. This functions in DLL are than ready to use them like imported functions from other DLL or EXE. At the beginning of this section is structure: IMAGE_EXPORT_DIRECTORY and right after goes raw data. Here are elements of this structure.

DWORD Characteristics
DWORD TimeDateStamp
DWORD MajorVersion
DWORD MinorVersion
Not important.

RVA address of ASCIIZ string with DLL name in it.

This is start ordinal number. To get real ordinal number you have to add this value to the numbers you get from AddressOfNameOrdinals.

DWORD NumberOfFunctions
Number of function you get via AddressOfFunctions. This number theoretically can be different from NumberOfNames value, but this two elements are usually the same.

DWORD NumberOfNames
Number of names that you got via AddressOfNames.

PDWORD *AddressOfFunctions
RVA address of an array that contain function addresses. This addresses are RVA for each exported function of current module (DLL).

PDWORD *AddressOfNames
RVA of pointer array that points to the strings. Strings contain names of exported functions.

PWORD *AddressOfNameOrdinals
RVA of pointer array that points to WORD ordinal numbers of exported functions. You have to add Base ordinal value on ordinal that you get from this array to get real function ordinal number.

You can see here that Export section contain 3 different array of pointers that points to function name, its address and its ordinal number. This 3 arrays are 'parallel'. When you search for function and you have only its name you will use array with pointers to the names to find out where in the array is pointer that points to the same function name. So if you wanna get other function info, usually its address, you just look into other table at the same array position (array index).

Kernel32 problem
It is time to analyze problems of attaching some code on a PE EXE file, which is called 'host'. Code that you wanna attach to the host must do something useful so it have to call WinAPIs. And here we got some problems.

It was said before how Loader loads and run a PE EXE file. It reads IAT that points to the names of imported functions, like WinAPI functions. Than Loader get the addresses of all those functions and patches the IAT so now IAT points directly to imported functions code, somewhere in DLL memory. When host code call imported function, it will be called indirectly, via IAT content. Programmer does not have to take care about IAT - it is created by linker, so programmer in his own code call imported function normally: call .

Problem is in the code that has to be attached to the host: how to call WinAPI from it? Attached code wasnt there at link-time so there is no redirections via IAT!

The solution is easy: first thing attached code must do is to find addresses of all function that it will use. All this has to be done before attached code call any of WinAPIs. So, first you need to locate KERNEL32.DLL because it export all system WinAPIs. But KERNEL32.DLL and other system DLLs are not on the same addresses. This was done by purpose so it could be not possible to write a Windows virus. But, there is always a backdoor...

...and here its name is GetModuleHandle. This WinAPI simply returns address of any module you need. Here we need KERNEL32.DLL. But, wait, how can we call GetModuleHandle() if KERNEL32.DLL is not already located and what is address of this function???

Solution is simply: you get GetModuleHandle from hosts IAT. When Loader finish its job and patch IAT and give control to the program (attached code) all you have to do is to read from IAT data about imported GetModuleHandle. So once you get GetModuleHandle address after you call it in EAX there will be address (handle) of KERNEL32.DLL.

Here you can spot a week point of this method: host must import GetModuleHandle. Well, this is not so week point cause almost every PE EXE import this function.

Lets go on. KERNEL32.DLL is located, but how to find all other function addresses that is used from attached code? Simply: use GetProcAddress WinAPI. This function for its params has DLL handle and pointer to function name. As result you got function address. But here I need the address of GetProcAddress before I call it, right?

Yes and no. GetProcAddress is also very often imported so you get its address from IAT, the same way you did it for GetModuleHandle. But there are many programs that do not import this function. The best solution here is to use your own function that work identically like GetProcAddress.

Once when you got addresses of GetModuleHandle and GetProcAddress it is possible to find out addresses of all other WinAPIs that attached code use. Usually all that is done on the very beginning of the attached code.

Down on this page you can download 'Getproc.exe' - program that gives you addresses of all functions. There is source included. Program is written by jqwerty from the famous group: 29A which created many very good viruses.

Attached code must not destroy current PE EXE file structure. There are many solutions for attaching code. Here goes description of common two. Others are only variants of this solutions.

First solution that man can think of is to attach your code as new section at the end of PE EXE. But, if we look a bit inside we will see that this solution works, but there is so many things that have to be changed. You need, for example, to add new Section Header in the middle of the PE EXE file: it is not very pleasant thing to do, especially for viruses. So there is another way to do it, more sophisticated.

Second solution will attach code at the last PE EXE file section. Like this original PE EXE file structure remains more and less the same. So here are details for this solution.

Note: regardless solution you will use, you have to take care of EXE files that have Overlays. Overlay is some data added at the end of the file and its size is not included in structures of PE EXE file. Typical example for this are apps installations: installer have an overlay at the end of its code that contain program that will be installed. Attached code always goes to the end of PE EXE file so if there is an overlay there you must to take care about it, too.

Here is explained how to attach code to the host PE EXE file. Variable declarations, error treatment and other 'non-important:)' stuff will not be discussed here so reader can keep attention on real stuff.

At the beggaring you should open host and map it into memory. Lets say that filemap is char* pointer to the host file memory mapped block.

[***] Important note: The rest of the project is just descriptive, for now. If there is a interest and support, this project will be finished with full virus source explanation.

Check out if file is valid PE EXE: first check for MZ (or ZM) signature at the beginning of the file and for PE00 signature before PE headers.

Get number of section. Go to the PE Optional Header.

Check if app is for GUI. Store EntryPoint, ImageBase, FileAlignment (it is section alignment) and get pointer to the DirectoryData and SizeOfImage.

If there is Import section in DirectoryData (and usually is) calculate its position and get from it RVA address for KERNEL32.DLL functions: for GetModuleHandleA and, if exist, for GetProcAddress.

Find the last section in the host. But note that this does not have to be the last section in Section Table!!! Sometimes can happened that the last section in Section Table is BSS section which does not take physic host file size! This section have zeroed PointerToRawData and/or SizeOfRawData.

Find out where to attach code. Like it was said before, VirtualSize can be 0, so take care of it.

Finally, time for changing PE file. You should change:
- RVA address of EntryPoint os it points to the attached code;
- size of last section (SizeOfRawData);
- size of the last section in DirectoryData, if exist there;
- characteristics of last section: need to set to EXECUTE, READ and WRITE.
- size of last section (VirtualSize) if not equal to 0;
- SizeOfImage;
- size of whole file

Before writing attached code to the host usually you store inside the attached code some data that it will use during execution. This will be founded addresses GetModuleHandleA and GetProcAddress from KERNEL32.DLL, and also need to store original EntryPoint.

Attach code
And the last thing we need to discuss here is how attached code looks like. All code is in one section, so data and code will be together, so there is no separate data and code segment. Because of that you need to set READ & WRITE attributes of section where code will be added.

[***] Important note: The rest of the project is just descriptive, for now. If there is a interest and support, this project will be finished with full virus source explanation.

Attach code general works like this: first it must to find out where is data part of the code. Then it reads from there all stored KERNEL32 function addresses. Than it finds addresses of all other WinAPI functions that will be needed.

After this code do whatever it should do: if it was a virus it should start to infect other files. Anyway, when it finish its job it has to jump to original stored EntryPoint so that host run normally.