- A word from the author
- PE EXE structure overview
- PE EXE structure (description)
- Standard sections
- Kernel32 problem
A word from the author
I take NO responsibility for any damage that is caused by using this project.
"In no event shall I, the author, be liable for any consequential, special,
incidental or indirect damages of any kind arising out of the delivery, performance or use
of this project."
This project is unique and some of material here can not be found elsewhere! You got here
some things that even a good viruses forgot!
The purpose of this project is not only to lead the reader into Win95 virus world. There
are a lot of things that can be 'born' from this project: an always good idea is an EXE
file crypter for programs protection.
Windows NT is based on VAX, VMS and UNIX roots. Many of Windows NT creators are also been
involved in development of those OSes before they join Micro$oft. So it is logic that they
build inside Windows NT some properties that has been already tested. Format of executable
and object files which they used was COFF (Common Object File Format). That was a very
good starting point, but it was necessary to change it a bit to corespondent the need of
'modern' OS. So PE (Portable Executable) format was born. It was called 'Portable' cause
all Windows NT ports on different platforms (x86, MIPS, Alpha...) use this same format. Of
course, there are differences, for ex.: different CPU instruction coding etc. but the main
goal is the fact that OS loader and OS tools do not have to be completely rewritten from
the beginning for every platform.
PE file structure is described in WINNT.H header (for Borlands IDE it is: NTIMAGE.H).
There are defined all PE file structures. Unfortunately, structure members have very long
names, and often have different meaning, structures are big, nested etc. But we should
expect all this, cause we know where it came from, right?
Little note for future OS creators: executable structure also shows OS characteristics.
PE EXE structure overview
Here you will not find the whole structure of PE EXE, but only important part of it.
Complete descriptions can be found on the net. However I didn't found any project that
will satisfy me completly. Anyway, Matt Pietrek leads.
PE EXE structure looks like this:
| MS-DOS MZ header |
| MS-DOS Real-Mode Stub program |
| PE EXE Header |
| PE EXE Optional Header |
| section header #1 |
| section header #2
| section #1 |
| section #2
First important thing you need to know is that PE EXE
on disk looks very much the same like its memory image, when Loader (part of Windows
KERNEL that load and run files) load it there. Loader map files into virtual address
space. So, all you need is to know where and how Loader maps parts of PE file in memory
reserved for that.PE EXE structure (description)
Now is the time to explain what 'Relative Virtual Address' (RVA) means. RVA is simply an
offset from the beginning of mapped PE file. For example: if Loader maps some file from
address: 0x40000, and if RVA of some data is 0x464 than you can find it in memory on
virtual address: 0x40000 + 0x00464 = 0x40464. Also note that there is a relation between
RVA of data and its PE EXE file offset!
One more term is need to be explained: PE file section. Sections are continuous parts of
PE file with variable length which stays continuous in memory, after file loading.
Sections hold all raw data: code, data, resource info etc. When Loader load PE file and
map it into reserved memory then Loader put sections in memory in specific order. But it
does not touch sections and do not break them. Every section has its own purpose: some
hold code or data, but some are created by linker just for OS use.
Here comes detailed description of PE EXE file structure from 'attaching files'
perspective: trying to explain how to join two files for good or bad reason:) This
description refer on structures defined in standard WINNT.H file. All referred structures
are written in different font. Structure elements that are important to us are marked with
MS-DOS MZ Header
First 64 bytes of every EXE file contains standard MS-DOS header. It has to stay because
of compatibility with previous versions of Micro$oft OS. This structure is defined in:
IMAGE_DOS_HEADER. From all elements of this structure here is what we need:
So called magic number: first two bytes of every EXE file must be letters 'M' and 'Z'
(those are initials of programmers who worked on first DOS versions). But, not many people
knows that this letters can be given in a different order: 'Z' and 'M'.
LONG e_lfanew [*]
File offset (and RVA, too) of 'PE EXE Header'. To be more precisely, it points 4 bytes
before that structure, on the PE 'signature' dword.
MS-DOS Real-Mode Stub program
This is normal DOS program with one purpose: to write a error message when somebody starts
PE exe file from old DOS version. But, this can be any kind of MS-DOS program so you can
create a program that work on two different OSes: DOS and Windows9x!
PE EXE Header
Here starts PE EXE. Before this structure in file (and in memory) is 'signature' dword on
which points e_lfanew, like I said before. This 'signature' dword tell the OS what kind of
EXE is this: is it PE, LE, NE etc. For PE files value of this 'signature' dword must be
0x00004550 (PE00). After 'signature' dword goes PE EXE header. It is defined in:
IMAGE_FILE_HEADER. Elements of this structure are:
CPU identification (ID number): Intel i860 (0x14D), Intel i386 (0x14C), MIPS R3000
(0x162), MIPS R4000 (0x166), DEC Alpha AXP (0x183).
WORD NumberOfSections [*]
Total number of sections.
Time of file creation in seconds since 31-dec-1969, 4:00 PM.
Size of 'PE File Optional Header' structure that comes right after this one.
PE EXE Optional Header
This structure come right after last one. Note that this structure must exist, regardless
of its name: 'Optional'! It has this name because on different platforms this structure
looks different. Anyway, this structure is defined in: IMAGE_OPTIONAL_HEADER. Here are
elements of this structure (for Intel processors):
Some magic number, again. (not important)
Version of linker that build this exe. Not important.
Size of code,data, bss section (for uninitialized data). Not important.
DWORD AddressOfEntryPoint [*]
RVA address where PE EXE is started by Loader. On this address program begins.
RVA of code and data section start. Not important.
DWORD ImageBase [*]
Virtual address where whole PE EXE is loaded (usually is 0x400000). All virtual address
are calculated from this one: just add their RVA on this base address.
This is alignment of sections virtual addresses (usually 0x1000).
DWORD FileAlignment [*]
File size alignment (usually 0x200). Note that this is a section alignment, too.
DWORD SizeOfImage [*]
This is total size of memory that Loader have to take care of. If it is not adjusted right
Loader will show error message that there is not enough memory and that you should close
other running programs.
DWORD SizeOfHeaders [*]
Size of all PE headers and tables. After all of it there comes raw data.
This should be CRC checksum, if is set by linker. Not important.
WORD Subsystem [*]
Interface type needed for PE EXE: NATIVE (1) - does not need subsystem, like device
driver; WINDOWS_GUI (2) - it needs Windows GUI; WINDOWS_CUI (3) - it is console
application; OS2_CUI (5) - OS/2 console application; POSIX_CUI (7) - Posix console
Number of fields in DataDirectory. Always set to 16.
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES] [*]
This is array of 16 entries. Every entry specify position and size of some important part
of PE file: section, table etc. Every entry is, in fact, a structure defined as:
IMAGE_DATA_DIRECTORY that have only 2 members:
DWORD VirtualAddress - entry RVA, and
DWORD Size - entry size.
Today only 11 entries are used from total 16, but lately there are 2 more entries that is
used - but we can't expect that they exist in every file, of course. From all this entries
the most important for us is second entry in the row, (that is DataDirectory) because
it specify position and size of IMPORT section. What are other entries for - you can find
that in WINNT.H in IMAGE_DIRECTORY_ENTRY_XXX definitions.
One of two new entries is DataDirectory and it is important because it specify where
IAT (Import Address Table) is, but, unfortunately, like I said before, this entry exist
only in EXEs builded with newer linker versions and we can't use it.
When some part of PE file is not present in a EXE its entry values are set to zero.
Purpose of DataDirectory is to help Loader to fast find some important parts of PE file.
Right after PE headers, and before raw data, there is an array of Section Headers - array
of header structures for every section that exist in PE EXE. Every section that exist in
EXE must have its own 40bytes long Section header listed here. Total number of sections is
given before. Section header is defined in: IMAGE_SECTION_HEADER. This structure have this
8bytes for ANSI section name. Usually, the section name starts with a dot ('.'). Also,
some section names have very informative names (.text, .data etc) but we can't count on
that. Also note that if name of section is 8 bytes long there will be no zero string
terminator on the end of the name.
DWORD VirtualSize [*]
This is one of most interesting part of PE file. This is union so for PE EXE manipulation
we use VirtualSize (PhysicalAddress is used for OBJ files). This element specify length of
raw section data. This is non-alignment section size!
But, this value can be set to 0!!! In this case Loader assume that raw section data is
equal to alignment section length (which is given further in structure). Watcom linker
generate PE EXE files with VirtualSize filed set to 0. Viruses almost always use this
data, and not many people knows that this size can be 0 - so virus don't attach itself
good on the file! If somebody runs this badly infected file Windows will usually give
error message and program will not work (virus either)! So here is an idea how to put some
'anti-viral' shield inside your programs, without any piece of the code. Some
sophisticated anti-viral shield could also check if this value is zero or not. Of course,
all this is actual only until is un-discovered:)
DWORD VirtualAdress [*]
Section RVA address. Add this address to ImageBase (given before) and you got virtual
memory address of section. Note: here RVA is not equal with section PE file offset, but
there is still a relation between them.
DWORD SizeOfRawData [*]
Alignment section length. Alignment step is defined before, in FileAlignment.
DWORD PointerToRawData [*]
File offset of raw section data. Like it was said before, all sections (it raw data) are
after this Section headers array. During loading PE EXE file in memory Loader read section
raw data from file offset that is specified here and put it on address specified in
DWORD Characteristics [*]
Contain section characteristic (attributes). Every section has it own characteristics.
Code section usually have this attributes set: executable, readable, code. Data section
set also this attributes: readable and writeable.
Here is important code section 'writeable' attribute cause it says to KERNEL is it
possible to write in code section or not.
Finally, after PE headers and Section Header array there comes program itself, divided
into sections. Every section usually contain unique program functionality: data, code,
import data, export data etc. But, there can exist more than one section with same
functionality (but it is not recommended), for example: 2 code sections.
Here goes the descriptions of standard sections, with special care of code, import and
Linkers usually collect all code from OBJ files and put it in one section. This section
has often named as '.text'. In this section is program code.
Here is important to see how program call a function that is imported from a DLL. Those
function are, for example, all WinAPI functions and they are imported from system DLLs.
Because DLLs are not always loaded on the same address in memory, or even are not existing
in memory before program start, Loader must use a efficient way to call all imported
functions. So what Loader (and previously linker) does is that all those calls from
program code redirect to the special part of Code section where are instructions like:
JMP DWORD PTR [XXXXXXXX]
where XXXXXXXX is address of somewhere inside of program Import section. During loading PE
EXE Loader fills Import sections with proper DLL imported function memory offsets, for
every imported function. All this performance is called 'importing function'. So Loader
does not have to patch every CALL instruction in the code, but it has to set memory offset
of every imported function in Import section. Drawback of this method is that you can't
get imported function address directly from the code, but, do not worry, you can get them
in other way.
Lets say, for example, that somewhere in program is a call to GetMessage WinAPI function
that belongs to system USER32.DLL. So here is a little picture that explains this
| : | 0x77879426: | : |
0x401042: | 0x77879426 | -------------------> | GetMessage code |
| : | | : |
| : |
0x404408: | JMP DWORD PTR  |
| : |
| : |
| CALL GetMessage |
| : |
Data section hold all global and static variables data which are initialized during code
compiling. Here goes strings, too. Linker usually join all data section into one and
usually name it as '.data'. Local variables are stored in local thread stack and they are
not in data or bss section.
Here are all uninitialized static and global variables. This section does not take any
physic file space, so it PointerToRawData and/or SizeOfRawData is equal to 0. Borland
linkers usually do not have this section - it is, in fact, inside their data section. Name
of this section is usually '.bss'.
Usually is named as '.rsrc' and here are all resource data.
Import section (usually named as '.idata') hold all informations that helps Loader to find
addresses of all imported functions which are used in program so that Loader can patch
part of Import section so that program can work properly. Import section starts with an
array of IMAGE_IMPORT_DESCRIPTOR structures. Every DLL that contain imported function is
represented in this array. One DLL also can be represented more than once: each time with
different set of imported functions. This array have, of course, variable length so at the
end of the array there is an empty structure (all elements are NULL) terminator. Here goes
DWORD Characteristic [*]
Well, this is not characteristic at all: it is RVA address of special pointer array. This
pointer array is called HintName Array (HNA) and have similarly function as IAT (Import
Address Table). But, when Loader run a program it does not change this table at all, like
it changes IAT.
This element is here for special function importing technic: 'forwarding'. Forwarding is
when functions are not imported directly from requested DLL, but from forwarded DLL. For
example, NTDLL.DLL have exports of some functions that are, in fact, from KERNEL32.DLL.
When they are called from program, and if they are imported from NTDLL.DLL, Windows will
not search for them in NTDLL.DLL code but in KERNEL32.DLL code, where those functions
really are. This structure element can be important in some cases.
DWORD Name [*]
RVA address of ASCIIZ string that contains the name of imported DLL.
PIMAGE_THUNK_DATA FirstThunk [*]
RVA address of Import Address Table (IAT). IAT is an array of pointers that, in the PE EXE
file, points to the imported function names and theirs ordinal numbers. Ordinal number is
unique DLL function number. Also this number is known as Hint. So, every pointer in IAT
points to structure defined in IMAGE_IMPORT_BY_NAME which has 2 elements:
WORD Hint - ordinal number, and
BYTE Name[?] - ASCIIZ name of imported function.
Here exist one undocumented thing: there are special cases when PE EXE imports some
special functions only by their ordinal number and not by its name, so name is not given.
This is case with some system programs (like EXPLORER.EXE) which use some OSes
undocumented and non public functions. Those cases can and must be detected when a pointer
from IAT points to signed int that is less than 0. To be precisely, pointer points to
DWORD which contain special function ordinal number with most significant bit set (for
example: value 0x80000012 is for ordinal number 0x12).
During loading and running program, Loader use all this information like this: for every
pointer from IAT Loader reads the name of function which is imported and finds out its
address in current DLL. Loader than change IAT so pointers from there now points directly
to the imported function code!
Next diagram shows one standard import structure in PE EXE file.
+---+ +------------------+ +---+
/->| |--->| 44 |<---| |<---\
| | | | "GetMessage" | | | |
| +---+ +------------------+ +---+ |
IMAGE_IMPORT_DESCRIPTOR | | |--->| 72 |<---| | |
+-----------------+ | | | | "LoadIcon" | | | |
| Characteristics |---/ +---+ +------------------+ +---+ |
+-----------------+ | |--->| 19 |<---| | |
| TimeDateStamp | | | |"TranslateMessage"| | | |
+-----------------+ +---+ +------------------+ +---+ |
| ForwarederChain | : : : |
| Name |---> "USER32.DLL" |
| FirstThunk |------------------------------------------------/
Like I said before, this is how PE exe file looks like. When Loader loads this PE EXE it
changes IAT pointers so that they point directly to the code of imported function
GetMessage, LoadIcon etc. which are in USER32.DLL.
Unfortunately there are two exceptions, and there is not much documentation about them.
First exception came from Borland: programs that are linked with TLINK32 have no HNA (it
points to NULL). Like this they can save some file space. The reason why Borland discard
HNA table is because HNA and IAT have almost the same function. Still, there is the
difference between IAT and HNA, because Loader changes IAT at load-time, so after program
run HNA still points to the names of imported functions and their hints. Anyway, Borland
decide to discard HNA information.
Second exception came from Micro$oft. It is about IAT: they wanted to speed up things so
some programs have IAT with pointers that already points to the imported functions (like
explorer.exe from Win95 OSR2). IAT here already contain imported function addresses! This
is opposite from the fact that DLL are always on different addresses! Anyway, if you wanna
work with functions imports you should use both HNA and IAT.
Here is interesting another thing: Import section have usually read-write attributes set
so it is easy to intercept imported function calls! All you have to do is to patch IAT so
that it points on intercepting function. Later, from intercepting function you just call
original imported function. This is nice method especially cause there is no need to
This section (usually called as '.edata') is standard section for DLLs. It contains all
information about export functions. This functions in DLL are than ready to use them like
imported functions from other DLL or EXE. At the beginning of this section is structure:
IMAGE_EXPORT_DIRECTORY and right after goes raw data. Here are elements of this structure.
RVA address of ASCIIZ string with DLL name in it.
This is start ordinal number. To get real ordinal number you have to add this value to the
numbers you get from AddressOfNameOrdinals.
DWORD NumberOfFunctions [*]
Number of function you get via AddressOfFunctions. This number theoretically can be
different from NumberOfNames value, but this two elements are usually the same.
Number of names that you got via AddressOfNames.
PDWORD *AddressOfFunctions [*]
RVA address of an array that contain function addresses. This addresses are RVA for each
exported function of current module (DLL).
PDWORD *AddressOfNames [*]
RVA of pointer array that points to the strings. Strings contain names of exported
RVA of pointer array that points to WORD ordinal numbers of exported functions. You have
to add Base ordinal value on ordinal that you get from this array to get real function
You can see here that Export section contain 3 different array of pointers that points to
function name, its address and its ordinal number. This 3 arrays are 'parallel'. When you
search for function and you have only its name you will use array with pointers to the
names to find out where in the array is pointer that points to the same function name. So
if you wanna get other function info, usually its address, you just look into other table
at the same array position (array index).
It is time to analyze problems of attaching some code on a PE EXE file, which is called
'host'. Code that you wanna attach to the host must do something useful so it have to call
WinAPIs. And here we got some problems.
It was said before how Loader loads and run a PE EXE file. It reads IAT that points to the
names of imported functions, like WinAPI functions. Than Loader get the addresses of all
those functions and patches the IAT so now IAT points directly to imported functions code,
somewhere in DLL memory. When host code call imported function, it will be called
indirectly, via IAT content. Programmer does not have to take care about IAT - it is
created by linker, so programmer in his own code call imported function normally: call .
Problem is in the code that has to be attached to the host: how to call WinAPI from it?
Attached code wasnt there at link-time so there is no redirections via IAT!
The solution is easy: first thing attached code must do is to find addresses of all
function that it will use. All this has to be done before attached code call any of
WinAPIs. So, first you need to locate KERNEL32.DLL because it export all system WinAPIs.
But KERNEL32.DLL and other system DLLs are not on the same addresses. This was done by
purpose so it could be not possible to write a Windows virus. But, there is always a
...and here its name is GetModuleHandle. This WinAPI simply returns address of any module
you need. Here we need KERNEL32.DLL. But, wait, how can we call GetModuleHandle() if
KERNEL32.DLL is not already located and what is address of this function???
Solution is simply: you get GetModuleHandle from hosts IAT. When Loader finish its job and
patch IAT and give control to the program (attached code) all you have to do is to read
from IAT data about imported GetModuleHandle. So once you get GetModuleHandle address
after you call it in EAX there will be address (handle) of KERNEL32.DLL.
Here you can spot a week point of this method: host must import GetModuleHandle. Well,
this is not so week point cause almost every PE EXE import this function.
Lets go on. KERNEL32.DLL is located, but how to find all other function addresses that is
used from attached code? Simply: use GetProcAddress WinAPI. This function for its params
has DLL handle and pointer to function name. As result you got function address. But here
I need the address of GetProcAddress before I call it, right?
Yes and no. GetProcAddress is also very often imported so you get its address from IAT,
the same way you did it for GetModuleHandle. But there are many programs that do not
import this function. The best solution here is to use your own function that work
identically like GetProcAddress.
Once when you got addresses of GetModuleHandle and GetProcAddress it is possible to find
out addresses of all other WinAPIs that attached code use. Usually all that is done on the
very beginning of the attached code.
Down on this page you can download 'Getproc.exe' - program that gives you addresses of all
functions. There is source included. Program is written by jqwerty from the famous group:
29A which created many very good viruses.
Attached code must not destroy current PE EXE file structure. There are many solutions for
attaching code. Here goes description of common two. Others are only variants of this
First solution that man can think of is to attach your code as new section at the end of
PE EXE. But, if we look a bit inside we will see that this solution works, but there is so
many things that have to be changed. You need, for example, to add new Section Header in
the middle of the PE EXE file: it is not very pleasant thing to do, especially for
viruses. So there is another way to do it, more sophisticated.
Second solution will attach code at the last PE EXE file section. Like this original PE
EXE file structure remains more and less the same. So here are details for this solution.
Note: regardless solution you will use, you have to take care of EXE files that have
Overlays. Overlay is some data added at the end of the file and its size is not included
in structures of PE EXE file. Typical example for this are apps installations: installer
have an overlay at the end of its code that contain program that will be installed.
Attached code always goes to the end of PE EXE file so if there is an overlay there you
must to take care about it, too.
Here is explained how to attach code to the host PE EXE file. Variable declarations, error
treatment and other 'non-important:)' stuff will not be discussed here so reader can keep
attention on real stuff.
At the beggaring you should open host and map it into memory. Lets say that filemap is
char* pointer to the host file memory mapped block.
[***] Important note: The rest of the project is just descriptive, for now. If there is a
interest and support, this project will be finished with full virus source explanation.
Check out if file is valid PE EXE: first check for MZ (or ZM) signature at the beginning
of the file and for PE00 signature before PE headers.
Get number of section. Go to the PE Optional Header.
Check if app is for GUI. Store EntryPoint, ImageBase, FileAlignment (it is section
alignment) and get pointer to the DirectoryData and SizeOfImage.
If there is Import section in DirectoryData (and usually is) calculate its position and
get from it RVA address for KERNEL32.DLL functions: for GetModuleHandleA and, if exist,
Find the last section in the host. But note that this does not have to be the last section
in Section Table!!! Sometimes can happened that the last section in Section Table is BSS
section which does not take physic host file size! This section have zeroed
PointerToRawData and/or SizeOfRawData.
Find out where to attach code. Like it was said before, VirtualSize can be 0, so take care
Finally, time for changing PE file. You should change:
- RVA address of EntryPoint os it points to the attached code;
- size of last section (SizeOfRawData);
- size of the last section in DirectoryData, if exist there;
- characteristics of last section: need to set to EXECUTE, READ and WRITE.
- size of last section (VirtualSize) if not equal to 0;
- size of whole file
Before writing attached code to the host usually you store inside the attached code some
data that it will use during execution. This will be founded addresses GetModuleHandleA
and GetProcAddress from KERNEL32.DLL, and also need to store original EntryPoint.
And the last thing we need to discuss here is how attached code looks like. All code is in
one section, so data and code will be together, so there is no separate data and code
segment. Because of that you need to set READ & WRITE attributes of section where code
will be added.
[***] Important note: The rest of the project is just descriptive, for now. If there is a
interest and support, this project will be finished with full virus source explanation.
Attach code general works like this: first it must to find out where is data part of the
code. Then it reads from there all stored KERNEL32 function addresses. Than it finds
addresses of all other WinAPI functions that will be needed.
After this code do whatever it should do: if it was a virus it should start to infect
other files. Anyway, when it finish its job it has to jump to original stored EntryPoint
so that host run normally.