Wednesday, November 12, 2014

Resource Mess: Parsing resource directory in PE files

Before I start the blog(I actually started), I wanted to say that I'm a lousy coder, my code is a mess, only I can read it, but the important thing is, it works :D.
Recently I was working on some project regarding PE files, where I've to parse the PE file to extract some information. As I've previous experience with the PE files, so it was supposed to be a smooth ride. But I hit a road bump, when I reached the resource section. I don't know why Microsoft had to made it so messy. I tried to search internet for info regarding the structure, but the information was too vague or hard to understand or in most cases incomplete (or I guess I don't know how to google stuff).
But somehow, I was able to parse the code and extract all the resources. So, I decided to blog about this, so that it may help some confused person like me. So, here are my findings about the resource section how to parse it easily using any language. It may help you if you are trying to learn file structure or create any program regarding resources in a PE file, so...lets go.
We'll go step by step how to parse resource section.
Step 1. To parse, first of all we need to locate where the resources of a file are located. Now it may be easier to look at the .rsrc section of the file, but in some files, there is no such section, or in case malware has infected the file, you wont find anything. So to avoid this, we'll go the legit way i.e. looking at the Data Directories in the PE file. These provide easy way to locate certain areas in PE file which are important in one way or another. The Data Directory has following structure:
typedef struct _IMAGE_DATA_DIRECTORY {
     DWORD   VirtualAddress;
     DWORD   Size;
We're concerned with VirtualAddress part to locate the concerned data directory. Remember the VirtualAddress denotes the RelativeVirtualAddress, you might need it to convert it into virtual or raw address. The Data Directory we are looking here is IMAGE_DIRECTORY_ENTRY_RESOURCE(defined as DataDirectorty[2]). Once we got the address, we go to the next step, which was real pain in the back for me to understand.
Step 2. Now that we got the address where the resource of the PE file are, now we can parse it. Well, as easy it may sound, it was not that easy for me. So to do this, we have to understand the structure of the resources in the PE file. The resource of PE comprises of two primary structure, which sort of form a tree like structure. The structures are:

     DWORD   Characteristics;
     DWORD   TimeDateStamp;
     WORD    MajorVersion;
     WORD    MinorVersion;
     WORD    NumberOfNamedEntries;
     WORD    NumberOfIdEntries;
This structure is immediately followed by another structure
     union {
         struct {
             DWORD NameOffset:31;
             DWORD NameIsString:1;
         DWORD   Name;
         WORD    Id;
     union {
         DWORD   OffsetToData;
         struct {
             DWORD   OffsetToDirectory:31;
             DWORD   DataIsDirectory:1;
Remember, the structure is immediately followed, there is no link between to, they are just adjacent to each other(I don't know why they are not linked).
So how many resources dierctories are there in the PE file? Its simple, we can get it from _IMAGE_RESOURCE_DIRECTORY i.e NumberOfNamedEntries + NumberOfIdEntries, it will gives you the exact number of resources in a PE file.
Now here some concentration(cup of 0xcoffee) is required.
The resources in PE files are of various categories. here are some :
  • Cursor
  • Bitmap
  • Icon
  • Menu
  • Dialog
  • String
  • Font directory
  • Font
  • Accelerator
  • RCData
  • Message table
  • Version
  • Dialog
  • Plug and Play
  • VXD
  • Animated Cursor
  • Animated Icon
  • HTML
  • Manifest
Usually, the resource directory is a tree with maximum depth of 3. There is a root node of type _IMAGE_RESOURCE_DIRECTORY. Then based on the number of resources the file has, it root node will have further branches. Now we have to decide if the branch is a another directory or leaf node(resource data). For this, we'll check the DataIsDirectory variable from _IMAGE_RESOURCE_DIRECTORY_ENTRY (which is adjacent to the _IMAGE_RESOURCE_DIRECTORY). If it is 1, the next node of the tree is another _IMAGE_RESOURCE_DIRECTORY, then we will have to parse it again and if it is 0, the next node of the tree is a leaf node, which points to the resource data.
Leaf node is in the form of structure
    DWORD   OffsetToData;
    DWORD   Size;
    DWORD   CodePage;
    DWORD   Reserved;
In psuedocode language:
function parse_resource_section(){

  for(NumberOfNamedEntries + NumberOfIdEntries){

    //leaf node not reached

    if DataIsDirectory is 1{

      call parse_resource_section()



      //we reached the leaf node,

      map the address to _IMAGE_RESOURCE_DATA_ENTRY



This thing is a bit complicated. I actually have to look back to my code to understand once again. But I think this is almost it. Once you get the address of the data of particular resource, you can now figure what to do with it. For example, in my case, I wanted to check if there is any executable file in the resource or not.
Please leave your feedback, any comments, I might be wrong about something in the blog, any mistakes or any suggestion.


Post a Comment