Introduction

During my holidays, I had plenty of time to study and reverse a program, which was completely coded in C++. This was the first time I seriously studied a C++ codebase, using IDA as the only source of information, and found it quite hard.

Here’s a sample of what you get with Hex-rays when you start up digging into an interesting function:

v81 = 9;
v63 = *(_DWORD *)(v62 + 88);
if ( v63 )
{
   v64 = *(int (__cdecl **)(_DWORD, _DWORD, _DWORD,
   _DWORD, _DWORD))(v63 + 24);
   if ( v64 )
     v62 = v64(v62, v1, *(_DWORD *)(v3 + 16), *(_DWORD
     *)(v3 + 40), bstrString);
}

It’s our job to add symbol names, identify classes and set up all the information to help hex-rays in giving us a reliable and certainly understandable output:

padding = *Dst;
if ( padding < 4 )
  return -1;
buffer_skip_bytes(this2->decrypted_input_buffer, 5u);
buffer_skip_end(this2->decrypted_input_buffer, padding);
if ( this2->encrypt_in != null )
{
  if ( this2->compression_in != null )
  {
    buffer_reinit(this2->compression_buffer_in);
    packet_decompress(this2,
      this2->decrypted_input_buffer,
      this2->compression_buffer_in);
    buffer_reinit(this2->decrypted_input_buffer);
    avail_len = buffer_avail_bytes(this2->compression_buffer_in);
    ptr = buffer_get_data_ptr(this2->compression_buffer_in);
    buffer_add_data_and_alloc(this2->decrypted_input_buffer, ptr, avail_len);
  }
}
packet_type = buffer_get_u8(this2->decrypted_input_buffer);
*len = buffer_avail_bytes(this2->decrypted_input_buffer);
this2->packet_len = 0;
return packet_type;

Of course, Hex-rays is not going to invent the names for you, you’ll still have to make sense of the code and what it means to you, but at least, being able to give a name to the classes will certainly help.

All my samples here have been compiled either with visual studio or Gnu C++. I have found the results to be similar, even if they may not be compatible. Fix it for your compiler of interest.

Structure of a C++ program

It is not my goal to teach you how OOP works, you already know that. We’ll just see how it works (and is implemented) in the big lines.

Class = data structure + code (methods).

The data structure can only be seen in the source code, when the methods will appear in your favorite disassembler.

Object = memory allocation + data + virtual functions.

The object is an instantiation of a class, and something you can observe in IDA. An object needs memory, so you will see a call to new() (or a stack allocation), a call to a constructor and a destructor. You will see accesses to its member variables (embedded objects) and maybe calls to virtual functions.

Virtual functions are silly: it is hard to know, without running the program with breakpoints, what code is going to be executed at runtime (and disassemble it).

Member variables are a bit easier: they work like their counterpart in C (structs), and IDA has a very handy tool to declare structures, and hex-rays handles them very well in the disassembly. Let’s go back to the bits and bytes.

Object creation

int __cdecl sub_80486E4()
{
  void *v0; // ebx@1
  v0 = (void *)operator new(8);
  sub_8048846(v0);
  (**(void (__cdecl ***)(void *))v0)(v0);
  if ( v0 )
    (*(void (__cdecl **)(void *))(*(_DWORD *)v0 + 8))(v0);
  return 0;
}

Here’s the decompilation of a small test program I compiled with G++. We can see the new(8), which means our object is 8 bytes long, even if that doesn’t mean we have 8 bytes of variables.

The function sub_8048846 called just after the new() takes the pointer as parameter, and certainly is the constructor.

The next function call is a little cryptic. It’s doing two pointer deferences on v0 before calling it. It’s a virtual function call.

All polymorphic objects have a special pointer in their variables, called the vtable. This table contains addresses of all the virtual methods, so the C++ program can call them when needed. In the compilers I could test, this vtable is always the first element of an object, and always stays at the same place, even in subclasses. (This could no stay true for multiple inheritance. I did not test).

Let’s do some IDA magic:

Rename the symbols

Just click on a name, press « n » and give a meaningful name. Since we don’t know yet what our class do, I suggest we name the class « class1 », and use this convention until we’ve understood what our class do. It’s very possible that we’re going to discover other classes before we finished digging class1, so I suggest we simply continue naming classes as we find them.

int __cdecl main()
{
  void *v0; // ebx@1
  v0 = (void *)operator new(8);
  class1::ctor(v0);
  (**(void (__cdecl ***)(void *))v0)(v0);
  if ( v0 )
    (*(void (__cdecl **)(void *))(*(_DWORD *)v0 + 8))(v0);
  return 0;
}

Create structures

The « structures » window of IDA is very useful. Type Shift-F9 to make it appear. I suggest you pull it off (in the QT IDA version) and put it on the right of the IDA window, so you can see both the decompile window and the structures.

Press « ins » and create a new structure « class1 ». Since we know that this structure is 8 bytes long, add fields (using key « d ») until we have two « dd » fields. Rename the first to vtable, since yes, that’s what we got here !

Now, we’re going to add typing information in our function. Right-click on v0, « Convert to struct * », select « class1 ». Alternatively, pressing « y » and typing in « class1 * » will give you the same result.

Create a new structure, of 12 bytes, and call it « class1_vtable ». At this state, we cannot really know how big that vtable is, but changing the structure size is very easy. Click on « vtable » in class1’s declaration, and type « y ». Now, declare it as a « class1_vtable * » object. Refresh the pseudocode view, and watch the magic.

We can rename the few methods to « method1 » to « method3 ». Method3 is certainly the destructor. Depending on the programming convention and the compiler used, the first method often is the destructor, but here’s a counterexample. It is time to analyze the constructor.

Analysis of the constructor

int __cdecl class1::ctor(void *a1)
{
  sub_80487B8(a1);
  *(_DWORD *)a1 = &off_8048A38;
  return puts("B::B()");
}

You can start by setting the typing information we already know on « a1 ». The puts() call confirms our thoughts that we are in a constructor, but here we even learn the name of the class.

« sub_80487B8() » is called directly in the constructor. This can be a static method of class1, but it can also be a constructor of a parent-class.

« off_8048A38 » is the vtable of class1. By looking there, you will be able to find out how big is our vtable (just watch the next pointer that has an Xref), and a list of the virtual methods of « class1 ». You can rename them to « class1_mXX », but beware that some of these methods may be shared with other classes.

It is possible to set typing information on the vtable itself (click on it, « y », « class1_vtable »), but I do not recommend it since you lose the classic view in IDA, and it doesn’t provide anything you can’t see in the classic view.

The strange call in the constructor

int __cdecl sub_80487B8(int a1)
{
  int result; // eax@1
  *(_DWORD *)a1 = &off_8048A50;
  puts("A::A()");
  result = a1;
  *(_DWORD *)(a1 + 4) = 42;
  return result;
}

The call to the « sub_80487b8() » function in the constructor reveals us the same type of function: a virtual function table pointer is put in the vtable member, and a puts() tells us we’re in yet another constructor.

Don’t retype the type « class1 » for argument « a1 », since we’re not dealing with class1. We found a new class, that we will call « class2 ». This class is a superclass of class1. Let’s do the same work as in class1. The only difference it that we do not know exactly the size of its member. There are two ways of figuring it out:

Look at the xrefs of class2 ::ctor. If we find a straight call to it after a new (i.e. an instantiation), we know the size of its members.
Look at the methods in the vtable, and try to guess what’s the highest member ever accessed.

In our case, « class2 ::ctor » accesses the 4 bytes after the 4 first ones and set it to 42. Since its child-class « class1 » is 8 bytes long, so is « class2 ».

Do the same procedure with all the subclasses, and give names to the virtual functions, starting from the parent classes to the children.

Study of the destructors

Let’s go back to our main function. We can see that the last call, before our v0 object becomes a memory leak, is a call to the third virtual method of class2. Let’s study it.

if ( v0 )
  ((void (__cdecl *)(class1 *))
    v0->vtable->method_3)(v0);
…

void __cdecl class1::m3(class1 *a1)
{
  class1::m2(a1);
  operator delete(a1);
}
…

void __cdecl class1::m2(class1 *a1)
{
  a1->vtable = (class1_vtable *)&class1__vtable;
  puts("B::~B()");
  class2::m2((class2 *)a1);
}
…

void __cdecl class2::m2(class2 *a1)
{
  a1->vtable = (class2_vtable *)&class2__vtable;
  puts("A::~A()");
}

What we can see here is the following: class1 ::m3 is a destructor, which calls class1 ::m2 which is the main destructor of class1. What this destructor do is ensure that we’re well in « class1 » context, by setting back the vtable to is « class1 » state. It then calls the destructor of « class2 », which also sets the vtable to « class2 » context. This method can also be used to walk through the whole class hierarchy, since the virtual destructors must always be called for all the classes in the way.

Hey, what are all these casts? Why do I have two structures defining the same fields?

What we have here is exactly the same problem that you get when doing OOP with C : You end up with several fields declared in all the subclasses. Here is what I do to avoid redefinition of fields:

For each class, define a classXX_members, classXX_vtable, classXX structure.
classXX contains
- +++ vtable (typed to classXX_vtable *)
- +++ classXX-1_members (members of the superclass)
- +++ classXX_members, if any

Ideally, you should start from the main class to the children, until you end up in an edge class. In our exemple, here’s the « solution » of our sample:

00000000 class1          struc ; (sizeof=0x8)
00000000 vtable          dd ?                    ; offset
00000004 class2_members  class2_members ?
00000008 class1          ends
00000008
00000000 ; ----------------------------------------------00000000
00000000 class1_members  struc ; (sizeof=0x0)
00000000 class1_members  ends
00000000
00000000 ; ----------------------------------------------00000000
00000000 class1_vtable   struc ; (sizeof=0xC)
00000000 class2_vtable   class2_vtable ?
0000000C class1_vtable   ends
0000000C
00000000 ; ----------------------------------------------00000000
00000000 class2          struc ; (sizeof=0x8)
00000000 vtable          dd ?                    ; offset
00000004 members         class2_members ?
00000008 class2          ends
00000008
00000000 ; ----------------------------------------------00000000
00000000 class2_vtable   struc ; (sizeof=0xC)
00000000 method_1        dd ?                    ; offset
00000004 dtor            dd ?                    ; offset
00000008 delete          dd ?                    ; offset
0000000C class2_vtable   ends
0000000C
00000000 ; ----------------------------------------------00000000
00000000 class2_members  struc ; (sizeof=0x4)
00000000 field_0         dd ?
00000004 class2_members  ends
00000004

int __cdecl main()
{
  class1 *v0; // ebx@1
  v0 = (class1 *)operator new(8);
  class1::ctor(v0);
  ((void (__cdecl *)(class1 *)) v0->vtable->class2_vtable.method_1)(v0);
  if ( v0 )
    ((void (__cdecl *)(class1 *)) v0->vtable->class2_vtable.delete)(v0);
  return 0;
}

int __cdecl class1::ctor(class1 *a1)
{
  class2::ctor((class2 *)a1);
  a1->vtable = (class1_vtable *)&class1__vtable;
  return puts("B::B()");
}

class2 *__cdecl class2::ctor(class2 *a1)
{
  class2 *result; // eax@1
  a1->vtable = (class2_vtable *)&class2__vtable;
  puts("A::A()");
  result = a1;
  a1->members.field_0 = 42;
  return result;
}

In brief

When you find a new class, give a symbolic name, and resolve the whole tree before figuring out what should be its real name
Start from the ancestor and go up to the children
Look at the constructors and destructors first, check out the references to new() and static methods.
Often, the methods of a same class are located close to each other in the compiled file. Related classes (inheritance) may be far away from each other. Sometimes, the constructors are inlined in childclasses constructors, or even at the place of the instantiation.
If you want to spare time when reversing huge inherited structures, use the struct inclusion trick to name variable only once.
Use and abuse Hex-rays’ typing system, it’s very powerful.
Pure virtual classes are hell : you can find several classes having similar vtables, but no code in common. Beware of them.

Sources

Try this at home !
The binary (elf32 stripped)
The source file. Don’t open it too fast !

18 Replies to “Reversing C++ programs with IDA pro and Hex-rays”

hellman says:

03/09/2011 at 07:42

Hey! Great article!
Can you post a binary+source of the example for some practicing? 🙂
Aris Adamantiadis says:

03/09/2011 at 09:27

Hi,

I intended this but forgot it while posting. Now it’s repaired !
Thanks for your feedback.

Aris
g says:

04/09/2011 at 01:00

have u played with any plugins that claim to make understanding C++ easier and were any of them useful
Aris Adamantiadis says:

04/09/2011 at 12:21

Nope, sorry. Where I was, I couldn’t search anything on the net to ease my task. I’ll get back with more info if I find anything useful.
Progman says:

05/09/2011 at 04:18

The member variable length discussed is good for a start but not definitive.

Now I would say that you would need to correct this as classes can theoretically be variable length if the last member of the class is a variable length array for example as is commonly seen in C structs.

Thus to find the length of any class whether the child or parent in this case, you must look at all references to the constructor to find the maximal length which is probably fixed though could possibly be variable. Also you could do the same by looking at all the member functions of the class in question and its parents to analyze maximal length (if you understand the code you don’t have to guess).
Pingback: Flame, Duqu and Stuxnet: in-depth code analysis of mssecmgr.ocx | ESET ThreatBlog
Pingback: IT Secure Site » Blog Archive » Flame, Duqu and Stuxnet: in-depth code analysis of mssecmgr.ocx
none says:

02/08/2012 at 15:58

What if i have done all that but, some function
gets &Class1_Struct+4 for some reason…
how can i link that to the structure I’ve made from the constructor,
Do I have to create a new structure just for that?

example:
Function A(int arg1,int arg2,int &Class1_plus->2nd_vftable)
{
…. code to analyze
….
}

please help
Aris Adamantiadis says:

02/08/2012 at 16:08

Hi,

I find very odd that you get a function that’s called with a pointer to the vtable… Maybe it’s an object that’s multiple inherited ? I see no real solution except set the argument type as int * and work with it.
When changing the types, I generaly work on whatever gives the best results inside the decompilation rather than in the function prototypes.
Pingback: Reversing C++ programs with IDA pro and Hex-rays | 0day in {REA_TEAM}
adam says:

01/10/2014 at 21:07

How can i connect between the actual vtable found in the .rodata section and the structs i use for function names like v0->vtbl->func1 (which has an address and i know that).

thanks!
Pingback: Flame, Duqu and Stuxnet | caglar's space
Pingback: Static analysis of C binaries | CL-UAT
Pingback: Static analysis of C binaries | XL-UAT
Pingback: Static analysis of C++ binaries | DL-UAT
Pingback: Самообразования тред — tboolean1408
Walter says:

13/12/2018 at 14:43

a very interesting blog! I am a embedded system designer, and like to do some reversing, some c project is ok, but c++ project is very hard to get its meaning. Not for business, only my weekend work.
Pingback: Reverse Engineering Resources | Beginners To Intermediate Guide/Links | ZeroOne.in