About Me

Rafal Wojtczuk

Rafal Wojtczuk

Read More

Feeds & Podcasts

Blogs

Meet the Bloggers

Archive

Tags

#SecChat $1 million guarantee 12 Scams of Christmas access to live fraud resolution agents Acquisition Alex Thurber Android antivirus Apple botnet Channel Partners cloud security Compliance Consumer counter identity theft credit card fraud and protection credit fraud alerts credit monitoring credit monitoring and resolution critical infrastructure Cyber Security Mom cyberbullying Cybercrime cybermom data breach data center data center security Data Protection Dave DeWalt DLP Email & Web Security embedded encryption Endpoint Protection enterprise facebook fake anti-virus software Family Safety Friday Security Highlights global threat intelligence google government Hacktivism how to talk to kids how to talk to teens identity fraud identity fraud scams identity protection identity protection $1 million guarantee identity protection fraud identity protection surveillance identity surveillance identity theft identity theft expert identity theft fraud identity theft protection identity theft protection product Identity thieves and cybercriminals intel iphone kids online behavior lost wallet protection malware McAfee McAfee Channel McAfee Family Protection McAfee Identity Protection McAfee Initiative to Fight Cybercrime McAfee Labs McAfee security products Mid-Market Mobile mobile malware mobile security monitor credit and personal information Network Security online personal data protection online safety Operation Aurora PCI personal identity theft fraud personal information loss personal information protection phishing privacy proactive identity protection proactive identity surveillance Public Sector restore credit and personal identity Risk and Compliance scam scams scareware security smartphones social media social networking social networks spam Stuxnet twitter vulnerability Web 2.0 work with victim restore identity

Binary code analysis: benefits of C++ virtual function tables detection

Monday, May 15, 2006 at 9:14am by Rafal Wojtczuk
Rafal Wojtczuk

Introduction

We should start with a description of C++ virtual functions implementation; fortunately, there are many articles (particularly this one) which explain it well. Some advanced issues, for instance the multiple inheritance implementation, are described here .
Short summary: if a C++ class contains at least one virtual function, then for each object of this class, the memory chunk allocated for this object contains a pointer to this class virtual function table (vftable for short). On x86 architecture, if the ecx register points to the object variable (so, ecx equals "this" pointer), then a call to this object's third virtual function can be implemented like this:
mov eax, [ecx] ; load eax with a pointer to vftable
call [eax+8] ; call the third function in the table

Why bother to detect vftables?

There are a couple of reasons why detection of vftables can be useful for binary analysis:

  • Because vftables can be stored within .text segment, a disassembler may try to treat it as code. Particularly, IDA sometimes does this; as a result, it produces functions containing weird opcodes, for instance:
    sbb (byte_7D3939FF-7D393A7Dh)[ebp], bh
    arpl [edx-79D682D4h], ax
    If we knew what regions are occupied by vftables, we could instruct IDA not to disassemble them.
  • Another usage is related to binary matching of different versions of the same code ( here you can learn more on what binary matching/binary diffing is about). From now on, we assume the debugging symbols are not available.Let's assume that we have already matched a certain number of functions from binary A with functions from binary B (say, we have matched functions with identical bodies, or with identical sets of called imported functions). If
    • a certain function funcA from binary A is present in only one vftable vftA,
    • a certain function funcB from binary B is present in only one vftable vftB,
    • we have already matched funcA with funcB

    then we may safely assume that vftA and vftB refer to the same class; therefore, we may match all members of vftA with respective members of vftB. Similarly, if we have matched class constructors, we can match all members of respective (referenced in the constructor) vftables.The above method has some advantages when compared with other matching algorithms. Particularly, it can reliably match functions which have few/none distinguishing features – all we need is its offset in vftable.

How to locate vftables?

In order to locate a vftable, we may use the fact that the vftable address is explicitely used in a constructor – as a part of object initialization, a constructor stores vftable address within the memory chunk allocated for an object. Therefore, the algorithm looks like this:
simple_vft_loc:

  • find all occurrences of "mov [reg+small_const_offset], some_const_val"
  • for each "some_const_val",
    • check whether it is a correct address within a binary boundaries
    • If so, extract the DWORD pointed to by some_const_val; let's name it FPTR.
    • Check whether FPTR is a valid pointer into an executable segment, and if it points into something resembling code, not data

    If all above steps succeed, then assume "some_const_val" is a beginning of vft, and a "mov" instruction referencing it belongs to a constructor.

Does it really work?

In order to test the above algorithm, let's run it on a binary for which the debugging symbols are available: this way, we will be able to compare this algorithm's results with .pdb file contents. In case of VC compilers, C++ mangled names of vftables start with "??_7″ prefix, so we can easily extract all vftable entries from the output of any .pdb parser.We have chosen mshtml.dll for our test drive (I bet some of you share the idea that it makes sense to examine this particular binary in some detail). For mshtml.dll version 6.0.3790.2577, mshtml.pdb contains 886 vftable names; they point to 763 different vftables. Simple_vft_loc outputs 768 addresses which are supposed to be vftables. It turned out that 28 vftables were not detected ("false negatives"); mostly because some static objects variables contain a preinitialized vftable pointer (so, the vftable pointer is not set by a constructor, it is set by the linker). On the other hand, 33 addresses were "false positives": they pointed to variables which were not actually vftables, they just happened to start with a function pointer.

As we see, the false negative ratio is below 4%. Moreover, it is very probable that in a binary we would match our mshtml.dll with, the matching vftable would not be detected as well. Therefore, vftable detection false negatives should not impair the matching algorithm.

The false positive ratio is similarly low. Again, it should not lead to errors in binary matching – instead of matching vftable entries, we will match entries in other structures containing function pointers.

The simple_vft_loc algorithm was integrated in the "funcmatch", a binary matching tool, and so far, its performance is very satisfactory.

Other tables of functions?

Another common construction containing function pointers is a RPC dispatch table. An approach very similar to the above, using dispatch table detection, was implemented in the funcmatch tool as well.

Bookmark and Share

Submit your own comments / message for this post

Your email is never published nor shared. Required fields are marked *

 

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Comments (0)