Skip navigation.
Home

On the Legitimacy of Obfuscated Code

Chris Wysopal has written an article about different uses of obfuscation inside of executables. Malicious or not, it is a useful tool for hiding or at least raising the bar on reverse engineering effort required. It's a good article and I recommend you read it. It did get me to thinking about a couple of things in reverse engineering.

One thing that Chris mentions is that users should be able to decide whether or not they want obfuscated code on their system. In many ways this is similar to the open vs. closed source debate. I have long argued that having the assembly for a program is equivalent to having the source code for a skilled reverse engineer. Looking at enough assembly and work with different compiler variations and one can work out what the original code looked like.

Regarding the question about whether obfuscation is a bad thing, Rolf Rolles recently commented that Bitdefender decided wholesale that the VMProtect packer is malware and anything obfuscated with it should be removed. Now the Bitdefender developers are smart guys, and maybe they decided that any legitimate software has no need to use this. Other anti-virus software takes a similar tactic. During the Race To Zero contest at Defcon last year, the winning team noticed that removing all the imports from an executable caused multiple AV vendors to automatically flag an executable as being suspicious.

The choice about the legitimacy of packers and obfuscation has already been made for us by the AV community: It's bad. This may be narrow sighted but hey, that's what the industry is all about.

BitDefender

I can only speculate as to the internal workings of BitDefender's AV, but I can assert that it did in fact flag my non-malicious VMProtect samples as malware and delete them. I assume this means VMProtect itself is blacklisted when it can be detected, e.g. in its "packer" mode or when OEP is virtualized.

obfuscation? maybe... or not

One problem with asserting that "users should be able to decide whether or not they want obfuscated code on their system" is that sometimes it can be hard to tell what's obfuscation and what isn't. A lot of code transformations that look like obfuscation to a non-expert may have nothing at all to do with DRM or anti-reverse-engineering, and may simply be the result of very aggressive optimization. For example:

  • A long time ago Henry Massalin did some really neat work on the "superoptimizer" [1], which would come up with really short and unintuitive instruction sequences for common operations. The fact that these were unintuitive and hard to understand doesn't make it obfuscation.

  • Conversely, a lousy compiler can generate really crappy code with all sorts of unnecessary and redundant operations. This isn't obfuscation, even though malware also use unnecessary/redundant code ("semantic NOPs") to disguise their code from scanners.

  • JIT compilation (and other such dynamic optimization techniques) are a form of self-modifying code.

  • There has been work on using selective virtualization to reduce a program's code footprint, for use in memory-limited embedded systems [2]. Again, this isn't obfuscation even though some code obfuscators use virtualization.

References:

[1] Henry Massalin. "Superoptimizer--A Look at the Smallest Program". Proc ACM ASPLOS '87, Sigplan Notices 22,10 (Oct 1987), 122-126.

[2] Jan Hoogerbrugge, Lex Augusteijn, Jeroen Trum and Rik Van De Wiel. "A code compression system based on pipelined interpreters". Software—Practice & Experience vol 29, issue 11 (September 1999).