Skip navigation.

Paper: Code Normalization for Self-Mutating Malware

Code normalization is a popular topic for anti-virus researchers, especially with respect to classifying the phylogeny of a particular sample. The essence of the idea is that each assembly instruction is translated into an intermediate language that contains all the state modification performed by an instruction. The consummate example is that of the dec assembly instruction. This modifies a register and also modifies six other control flags. The results are then run through a series of optimizations which remove and reorder the code into the normalized form.

This intermediate form is interesting in that it is simply an expansion of the assembly language. I have trouble seeing how this intermediate step would be worthwhile over just optimizing the non-expanded assembly code. It seems completely impractical for implementing on a real-time defense system (such as a virus scanning engine) and is better suited to closed research systems.

The article is in the Volume 5, issue 2 of IEEE's Security and Privacy. The article requires a ridiculous fee to get to, so if you have access check it out.

Update: Find them on the author's website.

Code normalization articles

One of the authors have several pdf files about this subject freely available from