Paper: Static Analyzer of Vicious Executables (SAVE)
SAVE seeks to classify closely related pieces of malicious software for the purposes of identifying future ones. The core idea is a good one: Byte code is modified for the purpose of obfuscating the signature of a piece of malware. The stated goal is to modify it in such a way that it is possible fool antivirus scanners. This is done in five different ways.
- Null operations are inserted into dead code. Assuming that one is modifying a section of code, insert null operations into the region. Nops are inserted at various places.
- Modify the data to have a different value, but also retain the same functionality. The example given in their paper is changing je operation to a jge. Functionality is maintained, but the data is altered.
- Control flow modification by inserting jumps and nops in various locations.
- Data and control flow modification. Same as 2, and 3 but in conjunction.
- Pointer aliasing. Variables are replaced with global pointers and functions are referred to by arrays of function pointers.
All these changes are performed on various viruses, but most notably the MyDoom, for which you can get assembly source code. The problem with the above mentioned techniques is that the data offsets will be all offered. Unless there is disassembly involved such that the relocations can be modified, the above modifications are largely invalid.
A sequence alignment based euclidian distance is calculated for each code segment. This gives an idea of the similarity of binary values. Read more about if you care. This technique is an interesting one, but certainly things like Scott Miller's binblast technique are better suited.
The anti-virus scanners reacted poorly to the modified samples, while SAVE performed admirably.
Problems With The Approach
The first one is the afore mentioned offset calculations. Modifying real-live PEs in this manner is unlikely to produce working malware without significant code patching. While this is not unlikely, it is quite possible that the code will produce a working example.
The next problem which might have been answered is that there was no investigation into whether the malware valid. Did it run? Did it reproduce? These are all factors that would cause and AV vendor to discount or not flag a binary as malicious.
The packer problem was also largely ignored, save a mention in the paper.
The last problem is that there was no representation as to what the false-positive rate was. This is a critical step that should be performed to show validity.
With a little more work the method could be a valid one. The binary modification techniques were certainly not new. The Euclidian distance is one that should prove useful under future iterations.