Skip navigation.
Home

Automating Tools

How should we automate malware fingerprinting and feature extraction?

For example:

Should we automatically run things like strings and make the output searchable?

Should we automatically look for data like URLs, domain names, and IP addresses that make up the network fingerprint. Should we do more detailed analysis on connect() calls and such?

How can we automate some of the static analysis like call-graph extraction?

cool ideas

Halvar has some interesting python scripts for call graph manipulation. I bet that stuff could be automated pretty easy.

Anyone else have ideas ?

Interesting

What Halvar tring to do whith his python scripts? There are public somewhere? Can I have it
thx
Tone Victor

structural information for fingerprints

I think that the best tool to find parental relations between code, and be able to automatically build a family tree is to use structural information, like what Halvar's (Sabre Security's) bindiff does.

I don't know how much from BinDiff could be reused, or if it would have to be reimplemented, but IMHO that's the best way to go.

We all know that any signature that will completely change if a single bit of the binary changes is not good at all. (talking about all and any hashes)

We need to have a "continuous" distance from binary to binary, rather than a binary distance. Now, the question is, who has time to work on it? that's always the question, ideas doesn't count much lately, what really couts is the power to make them real, so you can just ignore this comment, or add it to the wishlist :-)

Automating Tools

Considering past history with various forms of viruses, seems this method using a call-graph analysis could be easily overcome. Not that this idea is bad, but so many factors can affect a call graph beyond a typical program entry point. Simply putting in logic to use a jump table would make an analysis of this very difficult. Tracking api calls can of course be done via encoded strings, decoded at runtime and repeated calls to GetProcAddress(). Im a bit surprised most malware at this point isn't using many of the same techniques many viruses use, encrypting, polymorphic, etc.