For the past few months I have been doing research on PDF analysis and how it could be better improved. While doing the research I found myself writing tools and scripts to help me get the job done and decided it was time to put something more useful together. PDF X-RAY is a static analysis tool that allows you to analyze PDF files through a web interface or API. The tool uses multiple open source tools and custom code to take a PDF and turn it into a sharable format. The goal with this tool is to centralize PDF analysis and begin sharing comments on files that are seen.
PDF X-RAY differs from all other tools because it doesn't focus on the single file. Instead it compares the file you upload against thousands of malicious PDF files in our repository. These checks look for similar data structures within the PDF you upload and ones that have been reviewed by analysts. Using this feature we can begin to see shared coded samples among malicious files or trends due to malicious author coding styles. The tool is still in beta, but I wanted to release it to the public to see what users thought. In my opinion the API is the most useful as you can begin to integrate rich PDF analysis into other tools and services with little or no cost.
Earlier this year I put together an outline for a talk to cover how XMPP could be used as a botnet command and control. I just got around to playing around with the stuff and wanted to share some of the information I had and get opinions on what people thought about it all. I see XMPP as a more modern and flexible IRC when it comes to botnets. Features like federation, transports, p2p and client/server communication all make it seem to fit well in this area. Rather then waiting until it actually gets implemented, maybe we should think about what we could do to stop it or detect it now.
About a month ago I posted a blog describing research I was doing on malicious PDF files. As part of this research I needed a way to represent a malicious PDF file in a queryable form. I ultimately decided on MongoDB as my backend and therefore wanted to get the malicious file in a JSON form so I could store it.
The tool I just released today is a composite of tools from myself and Didier Stevens. Didier's PDF tools have done a lot of the heavy lifting, but my glue code brings multiple pieces of data into a single object. As of right now the object contains the following details:
For the past few days I have been completely immersing myself in PDF research in hopes to find better ways to detect malicious PDF files. I have collected a pretty good random sample set (15K) of PDF data and have a bunch of malicious files with the same statistics. I have wrote some basic tools to aid in my research and it would be nice to get some input on the results I have found so far.
The outline of the project can be found here:
The blog with all the research, data and tools that have been released can be found here: