Skip navigation.

PEFile: A Portable Executable Parser for Python


Ero Carrera created an excellent portable executable parser for python called PEFile. We've taken his file and run it across our entire malware collection for use in a future version of our malware analyzer. Attached is a collection of all the bug fixes we've made. If anyone has any comments on the modifications, I would very much appreciate hearing them.

Read the full article for all the bugs that have been fixed.

Here's a list of bugs fixed:

  • Handle poorly sized header entry, causing the parser to read too much or too little information
  • DOS based malware does not have an entry to the OPTIONAL_HEADERS field, this should be ignored
  • The NT_HEADERS field can be missing or mangled.
  • Properly recover from a missing FILE header
  • Create too many RVAs (relative virtual addresses) to cause resource exhaustion.
  • Handle a missing directory entry
  • Recover from omitting raw data sections that are referred to in the directories
  • Recover from mangled directory shennanigans
  • BOUND_FORWARDER reference can sometimes be missing for a non-used section of the PE file.
  • Skip bad RVA's that have no use in the code, but cause an analysis program to die
  • Handle recursive directory entries in a PE
  • Die gracefully when the import loader table, or the import address table. These are two references to the functions that need to be imported from DLL's.
  • Missized strings are now handled correctly

great work

as usual



in using this method, i encountered two problems. whenever i used this to get the unicode at a specified rva offset, 1.) the string will always start on the second unicode string 2.) and the value im getting contains other unwanted data.

my work around is to modify the so that whenever i use this method, i will get just the unicode i wanted. it will look like this:

def get_string_u_at_rva(self, rva):
"""Get an Unicode string located at the given address."""

# If the RVA is invalid all would blow up. Some EXEs seem to be
# specially nasty and have an invalid RVA.
data = self.get_data(rva, 2)
except PEFormatError, e:
return None

length = struct.unpack('<H', data)[0]

s = ''
for idx in range(length):
uchr = struct.unpack('<H', self.get_data(rva+2*(idx), 2))[0]
except struct.error:
if ord(unichr(uchr)) == 00:
s += unichr(uchr)

return s

it helped me, hope this can help others too.

Full article

How can I get the full article? I don't see any links to an article or code?