Skip to content

Some Win32 API usage statistics

Yesterday I saw a talk given by Frank Boldewin where he mentioned the FreeIconList trick to fool code emulators. At this point I started to wonder what other Win32 API functions are basically unused. Using Ero Carrera's Python library pefile to parse PE files I wrote a small Python script that tries to find out what Win32 API are basically unused.

The modus operandi was simple. I read the exported functions of all DLL files in WindowsDir and WindowsDir/system32 and compared them to the functions imported by all EXE/DLL files in WindowsDir, WindowsDir/system32 and my entire Program Files directory.

The first result is that most exported functions are apparently basically never used. My script managed to find 127569 exported functions in 1225 DLL files. 104608 of those are never used by the 6615 EXE/DLL files which import functions ("used" is liberally defined as "imported through the import directory" here, of course). That leaves 22961 functions which are actually used.

Here are some output files which show the exported DLL functions sorted by their usage. The numeric column contains the number of PE files which import the function statically. That means that 3475 of the 6615 files I tested import GetLastError for example.

  • Click here to see the Top 2000 most used API functions
  • Click here to see the usage statistics of all advapi32.dll functions
  • Click here to see the usage statistics of all gdi32.dll functions
  • Click here to see the usage statistics of all kernel32.dll functions
  • Click here to see the usage statistics of all msvcrt.dll functions
  • Click here to see the usage statistics of all ole32.dll functions
  • Click here to see the usage statistics of all oleaut32.dll functions
  • Click here to see the usage statistics of all shell32.dll functions
  • Click here to see the usage statistics of all user32.dll functions

Random notes

  • kernel32.dll is surprisingly dominant while gdi32.dll is surprisingly "unused"
  • pefile is extremely awesome and easy to use
  • Don't be confused that API functions like lstrlen are imported 0 times, check lstrlenA and lstrlenW

Click here to download the Python script.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

Joe on :

Hi

Your work and data is really interesting. But you have to keep in mind that a lot of API calls were received dynamically by GetProcAddress, so it's not strange that its rank is the third place.

Another thing is ntdll. A lot of packer and crypter have an API stub which calls directly ntdll to bypasse certain analysis tools.

If the purpose of the analysis is to get a picture about api statitics in malware its really necessary to extend the analysis spectrum with the descibed two points.

sp on :

Hi,

you are obviously right here. In fact GetProcAddress is one of the most-used API functions. That means there's a lot of API calls my script misses.

Nevertheless, even considering this problem we might be on the safe side because of the Law of large numbers. It might be reasonable to assume that the distribution of statically called API functions is kind of similar to the distribution of dynamically called API functions.

Joe on :

Hi again,

I think for malware analysis it is not only interesting which APIs were called at most, but much more the APIs which are called only once or twice. E.g. NtQueueApcThread is one of them. Those calls are very often received by GetProcAddress :-). I don't think its necessary to tell you for what it have been used in past.

You can not assume that non exported calls are similar distributed like the exported, simple because the GetProcAddress way or a few others aren't the standard way to received an function address of an other pe-file. They are often used to get non-documented function addresses. So you can not apply the lln theorem, because the "random variable" are not identically distributed. The NtQueueApcThread is the evidence for it.

sp on :

Yep, you're right. The thing is, the results of the script are kinda useless for anything but a short "well, that's kinda interesting" moment. Especially guessing what unused API functions could be used by malware is not really possible for a much simpler resaon.

Nearly all API functions are never statically imported by the files in my Windows/Program Files directories. Narrowing things down from 120.000 API functions to 100.000 API functions isn't really narrowing things down at all.

In the light of this I'd actually be more interested in how the FreeIconList API function was found as a suitable candidate for anti-emulation tricks. Maybe the inventor of this trick had access to one or more of the emulators and ripped out all API functions which are emulated by not more than "return 0/return 1" or something.

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
BBCode format allowed
Form options

Submitted comments will be subject to moderation before being displayed.