Saturday, June 13. 2009
So I finally got around to writing a review for Justin Seitz's new No Starch Press book Gray Hat Python (Official Website / Amazon). Unlike the other No Starch Press books I reviewed in the last months my copy of Gray Hat Python is not a free review copy. I actually bought Gray Hat Python because I wanted to support Justin Seitz who I met at this year's CanSecWest conference for the first time. And because Justin seems to be a pretty nice guy I will punch a bit harder in this review than I usually do (unless the reviewed book really sucks) by giving unsolicited advice on how to improve the book for the second edition.
What is Gray Hat Python all about? The back cover of the book describes it like this: "Gray Hat Python explains the concepts behind hacking tools and techniques like debuggers, trojans, fuzzers, and emulators." And all of that using Python code and popular Python libraries. How awesome is that? Pretty awesome I thought when I first heard about the book. So awesome in fact that several months before the book was published I actually sent Justin an email asking him if everything's fine because I was concerned that the publisher is imposing stuff on him which could lead to a shitty book (see: Reverse Engineering Code with IDA Pro; if you ever meet any of the authors of that book ask them to tell you just how much Syngress sucks; it's an entertaining story).
Gray Hat Python is just about 190 pages long and thank god for that because I am tired of getting giant tomes about hacking/reverse engineering where you can easily cut 50% of the content without any real loss. The 12 different chapters of Gray Hat Python have a significantly higher signal-to-noise ratio than most RE books I have read so far.
The first chapter is called Setting up your development environment. It tells you how to install all the software packages required to follow the examples in the book. The chapter starts with installing Python and Eclipse (with PyDev) both in Windows and Linux. How to set up and use the Python package ctypes is explained in detail too. Now don't be fooled into believing that Gray Hat Python deals with platform-independent hacking stuff. Pretty much the whole book is about hacking Windows software (on x86 hardware) and Justin actually states this right at the beginning. In the second edition I would cut the Linux setup from the first chapter to define the book better.
The second chapter is where the interesting stuff starts. It is called Debuggers and Debugger Design (chapter title does not really fit the content). In this chapter Justin explains how the stack works on x86 computers and what the difference between hardware breakpoints and software breakpoints is. Additionaly he explains the general-purpose registers of x86 CPUs. This does not really make much sense because everywhere else in the book, prior knowledge about x86 assembly is required to follow the book. And since everybody who knows enough x86 assembly to understand the book also knows the x86 general-purpose registers, section 2.1 can be cut from the second edition.
There was another thing which amused me in chapter two. Justin has a special talent. Whenever there is a concept which has a popular name and a lesser-known name Justin uses the lesser-known name. Examples I recognized are using the term FILO (~8,000 Google hits) instead of LIFO (~60,000 Google hits) to describe the stack, soft breakpoint (~750 Google hits) instead of software breakpoint (~7,000 Google hits) even though he uses hardware breakpoint instead of hard breakpoint, white-box debugger (0 relevant Google hits) instead of source-level debugger (~55,000 hits), and black-box debugger (0 relevant Google hits) instead of assembly-level debugger (~1,000 Google hits; and assembly debugger has even more hits).
While we're at it, there are a few more technical problems with the second chapter. The sentence "Each of the eight general-purpose registers is designed for a specific use" is cute. First you could argue that the general-purpose obviously do not have a specific use. At least not anymore, so maybe the "is" should be replaced by "was" in that sentence. Either way, even if you think the sentence is fine a few paragraphs further down Justin writes "The EBX register is the only register that was not designed for anything specific". So yeah, these two sentences do not fit together.
Alright, now for my next complaint. Later in the second chapter, Justin calls 8BC3 the opcode of the x86 instruction "MOV EAX, EBX". In fact he always refers to the hexadecimal encoding of an x86 instruction as the instruction's opcode. I was actually pedantic enough to look this up in the Intel Reference Manual, specifically Volume 2A: Instruction Set Reference, A-M. On the x86 architecture, opcodes are the 1-3 bytes (+3 optional bits) hexadecimal identifiers of instruction mnemonics. Instruction arguments are not part of the opcode. This means the opcode of "MOV EAX, EBX" is just 8B (identifying MOV) while the C3 is - but don't trust me on this; I am not an x86 instruction encoding wizard - the so called ModR/M field which identifies the register arguments EAX and EBX.
One last thing. Book publishers should find a better way to deal with URLs printed in books. There are some links to MSDN mentioned in the second chapter. I don't think anybody actually copies long URLs manually from books when googling for "MSDN CreateProcess" is much easier. The easiest way would be to mention the link by something I can google for. Like "the method described in $foo's article $bar which you can find at $url" instead of "the method described at $url".
Let's move on to the third chapter now. It's called Building a Windows Debugger. This chapter is really useful for me. Since approximately 1999 I used Iczelion's sample code whenever I needed to have some quick sample code for using the Windows Debug API. By now Iczelion's stuff is pretty old and incomplete though and it became less and less useful over the years. The third chapter of Gray Hat Python easily serves as my new reference from now on when I quickly need to see some Windows Debug API code. It provides the whole scoop. Debugging processes, getting information about threads and register values, setting breakpoints (hardware, software, memory access) and handling events sent to the debugger. All of this information is accompanied by Python code which is really easy to read and understand.
The fourth chapter (PyDbg - A pure Python Windows debugger) is another chapter about debugging processes with Python. The raw Windows Debug API is not used anymore though. This chapter focuses on Pedram Amini's Python library PyDbg instead. This library is a wrapper around the Windows Debug API which should make it easier to write debuggers in Python. More advanced debugger topics like dealing with exceptions (specifically access violations) are discussed as well as taking process snapshots and restoring the process to saved snapshots later.
The fifth chapter is the final chapter about debuggers. Immunity Debugger - The best of both worlds is covered this time. This means instead of building your own debuggers using Python, this chapter is all about using an existing debugger. This chapter is also the one where the hacking part begins. Justin introduces the concepts behind Immunity Debugger and how to use these concepts in your Python scripts for Immunity Debugger. As a concrete example he presents scripts that can be used for finding "exploit-friendly instructions" in loaded processes and filtering out instructions that contain bytes which are not allowed in shellcodes. This information is combined into a script for bypassing Data Execution Prevention as described here. The chapter ends with two smaller example scripts for disabling two anti-debugging tricks.
For those who don't know, Justin actually works for Immunity and - if my memory does not fail me - he did in fact join Immunity after winning the Immunity Debugger contest in 2008 which was all about coding the most awesome Immunity Debugger plugin the contestants could think of. For this reason I expected much more Immunity Debugger stuff in the book. Nevertheless except for this chapter, the sixth chapter and the tenth chapter, Immunity Debugger is not really used in the book. I am ambivalent about this. On the one hand, Immunity Debugger is a cool tool which deserves to play a prominent role in the book. On the other hand I do not want the book to degenerate into an Immunity advertisement brochure.
What else can you do with debugging techniques? Hooking stuff. That's what the very short sixth chapter (Hooking) is about. Justin presents Python scripts for PyDbg and Immunity Debugger that can be used for hooking stuff. The first example is a PyDbg script for hooking Firefox (using software breakpoints) and dumping all information the user enters into web forms before the information is SSL-encrypted. The second example is an Immunity Debugger script which hooks calls to RtlAllocateHeap and RtlFreeHeap and dumps information about allocated and freed memory. To improve performance the second script uses trampoline functions instead of software breakpoints.
DLL and Code Injection is the seventh chapter of Gray Hat Python. The first half of the chapter shows how to inject code or even whole DLL files into other processes using CreateRemoteThread. The second half of the chapter is called Getting Evil and it's really cute. It shows how to backdoor a system by hiding a file (specifically a TCP/IP remote shell) in the alternate data streams of the NTFS filesystem. The backdoor then replaces calc.exe and every time the user starts calc.exe he starts the remote shell in addition to the real calc.exe.
The structure of the next three chapters about fuzzing are comparable to the chapters about debuggers. I won't write too much about these chapters because fuzzing is not really my topic and I don't really have anything to say about these chapters.
The eight chapter, Fuzzing, explains what fuzzing is and how to use it, why it is useful for finding vulnerabilities and what bug classes can be found using fuzzers. In the second half of the chapter, a file format fuzzer is implemented in Python. This fuzzer can mutate data and it provides a way to monitor fuzzed processes for crashes. If the process crashes an email is sent to a pre-configured email address.
The ninth chapter, Sulley, shows how to use the Python fuzzing framework Sulley for fuzzing a known bug in a real-world application. The fuzzing target is a previous version of WarFTPD which contained a bug that led to a stack overflow when processing malformed USER and PASS messages.
The tenth chapter is the final chapter about fuzzing. It is titled Fuzzing Windows Drivers and yeah, that's what Justin is doing here. Immunity Debugger is used as the debugger of choice and a small Python script is presented that shows how to send fuzz data to the IOCTL handlers of Windows drivers.
The final two chapters are about using Python in combination with IDA Pro.
The eleventh chapter is called IDAPython - Scripting IDA Pro. This chapter introduces the extremely popular IDAPython scripting interface that can be used from inside IDA Pro. This interface can help to automate work during the analysis of disassembled files and it can be used to control the IDA Pro debugger in an automated way. There are three example scripts presented in this chapter. The first script is used to detect and list cross-references to potentially security-relevant functions (think strcpy et al.). The second example is a small code coverage script for recording which functions are hit during a debug run of the target. The third script calculates stack sizes and the sizes of variables on the stack.
I was a bit disappointed by the IDA Pro chapter because of the example scripts Justin used. While pretty much the whole book is about dynamic code analysis he could have focused on static analysis here. After all IDA Pro is probably the most popular software for doing static code analysis and there are many cool things you can do statically in about the lines of code you can reasonably present in a book. Maybe the second edition of Gray Hat Python can have some more static code analysis.
The final chapter PyEmu - The scriptable emulator is about Cody Pierce's PyEmu tool. PyEmu is an x86 emulator fully implemented in Python. Using this emulator you can emulate x86 code if you do not want to or do not have the means to run x86 code natively. The first example presented in this chapter shows how to emulate a given function and how to determine information about the value returned from the function. The chapter closes with an example of running PyEmu inside IDA Pro for unpacking EXE files packed with the UPX packer.
That's it. The book stops rather rapidly. There are no appendices, no epilogue, and not even a small paragraph that summarizes what the reader should have learned from the book.
So, what did I like about the book? I really liked that Justin makes no moral judegements about the techniques he presented. So many other hacking books are full of "only for educational use" disclaimers or "we're the good guys but you should know about evil stuff too, wink, wink" sections. Justin presents technical aspects of reverse engineering and vulnerability development and that's it.
I also liked the compactness of the sample scripts (which are also available for download on the official website of the book). With very few exceptions all the scripts presented in the book fit into one page. Justin skillfully avoids boring the reader with giant source listings of 20 pages or more which sometimes exist in other books. I guess the expressiveness of Python compared to, say, C also played a part there.
Furthermore I liked the broad range of topics covered in the book. There is debugging, code injection, shellcode development, fuzzing, and IDA Pro. These topics probably account for a huge part of what a typical reverse engineer uses for his job. For the second edition I hope to see more static code analysis examples though. The book could easily be extended to 250 or maybe even 300 pages while keeping its current style.
Like all No Starch Press books, this book was once again layouted and edited very nicely. I think I will actually stop mentioning this in the future because I keep repeating myself here. No Starch Press has found their design for books and it's working well.
I can recommend Gray Hat Python to all people who want to get an overview of hacking tools and hacking techniques that make use of Python. It is a no-nonsense book which follows a simple recipe: give a brief overview of a hacking technique and then dive straight into a real-world example. This structure is very useful for people who want to get to the practical aspects of reverse engineering software as fast as possible without reading much about the theory behind things. If you want deeper insights about reverse engineering techniques this book is not for you. Due to the limited length of Gray Hat Python only the minimal information necessary for the example scripts is given. If you want to learn more about individual techniques you should read dedicated books about these topics.
Finally, here's a random list of Python RE/hacking tools that are not covered in Gray Hat Python yet could be added to the book in the second edition:
Display comments as (Linear | Threaded)
I read this book too and thought it was fantastic. It was a little short, and I craved more Immunity Debugger examples, but even if the book consisted only of PyDbg, Immunity, Hooking, and DLL Injection, it still would be worth buying and keeping on your shelf.
Syndicate This Blog