Skip to content

It's quine time

Quines: In computing, a quine is a program (a form of metaprogram) that produces its complete source code as its only output. For amusement, hackers sometimes attempt to develop the shortest possible quine in any given programming language.

Recently I've had the pleasure of writing two of them.

Here's my Ruby quine. I swear it's a total coincidence that it nearly looks like one I found on this website.

t="t=%c%s%c;printf t,34,t,34";printf t,34,t,34

And I quickly ported it to Java too. Thanks to the new powers of Java 1.5 this one's shorter than any Java quine you can find on this website.

class Q { public static void main(String[] a) { String t = "class Q { \
public static void main(String[] a) { String t = %c%s%c; \
System.out.printf(t, 34, t, 34); } }"; System.out.printf(t, 34, t, 34); } }

Both of these quines only work on operating systems working with ASCII because I (ab-)use the fact
that 34 is the ASCII code for the character ".

InstructionCounter plugin for IDA Pro

The InstructionCounter plugin for IDA Pro is actually the first plugin I've ever succesfully finished. The main reason for this is probably that it's not exactly a very complicated plugin but basically a copy / paste of the template code from Steve Micallef's awesome IDA plugin tutorial with just a few extra lines added.

The plugin counts all instructions used in a file, orders them by their frequency of occurrence and prints all that information to a text file of your choice.

Here's what the output looks like:

Opcode distribution of file: D:\Coding\nes\new\Page_15.idb
Total opcodes: 6390

0001. 001337    20.92%      LDA
0002. 001022    15.99%      STA
0003. 000589     9.22%      JSR
0004. 000260     4.07%      RTS
0005. 000205     3.21%      AND
0006. 000198     3.10%      LDX
0007. 000186     2.91%      CMP
0008. 000178     2.79%      ADC
0009. 000177     2.77%      BNE
0010. 000171     2.68%      BEQ
...

The ZIP file contains the complete Visual C++ source code and the compiled plugin for IDA 4.8 and IDA 4.9.

There are two minor problems I'm having which are mentioned in the source file. Unfortunately none of the people I bug about my IDA problems have been online in the last few days.

Furthermore I was convinced that a plugin like that already exists but I didn't find anything at the IDA Palace when I needed it a few days ago. If there's really already a plugin like that let me know.

Two new F4I license infringements found

Third update for today! I swear I'm not making this stuff up but we've found two additional potential license infringements.

Rolf from Sabre Security was kind enough to point out that we had missed a giant copyright string.

000C48C0 4641 4143 202D 2046 7265 6577 6172 6520 FAAC - Freeware 
000C48D0 4164 7661 6E63 6564 2041 7564 696F 2043 Advanced Audio C
000C48E0 6F64 6572 2028 6874 7470 3A2F 2F77 7777 oder (http://www
000C48F0 2E61 7564 696F 636F 6469 6E67 2E63 6F6D .audiocoding.com
000C4900 2F29 0A20 436F 7079 7269 6768 7420 2843 /). Copyright (C
000C4910 2920 3139 3939 2C32 3030 302C 3230 3031 ) 1999,2000,2001
000C4920 2020 4D65 6E6E 6F20 4261 6B6B 6572 0A20   Menno Bakker. 
000C4930 436F 7079 7269 6768 7420 2843 2920 3230 Copyright (C) 20
000C4940 3032 2C32 3030 3320 204B 727A 7973 7A74 02,2003  Krzyszt
000C4950 6F66 204E 696B 6965 6C0A 5468 6973 2073 of Nikiel.This s
000C4960 6F66 7477 6172 6520 6973 2062 6173 6564 oftware is based
000C4970 206F 6E20 7468 6520 4953 4F20 4D50 4547  on the ISO MPEG
000C4980 2D34 2072 6566 6572 656E 6365 2073 6F75 -4 reference sou
000C4990 7263 6520 636F 6465 2E0A 0000 312E 3234 rce code....1.24

Yeah. Apparently FAAC code was used too. I positively identified several functions myself. For starters: The function at virtual offset 0x1007BA80 is known as WriteFAACStr in the file bitstream.c of the FAAC project. You can work yourself through other FAAC functions from there. I don't know for sure if that's GPL or LGPL. I think it's LGPL though.

And while we're at it. Matti found mpg123 references. In his opinion this is how the mpglib code made it into the OCX. It still needs to be determined if there's more mpg123 code in the OCX except the mpglib stuff. If that's the case another GPL infringement can be added to the list.

Proof that F4I violates the GPL

Due to the importance of the latest discoveries, here's another update. For the first time I'm updating twice on one day. I'm sure you've already been waiting for some proof about the GPL infringement by F4I. This post contains it in the already well-known form of a comparison between the original C code and an annotated disassembly of the F4I binary. All C code is from the function DoShuffle from the file drms.c which is part of the VideoLAN project.

I want to mention though that I'm not going to explain all code because the function is pretty long. I've picked two parts of the function where it's easily recognizable that the two functions are basically the same (there's one tiny difference explained later). Nevertheless I'm going to provide the other parts of the function too, I just won't comment them. Here's the full disassembly.

Continue reading "Proof that F4I violates the GPL"

Breakthrough after breakthrough in the F4I case

Ladies and gentlemen, muzzy and I made what's maybe the most significant progress since we began our little examination of the F4I binaries a few days ago. Thanks to Halvar Flake of Sabre Security who provided us with results from newer versions of BinDiff than those that were available to us, I was able to positively identify several functions from the mpglib library in the F4I code. What's significantly more important is that muzzy found actual GPL code in the files too! Yes, GPL, not LGPL! This opens up a completely different can of worms.

Let's start with the LGPL code from mpglib. I'm only listing the found functions briefly because I think there's already enough disassembly on my website.

The function decodeMP3 from interface.c can be found at virtual offset 0x10059850 in ECDPlayerControl.ocx
The function decodeMP3_clipchoice from interface.c can be found at virtual offset 0x10059440 in ECDPlayerControl.ocx
The function addbuf from interface.c can be found at virtual offset 0x10059020 in ECDPlayerControl.ocx
The function sync_buffer from interface.c can be found at virtual offset 0x10059310 in ECDPlayerControl.ocx
...

I could go on like this for quite a while as the functions mentioned already contain so many function calls themselves that it's probably possible to reconstruct large parts of mpglib from there (if not the complete library).

I'm sure you're more interested in the GPL code than in a large list of LGPL functions and where they can be found in the F4I code though.

I just want to mention that the function that can be found at virtual offset 0x10089E00 in ECDPlayerControl.ocx is the function DoShuffle from a GPL-ed file called drms.c written by Jon Lech Johansen and Sam Hocevar (Google for it). I'll leave the rest of the explanation to muzzy for now.

Is F4I in violation of the LGPL? - Part III

"Frank" posted the following comment to my last update.

"What does this code fragment do? It accesses data that is well-defined by an open specification. There is little leeway for a software developer to do things differently. So far, this may be coincidence, or a case of developers being "inspired" by looking at other source code -- like the famous "stolen" Unix code fragments in Linux. This is an invitation for more research, yes, but I don't see a smoking gun just yet."

This is a valid concern and I wanted to address this anyway, so let's do it right here. I think it's important enough to write a new update instead of just replying to Frank.

I've produced a complete annotated disassembly of the function in question. It matches the function from the LAME code 99%. I say 99% because there are differences which can be reasonably explained by common compiler optimization techniques. I've mentioned these techniques in the annotated disassembly where appropriate.

If the 99%/100% match of a 90 lines C function is a coincidence it goes beyond what I'm capable to detect using my tools.

Click here for the LAME source code.
Click here for the annotated disassembly.

Edit: I want to ask the people familiar with the LAME function in question something. Look at the following four lines from the LAME source code:

	if( buf[0] != VBRTag[0] && buf[0] != VBRTag2[0] ) return 0;    / fail /
	if( buf[1] != VBRTag[1] && buf[1] != VBRTag2[1]) return 0;    /* header not found*/
	if( buf[2] != VBRTag[2] && buf[2] != VBRTag2[2]) return 0;
	if( buf[3] != VBRTag[3] && buf[3] != VBRTag2[3]) return 0;

This piece of code attempts to check if buf contains 'Xing' or 'Info'. I don't know about the underlying data structures but the way it checks looks wrong to me. This piece of code passes if buf is a combination of 'Xing' and 'Info', like 'Iing' or 'Xnfo' which is probably not the desired functionality. Can anybody confirm that this is a bug or is not a bug? Because if it's a bug it's also part of the F4I code. This would solidify the assumption that the code was stolen from LAME even more.

Edit: OK guys, I'm going to proclaim victory now as I've found undeniable proof that this match is not a coincidence. I just took the the functions GetVbrTag and ExtractI4 from the LAME code and compiled them myself using the freely available Visual C++ 2003 commandline tools. The only compiler parameter I used is /Ox to turn on maximum optimizations. The resulting code is byte for byte the code from the F4I OCX file. Including all my correctly predicted compiler optimizations (function inlining, if-clause merging, operation re-ordering, ...).

Click here to see the disassembly of my own compiled version of the LAME code. It matches the disassembly posted earlier perfectly. Note that I didn't bother to re-name variables or to insert comments.

Is Sony in violation of the LGPL? - Part II

Hey,

I'm sure you've been waiting for updates that prove what we're talking about. Here it comes. I want to talk about the file ECDPlayerControl.ocx which the fanstastic muzzy found yesterday while I had nothing better to do than to listen to my pillow. It uses LAME code (and code from at least one other LGPL library).

At virtual offset 0x100607D0 you can find a function that's called GetVbrTag in the LAME source code (it can be found in the file VbrTag.c). Here's some code straight from the LAME source code (it's only the first part of the function, I don't want this post to get too long):

int GetVbrTag(VBRTAGDATA *pTagData,  unsigned char buf)
{
	int			i, head_flags;
	int			h_bitrate,h_id, h_mode, h_sr_index;
        int enc_delay,enc_padding; 

	/ get Vbr header data /
	pTagData->flags = 0;

	/ get selected MPEG header data /
	h_id       = (buf[1] >> 3) & 1;
	h_sr_index = (buf[2] >> 2) & 3;
	h_mode     = (buf[3] >> 6) & 3;
        h_bitrate  = ((buf[2]>>4)&0xf);
	h_bitrate = bitrate_table[h_id][h_bitrate];

        / check for FFE syncword */
        if ((buf[1]>>4)==0xE) 
            pTagData->samprate = samplerate_table[2][h_sr_index];
        else
            pTagData->samprate = samplerate_table[h_id][h_sr_index];
    ...
Now compare this to the disassembly. You can easily spot "pTagData->flags = 0", the three shifts, the array access and the if-comparison (although the code was a bit optimized by the compiler). To make it easier, here's the flow-chart diagram of the function too.
.text:100607D0 GetVbrTag       proc near               ; CODE XREF: sub_10059240+77p
.text:100607D0
.text:100607D0 arg_0           = dword ptr  4
.text:100607D0 arg_4           = dword ptr  8
.text:100607D0
.text:100607D0                 mov     ecx, [esp+arg_4]
.text:100607D4                 push    ebx
.text:100607D5                 push    ebp
.text:100607D6                 push    esi
.text:100607D7                 xor     eax, eax
.text:100607D9                 push    edi
.text:100607DA                 mov     edi, [esp+10h+arg_0]
.text:100607DE                 mov     dword ptr [edi+8], 0
.text:100607E5                 mov     dl, [ecx+1]
.text:100607E8                 movzx   ebx, byte ptr [ecx+3]
.text:100607EC                 mov     al, dl
.text:100607EE                 and     dl, 0F0h
.text:100607F1                 shr     ebx, 6
.text:100607F4                 shr     eax, 3
.text:100607F7                 and     eax, 1
.text:100607FA                 mov     ebp, eax
.text:100607FC                 movzx   eax, byte ptr [ecx+2]
.text:10060800                 mov     esi, eax
.text:10060802                 mov     [esp+10h+arg_0], ebp
.text:10060806                 shr     eax, 4
.text:10060809                 shl     ebp, 4
.text:1006080C                 add     eax, ebp
.text:1006080E                 mov     eax, ds:bitrate_table[eax*4]
.text:10060815                 mov     ebp, [esp+10h+arg_0]
.text:10060819                 shr     esi, 2
.text:1006081C                 and     esi, 3
.text:1006081F                 cmp     dl, 0E0h
.text:10060822                 mov     [esp+10h+arg_4], eax
.text:10060826                 jnz     short loc_10060831
.text:10060828                 mov     edx, ds:samplerate_table2[esi*4]
.text:1006082F                 jmp     short loc_1006083B
I think you agree with me that this is a clear case.

I also want to mention that the entire id3lib library (also LGPL software) is in the file too. Thankfully id3lib is written in C++ and not in C and therefore finding matches is significantly faster as the original function names are part of the binary files (thanks for the debug build too). Just click this link to see some of the id3lib functions in the file.

I want to summarize what we have and raise a few questions at this point:
- The LGPL is not mentioned on the CD.
- That means no copyright notice as the LGPL demands either.
- Does an OCX qualify as a linked library? Probably. But I am not able to re-create the OCX file because it contains at least two LGPL libraries and additional (probably proprietary) code. Is this necessary to be LGPL-compliant? (see the end of the article).
- Are the files part of this LGPL-licensed software by Sony? Does that have any effect on the legality of the OCX? The first two points would still stand.

All this legalese is killing me, I can only report on the code. I think we've reached a certain point where it's time to take a break. We've definitely found LGPL code in the software. Now it's time for the license gurus to find out if that constitutes a license violation. Until this is cleared up I think I'm going to do something else.

Edit: There are differences in opinion about what constitutes a LGPL infringement. Wikipedia says "Essentially, it must be possible for the software to be linked with a newer version of the LGPL-covered program. The most commonly used method for doing so is to use "a suitable shared library mechanism for linking". Alternatively, static linking is allowed if either source code or linkable object files are provided."

Does that mean everybody must be able to recreate the OCX file? Or what does that mean?

Is Sony in violation of the LGPL?

Update: Click here

I'm sure you've already heard about the Sony rootkit that was first revealed by Mark Russinovich of Sysinternals. After the Finnish hacker Matti Nikki (aka muzzy) found some revealing strings in one of the files (go.exe) that are part of the copy protection software, the rootkit is also suspected to be in violation of the open-source license LGPL. The strings indicate that code from the open-source project LAME was used in the copy protection software in a way that's not compatible with the LGPL license which is used by LAME.

On Slashot muzzy mentioned that he doesn't have access to Sabre BinDiff, a tool that can be used to compare binary files. I was in the opposite position as I have BinDiff but I didn't have the file in question (go.exe). I mailed muzzy and he hooked me up with the file.

I compared go.exe with a VC++-compiled version of lame_enc.dll but unfortunately BinDiff didn't find a single relevant matched function. A quick manual check didn't reveal any LAME functions in go.exe either.

Even though go.exe apparently does not contain any LAME code, a considerable amount of tables and constants from the LAME source files can be found in the go.exe file. Here's a list of the LAME tables I've been able to locate. The first column shows the hex address where the table can be found in the go.exe file, the second column shows the name of the table as it appears in the LAME source code and the third column shows the LAME source file where the table can be found.

I have to add though, that not a single table actually seems to be used by the go.exe code. What does that mean? I've asked random people and I've heard speculation ranging between "accidentaly linked" and "encrypted code in go.exe that uses the tables and can't be found in the disassembler". Further analysis needs to be made but at this point I'm leaning towards more or less accidental inclusion.

Generating a word list from Wikipedia

Holy shit, a site update! And after only 6 weeks too! Great.

This update is mainly a small program that shows how to parse huge XML files (about 3.5 GB) with C#. Recently I needed a giant word list and all word lists I found on the internet were very unsatisfactory. Therefore I decided to make my own one and the best source for words right now is probably Wikipedia (which you can thankfully download in XML format).

No, I didn't need that word list for a dictionary attack on some unsuspecting victim. Let's just pretend I was inspired by this flash movie and I wanted to find out what the highest scoring Scrabble words are.

Unfortunately that Scrabble program is on hold right now because I realized that I've never actually played Scrabble (except for like 10 games with a Shareware game while I developed my program) and there were some discrepancies between Scrabble score lists available online and the results I calculated. Now I don't know if I'm wrong or if they are wrong as I'm not really familiar with Scrabble rules at this point.

Continue reading "Generating a word list from Wikipedia"