Thursday, November 1. 2007
Over the last days some group released passwords to nearly 45000 MySpace accounts and they announced to release another 30000 passwords in the next few days. I used a few hours before Saturday's lunch to write a small program that analyzes the passwords that were released so far.
At worst the results of this are a useless time-filler, at best it's a case study of what happens if a website forces their users to choose passwords with certain minimum requirements. MySpace demands that every password contains at least one non-alphabetical character (like 0, 1, 2, or !, ?, @). How the users adhered to this requirement can be seen in the tables below.
It is my understanding that the 43713 passwords that were leaked so far come from fishing sites that trapped people to enter their password. This makes the passwords less reliable than a password list hacked straight from the MySpace servers. People could have misspelled their MySpace passwords or they could have entered fake information after they noticed that someone was trying to steal their password. A quick analysis has shown that probably less than 1% of the leaked passwords suffer from these problems.
Let's start with a the Top 10 passwords that can be found in the leaked password list. Password1 being the most popular password is no surprise. Password itself is a very popular password and the MySpace password requirements force people to enter at least one non-alphabetical character. Adding a single 1 at the end of the password seems to be the most popular choice as seen in a later table.
The length distribution of the passwords is as follows.
The longest passwords were the 32 characters long "dawn2222222222222222222222222222", followed by the 29 characters long "jim33333333333333333333333333", and the 27 characters long "jim333333333333333333333333". A honorable mention goes to the 18 character password "masonĂ¢™Â¥14".
I used this SourceForge project to evaluate the strengths of the passwords. It is not a very good program, so the results are not too useful. Basically the program produces a percentage value of how secure it believes a password is but the methods it uses to find this number are questionable. 100% is the best possible value.
Next I analyzed the general formats of the passwords. The column Wordlist gives the number of passwords I could look up in a word list after the non-alphabetical suffixes or prefixes were stripped from the password. It is curious that not a single password was a characters-only password.
Now we get to the point where we see what the result of the MySpace requirement of at least one non-alphabetical character is. The next table shows the most popular numeric suffixes that are attached to alphabetical passwords in order to pass the MySpace password requirement. Most people simply add a "1" to their password.
So what are the most popular passwords after the non-alphabetical suffixes are stripped off?
Some people choose to add a character instead of a digit to fulfill the MySpace password requirement. The most popular character suffixes of alphabetical passwords can be seen in the following table. Character prefixes to alphabetical passwords are so rare that I did not create a special table for them. The most popular character prefix is ! with 16 occurrences.
Last but not least here's the Top 10 email providers that were used by the users of the leaked passwords. Google's mail service GMail is surprisingly low on the list.
Display comments as (Linear | Threaded)
nice job. can you provide a link to the raw data for others to analyze?
I don't think it's a surprise at all that Gmail users are not that high up. Just compare the constituent demographics. I don't mean any offense to MySpace users or Yahoo or Hotmail, etc. I have an account with most of these services. What I mean is that, generally speaking, Gmail users would be more likely to be on Facebook and MySpace users by their very nature is more likely to be on Hotmail or Yahoo (older, more mainstream). I don't have numbers to support this, but I think it makes sense.
Actually you do have numbers to back your theory now (above). However, I wonder if they're a bit skewed, as a lot of people use their old hotmail and yahoo accounts for spam fodder.
I was about to post almost the same thing but was afraid it'd sound mean...It's not just the age demographic. It has something to do with internet-savvy and what you use the 'net for.
Very interesting, although I've never had a myspace account. Certainly shows how "aware" people are about their privacy and security.
A couple of notes:
If the list of passwords indeed are from phising attacks, the email host distribution does not necessarily say anything about the market share of email hosts, or how "internet-savvy" users of different hosts are. Instead, one internet company might be better at identifying and warning users of phising emails than others. (for instance, Gmail might be very good at identifying phishing attempts, which causes gmail users to less often fall into these traps, which causes the gmail rank to drop)
How is it possible that there were 475 passwords with only letters? You mention that myspace require at least one non-alpha character.
Additionally, it is sad to see how common very weak passwords are.. 99 occurences of "password1" means that by just testing different accounts, you've got a 0.2% chance of it having that password. How many such logins can a bot attempt in an hour, I wonder? How many cracked accounts?
Um... this is ridiculous.
OK, so out of 45,000 passwords... 99 of them were 'password1'. That is statistically insignificant, as are the rest of the 'observations'.
This article makes no mention of proportions and suffers for it. Too bad.
What do you mean that 99/45000 is insignificant? A sample size of 45000 is quite good.
Let's assume that the real distribution should be for instance 50/45000, and that the extra 49 were just by chance, and doesn't reflect the reality.
A simple chi-square test of significance will then tell you that the probability of seeing 99 such passwords (if it really should be 50) is 0.000000000004. That's pretty damn significant.
Assuming a real probability of 80/45000 gives a probability of seeing 99/45000 of 0.033. That's still less than 5%, and would typically be considered significant.
It's kinda silly to say that something isn't significant when not relating it to an assumed real distribution, but I'm still curious about what real basis you have for saying that 99/45000 isn't significant.
(from the above we can say with really good certainty that for every 45000 myspace users, there will be 50 "password1". We can also say that it's very likely that the number will even be higher than 80.
Out of 45,000 passwords, 99 of them were 'password1'. That is statistically insignificant.
This article makes no mention of proportions and suffers for it. Too bad.
For the low number of gmail accounts: Gmail-users are pros. Myspace users are noobs. conclusion: gmail users don't use myspace!
I wouldn't say that G-mail users don't use myspace. As has been mentioned before, a lot of people use their hotmail addresses as spam fodder. So that skews the results.
Also, most of this data has been collected from phishing websites. If we assume that the average gmail user is more technically proficient than the average hotmail user, then more than likely, the gmail user is going to either be able to recognise the phishing site, or have the program to be able to detect it (eg Firefox, IE7), where as the hotmail users are probably using earlier versions of IE which don't have the phishing filter.
Because of this, it does skew the whole data a bit, because it only shows part of the audience, and not the more technically proficient users.
As for the data, I also would've liked to have seen (if possible) a comparison of passwords that use a dictionary based word (eg password1) and a non-password based word (eg J5_83VmN1!B8)
Letters followed by Digits or Characters
36763 out of 45000
I found that interesting. Most people just have a word followed by a number. So if you know a damn thing about them, start picking words they use to break it.
The reason gmail is so low is because myspace users are generally to stupid to realise that it is the better email provider. At least 30% must be chavs, another 30% emos, another 30% retarded teens and 10% mildly intelligent people.
this whole post is worthless, meaningless, undocumented, and most likely fiction
Considering that we have nothing better to work with, and for that matter, that not even a sociologist conducting research could obtain the actual password lists, this is a pretty good attempt. If you don't wish to take part in the discussion, feel free to exclude yourself--quietly.
This is very old article, but I doubt about this thing. How do you write that program when the passwords are MD5 hash??? Any proof of this?
No proof. I think the reason why the passwords were available in plain-text is that they were fished from the individual users, not from the MySpace servers.
Syndicate This Blog