The Guessability of Passwords

September 1, 2016 | Views: 5055

Begin Learning Cyber Security for FREE Now!

FREE REGISTRATIONAlready a Member Login Here

Recently, over a family dinner, my aunt asked me how she could choose passwords that are secure. I responded with the usual advice: no words, especially not names; use a long passphrase, length really does matter; and so on. Until yesterday, though, I was unfamiliar with a formal metric for password “guessability”. In the course of my research I happened to stumble across a fascinating Google study on account recovery through security questions, and more broadly the work of Joseph Bonneau, a postdoctoral scholar in the Applied Cryptography group at Stanford who wrote his dissertation, “Guessing human-chosen secrets,” on password authentication. Several findings are extremely interesting, and have strong implications for security.

The guessing of a password is modeled as a random draw from some password distribution. If you are familiar with Shannon entropy (if not, check out my guide here), you might consider that as a useful measure of uncertainty, but that actually measures a slightly different quantity (the number of subset queries needed to identify the password in question, as opposed to the likelihood of guessing the correct password), which has no direct correlation. Another idea is called guessing entropy, the expected number of guesses until the correct password is found, which is closer to the metric we seek but is disproportionately affected by few users with very strong passwords (128-bit pseudorandom hexadecimal strings). In Bonneau’s research, 20 Yahoo! users of a nearly 70 million sample who used such passwords drove up the guessing entropy to 2^106, which is clearly not representative of most of the dataset. Instead, Bonneau describes and uses several “partial guessing metrics:” β-success-rate, the expected number of successes given guesses per account; α-work-factor, the number of guesses needed to break accounts; and, combining the previous two, α-guesswork, the number of guesses per account to achieve a success rate. I will omit the mathematical details here and merely present what I find to be the most salient results.

From the Yahoo! corpus, Bonneau looked at relative password strength across the population and several sub-populations. “There is a general trend towards better password selection with users’ age,” he writes, but age did not have nearly the effect of another factor, language. German and Korean-speaking users had the strongest passwords, where Indonesian-speaking users had the weakest in general. Users who actively change their passwords, who log in from multiple locations, and who store a lot of data with Yahoo! all selected stronger passwords. Users who had their accounts compromised did not choose significantly stronger passwords after a manual reset, nor did those who enrolled with a form that showed a “graphical indicator of password strength” as compared to those who enrolled with a form without password guidance or a minimum length requirement.

In the Google study, nearly 20% of English-speaking users’ answers to “favorite food” were guessable with a single try. Naturally, the answer distributions themselves are not included in the literature, but I’m willing to bet “pizza” was a pretty successful attempt for that question. Security questions are more insecure the smaller the answer space; in fact roughly 40% of questions used in practice have “trivially small” answer spaces, as in a limited number of potential answers. Even more problematic are strategies like United’s new system, where the user must choose from a discrete list of options. Outside of those cases, much of the information requested is widely available through social media profiles. For example, it is easy to imagine discovering someone’s mother’s maiden name by scrolling through their Facebook friends. Even when we consider untargeted guessing on a large scale, adversaries can often achieve a high rate of success if the true distribution is readily available. Responses to “Best friend’s name” and “first teacher’s name,” for example, occur in similar frequencies as first names and surnames do in the population. Furthermore, some questions are less secure given a particular cultural context. The authors noted that they were able to correctly answer “place of birth” for 12% of Korean-speaking users within one try (as compared to 1.3% of English-speaking users) and almost 90% within 1000 tries (as compared to about 60% for English-speaking). They attributed this phenomenon to the fact that the Korean population is highly concentrated in cities. The vendors performing the authentication might not take these factors into account, but persistent attackers certainly will.

In what the authors described as their most surprising observation, even for questions where everyone should have a unique answer, such as “frequent flyer number” and “phone number,” there was not a uniform answer distribution. Some 4.2% of users claimed to have the same frequent-flyer number. Now, either the airlines have failed to notice a rather large mistake on their part, or people are lying. There are a number of reasons why users copped to being dishonest, most commonly to make their response harder to guess, or easier to remember. Unfortunately untruthful answers actually did neither. The people who altered their responses tended to do so in the same way (i.e. a frequent-flyer number of “123456”), making it less secure, but also had a harder time remembering their answer. After a few days following the original fake input, users who were dishonest had trouble figuring out what false response they might have given.

It is an axiom among security experts that the weakest part of any system is the human component, and human-chosen passwords are no exception. And although some alternatives are favored over security questions, such as SMS and email recovery, it seems that security questions will not be totally removed from authentication for quite some time, so it’s important to understand the limitations of these approaches.

Share with Friends
Use Cybytes and
Tip the Author!
Share with Friends
Ready to share your knowledge and expertise?
  1. what explain that German and korean speakers have strong password than others

  2. I once had a colleague who set up a user. He set their password as the Latin for badger. The user had no chance!

  3. One strategy to improve security is to use password hint questions as passwords themselves.
    As one example – Mother’s Maiden Name: “Last3.14159First”
    This only works if you have a system and can remember it, of course.
    What are your thoughts on Randal Munroe’s password system using multiple random words? (

    • @Samuel I really like the general concept outlined in the xkcd comic but something I’ve observed has been an ever-so-subtle improvement in the passwords being used in the environments Ive worked in over the last few years.
      I’ve taken to using passphrases taken as snippets from larger quotes that i like, complete with spaces caps and punctuation where the system allows it and I’ve found it wonderfully effective.

      • I have my users use full, correctly written sentences. Preferably with a few symbols in there and in our own Language (not English). The passphrases are easy to write and remember for anyone who’s finished at least elementary school. Example passphrase one could use (not after I hit submit comment of course): There are 24 trains at platform 99! (but than not in English). They’re not perfect, but it’s a good length over anything policy with phrases people can remember.

        • Thats a great example, the current thinking with passwords needs to be changed. For starters the term Password needs to be dumped and Passphrase used more.

Comment on This

You must be logged in to post a comment.

Our Revolution

We believe Cyber Security training should be free, for everyone, FOREVER. Everyone, everywhere, deserves the OPPORTUNITY to learn, begin and grow a career in this fascinating field. Therefore, Cybrary is a free community where people, companies and training come together to give everyone the ability to collaborate in an open source way that is revolutionizing the cyber security educational experience.

Support Cybrary

Donate Here to Get This Month's Donor Badge


We recommend always using caution when following any link

Are you sure you want to continue?