A Study of Probabilistic Password Models

Abstract

A probabilistic password model assigns a probability value to each string. Such models are useful for research into understanding what makes users choose more (or less) secure passwords, for password strength meters, and for password cracking. Guess number graphs are a widely used method in password research. In this paper, we show that probability-threshold graphs have important advantages over guess-number graphs. They are much faster to compute, and at the same time provide information beyond what is feasible in guess-number graphs, potentially leading to new findings. Curves in probability-threshold graphs correspond naturally to a numerical metric, the Average Negative Log Likelihood (ANLL), which is closely related to other widely used metrics in statistics and language modeling. Using this new and improved methodology, we conduct a systematic evaluation of a large number of probabilistic password models, including Markov models using different normalization and smoothing methods, and found that, among other things, Markov models, when done correctly, significantly outperform the Probabilistic Context-Free Grammar (PCFG) model proposed by Weir et al., which has been considered to be the state-of-the-art model.