This run_test helper sets up a convention that allows
us to give really short tracebacks for errors, and
eventually we can have more control over running
individual tests. (The latter goal has some
complications, since we often intentionally leak
setup in tests.)
The original "quality score" was invented purely for populating
our password-strength progress bar, and isn't expressed in terms
that are particularly meaningful. For configuration and the core
accept/reject logic, it's better to use units that are readily
understood. Switch to those.
I considered using "bits of entropy", defined loosely as the log
of this number, but both the zxcvbn paper and the linked CACM
article (which I recommend!) are written in terms of the number
of guesses. And reading (most of) those two papers made me
less happy about referring to "entropy" in our terminology.
I already knew that notion was a little fuzzy if looked at
too closely, and I gained a better appreciation of how it's
contributed to confusion in discussing password policies and
to adoption of perverse policies that favor "Password1!" over
"derived unusual ravioli raft". So, "guesses" it is.
And although the log is handy for some analysis purposes
(certainly for a graph like those in the zxcvbn paper), it adds
a layer of abstraction, and I think makes it harder to think
clearly about attacks, especially in the online setting. So
just use the actual number, and if someone wants to set a
gigantic value, they will have the pleasure of seeing just
how many digits are involved.
(Thanks to @YJDave for a prototype that the code changes in this
commit are based on.)