Zombie Tokens and the Usability of 2-Factor Authentication

It turns out that ~20% of the time users spend on traditional 2-factor authentication apps is wasted on dead tokens. I call these tokens “Zombie Tokens” and I have a clever solution for them.

Two-factor authentication (2FA) has become very popular because stolen tokens cannot be used to authenticate the user at a later date. So if the password is leaked or easily guessed, the user is still safe.

However, I’ve noticed a serious UX deficiency common to time-based (TOTP) 2FA client applications such as Google Authenticator and Authy: they force users to spend a lot of time on useless tokens.  This has gone unnoticed by implementers because they do not experience the interaction in the way a typical user would.

Authentication servers actually accept late tokens, typically within one “time-step” – which is about 30 seconds.  So the server won’t reject a token even if the user submits the token at the moment that it “expires” locally.  However, users are ignorant of the backend and the vast majority of them will assume that the token expires shortly after it disappears from their screen.

Users also have to type the token in before it disappears from the screen.  People of average intelligence can simply read the 6 or 7 digit code and reproduce it from memory.  However, roughly 50% of the population would be unable to store the token in short-term memory, so they would have to refer back to the application to re-read the token.

Crafting an interface for a 2FA application should thus take into account that users will not submit a token unless there is enough time for them to type it in and submit it before the token expires locally.  Some modeling and time trials (see Appendix A: Timing) suggest that it takes about 5 seconds to reliably type in a security token and about 10 seconds to comfortably type one in.

Since tokens are time-based, a user may get a token that has 30 seconds or 5 seconds remaining before it expires1.  Since it takes ~5 seconds to type in the token it would appear that users are (on average) waiting for the next token to appear ~20% of the time.

A few seconds might not sound like much, but waiting is irritating.  Think about it: ~20% of time people spend using a 2-factor authentication app is time spent being irritated.  So what can we do? Show the upcoming token! You don’t even need to label it, just use an animation to replace the expired token with the upcoming token:

Screenshot of Authy 2-factor

Screenshot of Authy, a popular 2-factor app with the standard 2-factor UI.

Mockup of a 2-factor client with the next token being displayed before the current token.

Mockup of change with upcoming token displayed below the current token.

I suggested this to the Authy team and I got a response that is sadly familiar to anyone doing security focused usability research:

May I ask why you would need to enter in the code so quickly and frequently? We don’t share the next security code until the prior has expired for security reasons. Having both up at the same time would take away from the security that is provided by the codes.

Translation: you are lazy and I’m going to point to a minor security issue to justify not making this change.

Well, it’s not quite that ignorant, I received a similar response other programmers and security researchers.  The problem is that these are smart people who can take in token in a single glance and know that the backend servers will accept it even if the client application has “expired”.  Expert users understand that a Zombie Token will still work while regular users will assume that such tokens are simply dead.

I say that the security issue is minor because taking advantage of the additional token being displayed on a phone’s screen would require an attacker that can view the phone’s screen and enter in the token in real time.  An adversary with the resources to carry out a dedicated attack and view the user can typically gain physical access to the computing device.  The only scenario in which showing the additional token would matter is if the attacker can see the user through a window or was able to plant a camera in the workplace but unable to access the computer directly.

However, it’s fairly trivial to accommodate this threat model while still eliminating Zombie Tokens.  The fix is pretty simple: five seconds before the current token expires, display an obfuscated preview of the upcoming token that the user can manually swipe in:

Upcoming token peaks in on the side when 5 seconds remain.

The upcoming token peaks in on the side 5 seconds before the current token expires.  Users can manually swipe it in and replace the current token.

If the user finishes typing in the current token and closes the app before it expires, the upcoming token won’t be displayed in its entirety and the attacker won’t gain access to an unused token.  If the user would have waited for a new token anyway, they can skip ahead and start typing immediately.  This additional interaction ensures that the two interfaces are equivalent in terms of security (see Appendix B: Threat Modeling for an in-depth explanation).

This fix isn’t quite as usable as simply displaying the next token as users have to figure out that they can swipe the new token into place.  If you are willing to relax the security requirements a bit, you could show the full upcoming token but limit its display to ten seconds before the current token expires.

Usability is security: the more usable a security measure is the more likely people are to use it.  Two factor authentication is a great way to beef up the security of traditional login systems.  Sadly, Zombie Tokens make the user experience irritating 20% of the time.  But we can eliminate Zombie Tokens without sacrificing security.  Authy may or may not take my advice, but I’ve done my part in trying to wipe out Zombie horde.

Appendix A: Timing

I arrived at the 5-10 second time through a combination of timing trials in which I entered in a 6 digit token and modeling user input using GOMS.

GOMS is a framework for calculating the efficiency of user interfaces: you build a model by inputting each task and the framework calculates the total time required to complete the task based on time and motion studies of actual users.  It quite literally calculates the time required type in text, move a hand to the mouse, click a button on a screen, etc, etc.  The model I used to calculate the 5 second lower bound looks like this:

Look at phone
Store 544438
Look at computer
Hands to keyboard
Type 544438
Hands to mouse
Point to submit button
Click submit button

The above task is estimated to take 5.0 seconds to complete.  It is consistent with a few time trials in which I entered in a 6 digit token. However, the 5 second time felt very rushed and both the above GOMS model and the time trials made a few assumptions:

  • The token is 6 digits long.
  • The text field for entering the token has the keyboard focus.
  • The user can store the entire token in short term memory.
  • The user does not attempt to verify the token’s correctness.

Lengthening the token to seven digits and adding additional mousing and verification tasks to the GOMS model boosts the estimated completion time to 9.4 seconds.  Correcting a typo within the GOMS model resulted in a estimated time of 11.1 seconds.  When balanced with the subjective observation that typing in a 6 digit token in under 5 seconds feels rushed, the average time to comfortably complete the task probably lies somewhere between 5-10 seconds.

Appendix B: Threat Modeling

Matching the security parameters of the existing system entails not leaking any tokens that wouldn’t be leaked by the existing UI under an equivalent attack scenario.  We’ll start by defining the threat model and examining what tokens the existing UI would leak.

To take advantage of the preview token, an attacker would need to be able to view a phone’s screen and submit the token in real time.  For the sake of argument, we will assume that the surveillance is limited to a camera that has been planted in the workplace or that the target user is visible through a window.

An attacker with this level of access could steal the current token and submit it before the legitimate user is able to. I simulated this attack by switching Firefox and Chrome into incognito mode, logging into web service and reaching the 2FA input screen in both browsers, and then submitting the same 2FA token in both browsers in rapid succession ((It would be fun to see what happens when the IP addresses differ but my SOCKS proxy failed me.  However, if an attacker can plant a camera in your workplace they can probably plant a WiFi AP.  The IP address checks are often based on geo-location anyway, so the attacker would probably be okay as long as they are using an IP within the same area.)).  I performed this test with GitHub, CloudFlare and Coinbase.  GitHub allowed the login whereas CloudFlare and Coinbase simply reloaded the page.  So for many of these systems, the attacker could enter in the current token and the user wouldn’t be alerted to the fact that their token was hijacked.

The traditional UI would also leak unused tokens.  The user wouldn’t necessarily lock their phone’s screen right after entering in a token on their computer.  Indeed, I would suspect that most users would get distracted with the task at hand and forget about the phone sitting on their desk, broadcasting fresh security tokens to any overhead camera or onlooker.

If a user is presented with a token that will only last for 5-10 seconds, they will probably just wait for the next token.  An attacker could submit the unexpired token while the user waits for a fresh one.  They would have to be quick, but the threat model assumes an attacker performing real-time surveillance.  I’m sure someone could whip up a login script to speed up the process while the target is on lunch break.

To review, the current UI would allow an attacker to steal:

  1. the current token, although some systems might display a security warning;
  2. any tokens that are displayed after the user has entered in the current token unless the user turns off the screen immediately after entering in the token;
  3. any token that the user allows to expire while waiting for a new token.

Preventing a token from leaking under #3 requires preventing the upcoming token from being displayed unless we are positive that the user won’t use the current token.  So if the user is still typing in the current token <10 seconds before it expires, we cannot display the upcoming token.  Instead of simply displaying the upcoming token, we obfuscate it and require user interaction to display it in its entirety.

  1. For some configurations, the Authy app displays a fresh token when the app opens but reduces the valid time to 20 seconds.  This is an improvement, but the user may not be ready to type in the token when they open the app. 

No comments yet.

Leave a Reply