End-To-End Web Crypto: A Broken Security Model

End-to-end encryption of web services is increasingly popular: Mailvelope aims to bolt a PGP client onto webmail and both Yahoo and Google are working to add support directly. However, the fundamental nature of the web and the limits of human cognition make web-based E2E encryption susceptible to MITM attacks.  While still potentially useful, such systems should not be used by high-risk populations such as journalists and human rights workers.

The dynamic nature of the web gives service providers the ability to target individual users with a backdoored version of their web client every time the site is loaded, an attack that Hushmail validated back in 2007.  Mailvelope and similar browser addons can move message decryption to iFrames or new windows and rely on the same-origin policy to restrict the reading of content from the service provider.  Unfortunately, as long as a service provider can spoof the UI, they can copy the plain text version of new messages and send an encrypted version to the recipient.

Screenshot of Mailvelope UI overlay in Gmail.Mailvelope attempts to mitigate spoofing attacks through the use of security iconography, or “watermarks” as Mailvelope calls them.  Mailvelope randomly generates a security icon during installation which is incorporated into Mailvelope UI elements.  If the icon is different, users are not supposed to proceed, akin to site-authentication images used in some bank logins.  However, security icons cannot effectively mitigate a UI spoofing attack because security icons do not work.

Image of Mailvelope Watermark

Mailvelope watermark overlay of decrypted message.

Researchers have been testing the efficacy of security iconography for over a decade, and the results are dismal.  The most dramatic “experiment” was performed by Moxie Marlinspike in 2009.  Marlinspike removed encryption from connections using a malicious Tor exit node, which also removed the browser encryption icons.  Despite drawing his sample from a population with above average technical acumen and paranoia, he achieved a 100% “success” rate; meaning that every user who visited a login page logged into to their account. Marlinspike collected over 400 logins and 16 credit card numbers in 24 hours.

Of course, the encryption icons for browsers are smaller and somewhat different from what Mailvelope uses.  The closest thing to Mailvelope’s “watermark” are personalized site authentication images displayed by many banks during the login process.  In The Emperor’s New Security Indicators, researchers asked users to login to their bank and surreptitiously removed site authentication images, 22/25 of the participants sent their login information – a 92% failure rate.  Some will question the validity of this statistic due to the small sample size, however, the results are in-line with a decade of research and the lab setting boosts user awareness.

Increasing the size and prominence of the security indicator will not decrease the failure rate to acceptable levels.  One study devoted the entire browser skin to conveying encryption information and saw only modest improvements to user behavior.  It shows that most users don’t understand the purpose of the information and that the software must determine if something is safe.

Screenshot of browser skins with SSL information.

Screenshots of the browser skins used in one research study. A was displayed for an EV SSL certificate, B for traditional SSL certificates, and C for unencrypted connections.

The fundamental issue is that human cognition has limits: we cannot process unlimited amounts of information.  The assumptions made by the security model underpinning security iconography ignores a decade of behavioral studies and runs counter to 50 years of cognitive psychological research.  Just try to accurately count the number of times a player passes a basketball in the following video:

The 50% failure rate for the above video is artificially high, as the laboratory environment heightens user awareness.  The task is also very different from that of checking a security icon.  The tricks employed by pickpockets are a better real-world analogy for spoofing a security icon.  Watch as Apollo Robbins carefully manages the mark’s “cognitive spotlight” using misdirection to control the information that the mark is consciously aware of:

At 3:10 you can see how Robin applies pressure to the wrist near the watch clasp and then draws the mark’s attention elsewhere.  The initial stimulus is registered, deemed non-relevant, and then Apollo uses misdirection to remove the stimulus from the cognitive spotlight.  The stimulus is still present, but the nervous system and the brain must filter the signal down to relevant stimulus.

A user composing messages is in an even more precarious situation, as habituation conditions us to preemptively ignore information.  Unless something is integral to the task itself, we will filter it from your cognition.  Even if the user is asked to confirm that the icon is valid, they will habituate the task and complete it automatically1.  While a pickpocket must create new stimulus to control the cognitive spotlight, habituation will suppress the stimulus from even reaching the cognitive spotlight.

I don’t want to get too deep into the neurological details, but the inevitability of filtering stimulus that is irrelevant to the task workflow is fairly obvious if you think about the amount of information your body processes: temperature and pressure from your skin, taste and smell from your nose and tongue, your complete field of vision, and all of the sounds present in your local environment. There are even filters for information that has been processed abstractly, which is why you can pick up on someone saying your name at a cocktail party but ignore ambient conversations after you start talking with your friend. Without these filters, we would be overwhelmed with irrelevant information.

The inability to effectively mitigate user interface spoofing attacks cripples the usability of these bolted-on E2E interfaces. They must lift the new message and reply UI elements out of the browser chrome. They must also create a distinct contact manager to handle public keys. The only thing left is detecting encrypted messages, which Mailvelope decrypts and displays in an iFrame. I’m not sure that even this is safe, since the service provider could display a “Reply Securely” button over the decrypted message.

A website is a very hostile environment to be operating in. The URL bar is a remote function call interface which retrieves a Turing complete programming environment in the form of a website.  The service provider can target individual users, deliver new exploits at any time, and has total control over the messaging system.  I’m just not sure that we should be relying on the same-origin security policy of browsers to protect our encrypted communications.

On the web, the best we can do is ensure a secure connection and valid DNS information; trust in the service provider should be assumed.  With traditional software systems, we can use reproducible build systems to distribute trust and security audits to increase the cost of backdooring software. But without a clear separation between the messaging system and the software used to retrieve messages, we cannot build usable messaging systems that deploy end-to-end encryption.  Any user interface that is secure against UI spoofing will only be a step above manually copying and pasting in the ciphertext.

Mailvelope and service provider based end-to-end encryption is still potentially useful. They raise the cost of an attack and force service providers to participate in serving backdoored versions of their sites and may add additional legal hurdles.  It *may* be possible to bolt a usable PGP client onto website in a that can defend against malicious service providers.

But one of the many lessons Snowden has taught us is that the only thing worse than bad security is the illusion of good security. Such solutions should not be used by high-risk groups until they can prove that they can reliably defend against malicious service providers.  Until then, vendors of such software have a moral duty to try and prevent users from high risk groups from using their software.

Update: Some comments on my blog assert that the security situation is very similar to what can be performed through basic software updates.  I’m aware of this, an I have a few thoughts.

First of all, journalists and human rights workers really should be using TAILS – which is at least publicly auditable and in a position to refuse to comply with US court orders.  Furthermore, I believe that we should treat development of software for these users as if lives are on the line.  In that regard, it is at least possible to make attacks against the operating system and applications more expensive, which isn’t true of the attacks against E2E web crypto.

For example, we can create an operating system that uses reproducible builds for everything.  We can also define a software subset that is required for journalists or human rights workers (an email client, a chat client, a basic document editor, and a web browser) and use security audits, sandboxing, and (eventually) formal verification to drastically increase the cost of an attack.

I’m also concerned about projects like okTurtles, which want to make it easy to build a Mailvelope-like E2E client for any software platform, such Facebook and Twitter.  okTurtles claims to be “MITM-proof” and mentions journalists and human rights workers in its marketing.  I’m afraid that someone will take them at their word build an okTurtles front end for vKontakt or Weibo.  I hate the NSA and the US has a shitty human right record, but this would make it easy for Russia and China to precisely target users.


  1. Browser add-ons cannot switch to blocking the user’s task flow, as they are dependent upon the service provider delivering accurate hooks into their interface.  For example, no one would notice a silent forward from mail.google.com to www.mail.google.com. 

5 Responses to “End-To-End Web Crypto: A Broken Security Model”

  1. Clilve April 7, 2015 at 8:30 am #

    Interesting analysis. Just lifting a couple of the many great points you make, we get the following:-

    “However, the fundamental nature of the web and the limits of human cognition make web-based E2E encryption susceptible to MITM attacks.”

    and

    “On the web, the best we can do is ensure a secure connection and valid DNS information; trust in the service provider should be assumed.”

    and

    “But one of the many lessons Snowden has taught us is that the only thing worse than bad security is the illusion of good security. Such solutions should not be used by high-risk groups until they can prove that they can reliably defend against malicious service providers. Until then, vendors of such software have a moral duty to try and prevent users from high risk groups from using their software.”

    Another way of looking at your analysis and drawing conclusions would lead me to suggest that:-

    1. Any attempt at end-to-end security via the web is inherently flawed.
    2. Worse than a flawed web is the suggestion of a secure web when we know it to be flawed.
    3. It doesn’t matter what we try and introduce into *this* paradigm, what we have today is inherently broken at the *paradigm* level. To be secure, we need to develop a new paradigm, to go back to basics.

    As the Google/Mozilla/CINIC story shows, the “chain of trust” with CAs is inherently broken – or, at minimum, vulnerable to abuse.

    So whilst taking all your analysis and building on it a little bit, perhaps another approach would be to cast an even more critical eye of the current web/HTTP model and ask if this meets our current requirements? [ Spoiler alert: no, it doesn’t ].

    What’s missing? Well, a few things…

    1. End-to-End Risk Assessment
    When you connect your browser to a remote web site and get the “padlock”, all you really know is that there is a certificate in use that has been signed by a CA on your browsers well-known-CA list. What you *don’t* get includes:-

    1.1. Confidence in the DNS service that brought you to this host
    1.2. Any assessment of the local security that the host provides
    1.3. Any form of comparative analysis of the access path – any new or different hops get introduced since last visit?
    1.4. Real-time side-channel analysis – is your local endpoint reacting in response to the remote endpoint you’re in dialogue with?

    [ I’m certain that some of the above may be irrelevant and that other, better checks are missing. But you get the point].

    The issue here is that we’re discussing a redesign of the house, but our problem is with the foundations. Unless/until we’re willing to dig those flawed foundations up and re-build them, our house is vulnerable. This is likely not just a requirement for new protocols, but entirely new models.

    Bad example: if we can’t trust a CA to retain integrity, then should we trust the CA (“root of trust”) model at all? If our root is compromised, we’re blown. Now the GPG approach to this is the idea of self-signed and then individually-trusted keys. For this to work and for you and I to exchange data securely, we’d have to swap [signed] certificates. Then we’d check out each-other’s certificates and look at all the people who had signed them. A bit of software would then try and “map back” to see if I trust A, who trusts B, who trusts C, who trusts D, who trusts you…. In this model, each connection would establish a real-time security profile, with a more nuanced view. Upside is that it’s likely pretty elegant; downside is that unless we can make it trivially simple it causes as many problems as it solves. Upshot is: this will be hard.

    But my point stands.

    Your article correctly points out that the underlying principles of the web are flawed. The solutions currently on the table appear to be bolt-on additions to the existing problem. To quote Einstein,

    “We can’t solve problems by using the same kind of thinking we used when we created them.”

  2. Stuart Gathman April 7, 2015 at 12:19 pm #

    This article is basically making the distinction between “method failure” and “usage failure” – ala birth control.

  3. Chris Palmer April 7, 2015 at 1:19 pm #

    Thank you for this interesting article! Some thoughts.

    “””The dynamic nature of the web gives service providers the ability to target individual users with a backdoored version of their web client every time the site is loaded”””

    This is true — the user must trust the web application and its author(s) and operator(s).

    However, the same is true for non-web applications: We must trust Ubuntu and GnuPG just as fully. You might imagine that we can get Ubuntu and GnuPG just once, and hence trust them just once, but it is still full trust. And, as attacks against a given set of software packages become known, better known, and cheaper, we must get updates. Updates work best when automatic. Is not an auto-updating Ubuntu essentially as fluid as a web application?

    “””The 50% failure rate for the above video is artificially high, as the laboratory environment heightens user awareness.”””

    Do you mean “the 50% success rate”? I.e. outside a laboratory environment, the success rate would be even lower.

    “””A website is a very hostile environment to be operating in.”””

    Do you mean “a web browser”?

    “””On the web, the best we can do is ensure a secure connection and valid DNS information; trust in the service provider should be assumed. With traditional software systems, we can use reproducible build systems to distribute trust and security audits to increase the cost of backdooring software.”””

    I would argue that key pinning and/or Certificate Transparency provide an audit capability similar to binary transparency and reproducible builds. For both web apps and native apps, the user must still fully trust the application provider; but at least they can have some assurance they have the provider’s intended blob of code.

    (Also, I consider reproducible builds to be nice but not mandatory; binary software distribution is an optimization that might not be necessary. What if we had source code transparency (based e.g. GPG-signed Git) and a fast compiler and build system? The GNU toolchain is unnecessarily slow; C compilation can be quite fast. See e.g. http://bellard.org/tcc/.)

    “””It *may* be possible to bolt a usable PGP client onto website in a that can defend against malicious service providers.”””

    I definitely believe that if the user is willing to trust the web application provider, the web app provider can provide a meaningful PGP client. Did you mean that it may be possible to integrate PGP functionality into the browser? In such a case, the user’s agent could encrypt and decrypt in an environment (the browser process) separate from the web content (which runs in isolated renderer processes in e.g. Chrome). Then a user could choose to trust their user agent, without needing to trust a particular web application.

    (Also, FWIW, your site has mixed scripting content. You should fix that. :) )

  4. Ignacio Agulló April 9, 2015 at 5:03 am #

    I sent a comment two days ago, but wasn’t published. Censorship?

    • indolering April 9, 2015 at 10:28 pm #

      You’re comment didn’t make much sense. The specific attack I’m talking about doesn’t rely on checking SSL indicators, but a similar mechanism. However, even if it did require checking SSL indicators, I explain in the article that users do not check those indicators.

Leave a Reply