Categories
Security

Gravatar Advisory: How to Protect Your Email Address and Identity

Update: We’ve added comments at the end of the post pointing out that the National Institute of Standards and Technology (NIST) considers an email address to be personally identifiable information or PII.

Gravatar is a service that provides users with a profile image that can appear on many sites across the Net. It is integrated with WordPress.com (The version of WordPress hosted by Automattic) and is also integrated into WordPress.org, the self hosted version of WordPress. Gravatar is also used by many other popular services on the web like StackOverflow.com.

If you sign up for a website on WordPress.com and publish a blog post, a Gravatar icon appears on your site as your profile photo, indicated by the red arrow below. You can visit gravatar.com to customize that icon and upload a photo of your own.

screen-shot-2016-12-07-at-7-39-14-pm

 

If you use WordPress.org, Gravatars are an option you can enable for your users and they are widely used. It will either show their profile photo if they have gone to Gravatar.com to create one, or it will show a default image. You can select from several kinds of default images.

screen-shot-2016-12-07-at-7-31-37-pm

 

Other services like StackOverflow, one of the most popular sites on the web, also use Gravatar for profile images.

In the HTML source code of your website, Gravatar loads images using a hash of your email address. If you read our post earlier this week where we discuss the problem of malware scanners using weak hashing algorithms, you will have a basic understanding of how a hashing algorithm works. In short, a hash algorithm turns some value into a long number and in theory it is difficult to turn that number back into the original value.

Even if you haven’t signed up for a custom profile image at Gravatar.com, a hash of your email address still appears in the source code of any website that integrates this service.

You can see in the screenshot below how Gravatar loads your profile image using a hash of your email address:

Gravatar in HTML source
The value that appears after /avatar/ above is: fe967ccdc7b3caa33e0480bb95ae6588
That is a number (in hexadecimal) that is a hash of the email address that I used to create a WordPress.com website. The email I used is gravhashtest@wordfence.com.

I can run a PHP instruction to verify that. If I run the following PHP code, it produces the above hash:


<?php

echo md5('gravhashtest@wordfence.com');

This prints the value: fe967ccdc7b3caa33e0480bb95ae6588

Using Gravatar and GPU cracking to steal email addresses

If I want to steal a lot of email addresses, I need to turn those hashes back into email addresses somehow. If I can figure out a way to do that, I can crawl wordpress.com, all the self-hosted wordpress.org websites and a lot of other services like StackOverflow and harvest a huge number of email addresses for spamming. I may also be able to reveal the email addresses of people who want to remain anonymous.

It turns out that someone already thought of this. In 2009 a researcher proved that he could reverse engineer about 10% of gravatar hashes into email addresses.

Then in 2013 Dominique Bongard presented a talk at PasswordsCon in Las Vegas where he demonstrated that he could reverse engineer 45% of Gravatar hashes into email addresses. He targeted a well known political forum in France which uses Gravatar for user profile pictures.

The big difference in Dominique’s approach is that he used Hashcat, which is a password cracking tool. He repurposed it so that he could reverse engineer Gravatar hashes into email addresses. The reason this is important is that Hashcat executes significantly faster because it uses consumer graphics processing units, or GPUs, which are used by gamers to accelerate game graphics performance. Cracking hashes with GPU acceleration increases performance by a factor of several thousand.

At Wordfence we have done a significant amount of experimentation with GPUs and hash cracking and we even provide a commercial service as part of Wordfence Premium that uses a GPU cluster to perform a password audit on your WordPress website. We launched this service over a year ago. The photo below is the password cracking cluster we designed for this service. Those are liquid cooled chrome GPU pipes in the photo. They look even better in real life.

GPUs for password auditing

When Dominique did his talk in 2013 on using Hashcat to turn Gravatar profile hashes back into email addresses, the Nvidia GeForce GTX Titan GPU was released which provided 5045 Gigaflops of processing power.

In May of his year Nvidia launched the GeForce GTX 1080 which comes with 8873 Gigaflops of processing power. In just two years the amount of processing power that is available has almost doubled.

When you consider that 2 years ago a single researcher reverse engineered 45% of gravatar profile photos into email addresses, it’s quite possible that a criminal group armed with a modern GPU cluster, as shown above, could reverse engineer a far higher percentage today. The problem will only get worse.

Email hashes may expose your identity across the Web

The use of email address hashes has a further problem. If you view the source of a website using Gravatar profile photos, extract the hash and then google that hash in quotes, you can find other websites and services that are used by the individual you are researching.

For example: A user may be comfortable having their full name and profile photo appear on a website about skiing. But they may not want their name or identity exposed to the public on a website specializing in a medical condition. Someone researching this individual could extract their Gravatar hash from the skiing website along with their full name. They could then Google the hash and determine that the individual suffers from a medical condition they wanted to keep private.

To demonstrate this issue, we have created the form below which you can use to do a Google search of the MD5 hash of your own email address. We don’t log anything. This simply uses pure javascript to open a new window or tab with a google search of the hash of your email in quotes. Enter your email in the text field below and click the link to do the search. You should note that Google doesn’t index all Gravatar hashes because they appear in page source. But you may find a few interesting results that help illustrate the problem.

Email:
Click to Google an MD5 of your email

The above can be used to Google an MD5 hash of anything. Try entering in your domain name or common passwords (not passwords you actually use). Let us know what you find in the comments.

What to do to protect your email address and identity

To solve the identity and spam problem that Gravatar presents, the most effective option is to use a unique email address to register on each website you are a member of. The email address should be hard to reverse engineer.

If you use an @gmail.com address, Gmail provides a feature whereby you can append a plus sign to your email address and anything after it is ignored. If your email address is yourname@gmail.com, you can change it to yourname+junkGoesHere@gmail.com and you will still receive the email.

What we suggest you do is use a unique gmail address on any Gravatar enabled website when you register. Therefore yourname@gmail.com would become: yourname+2h4J1q9ZuU9@gmail.com. Gmail has documented this feature here. The feature also works with hosted Gmail addresses where you use your own domain. Outlook.com also provides this feature.

Using this technique makes it much harder for a spammer to reverse engineer your email address from a Gravatar hash. Try to make your email address at least 20 characters long and include upper and lower-case letters and numbers in the suffix after the plus sign. If you have uploaded a custom Gravatar profile image, you should note that this has the side effect of not displaying that image on the websites where you make this change. Instead you will get a default profile image.

Receiving extra spam is an inconvenience. It can be a minor inconvenience if you have an excellent spam filter in place. However, having your identity exposed on a website where you assumed your identity was private can be embarrassing at best and have far worse consequences. We therefore suggest that you switch to using a plus-suffix on any website where it is important to maintain your personal privacy. 

What should Gravatar do?

This presents a significant challenge for a service that is as widely used as Gravatar. They can’t simply upgrade their own systems. Web applications that have integrated Gravatar rely on the fact that they can request an image with an MD5 hash of a user email address and get a profile photo in return. These applications all need to be updated too, and there are thousands  – quite possibly tens of thousands of them.

Even if Gravatar switch to SHA-2 or a longer and stronger hashing algorithm, they are still vulnerable to GPU accelerated email cracking attacks. The identity problem will also still exist.

They could consider switching to a more computationally intensive hashing algorithm like bcrypt. That would provide significant resistance to reverse engineering. But it comes with the obvious cost that it is computationally intensive. Gravatar need to generate a lot of hashes to provide the service they do. Developers who integrate Gravatar into their products also need to generate hashes from email addresses. Both will suffer from increased resource usage if they start using bcrypt.  It also doesn’t solve the identity problem.

There are other options available like using a shared secret between developers and the Gravatar servers to generate hashes. These come with their own implementation challenges and performance implications. This option may solve the identity issue because it could generate unique hashes across websites that are also hard to reverse engineer.

A final option is to switch to locally hosted images and move away from hashes or global unique identifiers of any kind. This will introduce more complexity for developers who want to integrate Gravatar into websites, but has the benefit of doing a better job of protecting user privacy and avoids disclosing email addresses.

Further comments on privacy

This is a complex problem and there is unfortunately not an easy fix for Gravatar. In my opinion, the most important issue here is the potential exposure of user identities. I think the medical example that I provided above illustrates how much damage can be done if a user identity is exposed under certain conditions.

That is why the privacy implications of this problem cause the most concern. If you aren’t particularly technical you may simply trust a website owner who says that your full name and personal information won’t be exposed. With the current way Gravatar works, you run the risk of having that information exposed.

As always I welcome your comments below and will respond as time permits.

Update: After publication, one of our senior staff pointed out that the National Institute of Standards and Technology (NIST) considers an email address to be PII, or personally identifiable information. Please see the NIST publication 800-122 “Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)“. PII has a legal meaning in many jurisdictions and is used in the definition of privacy law.

The post Gravatar Advisory: How to Protect Your Email Address and Identity appeared first on Wordfence.

Categories
Security

Avoid Malware Scanners That Use Insecure Hashing

In this post I’m going to discuss a major problem that exists with several WordPress malware scanners: The use of weak hashing algorithms for good and bad file identification. Some malware and antivirus scanners outside of WordPress suffer from this same issue.

For brevity, I’m going to refer to this as the “weak hash scanner” issue.

This issue may allow an attacker to hide malware that is undetectable to scanners using the MD5 hashing algorithm. Below I will explain how hashes are used in the security industry, what the problem is and how to solve it. I’ll also point you to research demonstrating this issue and further reading. I’ll also describe how Wordfence uses a secure hashing algorithm for our malware scanner.

How we use hashes in the security industry to find bad things

In the security world we have a commonly used process of running a file through a piece of logic, called an algorithm, and generating a unique number. That number is used to uniquely identify files. This process is called a hashing algorithm and the unique number is called a ‘hash’.

We use hashing algorithms for all kinds of really cool and useful stuff. We can take a piece of malware, create a hash for it and then store that hash. Later, we can create a hash of a file we’re scanning to check if it contains malware. If that hash matches the hash of the malware we created earlier, then we know the file is that malware.

We can also use hashes to identify “known good” files. At Wordfence we have created hashes of every file we know is safe in the WordPress universe. We have hashes for every theme, plugin and core release in WordPress history. In fact, we have hashes of every file in every version of WordPress core ever released and every version of every theme and plugin ever released.

Right now Wordfence tracks hashes for:

  • 205,146 WordPress core files that Wordfence knows are safe. 

  • 5,967,361 WordPress theme files that Wordfence knows are safe.

  • 23,527,261 – yes that’s 23 Million – WordPress plugin files that are known to Wordfence to be safe. This is every version of every file in every plugin ever released. 

Hashes are a way for security companies like us to store a small piece of data that uniquely identifies known bad or good files, and then use that data to check if those files exist on a system we’re scanning. Then we can make a decision about whether to preserve the file or get rid of it.

The diagram below illustrates how malware scanners use hashing to identify good and bad files.

How scanners use hashing

Not all hashing algorithms are equal: MD5 vs SHA-2

There are various ways to create a hash. When you run a file through one of these hashing ‘algorithms’, they create a unique number of a fixed length. MD5 is a hashing algorithm that was created in 1991 by Professor Ron Rivest at MIT. It was incredibly useful but is now quite old and has some problems.

Another newer and much more secure hashing algorithm called SHA-2 was developed by the National Security Agency and released by the National Institute of Standards and Technology in 2001. Today SHA-2 is widely used and considered secure enough for commercial use.

MD5 is quite old now and the problem with it is something called ‘collisions’. It’s easy to understand the issue: With MD5, it’s possible to create two different files that have the same MD5 hash, or unique signature. This could be used, for example, to fool a malware scanner into thinking a malware file is actually a known-good file.

That is why we use SHA-2 in Wordfence to track known good files. It prevents an attacker from creating a bad file that has the same hash as a known good file and avoiding detection.

The weak hash scanner problem

Unfortunately not all security products do this. In the WordPress space, some malware scanners uses plain old MD5 to hash files when searching for malware. Sucuri’s WordPress plugin and “Shield WordPress Security”, for example, use MD5 to detect core file changes. The way they do this is they grab the newest MD5 hashes from api.wordpress.org.

The API these products use was not designed to be used for malware scanning. It was originally created for the WordPress upgrade process back in 2013 to help determine which files need to be upgraded. The MD5 algorithm used by this API is not cryptographically strong enough to be used to detect malicious or safe files.

At Wordfence we use SHA-2 and this is one of the reasons we have created our own API endpoint that we use for malware scanning. Doing this allows us to use a cryptographically strong hash function to ensure that malware can’t evade detection by exploiting weak hash algorithms. We have been using SHA-2 since 2012, when the very first version of Wordfence was released as version 1.1.

Last week a security researcher demonstrated how it’s possible to create two windows executables that both have the same MD5 hash. This allows an attacker to create one friendly executable and another malicious file that will later replace the friendly file and avoid detection.

In 2014 Nat McHugh showed how to create two different PHP files and two different image files with the same MD5 hash. This demonstrates the same concept in PHP – that an attacker can create a friendly file which becomes trusted and later replace it with a malicious file that avoids detection by MD5 scanners.

This research has actually been around for some time now. The attack is called a ‘chosen prefix’ attack on MD5 in the security industry. It first came to light in a paper in 2005 written by Xiaoyun Wang and Hongbo Yu at Shandong University in China in which they refer to it as a modular differential attack on MD5.

In 2007, Marc Stevens created an open source toolkit as part of his masters thesis which actually exploited this weakness in MD5. These tools are what were used by the researchers above to create different files with identical MD5 hashes.

This research demonstrates that it’s already possible for an attacker to exploit MD5 to provide a safe file and later replace it with a malicious file that will avoid detection by scanners using MD5. It may soon be possible to create a malicious file that shares the same MD5 hash as a legitimate WordPress core file. For this reason it is important that malware scanners avoid MD5 and use strong cryptographic hash functions to verify file integrity.

What to do about this

The goal of today’s blog post is to encourage two things:

  1. If you are a customer of a security product, make sure your product is using SHA-2 or another secure hashing algorithm for malware scanning and other checks. If a product uses MD5, it risks being fooled into thinking a file is safe when it is dangerous.
  2. If you are a security vendor and have not already switched to SHA-2 or a secure hashing algorithm, it’s time to do so now in the interests of your customer’s security.

As always I’m happy to respond to questions and comments below.

Mark Maunder – Wordfence Founder/CEO.

Footnotes:

You can learn more about hashing, how it is used for passwords and how to crack passwords using consumer GPU hardware by visiting our Password Authentication and Password Cracking article in the Wordfence Learning Center.

The post Avoid Malware Scanners That Use Insecure Hashing appeared first on Wordfence.