AWS S3 problems - don't point the finger at Amazon, blame users

finger pointing

Just before everyone disappeared off the face of the working planet for a long Easter weekend, Cloud Pro reported how the Amazon Simple Storage Service (S3) was potentially exposing private and confidential documents to the public at the rate of one in six data depositories, or 'buckets' as they are known.

At first glance, this appears to be a dramatic security vulnerability; especially when you take a closer look at the type of data being exposed to public view. That’s according to the security researchers at Rapid7 who disclosed the problem in a blog posting, provocatively entitled There's a Hole in 1,951 Amazon S3 Buckets.

After reviewing the permissions of 12,328 Amazon S3 buckets the Rapid7 team revealed that, of the 1,951 'public' ones there were some 126 billion files exposed in all, around 60 percent of which were images. However, there were also 28,000 PHP source files (including database usernames, passwords and API keys) and 218,000 CSV files (including personal data such as email addresses and telephone numbers). Oh, and something like 5 million text files, large numbers of which were marked as private or confidential and contained sensitive personal credentials; details about the organisations concerned and their customers.

Getting even more specific on the information that was exposed in these buckets, Rapid7 cites examples such as sales records and accounts from a large car dealership, source code and development tools from a mobile gaming outfit, sales 'battlecards' for a large software vendor and assorted cases of employee personal information across various spreadsheets.

The fact that there were, then, apparently some rather large and well-known companies represented in the list of publicly exposed data buckets, and the most common exposure by far was through log backups (many containing site data along with encrypted user passwords) left globally accessible, is a cause for concern.

No doubt about it, that represents a security risk that any CISO would quite rightly throw a fit at hearing about. A fit that may well be aimed in the general direction of Amazon S3, but that would be missing the point by a country mile. Amazon is not at fault here, users are, plain and simple (and simple is the operative word).

In our original news story, we quoted an Amazon spokesperson who, quite rightly pointed out, that these data buckets are not set to 'public' by default. "There are many legitimate reasons a customer may wish to share information with others, and S3 provides mechanisms to do so, which the customer has complete control of" was the precise comment in question.

Indeed, the security researchers at Rapid7 state that "it should be emphasised that a public bucket is not a risk created by Amazon but rather a misconfiguration caused by the owner of the bucket" and go on to admit that even if a file is listed in a public bucket that doesn't automatically mean that it can be downloaded as "buckets and objects have their own access control lists".

So while I don't wish to dismiss this whole thing as a fuss about nothing, nor suggest that the researchers themselves have blown the risk out of all proportion, I do think there is a danger that some media reporting is likely to grasp the headline and run with it without delving too far into the detail when it comes to cases such as this.

The real story is one of asking why the companies concerned would have weakened their security posture by misconfiguring the service in the first place.

Amazon has stated that there are safeguards in place to prevent misconfiguration, yet the evidence is clear that people who really should know better have chosen to make data public rather than keep it private by default. It's not even a difficult thing to check as the bucket URLs are hardly obfuscated, quite the reverse: they all take a pretty standard format of s3.amazonaws.com/bucket_name or bucket_name.s3.amazonaws.com and typing that into a browser client will either pop up the 'Access Denied' message if it's private or list the first 1,000 stored objects if it is public. It’s made clear that these objects are downloadable, provided that no access controls have been put into place to lock them down.

My worry being that if someone has gone to the trouble of fiddling with the access options to make something public that really shouldn't be, as the research suggests has happened thousands of times, then these same fiddlers are unlikely to be setting access controls at the same time. They are options fettlers, compulsive configurers, yet without the depth of knowledge that enables them to make the safe configuration choices.

Knowledge that is, oh so easy to come by, not least as Amazon has plenty of helpful information (or here) on the subject of data protection controls to prevent such misconfiguration mishaps.

If you use Amazon S3 and are unsure about the status of your buckets, I'd suggest you check them now, this instant, and put things right.

By which I mean ensure that you have the appropriate permissions and access controls in place whether your data stores are private or public. I'd also recommend that you ensure your public buckets include a robots.txt file to prevent Google indexing.

A simple Google Dork hacking query such as 'site:s3.amazonaws.com filetype:xls password' will throw up a list of Excel spreadsheets that contain the word password (see the redacted screenshot) for example, experienced hackers will be able to come up with much more interesting and valuable Google Dork queries than that I can assure you. And even if you became aware of the misconfiguration problem and have already changed from public to private, that might not help much. Rapid7 researchers used a Metasploit module and the WayBackMachine to identify buckets that were previously private and could potentially expose important data in the available cached archive copy.

Davey Winder

Davey is a three-decade veteran technology journalist specialising in cybersecurity and privacy matters and has been a Contributing Editor at PC Pro magazine since the first issue was published in 1994. He's also a Senior Contributor at Forbes, and co-founder of the Forbes Straight Talking Cyber video project that won the ‘Most Educational Content’ category at the 2021 European Cybersecurity Blogger Awards.

Davey has also picked up many other awards over the years, including the Security Serious ‘Cyber Writer of the Year’ title in 2020. As well as being the only three-time winner of the BT Security Journalist of the Year award (2006, 2008, 2010) Davey was also named BT Technology Journalist of the Year in 1996 for a forward-looking feature in PC Pro Magazine called ‘Threats to the Internet.’ In 2011 he was honoured with the Enigma Award for a lifetime contribution to IT security journalism which, thankfully, didn’t end his ongoing contributions - or his life for that matter.

You can follow Davey on Twitter @happygeek, or email him at davey@happygeek.com.