Setting up and securing Amazon S3 storage

A depiction of cloud storage
(Image credit: Getty)

Amazon Web Services S3 buckets provide flexible, highly scalable, and cost-effective cloud-based, high-availability bulk data storage, suitable for everything from backups to big data, to hosting for media, files, and web apps.

S3 – short for Simple Storage Services – is popularly used for data storage, everything you upload is stored as a key-value pair with a unique name and the stored data as its value. This effectively means that it's a lightweight NoSQL database that can hold a vast amount of structured data, which can then be easily queried for analysis via S3 Select API calls. 

Critically, S3 doesn't care what the data you're storing is – you can stuff anything into an S3 bucket, with a maximum individual object size of 5TB and unlimited total storage, as long as you're prepared to pay for it.

Setting up and securing Amazon S3 storage

There are several different S3 storage classes. This guide primarily addresses S3 Standard storage, intended for "hot" data that's going to be regularly accessed. But it's worth being aware of your options, particularly if you anticipate eventually archiving your data to cold storage in Amazon's cloud.

These include S3 Intelligent-Tiering, which can potentially save money by automatically moving your least-accessed data to cheaper storage, Infrequent Access (IA) tiers that cost less than hot storage but still allow data to be accessed within milliseconds, and the Glacier tiers, which cost a lot less have increasingly slow retrieval times – up to 12 hours in the case of S3 Glacier Deep Archive storage.

While costing your deployment is beyond the scope of this article, we recommend using the AWS Pricing Calculator and billing alarms to help you avoid any unpleasant surprises. Amazon provides a helpful step-by-step deployment guide, but the density of the reference documentation for S3 means that it can be challenging to winnow out exactly which of S3's many options you need.

The convenience of setting up and using S3 storage, particularly if your data set is never intended to be made available to the public, can be deceptively simple. With both hackers and white hat cybersecurity professionals routinely scanning for unsecured S3 buckets using a variety of effective free tools, security through obscurity isn't going to cut it here.

Its default options have been beefed up in recent years, to minimize the danger of accidentally exposing the personal data of your staff, customers, or citizens, but it's still important to ensure that your S3 deployment is appropriately planned, configured, and secured for your use case.

In this guide, we'll take you through the key considerations you should apply to every S3 deployment, from planning to access and version control, through logging and multi-factor authentication.

Amazon S3 setup

Before setting up your S3 bucket, you should decide what it is – and isn't – going to be used for. Different use cases require different options and settings on the bucket, and some settings, such as S3 Object Lock, can only be set when the bucket is deployed, and cannot be changed later. This is also a good time to work out who or what will be accessing it, how they will be doing that, and from where. Document all of this for clarity, and to prevent function creep in the future. You don't want to find that people have been storing critical documents in your low-priority off-site backup store, or that access to these documents is too slow because you created the bucket in the wrong AWS region.

Local legal requirements should also be taken into consideration when choosing the AWS region in which your bucket will reside, such as EU requirements for data protection. If you're unsure about this, choose the AWS region in your country if there is one, or – if your country doesn't have its own AWS region – consult someone who knows the legal requirements and can advise on which regions are suitable.

Amazon S3 retention and deletion protection

Depending on the usage case for your bucket, you may wish to enable the S3 Object lock, or WORM (Write Once Read Many) mode. This means that once a file has been uploaded to the bucket, it cannot be deleted or modified in any way. This is useful in cases such as storing copies of quarterly backups, or accounting files needed for compliance. Files stored in buckets with this enabled will be preserved unchanged until the end of their retention period. 

Alternatively, you may wish to enable versioning for objects in this bucket, if it is to be used to store critical files, or disable it, for example, if this is a backup store, as the backup software should take care of that. 

Lifecycle settings

Once you've created your bucket, you'll want to set some lifecycle rules to keep it in check. Rules can be set up to move files to less expensive storage classes after a specified period of time, delete older versions of files in buckets with versioning enabled, and perform basic housekeeping tasks. At the very least, you would probably want a set of rules to move objects from primary instant-availability storage to a lower cost tier such as Glacier after a time, and then eventually purge outdated files, so as to keep costs down.

S3 Default Encryption

Since January 2023, S3 has automatically encrypted all new objects uploaded to it using AES-256 hashing with server-side S3-managed (SSE-S3) keys, at no additional cost to users. This default option now has to be deliberately opted out of if you require an alternative approach, while it was previously an opt-in setting, which contributed to unnecessarily poor security on some users' buckets.

Note that objects in a bucket predating the new default, which had not previously been opted into S3 default encryption, can be manually encrypted to the same standard using a batch operation.

While SSE-S3 encryption is the default standard, you can also enable SSE-KMS, another approach to server-size encryption which uses the AWS Key Management Service. A dual-layer version, DSSE-KMS is also available, and customers can alternatively deploy their own keys using SSE-C, server-side encryption with customer-provided keys. While different objects in your bucket can use different encryption methods, you can't apply multiple types of encryption to a single object.

Access management

To control access to your new bucket for either people or processes, you will need access keys. Whilst you can generate these as your AWS admin user, you very much should not do so. It's better for your security if you create a dedicated user in IAM (AWS Identity and Access Management) for each process or person requiring access, granting them the minimum rights required, for example read-only, read-write, or full S3 access. Security keys can then be created for these users from their IAM details page.

Version control

For those who require more granular access control, access policies can be used. These are written in JSON, and if you wish you can input that directly. However, for the majority of users who'd prefer not to do that, there is a policy generator. Set the policy type to S3 Bucket policy then configure the required access conditions. Example policies are also available at this stage to assist with this process.

If your bucket is to be used for storing regularly accessed files, rather than as a backup or archive of some form, you may wish to enable versioning. This will allow you to revert any object stored in the bucket to any previous version, limited only by the retention policies set in your lifecycle rules for this bucket. Lifecycle policies can be used to move older versions to cheaper storage tiers or delete them completely after a given time. Care should be taken when setting these rules to balance the utility of being able to revert files to previous versions against the increased storage costs this incurs. 

A person using cloud storage services on a laptop

(Image credit: Getty)

Replication

If the objects to be stored in your bucket are critical, then you may wish to enable replication. This feature automatically – but asynchronously – copies objects from one S3 bucket to another, usually within 15 minutes of the initial object's creation. This ensures that there's never a single point of failure for your most critical data.

You can configure replication rules to replicate certain classes of objects in the bucket, or its entire contents. The replication target will be another S3 bucket, either in the same region as the original or in a different region. Note that data transfer charges will apply to data moving between regions, so bear this in mind when deciding on your replication setup. Likewise, the target bucket will also need its own lifecycle policy.

Logging, monitoring and auditing

If required, you can enable either basic access logging, event auditing, or both for your bucket. This will allow you to keep track of who accessed what data and when, and make it easier to spot any unauthorized access to these objects. It's thus highly recommended.

AWS's monitoring tools for S3 include alarms that will alert you if any selected metric (such as storage in use) passes a specified threshold, detailed user activity and server access logs, and "AWS Trusted Advisor", an automatic inspection and recommendation tool that will inspect and make security and configuration recommendations for your S3 bucket.

Security

There are two forms of access used in most cases with S3 buckets. Interactive user access and programmatic key-based access. For interactive users, multi-factor authentication (MFA) should always be enabled, and the users' rights should be limited to the minimum required. You should not be using your AWS admin account when accessing the bucket contents, but rather a lower privileged account setup for the purpose.

For programmatic key-based access, such as your backup software storing its nightly backups in the S3 bucket, the access keys should be rotated periodically. This helps to avoid any unauthorized access incidents resulting from keys that may have leaked. It also familiarises you, as the admin, with AWS's key rotation procedure, which is important as this will need to be done promptly as and when staff leave the organization, or in response to security incidents.

For those developing software that uses an S3 bucket as storage, be especially careful not to commit the access keys to any public (or even internal) code repository. There have been many security incidents reported by the press caused by someone carelessly committing an access key to Github or similar.

Common S3 code errors

When it comes to configuration, the AWS Share Responsibility Model warns users that they are responsible for security 'in' the cloud. AWS takes care of security out of the cloud, so the user is liable for the changes they make that publicly expose their own data. Thankfully, the most common misconfigurations are avoidable.

Arguably the most common S3 error code is the 403 'AccessDenied' code. This is usually a case of bucket and object ownership. If the error is from 'GetObject' or 'HeadObject' requests, you should check whether the object is also owned by the bucket's owner – if you are the bucket owner, then you should check the access control list (ACL) permissions. An S3 object is, by default, owned by the AWS account that uploaded it. If other accounts have permission to upload to your bucket, then you would need to verify the account that owns that object – also verify who can access it.

Another common error is "MultiRegionAccessPointModifiedByAnotherRequest" or error code '200'. These can crop up for a range of reasons but they are all to do with regional access points. So, for example, an action failed because another request is modifying a specific resource. Or you have a multi-region access point with the same name. 

For a complete list of error codes, see the AWS Error Response page here.

S3 pricing

You can start using S3 for free, though you will be charged for what you use. Essentially you don't pay to have a bucket, but you do pay for putting stuff in said bucket and the amount you pay is dependent on the size of the objects you store and how long you store them. You will also be charged on the tier of S3 service you use – Standard, Intelligent-Tiering, Standard-Infrequent Access, S3 One Zone-Infrequesnt Access, Glacier Instant Retrieval, Flexible Retrieval, and Glaxer Deep Archive.

AWS provides an online calculator to work out your fees. You need to consider ingest and transfer costs first, but you simply estimate your region, your storage needs, and your preferred tiering. So, for example, if you were to set up S3 with California (US West Coast) as the region and wanted 10TB stored per month in S3 Standard, the monthly fee would be $267.