Amazon Web Services S3 buckets provide flexible, highly scalable, and cost-effective cloud-based, high-availability bulk data storage, suitable for everything from backups to big data, to hosting for media, files, and web apps.
S3 – short for Simple Storage Services – is popularly used for data storage, everything you upload is stored as a key-value pair with a unique name and the stored data as its value. This effectively means that it's a lightweight NoSQL database that can hold a vast amount of structured data, which can then be easily queried for analysis via S3 Select API calls.
Critically, S3 doesn't care what the data you're storing is – you can stuff anything into an S3 bucket, with a maximum individual object size of 5TB and unlimited total storage, as long as you're prepared to pay for it.
Setting up and securing Amazon S3 storage
There are several different S3 storage classes. This guide primarily addresses S3 Standard storage, intended for "hot" data that's going to be regularly accessed. But it's worth being aware of your options, particularly if you anticipate eventually archiving your data to cold storage in Amazon's cloud.
These include S3 Intelligent-Tiering, which can potentially save money by automatically moving your least-accessed data to cheaper storage, Infrequent Access (IA) tiers that cost less than hot storage but still allow data to be accessed within milliseconds, and the Glacier tiers, which cost a lot less have increasingly slow retrieval times – up to 12 hours in the case of S3 Glacier Deep Archive storage.
While costing your deployment is beyond the scope of this article, we recommend using the AWS Pricing Calculator and billing alarms to help you avoid any unpleasant surprises. Amazon provides a helpful step-by-step deployment guide, but the density of the reference documentation for S3 means that it can be challenging to winnow out exactly which of S3's many options you need.
The convenience of setting up and using S3 storage, particularly if your data set is never intended to be made available to the public, can be deceptively simple. With both hackers and white hat cybersecurity professionals routinely scanning for unsecured S3 buckets using a variety of effective free tools, security through obscurity isn't going to cut it here.
Its default options have been beefed up in recent years, to minimize the danger of accidentally exposing the personal data of your staff, customers, or citizens, but it's still important to ensure that your S3 deployment is appropriately planned, configured, and secured for your use case.
In this guide, we'll take you through the key considerations you should apply to every S3 deployment, from planning to access and version control, through logging and multi-factor authentication.
Amazon S3 setup
Before setting up your S3 bucket, you should decide what it is – and isn't – going to be used for. Different use cases require different options and settings on the bucket, and some settings, such as S3 Object Lock, can only be set when the bucket is deployed, and cannot be changed later. This is also a good time to work out who or what will be accessing it, how they will be doing that, and from where. Document all of this for clarity, and to prevent function creep in the future. You don't want to find that people have been storing critical documents in your low-priority off-site backup store, or that access to these documents is too slow because you created the bucket in the wrong AWS region.
Local legal requirements should also be taken into consideration when choosing the AWS region in which your bucket will reside, such as EU requirements for data protection. If you're unsure about this, choose the AWS region in your country if there is one, or – if your country doesn't have its own AWS region – consult someone who knows the legal requirements and can advise on which regions are suitable.
Amazon S3 retention and deletion protection
Depending on the usage case for your bucket, you may wish to enable the S3 Object lock, or WORM (Write Once Read Many) mode. This means that once a file has been uploaded to the bucket, it cannot be deleted or modified in any way. This is useful in cases such as storing copies of quarterly backups, or accounting files needed for compliance. Files stored in buckets with this enabled will be preserved unchanged until the end of their retention period.
Alternatively, you may wish to enable versioning for objects in this bucket, if it is to be used to store critical files, or disable it, for example, if this is a backup store, as the backup software should take care of that.
Once you've created your bucket, you'll want to set some lifecycle rules to keep it in check. Rules can be set up to move files to less expensive storage classes after a specified period of time, delete older versions of files in buckets with versioning enabled, and perform basic housekeeping tasks. At the very least, you would probably want a set of rules to move objects from primary instant-availability storage to a lower cost tier such as Glacier after a time, and then eventually purge outdated files, so as to keep costs down.
S3 Default Encryption
Since January 2023, S3 has automatically encrypted all new objects uploaded to it using AES-256 hashing with server-side S3-managed (SSE-S3) keys, at no additional cost to users. This default option now has to be deliberately opted out of if you require an alternative approach, while it was previously an opt-in setting, which contributed to unnecessarily poor security on some users' buckets.
Note that objects in a bucket predating the new default, which had not previously been opted into S3 default encryption, can be manually encrypted to the same standard using a batch operation.
While SSE-S3 encryption is the default standard, you can also enable SSE-KMS, another approach to server-size encryption which uses the AWS Key Management Service. A dual-layer version, DSSE-KMS is also available, and customers can alternatively deploy their own keys using SSE-C, server-side encryption with customer-provided keys. While different objects in your bucket can use different encryption methods, you can't apply multiple types of encryption to a single object.
To control access to your new bucket for either people or processes, you will need access keys. Whilst you can generate these as your AWS admin user, you very much should not do so. It's better for your security if you create a dedicated user in IAM (AWS Identity and Access Management) for each process or person requiring access, granting them the minimum rights required, for example read-only, read-write, or full S3 access. Security keys can then be created for these users from their IAM details page.
For those who require more granular access control, access policies can be used. These are written in JSON, and if you wish you can input that directly. However, for the majority of users who'd prefer not to do that, there is a policy generator. Set the policy type to S3 Bucket policy then configure the required access conditions. Example policies are also available at this stage to assist with this process.
If your bucket is to be used for storing regularly accessed files, rather than as a backup or archive of some form, you may wish to enable versioning. This will allow you to revert any object stored in the bucket to any previous version, limited only by the retention policies set in your lifecycle rules for this bucket. Lifecycle policies can be used to move older versions to cheaper storage tiers or delete them completely after a given time. Care should be taken when setting these rules to balance the utility of being able to revert files to previous versions against the increased storage costs this incurs.
If the objects to be stored in your bucket are critical, then you may wish to enable replication. This feature automatically – but asynchronously – copies objects from one S3 bucket to another, usually within 15 minutes of the initial object's creation. This ensures that there's never a single point of failure for your most critical data.
You can configure replication rules to replicate certain classes of objects in the bucket, or its entire contents. The replication target will be another S3 bucket, either in the same region as the original or in a different region. Note that data transfer charges will apply to data moving between regions, so bear this in mind when deciding on your replication setup. Likewise, the target bucket will also need its own lifecycle policy.
Logging, monitoring and auditing
If required, you can enable either basic access logging, event auditing, or both for your bucket. This will allow you to keep track of who accessed what data and when, and make it easier to spot any unauthorized access to these objects. It's thus highly recommended.
AWS's monitoring tools for S3 include alarms that will alert you if any selected metric (such as storage in use) passes a specified threshold, detailed user activity and server access logs, and "AWS Trusted Advisor", an automatic inspection and recommendation tool that will inspect and make security and configuration recommendations for your S3 bucket.
There are two forms of access used in most cases with S3 buckets. Interactive user access and programmatic key-based access. For interactive users, multi-factor authentication (MFA) should always be enabled, and the users' rights should be limited to the minimum required. You should not be using your AWS admin account when accessing the bucket contents, but rather a lower privileged account setup for the purpose.
For programmatic key-based access, such as your backup software storing its nightly backups in the S3 bucket, the access keys should be rotated periodically. This helps to avoid any unauthorized access incidents resulting from keys that may have leaked. It also familiarises you, as the admin, with AWS's key rotation procedure, which is important as this will need to be done promptly as and when staff leave the organization, or in response to security incidents.
For those developing software that uses an S3 bucket as storage, be especially careful not to commit the access keys to any public (or even internal) code repository. There have been many security incidents reported by the press caused by someone carelessly committing an access key to Github or similar.
Cloud Pro Newsletter
Stay up to date with the latest news and analysis from the world of cloud computing with our twice-weekly newsletter
Thank you for signing up to Cloud Pro. You will receive a verification email shortly.
There was a problem. Please refresh the page and try again.