According to a post in the AWS forum, this issue was fixed sometime in 2016. You can safely ignore the below! (but you should check out Instrumental’s application and server monitoring while you’re here)
At Instrumental, we use a lot of AWS’s services, and sometimes we find its dark corners. S3 had some downtime today, limited to the region us-standard. There hasn’t been a public post-mortem yet, but we’d like to share why we switched away from it and why you should, too.
Many teams use us-standard as their default region. It’s geographically spread across a large market. It looks just like any other region at first glance, but it’s actually the odd one, saddled with some interesting restrictions.
Because us-standard is spread over such a wide physical space, it is a special case and many S3 guarantees are different with it than any other region. The biggest of these differences is what AWS calls read-after-write consistency, and it applies when the code using S3 is calling from outside the us-east region.
Simply put, if you’re requesting an object from S3’s us-standard region using anything other than the
s3-external-1.amazonaws.com endpoint, you may get a file does not exist error on something you just created in S3.
Here’s a bit from the S3 FAQ:
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard Region only provide read-after-write consistency when accessed through the Northern Virginia endpoint (
read-after-write of new new objects means when you create a new S3 object, it’s immediately available for read. Without this consistency, the latency is on the order of seconds. A few seconds of latency is often fine for many uses, especially when human reaction is involved, but it’s annoying when applications are dealing with S3. Here’s a simplified version of what that code can look like:
# process 1 s3.write(“file.csv”) # process 2 while !(csv = s3.get(“file.csv”)) sleep 0.1 end
It gets even weirder if you’re sharing S3 URLs with other applications, because you can’t actually wait for the file to appear, then fire off a job telling the other service that the file is available. Because these reads are inconsistent, your read may succeed, then future one’s fail, then succeed shortly thereafter. This happens because multiple servers are in charge of the metadata for which files exist, and it takes a while for this metadata to propagate to all the servers.
Why might you continue?
If you serve S3 URLs directly to customers mostly in the United States and you don’t mind eventual consistency. Though strongly consider Cloudfront as a faster, and global, solution to this problem.
If you already have handling for eventual consistency in your application and it’s in multiple us-regions. This gives you the bandwidth benefits of being close to S3 and you’ve already worked around the edge cases described above.
Another big warning in case you missed it:
All operations on existing S3 objects (regardless of region) are eventually consistent, so updating or deleting an object can have a delay before you see the effect and your code should be prepared for that.