Author: Nathan Acuff

What to Expect When You’re Expecting Failure

Background

Instrumental is a key piece of infrastructure for many businesses, including Instrumental!  We put significant effort into making sure that Instrumental customers can rely on us to be accurate, available, and consistent, but no system is perfect.  There are two key components of our approach to reliability:

  • Make it hard to do the wrong thing
  • Assume that everything is going to fail

Example Incident

With that approach  mind, let’s talk about what happened on the 16th of November.   Read More

How to Monitor a Background Job Queue: Age vs Depth

Many developers analyze background job queues too simply: queues are either too full (bad) or mostly empty (good). This is a dangerous perspective, especially at scale. To get the full picture of your queue’s health, you need to measure both queue depth and job age.

Queue depth is the common metric for measuring a queue’s health. Read More