Amazon Kinesis: the best event queue you’re not using

Instrumental receives a lot of raw data, upwards of 1,000,000 metrics per second. Because of this, we’ve always used an event queue to aggregate the data before we permanently store it.

Before switching to AWS Kinesis, this aggregation was based on many processes writing to AWS Simple Queue Service (SQS) with a one-at-a-time reader that would aggregate data, then push it into another SQS queue, where multiple readers would store the data in MongoDB.

Read More

What to Expect When You’re Expecting Failure

Background

Instrumental is a key piece of infrastructure for many businesses, including Instrumental!  We put significant effort into making sure that Instrumental customers can rely on us to be accurate, available, and consistent, but no system is perfect.  There are two key components of our approach to reliability:

  • Make it hard to do the wrong thing
  • Assume that everything is going to fail

Example Incident

With that approach  mind, let’s talk about what happened on the 16th of November.   Read More

Server & Application Monitoring Pricing Comparison: Instrumental, New Relic, Datadog, Librato, and SignalFx

Confused about the price of an application and server monitoring tool? So were we! Every tool is priced differently and there are a lot of nuances. We’ll walk you through important terminology differences, the pros and cons of different plans, and then discuss the pricing details for Librato, New Relic, SignalFX, Datadog, and of course, Instrumental. Read More

Best Practices for Deprecating and Removing an API

Why removing an API the right way is important

A few weeks ago, our alert klaxons started blaring (alert notifications – we don’t really have klaxons, but maybe we should). We had a massive spike in failed background jobs. With the scale of data processing at Instrumental, a few minutes of background jobs is potentially hundreds of millions of data points. Read More

Monitoring for Docker, MongoDB, Redis and more!

Today, we’re launching InstrumentalD as a major upgrade and replacement of Instrumental Tools. Since 2011, Instrumental Tools has provided a system metrics daemon and a powerful plugin framework to write custom scripts for service monitoring.

While the ability to write fully custom service monitoring in a language of your choice is an important feature (and we’re keeping the plugin framework!), InstrumentalD includes out-of-the-box service monitoring for the following:

  1. Docker
  2. MySQL
  3. Memcached
  4. MongoDB
  5. Nginx
  6. PostgreSQL
  7. Redis
  8. (and more to come)

For each service, we’ve selected the critical metrics everyone should be monitoring, and we list each metric sent in the service documentation page. Read More

Upgraded Feature: Time Navigation

timenav-calendar_thumb

Today we’re launching a big upgrade to Instrumental’s time navigation. Before, Instrumental always showed data relative to now, and graphs would always update with new data. That’s great for every-day use. But sometimes, like during incident investigation and root cause analysis, it’s really useful to look at specific start/end dates and times. Read More

How to Monitor a Background Job Queue: Age vs Depth

Many developers analyze background job queues too simply: queues are either too full (bad) or mostly empty (good). This is a dangerous perspective, especially at scale. To get the full picture of your queue’s health, you need to measure both queue depth and job age.

Queue depth is the common metric for measuring a queue’s health. Read More

New Features: Read-Only Users + Multiple Admins

We’ve recently added two features that will make it easier for you to manage user roles across your organization.

readonly-usersRead-only Users

You can now invite users with read-only permissions. They’ll be able to see all graphs, metrics, and alerts, but not create, edit or delete them. This role is perfect for trusted partners and less-technical team members (like your CEO). Read More