Category: Infrastructure & Engineering

Amazon Kinesis: the best event queue you’re not using

Instrumental receives a lot of raw data, upwards of 1,000,000 metrics per second. Because of this, we’ve always used an event queue to aggregate the data before we permanently store it.

Before switching to AWS Kinesis, this aggregation was based on many processes writing to AWS Simple Queue Service (SQS) with a one-at-a-time reader that would aggregate data, then push it into another SQS queue, where multiple readers would store the data in MongoDB.

Read More

Best Practices for Deprecating and Removing an API

Why removing an API the right way is important

A few weeks ago, our alert klaxons started blaring (alert notifications – we don’t really have klaxons, but maybe we should). We had a massive spike in failed background jobs. With the scale of data processing at Instrumental, a few minutes of background jobs is potentially hundreds of millions of data points. Read More

How Time Series Data Can Serve More Than One Purpose

Here’s a pro-tip for you when you record metrics on Instrumental: anything that can be a gauge, should be a gauge.

Why is that? You can use the Instrumental Query Language to get increment data, because we record the number of times we receive your gauge calls.

Let’s say you’re recording the time it takes to complete one request on your server when a user signs up, but you also want to know how often that is happening. Read More

Toku vs. Mongo: Fight!

We’ve done a lot of infrastructure work on Instrumental’s monitoring pipeline over the last few weeks. Our main goals with this infrastructure work were making Instrumental’s data collection and writing process more resilient and easier to scale. As an application monitoring service, we process billions of data points every single day, and we would love to write more data in less time. Read More

Application Monitoring Is The New Unit Testing

Once upon a time, automated testing was not a popular idea. It was too expensive. It was too time-consuming. At best, it was a nice-to-have.

The prevailing idea was that if you were a good and careful software developer, regressions weren’t a problem. When a regression did happen (rarely, of course), the good and careful software developer that you are would carefully consider why it had happened and make a correction to prevent it from happening again. Read More

When One Second Isn’t: Tracking EventMachine Latency

Ruby developers love EventMachine. Used to drive gems like Thin, network protocols (em-http-request, em-mongo, em-zeromq, et al) and a host of in-house network servers and clients, it’s the Ruby community’s go to library for the Reactor pattern.

For Those New To EventMachine

EventMachine is a gem that provides an easy interface to an event reactor capable of handling thousands of concurrent clients all in a single thread. Read More