Do you remember that time one user with a rogue script saturated your workers, causing a ton of problems for your system (and your other users!) without even realizing he’d caused a problem?
We ran into a similar issue a while back with our HTML to PDF product, DocRaptor. One of our users was generating way more documents than normal – his test document creation alone accounted for 75% of all documents being generated!
This problem was easy to spot, as we’d built a graph around enqueued documents by user id. We’ve used Instrumental for application monitoring for years now, and this graph is usually pretty boring. We like this graph to be boring, because it means nobody’s abusing our system.
Once in a while, though, it gets pretty exciting. Here’s what our enqueued documents graph looked like when we spotted the problem:
Pretty gross, right? Luckily we were able to catch this bug before it actually became a problem and a quick chat with the user resolved the issue.
This user had added DocRaptor to several applications and part of his test suite for one implementation had a bug that generated far too many test documents every time he ran the suite. Let me clarify: his test suite didn’t stop generating documents. You can see why we found this slightly concerning.
Creating graphs to monitor application load with Instrumental is easy. We’ve used the series_top_n function to track users who have enqueued more than 25 documents at once. The syntax looks like this:
series_top_n(amount, metric_pattern, ...)
And here’s how we’ve implemented this function to track users who have enqueued documents in increments greater than 25:
We’ re using this method to monitor application load because we only care about large number of enqueued documents per user. This counter is incremented upon each document creation request, and then we can send interesting information graph that data.
Thanks to Instrumental monitoring, we were able to give this user enough information to quickly debug and solve the issue on his end.