Proving What You Know

It happened again. You were loading the front page of your app again and the load time took 27 seconds. You’ve seen it before, you think, on every second Tuesday, Arbor Day, and right after every new deploy. You’ve looked at the web server log files, application server log files and your database slow query logs. No luck. In the absence of any facts, you pick the nearest available black box that might be causing the problem.

  # ApplicationController
  def index
    @greeting = WelcomeMessage.random
    @messages = current_user.messages if logged_in?
  end

  # User < ActiveRecord::Base
  has_many :messages

  # WelcomeMessage
  def self.random
    if Rails.env.production?
      path = "/nfs/shared/marketing/welcome_messages.yml"
    else
      path = File.join(Rails.root, "marketing", "welcome_messages.yml")
    end
    @messages = YAML.load(File.read(path))
    @messages.sample
  end

You know reading files off the filesystem should be ridiculously fast, so it can’t be WelcomeMessage::random. It’s got to be User#messages; the message table usually undergoes moderate throughput, so occasionally you must be hitting some sort of transaction lock. You don’t personally get that many messages, which is why you get it so rarely. You’ve certainly never got a user complaint about it. You talk it over with the other developers, who agree that your theory seems reasonable, and you all decide that someone should get around to fixing it when you’re done with the critical stuff.

You’re on vacation when you get the email. A journalist wrote an article about the app, and traffic went through the roof. And that 27 second load time? Yeah, that’s happening roughly every time now. One of the other developers immediately brought in some caching code to alleviate the load on the message table, but it definitely didn’t fix the page load time, and seems to have some bugs of its own as users are seeing old data in their messages. What the hell?

Reasonable Explanations Are Not Facts

What you needed was the ability to prove your theory, the ability to prove to yourself and your team that:

  • WelcomeMessage::random was always fast
  • User#messages was occasionally slow

We’ve had this problem tons of times in the past. Applications with many moving parts, applications that behave in ways that run counter to how everyone believes they should. When we built Instrumental, it was to give ourselves a tool that allowed us to easily see the facts in complex systems.

Using Instrumental to figure out what’s going on in this example is easy; you just need to initialize our Ruby agent:

  # Gemfile
  gem 'instrumental_agent'

  # config/initializers/instrumental.rb
  I = Instrumental::Agent.new('Your Account Token')

and then measure the specific things you want to examine.

  # ApplicationController
  def index
    @greeting = I.time("timing.welcome_message") { WelcomeMessage.random }
    if logged_in?
      @messages = I.time("timing.user_messages") { current_user.messages }
    end
  end

Our agent will send the data to us; once we’ve got it, you can look at the graphs you’ve created to debug this problem on your project page, and smack yourself in the forehead:

Picture of Timing Graph

You may not know why WelcomeMessage::random is slow yet, but you at least know the culprit. The short term fix?

  # WelcomeMessage (before)
  @messages = YAML.load(File.read(path))

  # WelcomeMessage (after)
  @messages ||= YAML.load(File.read(path))

You deploy the code, and your application resumes its normal snappy behavior. In this slightly contrived example, the imaginary culprit? You guessed it, the "/nfs/" at the beginning of that filesystem path referred to some network attached storage that could not handle the unexpected load. Many apps have behavior like this, seemingly inoccuous behavior that is only triggered in production; without a measurement tool like Instrumental, it can be frustratingly difficult to ascertain what your application is actually doing in production.

If you’ve got a Ruby app in production, you can start measuring things right now. Create an account and install the Instrumental Agent gem in your app. Check out the documentation for advanced topics like the query language, our API and more.

Instrumental Free Trial

Understanding what's happening with your software is only possible if you monitor it at the code layer. From agents to our metric-based pricing, we’re focused on making it easy to measure your code in real-time. Try Instrumental free for 30 days.