Alerts provide intelligent notifications of events in your application or infrastructure. For example, you can use alerts to notify your team when:

  • Queues are close to backing up
  • Database servers are low on memory
  • Application latency is outside of normal ranges
  • APIs are being abused by a customer (and which customer it is)

Alerts are defined by a conditional query language expression, combined with notification settings. When the expression is matched (returns true), you will be notified by your selected channels.

Table of Contents

Alert Expressions

Alert expressions must contain a conditional operator: <=, >=, ==, !=, <, or >. The expression can be simple, or more powerful with the use of query language wildcards to alert across multiple metrics, functions to transform data, and time shifting for alerts based on historical data. See the query language expression documentation for full details.

Example Expressions

  • system.?.disk.?.?.used_percent > 50
  • external.intercom.message_failed >= 1
  • ts_sum(track_processor.?.record) < ts_sum(track_processor.?.record) @ 3 days

Evaluation Timing

To avoid evaluating expressions on incomplete datasets, expression queries are delayed for 5 minutes.

Alert Playbooks

Alert playbooks are the place to describe why the alert exists and a brief list of suggested steps for handling the alert when it fires. Consider things like

  • The expected severity of the alert
  • Related Instrumental graphs and how to interpret them
  • Specific steps to resolution, if known

The full text of the playbook is included in the alert notifications (e.g. email, callback) when the alert opens. If you need to provide more than a brief description, consider linking to one or more documents.

Alert Notifications

Notifications are available via email, SMS, or an HTTP callback. External services such as Slack and PagerDuty are easily added via the email channel.

Integrating with Slack, Flowdock, and other chat services

Email is the best method of integrating Instrumental alerts into Slack and other chat services. First, setup an email integration:

  • For Slack, create an email app for your desired channel. We suggest defining the app with the name "Instrumental Alerts" and using this icon. You'll end up with an email address that looks like
  • For Flowdock, go to the Settings page for your desired flow, then Integrations and create a new email integration. You'll end up with an email address that looks like
  • For HipChat, use the Mailroom add-on. After setup, you'll have an email address that looks like:

To complet the integration, simply add the email address you created to the Email Notification List in the alert configuration.

Integrating with PagerDuty, OpsGenie, and other incident management services

Email is the best method of integrating Instrumental alerts into PagerDuty and other incident response tools. Just create a new email integration within your preferred service and add that email address to your Instrumental alert configuration (alert notifications can be sent to multiple email addresses).

If your service supports regex email parsing, it's easy to open and close issues as Instrumental alerts open and close.

An example PagerDuty configuration would be:

  • Trigger an Alert if...
    • Subject contains "Alert Opened"
    • Incident Key matches this regular expression: Opened: (.*) at \d\d?:\d\d
  • Resolve an Alert if...
    • Subject contain "Alert Closed"
    • Incident Key matches this regular expression: Closed: (.*) at \d\d?:\d\d
  • Ignore emails that don't match the above conditions. We send a test email each time you create a new alert. If you don't ignore unmatched emails, these test emails will open an issue in PagerDuty every time you create a new alert in Instrumental. Alternatively, you can create the alert and add the PagerDuty email address to the alert configuration after we send the initial test email.

An example OpsGenie configuration (based on regex extraction) would be:

  • Create Alert
    • Filter: Subject contains "Alert Opened"
    • Alert Fields: "Alias" set to {{ subject.extract(/Opened: (.*) at \d\d?:\d\d/) }}
  • Close Alert
    • Filter: Subject contains "Alert Closed"
    • Alert Fields: "Alias" set to {{ subject.extract(/Closed: (.*) at \d\d?:\d\d/) }}

For reference, our alert notification subject line format is:
[Project Name] Alert Opened: [Alert Name] at 8:31 AM Wed Dec 7

HTTP Callback Specification

You can specify an endpoint URL when you create an alert, and when an alert occurs a HTTP POST request will be issued to the URL.

The parameters will be sent as application/x-www-form-urlencoded string value pairs. The cause field will contain JSON encoded data you may decode as necessary. The following requests will be made to your URL in cases where an OPEN, CLOSE or TEST alert event occurs.


NOTE: If your server cannot be reached in 10 seconds, the request will time out and be abandoned.

CloudWatch Alerts

If you're using metrics from our CloudWatch integration in an alert, it's important to take the fetch interval settings into account. A longer fetch interval means a long delay before complete metric data is available. This means that you can get false positives in alerts that use a standard 1 minute averaging window. For example, a metric with a 10 minute fetch interval used in an alert with a 1 minute averaging window will have no data 9 out of 10 minutes, and therefore likely misfire.

To avoid problems with CloudWatch metrics in alerts:

  1. Use metrics with a 1 minute fetch interval in alerts.
  2. If that's not an option, you can set a larger expression averaging window. However, this can be tricky due to averages including many data points with a value of 0. If you need help, just contact support. We're happy to help!
Questions? We can help!