Alerts

Alerts provide intelligent notifications of events in your application or infrastructure. For example, you could use alerts to notify your team when:

  • Queues are close to backing up
  • Database servers are low on memory
  • Application latency is outside of normal ranges
  • APIs are being abused by a customer (and which customer it is)

Alerts are defined by a conditional query language expression, combined with notification settings. When the expression is matched (returns true), you will be notified by your selected channels.

Table of Contents

Alert Expressions

Alert expressions must contain a conditional operator: <=, >=, ==, !=, <, or >. The expression can be simple, or more powerful with the use of query language wildcards to alert across multiple metrics, functions to transform data, and time shifting for alerts based on historical data. See the query language expression documentation for full details.

Example Expressions

  • system.?.disk.?.used_percent > 50
  • external.intercom.message_failed >= 1
  • ts_sum(track_processor.?.record) < ts_sum(track_processor.?.record) @ 3 days

Evaluation Timing

To avoid evaluating expressions on incomplete datasets, expression queries are delayed for 5 minutes.

Alert Notifications

Notifications are available via email, SMS, or an HTTP callback. External services such as Slack and PagerDuty are easily added via the email channel.

Integrating with Slack, Flowdock, and other chat services

Email is the best method of integrating Instrumental alerts into Slack and other chat services. First, setup an email integration:

  • For Slack, create an email app for your desired channel. We suggest defining the app with the name "Instrumental Alerts" and using this icon. You'll end up with an email address that looks like random-text@your-org.slack.com.
  • For Flowdock, go to the Settings page for your desired flow, then Integrations and create a new email integration. You'll end up with an email address that looks like instrumental@your-org.flowdock.com.
  • For HipChat, use the Mailroom add-on. After setup, you'll have an email address that looks like: key-text+your-room@in.mailroom.hipch.at.

To complet the integration, simply add the email address you created to the Email Notification List in the alert configuration.

Integrating with PagerDuty, OpsGenie, and other incident management services

Email is the best method of integrating Instrumental alerts into PagerDuty and other incident response tools. Just create a new email integration within your preferred service and add that email address to your Instrumental alert configuration (alert notifications can be sent to multiple email addresses).

If your service supports regex email parsing, it's easy to open and close issues as Instrumental alerts open and close. An example PagerDuty configuration would be:

  • Trigger an Alert if...
    • Subject contains "Alert Opened"
    • Incident Key matches this regular expression: "Opened: (.*) at \d\d?:\d\d"
  • Resolve an Alert if...
    • Subject contain "Alert Closed"
    • Incident Key matches this regular expression: "Closed: (.*) at \d\d?:\d\d"
  • Ignore emails that don't match the above conditions. We send a test email each time you create a new alert. If you don't ignore unmatched emails, these test emails will open an issue in PagerDuty every time you create a new alert in Instrumental. Alternatively, you can create the alert and add the PagerDuty email address to the alert configuration after we send the initial test email.

For reference, our alert notification subject line format is:
[Project Name] Alert Opened: [Alert Name] at 8:31 AM Wed Dec 7

HTTP Callback Specification

You can specify an endpoint URL when you create an alert, and when an alert occurs a HTTP POST request will be issued to the URL.

The parameters will be sent as application/x-www-form-urlencoded string value pairs. The cause field will contain JSON encoded data you may decode as necessary. The following requests will be made to your URL in cases where an OPEN, CLOSE or TEST alert event occurs.

Open

alert_id=ALERT_ID&
alert_config_id=ALERT_CONFIG_ID&
project_id=PROJECT_ID&
project_name=PROJECT_NAME&
name=ALERT_CONFIG_NAME&
opened_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP&
cause=JSON_DICTIONARY_OF_METRICS_CAUSING_ALERT&
state=open

Close

alert_id=ALERT_ID&
alert_config_id=ALERT_CONFIG_ID&
project_id=PROJECT_ID&
project_name=PROJECT_NAME&
name=ALERT_CONFIG_NAME&
oepned_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP&
closed_at=WHEN_ALERT_CLOSED_UNIX_TIMESTAMP&
cause=JSON_DICTIONARY_OF_METRICS_CAUSING_ALERT&
state=closed

Test

alert_id=ALERT_ID&
alert_config_id=ALERT_CONFIG_ID&
project_id=PROJECT_ID&
project_name=PROJECT_NAME&
name=ALERT_CONFIG_NAME&
opened_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP&
closed_at=WHEN_ALERT_CLOSED_UNIX_TIMESTAMP&
cause=EMPTY_JSON_DICTIONARY&
state=test

NOTE: If your server cannot be reached in 10 seconds, the request will time out and be abandoned.

Questions? We can help!