- Metric Sources
- Using Instrumental
- Where to Start
Alerts provide intelligent notifications of events in your application or infrastructure. For example, you could use alerts to notify your team when:
- Queues are close to backing up
- Database servers are low on memory
- Application latency is outside of normal ranges
- APIs are being abused by a customer (and which customer it is)
Alerts are defined by a conditional query language expression, combined with notification settings. When the expression is matched (returns true), you will be notified by your selected channels.
Table of Contents
- Alert Expressions
- Alert Notifications
Alert expressions must contain a conditional operator:
>. The expression can be simple, or more powerful with the use of query language wildcards to alert across multiple metrics, functions to transform data, and time shifting for alerts based on historical data. See the query language expression documentation for full details.
system.?.disk.?.used_percent > 50
external.intercom.message_failed >= 1
ts_sum(track_processor.?.record) < ts_sum(track_processor.?.record) @ 3 days
To avoid evaluating expressions on incomplete datasets, expression queries are delayed for 5 minutes.
Email is the best method of integrating Instrumental alerts into Slack and other chat services. First, setup an email integration:
- For Slack, create an email app for your desired channel. We suggest defining the app with the name "Instrumental Alerts" and using this icon. You'll end up with an email address that looks like firstname.lastname@example.org.
- For Flowdock, go to the Settings page for your desired flow, then Integrations and create a new email integration. You'll end up with an email address that looks like email@example.com.
- For HipChat, use the Mailroom add-on. After setup, you'll have an email address that looks like: firstname.lastname@example.org.
To complet the integration, simply add the email address you created to the Email Notification List in the alert configuration.
Email is the best method of integrating Instrumental alerts into PagerDuty and other incident response tools. Just create a new email integration within your preferred service and add that email address to your Instrumental alert configuration (alert notifications can be sent to multiple email addresses).
If your service supports regex email parsing, it's easy to open and close issues as Instrumental alerts open and close. An example PagerDuty configuration would be:
Trigger an Alert if...
- Subject contains "Alert Opened"
- Incident Key matches this regular expression: "Opened: (.*) at \d\d?:\d\d"
Resolve an Alert if...
- Subject contain "Alert Closed"
- Incident Key matches this regular expression: "Closed: (.*) at \d\d?:\d\d"
- Ignore emails that don't match the above conditions. We send a test email each time you create a new alert. If you don't ignore unmatched emails, these test emails will open an issue in PagerDuty every time you create a new alert in Instrumental. Alternatively, you can create the alert and add the PagerDuty email address to the alert configuration after we send the initial test email.
For reference, our alert notification subject line format is:
[Project Name] Alert Opened: [Alert Name] at 8:31 AM Wed Dec 7
You can specify an endpoint URL when you create an alert, and when an alert occurs a HTTP POST request will be issued to the URL.
The parameters will be sent as
application/x-www-form-urlencoded string value pairs. The
cause field will contain JSON encoded data you may decode as necessary. The following requests will be made to your URL in cases where an
TEST alert event occurs.
alert_id=ALERT_ID& alert_config_id=ALERT_CONFIG_ID& project_id=PROJECT_ID& project_name=PROJECT_NAME& name=ALERT_CONFIG_NAME& opened_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP& cause=JSON_DICTIONARY_OF_METRICS_CAUSING_ALERT& state=open
alert_id=ALERT_ID& alert_config_id=ALERT_CONFIG_ID& project_id=PROJECT_ID& project_name=PROJECT_NAME& name=ALERT_CONFIG_NAME& oepned_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP& closed_at=WHEN_ALERT_CLOSED_UNIX_TIMESTAMP& cause=JSON_DICTIONARY_OF_METRICS_CAUSING_ALERT& state=closed
alert_id=ALERT_ID& alert_config_id=ALERT_CONFIG_ID& project_id=PROJECT_ID& project_name=PROJECT_NAME& name=ALERT_CONFIG_NAME& opened_at=WHEN_ALERT_OPENED_UNIX_TIMESTAMP& closed_at=WHEN_ALERT_CLOSED_UNIX_TIMESTAMP& cause=EMPTY_JSON_DICTIONARY& state=test
NOTE: If your server cannot be reached in 10 seconds, the request will time out and be abandoned.