DigitalOcean Monitoring is a free, opt-in service that gathers metrics about Droplet-level resource utilization. It provides additional Droplet graphs and supports configurable metrics alert policies with integrated email Slack notifications to help you track the operational health of your infrastructure.
Like other DigitalOcean Monitoring functionalities, the alert policies and notifications feature relies on information provided by the DigitalOcean metrics agent: a lightweight, open-source program that gathers metrics.
Before setting up alert policies, you must install the metrics agent on each of the participating Droplets. Kubernetes worker nodes, a kind of Droplet, already have the metrics agent by default.
Once the metrics agent is running, you can begin creating alert policies. In the control panel, click the Create button in the top menu and select Alert Policies:
This opens the policy creation page.
The pattern for defining an alert policy is the same for all metrics:
The Select Droplets or Tags section includes a field where you apply the alert policy to specific Droplets or groups of Droplets.
Adding Droplets by name allows you to target individual resources unambiguously. Adding tags to an alert policy provides flexibility in deciding which Droplets are covered by the policy by adding or removing tags from Droplets.
To create a policy, you must select at least one of the two possible notification methods: email or Slack. The first choice, checked by default, is the verified email address of the account you’re using when you create the policy.
When you create an alert policy from a Team account, you’ll have the option to select any of your teammates as email recipients for an alert. When you click Add more recipients, you’ll be given a list of your team’s email addresses. Select each address individually or use the checkbox by the Team members header to select or deselect all the addresses on the list.
If you are part of a Slack organization, you can choose to connect your Slack account to receive notifications in Slack. Click the Connect Slack button to authorize DigitalOcean to create notifications within your Slack organization:
On the authorization page that follows, you can select any Slack teams you are authenticated to or log in to a different team.
You can then choose to notify Slackbot (which will send messages only to you), notify a channel, or notify any person or group through direct messages.
Once you’ve authorized the link between DigitalOcean Monitoring and a Slack team, that connection will be available and enabled by default the next time you create an alert policy. If you choose to unlink in a new alert policy, you’ll be able to select a different channel or a different team without affecting any previous connections.
Finally, choose a unique and descriptive name for the alert policy. This name will be used to identify this specific alert policy when notifications are sent.
The name you choose will:
When everything is configured, clicking the Create alert policy button will create the policy and kick off the evaluation of incoming data.
The new policy will appear on the Monitoring page under a section called Alert Policies:
When a policy is first created, it may take a few minutes before the evaluation of incoming data begins. After that slight delay, data will be evaluated at regular intervals.
If the average of the data points in the alert interval exceeds the threshold, an alert is triggered. In our example, once monitoring begins, after 1440 minutes (one day), the monitoring will average out the data points over that period to determine the percentage of disk usage. If the average indicates that disk usage is above 70%, we would receive a notification.
This same data point evaluation process is used to determine when an alert has been resolved. Data points continue to be collected at regular intervals. Each time a new metric is received, the oldest point drops off, the newest is added, and the average of the threshold interval is evaluated. This means that if a threshold was barely exceeded and a new data point comes in that brings the new average below the threshold, a resolution notification could be triggered without much delay.
With our disk example, let’s say that a log rotation policy deletes an old and particularly large log file, causing the threshold to go down dramatically. We will receive the resolution notification in the same channels where we received the alert notification (unless, of course, we’ve edited the policy in the interim).
At this time, it is not possible to manually resolve or acknowledge an alert. Alerts are automatically resolved when resource usage falls back to an acceptable level according to the alert policy.
When an alert is triggered according to the process outlined above, a notification is sent using the chosen mediums. You will be notified once per configured medium when an alert has been triggered. A second notification is sent when the alert has been resolved.
Each notification includes the name of the alert, the name and IP address of the triggering Droplet, and a link to the triggering Droplet’s page in the control panel. Additionally, notifications about triggered alerts include the alert policy parameters and the average resource usage at the time the alert was triggered. Resolution notifications include the length of the alert event and the current average resource usage.
If an alert is triggered, a new section in the Monitoring interface will be displayed called Triggered Alerts. This section is only visible when there are active alerts:
This section of the page displays the active alerts, including each of the Droplets that are currently above the usage threshold. Once the alert has been resolved, the entry will drop out of the Triggered Alerts section. If there are no longer any active alerts, the Triggered Alerts section will be hidden.
If you’ve selected email notifications, you will receive a notification email when an alert is triggered:
Subject: DigitalOcean monitoring triggered: CPU is running high - example_droplet CPU Utilization Percent is currently at 71.56%, above setting of 70.00% for the last 5m View droplet: https://cloud.digitalocean.com/droplets/12345 IP: 203.0.113.1 Edit monitor: https://cloud.digitalocean.com/monitors/b0fa6de7-00ex-ampl-e920-e52eeb35a903
Once the alert has been resolved, a similar resolution email will be sent:
Subject: DigitalOcean monitoring resolved: Disk Utilization is high on a server tagged 'Database' - Database-01 The monitor was triggered for more than 1 hour. Disk Utilization is currently at 69.70%. View droplet Database-01: https://cloud.digitalocean.com/droplets/12345678 IP: 203.0.113.1
This indicates that the alert has been resolved.
If you’ve enabled Slack notifications, you will receive a notification in Slack in the team and channel selected in the alert policy:
Once the average resource consumption has dipped below the threshold again, a similar Slack notification will be sent indicating that the alert has been resolved:
Again, this message indicates that the alert has been resolved.