Monitoring your clusters

When it comes to performance tuning, having the right tools to troubleshoot your applications can help save valuable time and money, and prevent monitoring headaches. Without the right tools, you could spend days or even weeks diagnosing multiple complex issues.
There is no shortage of application performance management tools to choose from, but here at ReleaseHub, we use Datadog internally to monitor our Kubernetes clusters and applications, and we think you should use it too.
Datadog provides a 360-degree view into your infrastructure and applications and can quickly identify bottlenecks. You can even send yourself alerts via email, Slack, or other Datadog-supported channels.

Using Datadog to monitor ReleaseHub clusters

Here is some information to help you gain insight into your clusters using Datadog.


You'll need a Datadog account to follow along.
You might like to familiarize yourself with Datadog and the Datadog interface by browsing through the Datadog documentation pages.
If you do not see the metrics described here, verify the integrations are installed in your Datadog account, and that your cluster is operating correctly in your account, or contact us and we'll be more than happy to help.

Select a metric

Your first step is to identify a metric that you would like to monitor. For example, you may notice that your application has been performing poorly, and you've traced this back to pod health checks failing. You investigate the metrics in the Kubernetes Pods dashboard and find an interesting rise in "Pods in Bad State (Not Ready) by Namespace".
This seems like a good metric to set an alert for.
Notice how this metric is for "pods" and "namespaces" (which will be how you separate out your application environments and services in the ReleaseHub UI).

Create a monitor

Next you'll want to zero in on the metric. In the left-hand pane, select Monitor then New Monitor as shown:
Create a new monitor in Datadog
Select Metric to create a monitor based on the metric you are interested in.
Select a Metric type to create the monitor
For this example, we'll select a Threshold Alert, and then define the metric. If you're not sure which metric to choose, you can start typing metric names you see in the dashboard and experiment. We select kubernetes\_state.pod.status\_phase to monitor the "Pods in Bad State (Not Ready) by Namespace" metric we identified earlier.
We've marked the steps in red to show you our example.
Chose a threshold alert and a metric to alert on

Set the threshold

Fill in the rest of the fields as below. Here we're specifying that we'd like to be alerted when pod phase status is "failed" for more than 15 minutes. You can adjust the threshold and exclude clusters or namespaces you are not interested in.
Enter fields for your monitor trigger, threshold, and metrics

Set the notification

You can choose any of the notification integrations Datadog supports.
Now complete the remaining fields to send, test, and notify people when the alert is triggered.