Understanding the System Metrics for Monitoring (AWS)¶
Qubole clusters support Datadog monitoring, which you can enable at the QDS account level. For more information on enabling Datadog in Control Panel > Account Settings, see Datadog Settings.
The following table lists the different system metrics that are published to the Datadog account.
System Metrics | Metrics Definition |
---|---|
disk_free | Total free disk space |
disk_total | Total disk space |
part_max_used | Maximum percent used on any single disk partition. |
load_one | Load Average over 1 minute |
load_five | Load Average over 5 minutes |
load_fifteen | Load Average over 15 minutes |
cpu_user | Percentage of CPU utilization while executing at the user level. |
cpu_system | Percentage of CPU utilization while executing at the system level. |
cpu_wio | The percentage of CPU Wait I/O. |
cpu_nice | Percentage of CPU cycles spent on nice processes. |
cpu_steal | Stolen time, which is the time spent in other operating systems when running in a virtualized environment. |
cpu_aidle | Percentage of CPU cycles spent idle since last boot. |
cpu_idle | Percentage of CPU idle time. |
cpu_report | Aggregate report of CPU utilization percentage. |
mem_report | Aggregate report of memory usage in bytes. |
load_report | Aggregate report with current load, number of processes running processes, nodes and CPU count. |
network_report | Aggregate report with network traffic in and out of the cluster nodes. |
cluster-addnodefailure | The node addition metric to monitor the autoscaling feature. |
cluster-removenodefailure | The node removal metric to monitor the downscaling/autoscaling events in a cluster. |
qubole.cluster_size | The metric displays the minimum size of a cluster. |
qubole.max_cluster_size | The metric displays the maximum size of a cluster. |
system-rootdiskfullmaster | The metric displays the disk space in the coordinator node’s root partition. |
system-ephemeral0fullmaster | The metric displays the disk space in the coordinator node’s ephemeral0 partition. |