The documentation is under development and may contain incomplete information.
Monitoring
For node groups (NodeGroup resource), DKP exports availability metrics for the group.
What information does Prometheus collect?
All node group metrics have the prefix d8_node_group_
in their name, and a label with the node group’s name node_group_name
.
The following metrics are collected for each node group:
d8_node_group_ready
— the number of nodes in the group that are inReady
status;d8_node_group_nodes
— the total number of nodes in the group (in any status);d8_node_group_instances
— the total number of instances in the group (in any status);d8_node_group_desired
— the desired (target) number ofMachines
objects in the group;d8_node_group_min
— the minimum number of instances in the group;d8_node_group_max
— the maximum number of instances in the group;d8_node_group_up_to_date
— the number of nodes in the group inup-to-date
state;d8_node_group_standby
— the number of standby nodes in the group (see the standby parameter);d8_node_group_has_errors
— one if there are any errors in the node group.