Monitoring

For node groups (NodeGroup resource), DKP exports availability metrics for the group.

What information does Prometheus collect?

All node group metrics have the prefix d8_node_group_ in their name, and a label with the node group’s name node_group_name.

The following metrics are collected for each node group:

  • d8_node_group_ready — the number of nodes in the group that are in Ready status;
  • d8_node_group_nodes — the total number of nodes in the group (in any status);
  • d8_node_group_instances — the total number of instances in the group (in any status);
  • d8_node_group_desired — the desired (target) number of Machines objects in the group;
  • d8_node_group_min — the minimum number of instances in the group;
  • d8_node_group_max — the maximum number of instances in the group;
  • d8_node_group_up_to_date — the number of nodes in the group in up-to-date state;
  • d8_node_group_standby — the number of standby nodes in the group (see the standby parameter);
  • d8_node_group_has_errors — one if there are any errors in the node group.