High reliability and availability

The reliability and resilience of a Kubernetes cluster are key characteristics that define the stability of the infrastructure. Deckhouse Kubernetes Platform (DKP) ensures high availability (HA) and fault tolerance through built-in mechanisms and modules.

When HA mode is enabled, critical cluster components are launched with the necessary redundancy to guarantee their continuous operation. Even if a single instance fails, services can continue functioning without downtime. For more information on enabling HA mode, refer to Managing HA mode.

If the cluster has more than one master node, HA mode is automatically enabled, both during the initial deployment and when additional nodes are added later. Recommended roles and number of nodes can be found in Recommendations for configuring cluster nodes and preventing overload.

DKP provides chaos engineering tools to test cluster resilience. These tools let you deliberately or randomly disrupt individual components and observe the infrastructure’s response. For information on configuring these tools, refer to Chaos engineering.

You can further increase cluster fault tolerance by enabling inter-cluster communication based on the Service Mesh mode of the istio module. In this mode, federation is configured between multiple clusters. In case of failures in one cluster, the load is automatically redistributed to others. For configuration details, refer to Federation.

Additional resources

Module istio documentation

Additional resources

An error has occurred

Tell us what you didn’t like.

High reliability and availability

Additional resources

An error has occurred

Tell us what you didn’t like.

Request trial access

Thank you

Error

Request callback

Thank you

Something went wrong

Book your sessions

Thank you

Error

Request demo

Thank you

Error

Get the PCI SSC Compliance Report

Thank you

Error