Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
  • v2.28.0
  • v2.27.0
  • v2.25.1
  • v2.24.3
  • v2.26.0
  • v2.24.2
  • v2.25.0
  • v2.24.1
  • v2.22.2
  • v2.23.3
  • v2.24.0
  • v2.23.2
  • v2.23.1
  • v2.23.0
  • v2.22.1
  • v2.22.0
  • v2.21.0
  • v2.20.0
  • v2.19.1
  • v2.18.2
21 results

kubernetes-reliability.md

Blame
  • kubernetes-reliability.md 4.65 KiB

    Overview

    Distributed system such as Kubernetes are designed to be resilient to the failures. More details about Kubernetes High-Availability (HA) may be found at Building High-Availability Clusters

    To have a simple view the most of parts of HA will be skipped to describe Kubelet<->Controller Manager communication only.

    By default the normal behavior looks like:

    1. Kubelet updates it status to apiserver periodically, as specified by --node-status-update-frequency. The default value is 10s.

    2. Kubernetes controller manager checks the statuses of Kubelet every –-node-monitor-period. The default value is 5s.

    3. In case the status is updated within --node-monitor-grace-period of time, Kubernetes controller manager considers healthy status of Kubelet. The default value is 40s.

    Kubernetes controller manager and Kubelet work asynchronously. It means that the delay may include any network latency, API Server latency, etcd latency, latency caused by load on one's master nodes and so on. So if --node-status-update-frequency is set to 5s in reality it may appear in etcd in 6-7 seconds or even longer when etcd cannot commit data to quorum nodes.

    Failure

    Kubelet will try to make nodeStatusUpdateRetry post attempts. Currently nodeStatusUpdateRetry is constantly set to 5 in kubelet.go.

    Kubelet will try to update the status in tryUpdateNodeStatus function. Kubelet uses http.Client() Golang method, but has no specified timeout. Thus there may be some glitches when API Server is overloaded while TCP connection is established.

    So, there will be nodeStatusUpdateRetry * --node-status-update-frequency attempts to set a status of node.