From 0f5fd1edc0a91efea659c20f94295a1557accb2f Mon Sep 17 00:00:00 2001
From: Alexander Petermann <5159138+lexxxel@users.noreply.github.com>
Date: Mon, 18 May 2020 11:35:37 +0200
Subject: [PATCH] update documentation to add and remove nodes (#6095)

* update documentation to add and remove nodes

* add information about parameters to change when adding multiple etcd nodes

* add information about reset_nodes

* add documentation about adding existing nodes to ectd masters.
---
 docs/nodes.md | 162 +++++++++++++++++++++++++++++---------------------
 1 file changed, 93 insertions(+), 69 deletions(-)

diff --git a/docs/nodes.md b/docs/nodes.md
index 6eb987428..c8fe5bf93 100644
--- a/docs/nodes.md
+++ b/docs/nodes.md
@@ -2,6 +2,58 @@
 
 Modified from [comments in #3471](https://github.com/kubernetes-sigs/kubespray/issues/3471#issuecomment-530036084)
 
+## Limitation: Removal of first kube-master and etcd-master
+
+Currently you can't remove the first node in your kube-master and etcd-master list. If you still want to remove this node you have to:
+
+### 1) Change order of current masters
+
+Modify the order of your master list by pushing your first entry to any other position. E.g. if you want to remove `node-1` of the following example:
+
+```yaml
+  children:
+    kube-master:
+      hosts:
+        node-1:
+        node-2:
+        node-3:
+    kube-node:
+      hosts:
+        node-1:
+        node-2:
+        node-3:
+    etcd:
+      hosts:
+        node-1:
+        node-2:
+        node-3:
+```
+
+change your inventory to:
+
+```yaml
+  children:
+    kube-master:
+      hosts:
+        node-2:
+        node-3:
+        node-1:
+    kube-node:
+      hosts:
+        node-2:
+        node-3:
+        node-1:
+    etcd:
+      hosts:
+        node-2:
+        node-3:
+        node-1:
+```
+
+## 2) Upgrade the cluster
+
+run `cluster-upgrade.yml` or `cluster.yml`. Now you are good to go on with the removal.
+
 ## Adding/replacing a worker node
 
 This should be the easiest.
@@ -10,19 +62,16 @@ This should be the easiest.
 
 ### 2) Run `scale.yml`
 
-You can use `--limit=node1` to limit Kubespray to avoid disturbing other nodes in the cluster.
+You can use `--limit=NODE_NAME` to limit Kubespray to avoid disturbing other nodes in the cluster.
 
 Before using `--limit` run playbook `facts.yml` without the limit to refresh facts cache for all nodes.
 
-### 3) Drain the node that will be removed
-
-```sh
-kubectl drain NODE_NAME
-```
-
-### 4) Run the remove-node.yml playbook
+### 3) Remove an old node with remove-node.yml
 
 With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
+  
+If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars: `-e node=NODE_NAME reset_nodes=false`.
+Use this flag even when you remove other types of nodes like a master or etcd nodes.
 
 ### 5) Remove the node from the inventory
 
@@ -30,32 +79,9 @@ That's it.
 
 ## Adding/replacing a master node
 
-### 1) Recreate apiserver certs manually to include the new master node in the cert SAN field
+### 1) Run `cluster.yml`
 
-For some reason, Kubespray will not update the apiserver certificate.
-
-Edit `/etc/kubernetes/kubeadm-config.yaml`, include new host in `certSANs` list.
-
-Use kubeadm to recreate the certs.
-
-```sh
-cd /etc/kubernetes/ssl
-mv apiserver.crt apiserver.crt.old
-mv apiserver.key apiserver.key.old
-
-cd /etc/kubernetes
-kubeadm init phase certs apiserver --config kubeadm-config.yaml
-```
-
-Check the certificate, new host needs to be there.
-
-```sh
-openssl x509 -text -noout -in /etc/kubernetes/ssl/apiserver.crt
-```
-
-### 2) Run `cluster.yml`
-
-Add the new host to the inventory and run cluster.yml.
+Append the new host to the inventory and run `cluster.yml`. You can NOT use `scale.yml` for that.
 
 ### 3) Restart kube-system/nginx-proxy
 
@@ -68,64 +94,62 @@ docker ps | grep k8s_nginx-proxy_nginx-proxy | awk '{print $1}' | xargs docker r
 
 ### 4) Remove old master nodes
 
-If you are replacing a node, remove the old one from the inventory, and remove from the cluster runtime.
-
-```sh
-kubectl drain NODE_NAME
-kubectl delete node NODE_NAME
-```
-
-After that, the old node can be safely shutdown. Also, make sure to restart nginx-proxy in all remaining nodes (step 3)
-
-From any active master that remains in the cluster, re-upload `kubeadm-config.yaml`
-
-```sh
-kubeadm config upload from-file --config /etc/kubernetes/kubeadm-config.yaml
-```
+With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
+If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.
 
-## Adding/Replacing an etcd node
+## Adding an etcd node
 
 You need to make sure there are always an odd number of etcd nodes in the cluster. In such a way, this is always a replace or scale up operation. Either add two new nodes or remove an old one.
 
 ### 1) Add the new node running cluster.yml
 
 Update the inventory and run `cluster.yml` passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`.
+If the node you want to add as an etcd node is already a worker or master node in your cluster, you have to remove him first using `remove-node.yml`.
 
-Run `upgrade-cluster.yml` also passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`. This is necessary to update all etcd configuration in the cluster.
+Run `upgrade-cluster.yml` also passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`. This is necessary to update all etcd configuration in the cluster.  
 
-At this point, you will have an even number of nodes. Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node. Even so, running applications should continue to be available.
+At this point, you will have an even number of nodes.
+Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node.
+Even so, running applications should continue to be available.
 
-### 2) Remove an old etcd node
+If you add multiple ectd nodes with one run, you might want to append `-e etcd_retries=10` to increase the amount of retries between each ectd node join.
+Otherwise the etcd cluster might still be processing the first join and fail on subsequent nodes. `etcd_retries=10` might work to join 3 new nodes.
 
-With the node still in the inventory, run `remove-node.yml` passing `-e node=NODE_NAME` as the name of the node that should be removed.
+## Removing an etcd node
 
-### 3) Make sure the remaining etcd members have their config updated
+### 1) Remove old etcd members from the cluster runtime
 
-In each etcd host that remains in the cluster:
-
-```sh
-cat /etc/etcd.env | grep ETCD_INITIAL_CLUSTER
-```
-
-Only active etcd members should be in that list.
-
-### 4) Remove old etcd members from the cluster runtime
-
-Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member.
+Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member. Use a etcd master that will not be removed for that.  
 
 ```sh
 # list all members
 etcdctl member list
 
-# remove old member
+# run remove for each member you want pass to remove-node.yml in step 2
 etcdctl member remove MEMBER_ID
 # careful!!! if you remove a wrong member you will be in trouble
 
-# note: these command lines are actually much bigger, since you need to pass all certificates to etcdctl.
+# wait until you do not get a 'Failed' output from
+etcdctl member list
+
+# note: these command lines are actually much bigger, if you are not inside an etcd container, since you need to pass all certificates to etcdctl.
 ```
 
-### 5) Make sure the apiserver config is correctly updated
+You can get into an etcd container by running `docker exec -it $(docker ps --filter "name=etcd" --format "{{.ID}}") sh` on one of the etcd masters.  
+
+### 2) Remove an old etcd node
+
+With the node still in the inventory, run `remove-node.yml` passing `-e node=NODE_NAME` as the name of the node that should be removed.
+If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.
+
+### 3) Make sure only remaining nodes are in your inventory
+
+Remove `NODE_NAME` from your inventory file.
 
-In every master node, edit `/etc/kubernetes/manifests/kube-apiserver.yaml`. Make sure only active etcd nodes are still present in the apiserver command line parameter `--etcd-servers=...`.
+### 4) Update kubernetes and network configuration files with the valid list of etcd members
 
-### 6) Shutdown the old instance
+Run `cluster.yml` to regenerate the configuration files on all remaining nodes.
+
+### 5) Shutdown the old instance
+
+That's it.
-- 
GitLab