Having a master in a Kubernetes cluster is all very well and good, but if that master goes down the entire cluster cannot schedule new work. Pods will continue to run, but new ones cannot be scheduled and any pods that die will not get rescheduled.
Having multiple masters allows for more resiliency and can pick up when one goes down. However, as I found out, setting multi-master was quite problematic. Using the guide here only provided some help so after trashing my own and my company’s test cluster, I have expanded on the linked guide.
First add the subnet details for the new zone into your cluster definition — CIDR, subnet id, and make sure you name it something that you can remember. For simplicity, I called mine eu-west-2c
. If you have a definition for utility
(and you will if you use a bastion), make sure you have a utility
subnet also defined for the new AZ
kops edit cluster --state s3://bucket
Now, create your master instance groups, you need an odd number to enable quorum and avoid split brain (I’m not saying prevent, and there are edge cases where this could be possible even with quorum). I’m going to add west-2b
and west-2c
. AWS recently introduced the third London AWS zone, so I’m going to use that.
kops create instancegroup master-eu-west-2b --subnet eu-west-2b --role Master
Make this one have a max/min of 1
kops create instancegroup master-eu-west-2c --subnet eu-west-2c --role Master
Make this one have a max/min of 0 (yes, zero) for now
Reference these in your cluster config
kops edit cluster --state=s3://bucket
etcdClusters: - etcdMembers: - instanceGroup: master-eu-west-2a name: a - instanceGroup: master-eu-west-2b name: b - instanceGroup: master-eu-west-2c name: c name: main - etcdMembers: - instanceGroup: master-eu-west-2a name: a - instanceGroup: master-eu-west-2b name: b - instanceGroup: master-eu-west-2c name: c name: events
Start the new master
kops update cluster --state s3://bucket --yes
Find the etcd and etcd-event pods and add them to this script. Change “clustername” to the name of your cluster, then run it. Confirm the member lists include both two members (in my case it would be etc-a and etc-b)
ETCPOD=etcd-server-events-ip-10-10-10-226.eu-west-2.compute.internal ETCEVENTSPOD=etcd-server-ip-10-10-10-226.eu-west-2.compute.internal AZ=b CLUSTER=clustername kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member add etcd-$AZ http://etcd-$AZ.internal.$CLUSTER:2380 kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-$AZ http://etcd-events-$AZ.internal.$CLUSTER:2381 echo Member Lists kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member list kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member list
(NOTE: the cluster will break at this point due to the missing second cluster member)
Wait for the master to show as initialised. Find the instance id of the master and put it into this script. Change the AWSSWITCHES to match any switches you need to provide to the awscli. For me, I specify my profile and region
The script will run and output the status of the instance until it shows “ok”
AWSSWITCHES="--profile personal --region eu-west-2" INSTANCEID=master2instanceid while [ "$(aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2)" != "ok" ] do sleep 5s aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2 done aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2
ssh into the new master (or via bastion if needed)
sudo -i systemctl stop kubelet systemctl stop protokube
edit /etc/kubernetes/manifests/etcd.manifest
and /etc/kubernetes/manifests/etcd-events.manifest
Change the ETCD_INITIAL_CLUSTER_STATE
value from new
to existing
Under ETCD_INITIAL_CLUSTER
remove the third master definition
Stop the etcd docker containers
docker stop $(docker ps | grep "etcd" | awk '{print $1}')
Run this a few times until you get a docker error saying you need more than one container name
There are two volumes mounted under /mnt/master-vol-xxxxxxxx, one contains /var/etcd/data-events/member/ and one contains /var/etcd/data/member/ but it varies because of the id.
rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data-events/member/ rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data/member/
Now start kubelet
systemctl start kubelet
Wait until the master shows on the validate list then start protokube
systemctl start protokube
Now do the same with the third master
edit the third master ig to make it min/max 1
kops edit ig master-eu-west-2c --name=clustername --state s3://bucket
Add it to the clusters (the etcd pods should still be running)
ETCPOD=etcd-server-events-ip-10-10-10-226.eu-west-2.compute.internal ETCEVENTSPOD=etcd-server-ip-10-10-10-226.eu-west-2.compute.internal AZ=c CLUSTER=clustername kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member add etcd-$AZ http://etcd-$AZ.internal.$CLUSTER:2380 kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-$AZ http://etcd-events-$AZ.internal.$CLUSTER:2381 echo Member Lists kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member list kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member list
Start the third master
kops update cluster --name=cluster-name --state=s3://bucket
Wait for the master to show as initialised. Find the instance id of the master and put it into this script. Change the AWSSWITCHES to match any switches you need to provide to the awscli. For me, I specify my profile and region
The script will run and output the status of the instance until it shows “ok”
AWSSWITCHES="--profile personal --region eu-west-2" INSTANCEID=master3instanceid while [ "$(aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2)" != "ok" ] do sleep 5s aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2 done aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text | grep SYSTEMSTATUS | cut -f 2
ssh into the new master (or via bastion if needed)
sudo -i systemctl stop kubelet systemctl stop protokube
edit /etc/kubernetes/manifests/etcd.manifest
and /etc/kubernetes/manifests/etcd-events.manifest
Change the ETCD_INITIAL_CLUSTER_STATE
value from new
to existing
We DON’T need to remove the third master defintion this time, since this is the third master
Stop the etcd docker containers
docker stop $(docker ps | grep "etcd" | awk '{print $1}')
Run this a few times until you get a docker error saying you need more than one container name
There are two volumes mounted under /mnt/master-vol-xxxxxxxx, one contains /var/etcd/data-events/member/ and one contains /var/etcd/data/member/ but it varies because of the id.
rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data-events/member/ rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data/member/
Now start kubelet
systemctl start kubelet
Wait until the master shows on the validate list then start protokube
systemctl start protokube
If the cluster validates, do a full respin
kops rolling-update cluster --name clustername --state s3://bucket --force --yes
You must be logged in to post a comment.