How to move from single master to multi-master in an AWS kops kubernetes cluster

Having a master in a Kubernetes cluster is all very well and good, but if that master goes down the entire cluster cannot schedule new work. Pods will continue to run, but new ones cannot be scheduled and any pods that die will not get rescheduled.

Having multiple masters allows for more resiliency and can pick up when one goes down. However, as I found out, setting multi-master was quite problematic. Using the guide here only provided some help so after trashing my own and my company’s test cluster, I have expanded on the linked guide.

First add the subnet details for the new zone into your cluster definition — CIDR, subnet id, and make sure you name it something that you can remember. For simplicity, I called mine eu-west-2c. If you have a definition for utility (and you will if you use a bastion), make sure you have a utility subnet also defined for the new AZ

kops edit cluster --state s3://bucket

Now, create your master instance groups, you need an odd number to enable quorum and avoid split brain (I’m not saying prevent, and there are edge cases where this could be possible even with quorum). I’m going to add west-2b and west-2c. AWS recently introduced the third London AWS zone, so I’m going to use that.

kops create instancegroup master-eu-west-2b --subnet eu-west-2b --role Master

Make this one have a max/min of 1

kops create instancegroup master-eu-west-2c --subnet eu-west-2c --role Master

Make this one have a max/min of 0 (yes, zero) for now

Reference these in your cluster config

kops edit cluster --state=s3://bucket
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-west-2a
      name: a
    - instanceGroup: master-eu-west-2b
      name: b
    - instanceGroup: master-eu-west-2c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-west-2a
      name: a
    - instanceGroup: master-eu-west-2b
      name: b
    - instanceGroup: master-eu-west-2c
      name: c
    name: events

Start the new master

kops update cluster --state s3://bucket --yes

Find the etcd and etcd-event pods and add them to this script. Change “clustername” to the name of your cluster, then run it. Confirm the member lists include both two members (in my case it would be etc-a and etc-b)

ETCPOD=etcd-server-events-ip-10-10-10-226.eu-west-2.compute.internal
ETCEVENTSPOD=etcd-server-ip-10-10-10-226.eu-west-2.compute.internal
AZ=b
CLUSTER=clustername

kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member add etcd-$AZ http://etcd-$AZ.internal.$CLUSTER:2380

kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-$AZ http://etcd-events-$AZ.internal.$CLUSTER:2381

echo Member Lists
kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member list

kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member list

(NOTE: the cluster will break at this point due to the missing second cluster member)

Wait for the master to show as initialised. Find the instance id of the master and put it into this script. Change the AWSSWITCHES to match any switches you need to provide to the awscli. For me, I specify my profile and region

The script will run and output the status of the instance until it shows “ok”

AWSSWITCHES="--profile personal --region eu-west-2"
INSTANCEID=master2instanceid
while [ "$(aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2)" != "ok" ]
do
  sleep 5s
  aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2
done
aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2

ssh into the new master (or via bastion if needed)

sudo -i
systemctl stop kubelet
systemctl stop protokube

edit /etc/kubernetes/manifests/etcd.manifest and /etc/kubernetes/manifests/etcd-events.manifest
Change the ETCD_INITIAL_CLUSTER_STATE value from new to existing
Under ETCD_INITIAL_CLUSTER remove the third master definition

Stop the etcd docker containers

docker stop $(docker ps | grep "etcd" | awk '{print $1}')

Run this a few times until you get a docker error saying you need more than one container name
There are two volumes mounted under /mnt/master-vol-xxxxxxxx, one contains /var/etcd/data-events/member/ and one contains /var/etcd/data/member/ but it varies because of the id.

rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data-events/member/
rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data/member/

Now start kubelet

systemctl start kubelet

Wait until the master shows on the validate list then start protokube

systemctl start protokube

Now do the same with the third master

edit the third master ig to make it min/max 1

kops edit ig master-eu-west-2c --name=clustername --state s3://bucket

Add it to the clusters (the etcd pods should still be running)

ETCPOD=etcd-server-events-ip-10-10-10-226.eu-west-2.compute.internal
ETCEVENTSPOD=etcd-server-ip-10-10-10-226.eu-west-2.compute.internal
AZ=c
CLUSTER=clustername

kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member add etcd-$AZ http://etcd-$AZ.internal.$CLUSTER:2380
kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-$AZ http://etcd-events-$AZ.internal.$CLUSTER:2381

echo Member Lists
kubectl --namespace=kube-system exec $ETCPOD -- etcdctl member list
kubectl --namespace=kube-system exec $ETCEVENTSPOD -- etcdctl --endpoint http://127.0.0.1:4002 member list

Start the third master

kops update cluster --name=cluster-name --state=s3://bucket

Wait for the master to show as initialised. Find the instance id of the master and put it into this script. Change the AWSSWITCHES to match any switches you need to provide to the awscli. For me, I specify my profile and region

The script will run and output the status of the instance until it shows “ok”

AWSSWITCHES="--profile personal --region eu-west-2"
INSTANCEID=master3instanceid
while [ "$(aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2)" != "ok" ]
do
  sleep 5s
  aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2
done
aws $AWSSWITCHES ec2 describe-instance-status --instance-id=$INSTANCEID --output text  | grep SYSTEMSTATUS | cut -f 2

ssh into the new master (or via bastion if needed)

sudo -i
systemctl stop kubelet
systemctl stop protokube

edit /etc/kubernetes/manifests/etcd.manifest and /etc/kubernetes/manifests/etcd-events.manifest
Change the ETCD_INITIAL_CLUSTER_STATE value from new to existing

We DON’T need to remove the third master defintion this time, since this is the third master

Stop the etcd docker containers

docker stop $(docker ps | grep "etcd" | awk '{print $1}')

Run this a few times until you get a docker error saying you need more than one container name
There are two volumes mounted under /mnt/master-vol-xxxxxxxx, one contains /var/etcd/data-events/member/ and one contains /var/etcd/data/member/ but it varies because of the id.

rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data-events/member/
rm -r /mnt/var/master-vol-xxxxxx/var/etcd/data/member/

Now start kubelet

systemctl start kubelet

Wait until the master shows on the validate list then start protokube

systemctl start protokube

If the cluster validates, do a full respin

kops rolling-update cluster --name clustername --state s3://bucket  --force --yes

Enabling and using Let’s Encrypt SSL Certificates on Kubernetes

Kubernetes is an awesome piece of kit, you can set applications to run within the cluster, make it visible to only apps within the cluster and/or expose it to applications outside of the cluster.

As part of my tinkering, I wanted to setup a Docker Registry to store my own images without having to make them public via docker hub.  Doing this proved a bit more complicated than expected since by default, it requires SSL which requires a certificate to be purchased and installed.

Enter Let’s Encrypt which allows you to get SSL certificates for free; and by using their API, you can set it to regularly renew. Kubernetes has the kube-lego project which allows this regular integration. So here, I’ll go through enabling an application (in this case, it’s a docker registry, but it can be anything).

First, lets ignore the lego project, and set up the application so that it is accessible normally. As mentioned above, this is the docker registry

I’m tying the registry storage to a pv claim, though you can modify this to tie to S3, instead etc.

---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: registry
  namespace: default
  labels:
    name: registry
spec:
  replicas: 1
  selector:
    matchLabels:
      name: registry
  template:
    metadata:
      creationTimestamp: 
      labels:
        name: registry
    spec:
      volumes:
      - name: registry-data
        persistentVolumeClaim:
          claimName: registry-data
      containers:
      - name: registry
        image: registry:2
        resources: {}
        volumeMounts:
        - name: registry-data
          mountPath: "/var/lib/registry"
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
        imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: Recreate
---
kind: Service
apiVersion: v1
metadata:
  name: registry
  namespace: default
  labels:
    name: registry
spec:
  ports:
  - protocol: TCP
    port: 9000
    targetPort: 5000
  selector:
    name: registry
  type: LoadBalancer
  sessionAffinity: None
  externalTrafficPolicy: Cluster

Once you’ve applied this, verify your config is correct by ensuring you have an external endpoint for the service (use kubectl describe service registry | grep "LoadBalancer Ingress"). On AWS, this will be an ELB, on other clouds, you might get an IP. If you get an ELB, CNAME a friendly name to it. If you get an IP, create an A record for it. I’m going to use registry.blenderfox.com for this test.

Verify by doing this. Bear in mind it can take a while before DNS records updates so be patient.

host $(SERVICE_DNS)

So if I had set the service to be registry.blenderfox.com, I would do

host registry.blenderfox.com

If done correctly, this should resolve to the ELB then resolve to the ELB IP addresses.

Next, try to tag a docker image of the format registry-host:port/imagename, so, for example, registry.blenderfox.com:9000/my-image.

Next try to push it.

docker push registry.blenderfox.com:9000/my-image

It will fail because it can’t talk over https

docker push registry.blenderfox.com:9000/my-image
The push refers to repository [registry.blenderfox.com:9000/my-image]
Get https://registry.blenderfox.com:9000/v2/: http: server gave HTTP response to HTTPS client

So let’s now fix that.

Now let’s start setting up kube-lego

Checkout the code
git clone git@github.com:jetstack/kube-lego.git

cd into the relevant folder
cd kube-lego/examples/nginx

Start applying the code base

kubectl apply -f lego/00-namespace.yaml
kubectl apply -f nginx/00-namespace.yaml
kubectl apply -f nginx/default-deployment.yaml
kubectl apply -f nginx/default-service.yaml

Open up nginx/configmap.yaml and change the body-size: "64m" line to a bigger value. This is the maximum size you can upload through nginx. You’ll see why this is an important change later.

kubectl apply -f nginx/configmap.yaml
kubectl apply -f nginx/service.yaml
kubectl apply -f nginx/deployment.yaml

Now, look for the external endpoint for the nginx service
kubectl describe service nginx -n nginx-ingress | grep "LoadBalancer Ingress"

Look for the value next to LoadBalancer Ingress. On AWS, this will be the ELB address.

CNAME your domain for your service (e.g. registry.blenderfox.com in this example) to that ELB. If you’re not on AWS, this may be an IP, in which case, just create an A record instead.

Open up lego/configmap.yaml and change the email address in there to be the one you want to use to request the certs.

kubectl apply -f lego/configmap.yaml
kubectl apply -f lego/deployment.yaml

Wait for the DNS to update before proceeding to the next step.

host registry.blenderfox.com

When the DNS is updated, finally create and add an ingress rule for your service:

---
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: registry
  namespace: default
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: 'true'
spec:
  tls:
  - hosts:
    - registry.blenderfox.com
    secretName: docker-tls
  rules:
  - host: registry.blenderfox.com
    http:
      paths:
      - path: "/"
        backend:
          serviceName: registry
          servicePort: 9000
status:
  loadBalancer:
    ingress:
    - {}

Look add the logs in nginx-ingress/nginx and you’ll see the Let’s Encrypt server come in to validate:

100.124.0.0 - [100.124.0.0] - - [19/Jan/2018:09:50:19 +0000] "GET /.well-known/acme-challenge/[REDACTED] HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" 277 0.044 100.96.0.3:8080 87 0.044 200

And look in the logs on the kube-lego/kube-lego pod and you’ll see the success and saving of the secret

time="2018-01-19T09:49:45Z" level=info msg="requesting certificate for registry.blenderfox.com" context="ingress_tls" name=registry namespace=default 
time="2018-01-19T09:50:21Z" level=info msg="authorization successful" context=acme domain=registry.blenderfox.com 
time="2018-01-19T09:50:47Z" level=info msg="successfully got certificate: domains=[registry.blenderfox.com] url=https://acme-v01.api.letsencrypt.org/acme/cert/[REDACTED]" context=acme 
time="2018-01-19T09:50:47Z" level=info msg="Attempting to create new secret" context=secret name=registry-tls namespace=default 
time="2018-01-19T09:50:47Z" level=info msg="Secret successfully stored" context=secret name=registry-tls namespace=default 

Now let’s do a quick verify:

curl -ILv https://registry.blenderfox.com
...
* Server certificate:
*  subject: CN=registry.blenderfox.com
*  start date: Jan 19 08:50:46 2018 GMT
*  expire date: Apr 19 08:50:46 2018 GMT
*  subjectAltName: host "registry.blenderfox.com" matched cert's "registry.blenderfox.com"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
...

That looks good.

Now let’s re-tag and try to push our image

docker tag registry.blenderfox.com:9000/my-image registry.blenderfox.com/my-image
docker push registry.blenderfox.com/my-image

Note we are not using a port this time as there is now support for SSL.

BOOM! Success.

The tls section indicates the host to request the cert on, and the backend section indicates which backend to pass the request onto. The body-size config is at the nginx level so if you don’t change it, you can only upload a maximum of 64m even if the backend service (docker registry in this case) can support it. I have it set here at “1g” so I can upload 1gb (some docker images can be pretty large)

Kubernetes V1.9 released

From the Kubernetes blog, the next version of Kubernetes has been released. And one feature has definitely caught my eye:

Windows Support (beta)

Kubernetes was originally developed for Linux systems, but as our users are realizing the benefits of container orchestration at scale, we are seeing demand for Kubernetes to run Windows workloads. Work to support Windows Server in Kubernetes began in earnest about 12 months ago. SIG-Windows has now promoted this feature to beta status, which means that we can evaluate it for usage.

So users of Windows can now hook up Windows boxes into their cluster. Which leads to an interesting case of mixed-OS clusters. Strictly speaking, that’s already possible now with a mix of Linux distributions able to run Kubernetes.

http://blog.kubernetes.io/2017/12/kubernetes-19-workloads-expanded-ecosystem.html

Training

Tried a different route today. Still sore from the hour run yesterday. This new route turns out to be just under 5k, though I’m not sure it’s right, since my Fitbit seemed to lose communication with my phone so didn’t track the route properly. Guess I’ll try again tomorrow maybe.

Still, I got two achievements on the run which was two PRs on segments on the run (which were tracked properly)

Guide to creating a Kubernetes Cluster in existing subnets & VPC on AWS with kops

This article is a guide on how to setup a Kubernetes cluster in AWS using kops and plugging it into your own subnets and VPC. We attempt to minimise the external IPs used in this method.

Export your AWS API keys into environment variables

export AWS_ACCESS_KEY_ID='YOUR_KEY'
export AWS_SECRET_ACCESS_KEY='YOUR_ACCESS_KEY'
export CLUSTER_NAME="my-cluster-name"
export VPC="vpc-xxxxxx"
export K8SSTATE="s3-k8sstate"</pre>

Create the cluster (you can change some of these switches to match your requirements. I would suggest only using one worker node and one master node to begin with and then increase them once you have confirmed the config is good. The more workers and master nodes you have, the longer it will take to run a rolling-update.

kops create cluster --cloud aws --name $CLUSTER_NAME --state s3://$K8SSTATE --node-count 1 --zones eu-west-1a,eu-west-1b,eu-west-1c --node-size t2.micro --master-size t2.micro --master-zoneseu-west-1a,eu-west-1b,eu-west-1c --ssh-public-key ~/.ssh/id_rsa.pub --topology=private --networking=weave --associate-public-ip=false --vpc $VPC

Important note: There must be an ODD number of master zones. If you tell kops to use an even number zones for master, it will complain.

If you want to use additional security groups, don’t add them yet — add them after you have confirmed the cluster is working.

Internal IPs: You must have a VPN connection into your VPC or you will not be able to ssh into the instances. The alternative is to use the bastion functionality using the --bastion flag with the create command. Then doing this:

ssh -i ~/.ssh/id_rsa -o ProxyCommand='ssh -W %h:%p admin@bastion.$CLUSTER_NAME' admin@INTERNAL_MASTER_IP

However, if you do this method, you MUST then use public IP addressing on the api load balancer, as you will not be able to do kops validate otherwise.

Edit the cluster

kops edit cluster $CLUSTER_NAME --state=s3://$K8SSTATE

Make the following changes:

If you have a VPN connection into the VPC, change spec.api.loadBalancer.type to “Internal“, otherwise, leave it as “Public
Change spec.subnets to match your private subnets. To use existing private subnets, they should also include the id of the subnet and match the CIDR range, e.g.:

subnets:
- cidr: 10.10.10.0/23
  id: subnet-xxxxxxx
  name: eu-west-1a
  type: Private
  zone: eu-west-1a</pre>

The utility subnet is where the Bastion hosts will be placed, and these should be in a public subnet, since they will be the inbound route into the cluster from the internet.

If you need to change or add specific IAM permissions, add them under spec.additionalPolicies like this to add additional policies to the node IAM policy (apologies about the formatting. WordPress is doing something weird to it.)

additionalPolicies:
  node: | 
    [ 
      {  
        "Effect": "Allow", 
        "Action": ["dynamodb:*"],
        "Resource": ["*"] 
      },  
      {  
        "Effect": "Allow",  
        "Action": ["es:*"],     
        "Resource": ["*"]     
      }    
    ]

Edit the bastion, nodes, and master configs (MASTER_REGION is the zone where you placed the master. If you are running a multi-region master config, you’ll have to do this for each region)

kops edit ig master-{MASTER_REGION} --name=$CLUSTER_NAME --state s3://$K8SSTATE

kops edit ig nodes --name=$CLUSTER_NAME --state s3://$K8SSTATE
kops edit ig bastions --name=$CLUSTER_NAME --state s3://$K8SSTATE

Check and make any updates.

If you want a mixture of instance types (e.g. t2.mediums and r3.larges), you’ll need to separate these using new instance groups ($SUBNETS is the subnets where you want the nodes to appear — for example, you can provide a list “eu-west-2a,eu-west-2b)

kops create ig anothernodegroup --state s3://$K8SSTATE --subnets $SUBNETS

You can later delete this with

kops delete ig anothernodegroup --state s3://$K8SSTATE

If you want to use spot prices, add this under the spec section (x.xx is the price you want to bid):

maxPrice: "x.xx"

Check the instance size and count if you want to change them (I would recommend not changing the node count just yet)

If you want to add tags to the instances (for example for billing), add something like this to the spec section:

cloudLabels:
  Billing: product-team</pre>

If you want to run some script(s) at node startup (cloud-init), add them to spec.additionalUserData:

spec:
  additionalUserData:
  - name: myscript.sh
    type: text/x-shellscript
    content: |
      #!/bin/sh
      echo "Hello World.  The time is now $(date -R)!" | tee /root/output.txt

Apply the update:

kops update cluster $CLUSTER_NAME --state s3://$K8SSTATE --yes

Wait for DNS to propagate and then validate

kops validate cluster --state s3://$K8SSTATE

Once the cluster returns ready, apply the Kubernetes dashboard

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/alternative/kubernetes-dashboard.yaml

Access the dashboard via

https://api.$CLUSTER_NAME/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/

also try:

https://api.$CLUSTER_NAME/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

If the first doesn’t work

(ignore the cert error)

Username is “admin” and the password is found from your local ~/.kube/config

Add the External DNS update to allow you to give friendly names to your externally-exposed services rather than the horrible elb names.

See here: https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/aws.md

(You can apply the yaml directly onto the cluster via the dashboard. Make sure you change the filter to match your domain or subdomain. )

Note that if you use this, you’ll need to change the node IAM policy on the cluster config as the default IAM policy won’t allow the External DNS container to modify Route 53 entries, and also annotate (use kubectl annotate $service_name key:value) your service with text such as:

external-dns.alpha.kubernetes.io/hostname: $SERVICE_NAME.$CLUSTERNAME

And also you might need this annotation, to make the ELB internal rather than public – otherwise Kubernetes will complain “Error creating load balancer (will retry): Failed to ensure load balancer for service namespace/service: could not find any suitable subnets for creating the ELB”

service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0

(Optional) Add the Cockpit pod to your cluster as described here

http://cockpit-project.org/guide/133/feature-kubernetes.html

It will allow you to visually see a topology of your cluster at a cluster and also provides some management features too. For example, here’s my cluster. It contains 5 nodes (1 master, 4 workers and is running 4 services (Kubernetes, external-dns, cockpit, and dashboard). Cockpit creates a replication controller so it knows about the changes.

chrome_2018-01-14_15-44-00

Add any additional security groups by adding this under the spec section of the node/master/bastions config, then do a rolling-update (you might need to use the --force switch), do this as soon as you can after creating and verifying the cluster updates work.

additionalSecurityGroups:
- sg-xxxxxxxx
- sg-xxxxxxxx

If the cluster breaks after this (i.e. the nodes haven’t shown up on the master), reboot the server (don’t terminate, use the reboot option from the AWS console), and see if that helps. If it still doesn’t show up, there’s something wrong with the security groups attached — i.e. they’re conflicting somehow with the Kubernetes access. Remove those groups and then do another rolling-update but use both the --force and --cloudonly switches to force a “dirty” respin.

If the cluster comes up good, then you can change the node counts on the configs and apply the update.

Note that if you change the node count and then apply the update, the cluster attempts to make the update without rolling-update. For example, if you change the node count from 1 to 3, the cluster attempts to bring up the 2 additional nodes.

Other things you can look at:

Kompose – which converts a docker-compose configuration into Kubernetes resources

Finally, have fun!

%d bloggers like this: