The “Snowball” Effect In Kubernetes

So, a weird thing occurred in Kubernetes on the GKE cluster we have at the office. I figured I would do a write up here, before I forget everything and maybe allow the Kubernetes devs to read over this as an issue (https://github.com/kubernetes/kubernetes/issues/93783)

We noticed some weirdness occurring on our cluster when Jobs and CronJobs started behaving strangely.

Jobs were spawning but seemed to not spawn any pods to go with it, even over an hour later, they were sitting there without a pod to go with it.

Investigating other jobs, I found a crazy large number of pods in one of our namespaces, over 900 to be exact. These pods were all completed pods from a CronJob.

The CronJob was scheduled to run every minute, and the definition of the CronJob had valid values for the history — sensible values for .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit were set. And even if they weren’t, the defaults would (or should) be used.

So why did we have over 900 cron pods, and why weren’t they being cleaned up upon completion?

Just in case the number of pods were causing problems, I cleared out the completed pods:

kubectl delete pods -n {namespace} $(kubectl get pods -n {namespace} | grep Completed | awk '{print $1}' | xargs)

But even after that, new jobs weren’t spawning pods. And in fact, more CronJob pods were appearing in this namespace. So I disabled the CronJob

kubectl patch cronjobs -n {namespace} {cronjob-name} -p '{"spec" : {"suspend" : true }}'

But that also didn’t help, pods were still being generated. Which is weird — why is a CronJob still spawning pods even when it’s suspended?

So then I remembered that CronJobs actually generate Job objects. So I checked the Job objects and found over 3000 Job objects. Okay, something is seriously wrong here, there shouldn’t be 3000 Job objects for something that only runs once a minute.

So I went and deleted all the CronJob related Job objects:

kubectl delete job -n {namespace} $(kubectl get jobs -n {namespace} | grep {cronjob-name} | awk '{print $1}' | xargs)

This reduced the pods down, but did not help us determine why the Job objects were not spawning pods.

I decided to get Google onto the case and raised a support ticket.

Their first investigation brought up something interesting. They sent me this snippet from the Master logs (redacted)

2020-08-05 10:05:06.555 CEST - Job is created
2020-08-05 11:21:16.546 CEST - Pod is created
2020-08-05 11:21:16.569 CEST - Pod (XXXXXXX) is bound to node
2020-08-05 11:24:11.069 CEST - Pod is deleted

2020-08-05 12:45:47.940 CEST - Job is created
2020-08-05 12:57:22.386 CEST - Pod is created
2020-08-05 12:57:22.401 CEST - Pod (XXXXXXX) is bound to node

Spot the problem?

The time between “Job is created” and “Pod is created” around 80 minutes in the first case, and 12 minutes in the second one. That’s right, it took 80 minutes for the Pod to be spawned.

And this is where it dawned on me about what was possibly going on.

  • The CronJob spawned a Job object. It tried to spawn a pod, and that took a significant amount of time, far more than the 1 minute between runs
  • The next cycle, the CronJob looks to see if it has a running pod due to the .spec.concurrencyPolicy value.
  • The CronJob does not find a running pod so generates another Job object, which also gets stuck waiting for pod generation
  • And so on, and so on.

Each time, a new Job gets added, gets stuck waiting for pod generation for an abnormally long time, which causes another Job to be added to the namespace which also gets stuck…

Eventually, the pod will generate but by then there’s now a backlog of Jobs, meaning even if I suspended the CronJob, it won’t have any effect until the Jobs in the backlog are cleared or deleted (I had deleted them).

Google investigated further, and found the culprit:

Failed calling webhook, failing open www.up9.com: failed calling webhook "www.up9.com": Post https://up9-sidecar-injector-prod.up9.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

We were testing up9 and this was using a webhook, so it looks like a misbehaving webhook was causing this problem. We removed the webhook and everything started working again.

So where does this leave us? Well, a few thoughts:

  • A misbehaving/misconfigured webhook can cause a Snowball effect in the cluster causing multiple runs of a single CronJob without cleanup — successfulJobsHistoryLimit and failedJobsHistoryLimit values are seemingly ignored.
  • This could break systems where the CronJob is supposed to be run mutually exclusively, since the delay in pod generation could allow two cron pods to spawn together, even though the CronJob has a concurrencyPolicy set as Forbid.
  • If someone managed (whether intentionally or maliciously) to install a webhook that causes this pod spawning delay, and then adds a CronJob that runs once a minute — and then maliciously crafts the job to never finish, this snowball effect will cause the cluster to run out of resource and/or scale up nodes forever or until it hits the max allowed by your configuration.

Twitter’s Security Screwup and New Privacy Concerns

There is a new story doing the round about how Twitter found that it had stored user’s password in the clear in an internal log. Whilst reading it, I got this email from Twitter:

While this isn’t the first time a big company has done this (Github for one also did this), it seems unbelievable that a big company like Twitter would get itself caught out by this basic, common sense security practice. Pretty much every YouTube video and article about correctly handling passwords will tell you not to store them in the clear and only store them as hashes (with salts, preferably). Hashing algorithms are meant to be really difficult or impossible to reverse, meaning you can’t (easily) use the hashes to determine the original passwords.

Some examples from a quick YouTube search — Tom Scott’s video’s really good btw :), although is comment about “using login using Twitter and let them store your password for you” is a bit ironic :P

The fact that Twitter has our unencrypted passwords on disk… does this mean Twitter has been saving our original passwords before hashing them?

More to the point – whilst Twitter are quick to point out that no-one at the company can see the masked password, they don’t mention who has (or had) access to the unmasked passwords in the internal log. Or for how long…

Twitter users who had their accounts on private may not have been as private as they initially thought….

 

Facebook Privacy (or lack of)

Facebook have been having a lot of bad publicity lately (and I would personally say it’s long overdue) and a lot of it over privacy. Now, there’s talk about Facebook lifting SMS and phone call information from Android phones with consent. Yes, Facebook asks for it, but you can (and should) refuse it access.

Later versions of Android allow you to revoke and change the permissions given to an app, and also prompt you again if the app asks for it.

My Facebook app has very little permissions on my device because I don’t trust it a single bit.

I also have Privacy Guard enabled and restricted. Whenever it wants to know my location, I can refuse it.

Hack the USAF [Engadget]

Whilst finding vulnerabilities is a bad thing, having them found by white hat hackers is a good thing. Hackathons like this one prove that it can be constructive to get a group of them in to find and help fix vulnerabilities in your system before they are found in public and exploited to death before you have a chance to fix them.

The US Air Force’s second security hackathon has paid dividends… both for the military and the people finding holes in its defenses. HackerOne has revealed the results of the Hack the Air Force 2.0 challenge from the end of 2017, and it led to volunteers discovering 106 vulnerabilities across roughly 300 of the USAF’s public websites. Those discoveries proved costly, however. The Air Force paid out a total of $103,883, including $12,500 for one bug — the most money any federal bounty program has paid to date.

 

https://www.engadget.com/2018/02/19/hack-the-air-force-2/

 

Enabling and using Let’s Encrypt SSL Certificates on Kubernetes

Kubernetes is an awesome piece of kit, you can set applications to run within the cluster, make it visible to only apps within the cluster and/or expose it to applications outside of the cluster.

As part of my tinkering, I wanted to setup a Docker Registry to store my own images without having to make them public via docker hub.  Doing this proved a bit more complicated than expected since by default, it requires SSL which requires a certificate to be purchased and installed.

Enter Let’s Encrypt which allows you to get SSL certificates for free; and by using their API, you can set it to regularly renew. Kubernetes has the kube-lego project which allows this regular integration. So here, I’ll go through enabling an application (in this case, it’s a docker registry, but it can be anything).

First, lets ignore the lego project, and set up the application so that it is accessible normally. As mentioned above, this is the docker registry

I’m tying the registry storage to a pv claim, though you can modify this to tie to S3, instead etc.

---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: registry
  namespace: default
  labels:
    name: registry
spec:
  replicas: 1
  selector:
    matchLabels:
      name: registry
  template:
    metadata:
      creationTimestamp: 
      labels:
        name: registry
    spec:
      volumes:
      - name: registry-data
        persistentVolumeClaim:
          claimName: registry-data
      containers:
      - name: registry
        image: registry:2
        resources: {}
        volumeMounts:
        - name: registry-data
          mountPath: "/var/lib/registry"
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
        imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: Recreate
---
kind: Service
apiVersion: v1
metadata:
  name: registry
  namespace: default
  labels:
    name: registry
spec:
  ports:
  - protocol: TCP
    port: 9000
    targetPort: 5000
  selector:
    name: registry
  type: LoadBalancer
  sessionAffinity: None
  externalTrafficPolicy: Cluster

Once you’ve applied this, verify your config is correct by ensuring you have an external endpoint for the service (use kubectl describe service registry | grep "LoadBalancer Ingress"). On AWS, this will be an ELB, on other clouds, you might get an IP. If you get an ELB, CNAME a friendly name to it. If you get an IP, create an A record for it. I’m going to use registry.blenderfox.com for this test.

Verify by doing this. Bear in mind it can take a while before DNS records updates so be patient.

host $(SERVICE_DNS)

So if I had set the service to be registry.blenderfox.com, I would do

host registry.blenderfox.com

If done correctly, this should resolve to the ELB then resolve to the ELB IP addresses.

Next, try to tag a docker image of the format registry-host:port/imagename, so, for example, registry.blenderfox.com:9000/my-image.

Next try to push it.

docker push registry.blenderfox.com:9000/my-image

It will fail because it can’t talk over https

docker push registry.blenderfox.com:9000/my-image
The push refers to repository [registry.blenderfox.com:9000/my-image]
Get https://registry.blenderfox.com:9000/v2/: http: server gave HTTP response to HTTPS client

So let’s now fix that.

Now let’s start setting up kube-lego

Checkout the code
git clone git@github.com:jetstack/kube-lego.git

cd into the relevant folder
cd kube-lego/examples/nginx

Start applying the code base

kubectl apply -f lego/00-namespace.yaml
kubectl apply -f nginx/00-namespace.yaml
kubectl apply -f nginx/default-deployment.yaml
kubectl apply -f nginx/default-service.yaml

Open up nginx/configmap.yaml and change the body-size: "64m" line to a bigger value. This is the maximum size you can upload through nginx. You’ll see why this is an important change later.

kubectl apply -f nginx/configmap.yaml
kubectl apply -f nginx/service.yaml
kubectl apply -f nginx/deployment.yaml

Now, look for the external endpoint for the nginx service
kubectl describe service nginx -n nginx-ingress | grep "LoadBalancer Ingress"

Look for the value next to LoadBalancer Ingress. On AWS, this will be the ELB address.

CNAME your domain for your service (e.g. registry.blenderfox.com in this example) to that ELB. If you’re not on AWS, this may be an IP, in which case, just create an A record instead.

Open up lego/configmap.yaml and change the email address in there to be the one you want to use to request the certs.

kubectl apply -f lego/configmap.yaml
kubectl apply -f lego/deployment.yaml

Wait for the DNS to update before proceeding to the next step.

host registry.blenderfox.com

When the DNS is updated, finally create and add an ingress rule for your service:

---
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: registry
  namespace: default
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: 'true'
spec:
  tls:
  - hosts:
    - registry.blenderfox.com
    secretName: docker-tls
  rules:
  - host: registry.blenderfox.com
    http:
      paths:
      - path: "/"
        backend:
          serviceName: registry
          servicePort: 9000
status:
  loadBalancer:
    ingress:
    - {}

Look add the logs in nginx-ingress/nginx and you’ll see the Let’s Encrypt server come in to validate:

100.124.0.0 - [100.124.0.0] - - [19/Jan/2018:09:50:19 +0000] "GET /.well-known/acme-challenge/[REDACTED] HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" 277 0.044 100.96.0.3:8080 87 0.044 200

And look in the logs on the kube-lego/kube-lego pod and you’ll see the success and saving of the secret

time="2018-01-19T09:49:45Z" level=info msg="requesting certificate for registry.blenderfox.com" context="ingress_tls" name=registry namespace=default 
time="2018-01-19T09:50:21Z" level=info msg="authorization successful" context=acme domain=registry.blenderfox.com 
time="2018-01-19T09:50:47Z" level=info msg="successfully got certificate: domains=[registry.blenderfox.com] url=https://acme-v01.api.letsencrypt.org/acme/cert/[REDACTED]" context=acme 
time="2018-01-19T09:50:47Z" level=info msg="Attempting to create new secret" context=secret name=registry-tls namespace=default 
time="2018-01-19T09:50:47Z" level=info msg="Secret successfully stored" context=secret name=registry-tls namespace=default 

Now let’s do a quick verify:

curl -ILv https://registry.blenderfox.com
...
* Server certificate:
*  subject: CN=registry.blenderfox.com
*  start date: Jan 19 08:50:46 2018 GMT
*  expire date: Apr 19 08:50:46 2018 GMT
*  subjectAltName: host "registry.blenderfox.com" matched cert's "registry.blenderfox.com"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
...

That looks good.

Now let’s re-tag and try to push our image

docker tag registry.blenderfox.com:9000/my-image registry.blenderfox.com/my-image
docker push registry.blenderfox.com/my-image

Note we are not using a port this time as there is now support for SSL.

BOOM! Success.

The tls section indicates the host to request the cert on, and the backend section indicates which backend to pass the request onto. The body-size config is at the nginx level so if you don’t change it, you can only upload a maximum of 64m even if the backend service (docker registry in this case) can support it. I have it set here at “1g” so I can upload 1gb (some docker images can be pretty large)

Massive Intel Chip Security Flaw Threatens Computers

An Intel flaw that has been sitting hidden for a decade has finally surfaced.

Being on the chip rather than the OS, it doesn’t affect a single OS — with Linux, Windows and MacOS being mentioned in this article.

https://www.linuxinsider.com/story/85039.html

Why everyone is so convinced Facebook is spying on their conversations

Bipul Lama believes Facebook is spying on him.

And he’s got proof, sort of. Lama performed a test. For two days, all he talked about was Kit-Kats.

“The next day, all I saw on my Instagram and Facebook were Kit-Kat ads,” Lama said.

After his Kit-Kat experiment, he successfully repeated it with chatter about Lysol. The 23-year-old musician is now more convinced than ever that Facebook is listening to his conversations through his phone’s microphone.

“It listens to key words. If you say a word enough times, the algorithm catches those words and it sets off targeted ads,” Lama theorized.

Lama is far from alone. The belief that Facebook is actively listening to people through their phones has become a full-on phenomenon. Facebook has, of course, denied it does this. That has done little to dampen the ongoing paranoia around the theory.

Because it is just a theory… right?

Source: Why everyone is so convinced Facebook is spying on their conversations

CCleaner malware outbreak is much worse than it first appeared | Ars Technica

The malware backdoor in this story is quite intriguing. They are targeting specific companies (Samsung, Akamai, Cisco, Microsoft amongst them) and only attempting the second level attack if they are detecting they are being installed there.

The advice mentioned in the article is that anyone who installed the software on their system should REFORMAT THEIR DRIVE. Quite an extreme recommendation. My suggestion – stop using Windows.

Source: CCleaner malware outbreak is much worse than it first appeared | Ars Technica

Linus Torvalds Invites Attackers to Join the Ke… » Linux Magazine

Torvalds is not a huge fan of the ‘security community’ as he doesn’t see it as black and white. He maintains that bugs are part of the software development process and they cannot be avoided, no matter how hard you try. “constant absolute security does not exist, even if we do a perfect job,” said Torvalds in a conversation with Jim Zemlin, the executive director of the Linux Foundation.

“As a technical person, I’m always very impressed by some of the people who are attacking our code,” Torvalds said. “I get the feeling that these smart people are doing really bad things that I wish they were on our side because they are so smart and they could help us.”

Source: Linus Torvalds Invites Attackers to Join the Ke… » Linux Magazine

%d bloggers like this: