Training in Quarantine - Day 127
#Mid 20s but feels a lot more comfortable. Did my walk, but got caught in the rain. Yes, it finally RAINED.
Still needed the desk fan at home though.
Mid 20s but feels a lot more comfortable. Did my walk, but got caught in the rain. Yes, it finally RAINED.
Still needed the desk fan at home though.
High 20s again today. Went into the office to pick up my phone charger which I left there when I went into the office last time. Even at 5am, the temperature was reading 22degC. Comfortable to walk in at that time since the sun hadn't come out yet, but the Victoria Line was still baking hot and the stations were pretty dead...
Came back and did my walk around lunchtime, the temperature at that time was around 28degC and my shorts were getting soaked with sweat so I had to change them when I got home.
So, a weird thing occurred in Kubernetes on the GKE cluster we have at the office. I figured I would do a write up here, before I forget everything and maybe allow the Kubernetes devs to read over this as an issue (https://github.com/kubernetes/kubernetes/issues/93783)
We noticed some weirdness occurring on our cluster when Jobs
and CronJobs
started behaving strangely.
Jobs
were spawning but seemed to not spawn any pods to go with it, even over an hour later, they were sitting there without a pod to go with it.
Investigating other jobs, I found a crazy large number of pods in one of our namespaces, over 900 to be exact. These pods were all completed pods from a CronJob
.
The CronJob
was scheduled to run every minute, and the definition of the CronJob
had valid values for the history -- sensible values for .spec.successfulJobsHistoryLimit
and .spec.failedJobsHistoryLimit
were set. And even if they weren't, the defaults would (or should) be used.
So why did we have over 900 cron pods, and why weren't they being cleaned up upon completion?
Just in case the number of pods were causing problems, I cleared out the completed pods:
kubectl delete pods -n {namespace} $(kubectl get pods -n {namespace} | grep Completed | awk '{print $1}' | xargs)
But even after that, new jobs weren't spawning pods. And in fact, more CronJob
pods were appearing in this namespace. So I disabled the CronJob
kubectl patch cronjobs -n {namespace} {cronjob-name} -p '{"spec" : {"suspend" : true }}'
But that also didn't help, pods were still being generated. Which is weird -- why is a CronJob
still spawning pods even when it's suspended?
So then I remembered that CronJobs
actually generate Job
objects. So I checked the Job
objects and found over 3000 Job
objects. Okay, something is seriously wrong here, there shouldn't be 3000 Job
objects for something that only runs once a minute.
So I went and deleted all the CronJob
related Job
objects:
kubectl delete job -n {namespace} $(kubectl get jobs -n {namespace} | grep {cronjob-name} | awk '{print $1}' | xargs)
This reduced the pods down, but did not help us determine why the Job
objects were not spawning pods.
I decided to get Google onto the case and raised a support ticket.
Their first investigation brought up something interesting. They sent me this snippet from the Master logs (redacted)
2020-08-05 10:05:06.555 CEST - Job is created 2020-08-05 11:21:16.546 CEST - Pod is created 2020-08-05 11:21:16.569 CEST - Pod (XXXXXXX) is bound to node 2020-08-05 11:24:11.069 CEST - Pod is deleted 2020-08-05 12:45:47.940 CEST - Job is created 2020-08-05 12:57:22.386 CEST - Pod is created 2020-08-05 12:57:22.401 CEST - Pod (XXXXXXX) is bound to node
Spot the problem?
The time between "Job is created" and "Pod is created" around 80 minutes in the first case, and 12 minutes in the second one. That's right, it took 80 minutes for the Pod to be spawned.
And this is where it dawned on me about what was possibly going on.
CronJob
spawned a Job
object. It tried to spawn a pod, and that took a significant amount of time, far more than the 1 minute between runsCronJob
looks to see if it has a running pod due to the .spec.concurrencyPolicy
value.CronJob
does not find a running pod so generates another Job
object, which also gets stuck waiting for pod generationEach time, a new Job
gets added, gets stuck waiting for pod generation for an abnormally long time, which causes another Job
to be added to the namespace which also gets stuck...
Eventually, the pod will generate but by then there's now a backlog of Jobs
, meaning even if I suspended the CronJob
, it won't have any effect until the Jobs
in the backlog are cleared or deleted (I had deleted them).
Google investigated further, and found the culprit:
Failed calling webhook, failing open www.up9.com: failed calling webhook "www.up9.com": Post https://up9-sidecar-injector-prod.up9.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
We were testing up9 and this was using a webhook, so it looks like a misbehaving webhook was causing this problem. We removed the webhook and everything started working again.
So where does this leave us? Well, a few thoughts:
CronJob
without cleanup -- successfulJobsHistoryLimit
and failedJobsHistoryLimit
values are seemingly ignored.CronJob
is supposed to be run mutually exclusively, since the delay in pod generation could allow two cron pods to spawn together, even though the CronJob
has a concurrencyPolicy
set as Forbid.CronJob
that runs once a minute -- and then maliciously crafts the job to never finish, this snowball effect will cause the cluster to run out of resource and/or scale up nodes forever or until it hits the max allowed by your configuration.High 20s today, so a bit of a slog getting my walk done. 3km and I'm covered in sweat.... 🥵
Another cool afternoon to walk. Enjoying the Kubernetes Podcast (@kubernetespod) as I went.
20degC so cooler than it's been for a while. Nice comfortable walk, but as I was walking past Sainsbury's there seemed to be some kind of argument going on, and escalating. Didn't hang around, but possible that this could get to a punch-up level. And it wouldn't surprise me, Covid has certainly brought out the stupidity in people.
Cooler day, still warm, but not overly hot and no rain.
But my walk was a bit distracted by this little guy:
He was abnormally tame, and didn't run off when I approached and seemed happy to look at me and even lay down. One of the residents told me they sometimes fed him. Probably wasn't a good idea since he now sees humans as a food source.
A distinctly cooler day today. Still warm, but bearable.
Boy, oh boy, yesterday was a crazy hot day. I went back into the office as they are going to do a deep clean so we needed to make sure all our desks are clear of any personal belongings.
They blocked out every other desk:
And the air conditioning was reduced since there weren't that many people in the office.
The temperature in the office was a toasty 26.8degC
But the temperature outside soared to 36degC, and nearly tipped the 38degC/100degF mark.
As expected, the heat destroyed Network Rail's ability to run a reliable service and services out of King's Cross were delayed or cancelled.
I ended up using the Piccadilly Line to get back home.
On the way, spotted Blackfriars station had a vending machine for alcohold wipes, hand sanitisers and face masks:
The heat in the trains was horrendous. Eventually made it home and decided not to walk, I was sweating enough as it was just taking the train.
Back to today. Temperatures are back to mid-20s so walking is still hot weather, but not as uncomfortable as yesterday.
As you probably are aware, I now own a Yaris. And today, I got surprised when Toyota UK on Twitter linked in a tribute to their Corolla AE86 on their Twitter feed, namely the Corolla GT86, although disappointingly, there was no Eurobeat on the trailer.
For those of you who don't know, the AE86 is the car used by Fujiwara Takumi in the Initial D anime series, a young man doing early morning tofu deliveries down a winding mountain pass, learning how to drift as he went, and minimising the jerky movements of the car, since tofu is an extremely fragile food.
Toyota made reference to this also in my Yaris by adding an app to the car called "A Glass of Water" which simulates having a glass of water in the car, and when you finish your journey, it tells you how much water you would have spilt had you had a real glass of water in your car.
Takumi's father used this same technique while training him. Putting a glass of water in his cup holder and Takumi would try to minimise the amount of spillage during his delivery run.
I am impressed how Toyota have been embracing the references to their cars in the anime, and when I mentioned this on my office's slack, surprisingly some people also got the reference ^_^
Today's forecast was 35degC, and it's currently 28degC according to fitbit. A 3km walk has me sweating from the heat. If you're running or cycling out there today, you are much braver than I am.
No rain today, but a windier day today. Had to do my walk with a running coat on for the first time in a long while.
No walks or runs today, the weather has been pretty miserable and raining most of the day. That being said, I did see some people running even in the pouring rain.....
Runners can be a weird class of people sometimes.
Warm day with a few heavy downpours of rain. Still managed to get a walk done, and realised this postbox I've been walking past for years, doesn't have the "EIIR" logo on it, but has a "GVIR" logo instead.
Which I _believe_ means it's a box erected during George VI's reign (the EIIR stands for Elizabeth II, Regina (queen), and the GVIR stands for George VI, Rex (king)).
If so, this box was erected between 1936-1952, meaning it's at least 68 years old.
Been a warm day, but on-off raining.
My FitBit premium subscription has now expired and I won't be renewing it for now.
25degC, but a bit cloudy and humid. Muggy as I did my walk.
Spent today's walk going up to Sainsbury's shopping and then walking back.
My office contacted me saying there's a Summer clean up and if I wanted to come in to pick up things or stuff, I can.
I've picked a day next week.
23degC today, and a nice hot day. Did my long walk and now hot and sweaty -- but feeling good.
Another nice warm day. Not too hot, and a comfortable walk.
Another nice warm day. Cooler than yesterday, but still quite warm.
Mid 20s today and a scorcher of a day for a walk.
So I went to the local McDonalds to pick up a coffee and light breakfast and they were doing takeaway only, and only providing seating for delivery partners, like Uber Eats and Deliveroo.
My normal local cafe has turned into a KFC-like chicken shop and now opens at 10am instead of the usual 6am. Disappointed, but I can understand why.
At McDs, I placed my order just as the manager was asking a customer to leave. As he was, I noticed he had a bottle in his hand. Originally I thought it was a Maker's Mark bottle, but actually looked more like a Courvoisier 1L bottle, which retails for around £35-£40.
This guy was nearing the end of the bottle when I arrived. Had he finished an entire 1L, 40% ABV bottle of Cognac at 6am in the morning? No wonder the boss asked him to leave. The boss was then on the phone, not sure who to, though. I waited for my breakfast outside, monitoring the display screen.
I've decided to put the McDs app on my phone in case that helps for the next time I'm ordering.
After my breakfast, I went to Tesco for the weekly shop.
I parked up and then noticed nearby (like three spaces away) was another Yaris. It was a black Yaris Hybrid, like mine, with a 68 plate (mine is a 67 plate). I thought "that's ironic", and went to do my shopping.
Came back after shopping, loaded up my boot, then sat in the car and tried to get phone paired with my car's Bluetooth so I can listen to my Spotify music on the journey back.
Just then I felt my car's back shaking. Looked in the rear view mirror and found someone trying to get into the boot. I'd already central locked the car, which means the boot won't open. So I opened the window and said "Wrong Car". The guy realised immediately what had happened, and so did I. He's the driver of the _other_ black Yaris Hybrid, parked but three spaces away!
A much brighter, sunnier day today. Great for a walk. Low 20s degC.
A cloudy day with ominous clouds in the sky. Going to take my chances and head out for a walk....
A cloudy day, but still warm enough to do a walk, without any unwanted attention from flies.