Managing Kubernetes without losing your cool

DevOps Notts 29th March 2022

Hi,

I’m Marcus Noble, a platform engineer at Giant Swarm.

I’m found around the web as AverageMarcus in most places and @Marcus_Noble_ on Twitter.

I have about 5 years experience running Kubernetes in production environments.

I also like home automation, IOT and 3D printing.

Summary

My 10 tips for working with Kubernetes

#1 → #5 Anyone can start using these today #6 → #7 Good to know a little old-skool ops first #8 → #10 Good have some programming knowledge

#0 - Pay someone else to deal with it

OK, this one is kinda tongue in cheek but worth mentioning. If you have dozens or hundreds of clusters on-top of other development work you’re going to be stretched thin. Getting someone else to manage things while you focus on what makes your business money can often be the right choice.

#1 - Love your terminal

Bash? ZSH? Fish? - Doesn’t matter as long as you’re comfortable with it.
“rc” files - .bashrc, .zshrc These set runtime configuration for each terminal window you open.
alias - easily create your own terminal commands
Look for “dotfiles” on GitHub - e.g. https://github.com/averagemarcus/dotfiles

Create your own workflow of tasks you perform often. Avoid typos and ”fat fingering” by replacing long, complex commands with short aliases (bonus points for adding help text to remind you later)

#2 - Learn to love `kubectl`

The official documentation offers a single page view of all built in commands: kubernetes.io/docs/reference/generated/kubectl/kubectl-commands
Add alias k=’kubectl’ to your .bashrc / .zshrc / .whateverrc

k get pods -A

kubectl explain is your friend! Find out what any property of any Kubernetes resource is for.

k explain pods.spec.containers 
KIND: Pod
VERSION: v1

RESOURCE: containers <[]Object> 

DESCRIPTION:

Save time by only typing k.

Kubectl explain for digging into resources and their properties (useful when you can’t access the official docs or know exactly what you’re looking for)

#3 - Multiple kubeconfigs

Quick switch between different Kubernetes contexts (clusters) and between different namespaces.
kubectx and kubens - https://github.com/ahmetb/kubectx
kubeswitch - https://github.com/danielfoehrKn/kubeswitch
kubie - https://github.com/sbstp/kubie

kubeswitch my fave as it supports directory of kubeconfigs to make organising easier

#4 - k9s

Interactive terminal. Supports all resource types and actions. Lots of keybinding and similar to quickly work with a cluster. Find, view, edit, port forward, view logs, delete, etc.

#5 - kubectl plugins

Krew - package manager for kubectl plugins https://github.com/kubernetes-sigs/krew
Any command in your $PATH that is prefixed with kubectl- becomes a kubectl plugin
Install plugins with kubectl krew install <PLUGIN NAME>
Some of my fave plugins:
- stern - Multi-pod/container log tailing
- tree - Show hierarchy of resources based on ownerReferences
- outdated - Find containers with outdated images
- gs - Giant Swarm’s plugin for working with our managed clusters

Plugins can be in any language. You can easily add your own by creating Bash scripts with a kubectl- prefixed name.

Note: autocomplete is a bit trickier here. Some plugins support it but generally expect your tabcompletion to only recommend core kubectl features.

Summary

My 10 tips for working with Kubernetes

#1 → #5 Anyone can start using these today - Done #6 → #7 Good to know a little old-skool ops first #8 → #10 Good have some programming knowledge

Not so scary so far, right? Now on to a little more hands-on techniques.

#6 - kshell / kubectl debug

Launch a temporary pod running a bash shell for cluster debugging

alias kshell='kubectl run -it --image bash --restart Never --rm shell'

Need more tools? Replace bash with ubuntu

Great for more general debugging of a cluster, especially with networking issues or similar.

#6 - kshell / kubectl debug

Launch a temporary pod running a bash shell for cluster debugging

# kshell If you don't see a command prompt, try pressing enter. bash-5.1# nslookup google.com 
Server: 1.1.1.1 
Address: 1.1.1.1:53 

Non-authoritative answer: 
Name: google.com 
Address: 142.250.187.206

Great for more general debugging of a cluster, especially with networking issues or similar.

#6 - kshell / kubectl debug

Debugging a running pod - kubectl exec

# kubectl exec my-broken-pod -it -- sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec……

Debugging a running pod - kubectl debug (Requires Kubernetes 1.23)

# kubectl debug -it --image bash my-broken-pod Defaulting debug container name to debugger-gprmk. If you don't see a command prompt, try pressing enter. bash-5.1#

kubectl exec is great for debugging misconfigured pods that aren’t crashing and have enough OS to exec into. But… If the pod is CrashLooping you’ll get kicked out of the session when it crashes. If the pod doesn’t have a shell you can exec into (e.g. a container that only has a Golang binary) you’ll not be able to exec

kubectl debug is great for pods that either don’t have any OS

#6 - kshell / kubectl debug

Example - investigate a CrashLooping pod

# kubectl run debug-demo --image=bash -- exit 1

# kubectl get pods debug-demo 
NAME debug-demo
READY 0/1
STATUS CrashLoopBackOff
RESTARTS 2 (20s ago)
AGE 44s

(This will prevent us from kubectl exec into the pod)

# kubectl debug -it --image bash debug-demo 
Defaulting debug container name to debugger-5mkjj. 
If you don't see a command prompt, try pressing enter. 
bash-5.1#

kubectl debug has a few different modes:

launches an “ephemeral container” within the pod you’re debugging - kubectl debug
creates a copy of the pod with some values replaced (e.g. the image used) - kubectl debug –copy-to
launch a pod in the nodes host namespace to debug the node - kubectl debug node/my-node

This has some limitations

cannot access all filesystem of failing container, only volumes that are shared

#6 - kshell / kubectl debug

When to use what:

Multiple workloads experiencing network issues - kshell

Workload not running as expected but not CrashLooping and isn’t a stripped down image (e.g. not Scratch / Distroless) - kubectl exec

Workload not running as expected but not CrashLooping and has an image based on Scratch / Distroless or similar - kubectl debug

Workload is CrashLooping - kubectl debug

#7 - kube-ssh

https://github.com/AverageMarcus/kube-ssh
Give ssh-like access to a node host, great for instances where nodes are provisioned without SSH or direct access

sh -c "$(curl -sSL https://raw.githubusercontent.com/AverageMarcus/kube-ssh/master/ssh.sh)" 
[0] - ip-10-18-21-146.eu-west-1.compute.internal
[1] - ip-10-18-21-234.eu-west-1.compute.internal
[2] - ip-10-18-21-96.eu-west-1.compute.internal
Which node would you like to connect to? 1

If you don't see a command prompt, try pressing enter. 
[root@ip-10-18-21-234 ~]#

Why? - I prefer to use ephemeral instances with minimal needed to run Kubernetes, no sshd, no port 22 open etc. but there are times when you just need to check what’s actually going on with the underlying host machine.

Always verify a shell script before you run it! Ideally, download it first and run that instead.

Why? Smaller potential attack surface. Less chance of “hotfixes” or “tweaks” being forgotten about.

#7 - kube-ssh

Some caveats - underlying host needs a shell
You require enough permissions to launch pods with privileged securityContext - RBAC, PSPs and Admission Controllers could all potentially block this. (This could also be considered a benefit to this approach over traditional SSH)
Not a real SSH session
nsenter - “The nsenter command executes program in the namespace(s) that are specified in the command-line options.” (Man page)

Summary

#1 → #5 Anyone can start using these today - Done #6 → #7 Good to know a little old-skool ops first - Done #8 → #10 Good have some programming knowledge

#8 - Webhooks

Two types of webhooks:
- ValidatingWebhook - Ability to block actions against the API server if fails to meet given criteria.
- MutatingWebhook - Modify requests before passing them on to the API server.
Implement more advanced access control than is possible with RBAC.
Add default labels to resources as they’re created.
Enforce policies such as not using latest as an image tag or ensuring all workloads have resource requests/limits specified.
“Hotfix” for security issues (e.g. mutating all pods to include a LOG4J_FORMAT_MSG_NO_LOOKUPS env var to prevent Log4Shell exploit).

Allows for subtractive access control (take away a users ability to perform a certain action against a certain resource) - something not possible with RBAC

See blog post about how we avoided a nasty bug in our CLI tool with a ValidatingWebhook. https://www.giantswarm.io/blog/restricting-cluster-admin-permissions

#8 - Webhooks

Build your own operator to implement custom logic
Kyverno - Kubernetes native policy management. Create Policy and ClusterPolicy resources to define rules in YAML
OPA Gatekeeper - Policy management built on top of Open Policy Agent

#8 - Webhooks

Notes:

Where possible always avoid applying webhooks to resources in kube-system. This can cause a deadlock if those pods try to come up before the webhook service is available.
Be aware of the failurePolicy property - it defaults to “fail” which can cause troubles if your service handling the webhook goes down.
The reinvocationPolicy property can be set if changes made by a MutatingWebhook may need to go through other defined webhooks again.
Ordering - first MutatingWebhooks then ValidatingWebhooks. No guaranteed control of order within these two phases.

Webhooks can break a cluster. Make sure your service is resilient and that your webhooks don’t block critical workloads.

Webhooks can be backed by either services within the cluster or pointing to an URL outside of the cluster.

#9 - Kubernetes API

All Kubernetes operations are done via the API - kubectl uses it, in-cluster controllers use it, the scheduler uses it and you can use it too!

Currently using OpenAPI V2 (OpenAPI V3 available as an alpha feature in v1.23)

The API can be extended either by Custom Resource Definitions (CRDs) or by implementing an Aggregation Layer (such as what metrics-server implements).

#9 - Kubernetes API

You can easily try out the API using kubectl with the —raw argument.

# kubectl get --raw /api/v1/namespaces/default/pods {"kind":"PodList","apiVersion":"v1","metadata":{"selfLink":...

If no host is provided kubectl will use the API of the current context.

HTTP Method to Kubectl command mappings: GET - kubectl get —raw POST - kubectl create —raw DELETE - kubectl delete —raw PUT - kubectl replace —raw

To target another cluster not set as your current kubeconfig context you can specify the full URL of the endpoint.

Be aware that not all kubectl commands map to a single API call. Lots do several API calls under the hood.

#9 - Kubernetes API

Not sure what APIs are available?

# kubectl api-resources NAME bindings componentstatuses configmaps endpoints deployments

API Endpoint format:

/{API_VERSION}/namespace/{NAMESPACE_NAME}/{RESOURCE_KIND}/{NAME}

#9 - Kubernetes API

If APIVERSION is just v1 the endpoint starts with /api/v1/

E.g. /api/v1/componentstatuses

The “core” API is accessible on /api/v1

#9 - Kubernetes API

Otherwise, the endpoint starts with /apis/{APIVERSION}/ (Note the extra ‘s’)

E.g. /apis/apps/v1/

APIs added to kubernetes in later versions are available on the /apis endpoint. This include the built-in ones like Deployments as well as Custom Resources

#9 - Kubernetes API

The NAMESPACED column indicates if the resource is bound to a namespace.

If false: /api/v1/componentstatuses If true: /apis/apps/v1/namespaces/default/deployment

#9 - Kubernetes API

Resources:

kubernetes/client-go - the official Golang module for interacting with the Kubernetes API
Kubernetes Provider for Terraform (actually uses the above Go module under the hood)
kubernetes-client/python - the official Python library for interacting with the Kubernetes API

Where is this useful?

Building our own CLI / desktop tooling (e.g. k9s, Lens)
Cluster automation - resources managed by CI, CronJobs, etc.
Building our own operators to extend Kubernetes.

Make use of one of the many client libraries available rather than interacting with the REST endpoint directly. Plenty more official clients available at https://github.com/kubernetes-client

#10 - CRDs & Operators

Extend Kubernetes’ built-in API and functionality with your own Custom Resource Definitions (CRDs) and business logic (operators).

#10 - CRDs & Operators

Frameworks

Kubebuilder
Operator Framework
KUDO
Metacontroller

References

https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
https://blog.container-solutions.com/kubernetes-operators-explained
https://operatorhub.io/ - Directory of existing operators

This topic is too large to cover within this talk, there are already plenty of better resources available.

Kubebuilder tends to be the most popular framework and used by all of the cluster-api projects.

Summary

#1 → #5 Anyone can start using these today - Done #6 → #7 Good to know a little old-skool ops first - Done #8 → #10 Good have some programming knowledge - Done

Recap

#1 - Love your terminal - Shell aliases and helpers #2 - Learn to love kubectl - Alias k, kubectl explain #3 - Multiple kubeconfigs - Kubeswitch #4 - k9s - Interactively work with clusters #5 - Kubectl plugins - Krew. Build your own with bash. Kubectl- prefixed name #6 - kshell / kubectl debug - Pod debugging #7 - kube-ssh - Node debugging #8 - Webhooks - Validating and mutating requests to the Kubernetes API #9 - Kubernetes API - Working directly with the API to build our own logic #10 - CRDs & Controllers - Extending Kubernetes with our own resources and logic

Thank You

🧡

Managing Kubernetes without losing your cool

Slide 1

Managing Kubernetes without losing your cool

Slide 2

Slide 3

Summary

Slide 4

#0 - Pay someone else to deal with it

Slide 5

#1 - Love your terminal

Slide 6

#1 - Love your terminal

Slide 7

#2 - Learn to love `kubectl`

Slide 8

#2 - Learn to love `kubectl`

Slide 9

#3 - Multiple kubeconfigs

Slide 10

#3 - Multiple kubeconfigs

Slide 11

#4 - k9s

Slide 12

#4 - k9s

Slide 13

#5 - kubectl plugins

Slide 14

#5 - kubectl plugins

Slide 15

Summary

Slide 16

#6 - kshell / kubectl debug

Slide 17

#6 - kshell / kubectl debug

Slide 18

#6 - kshell / kubectl debug

Slide 19

#6 - kshell / kubectl debug

Slide 20

#6 - kshell / kubectl debug

Slide 21

#6 - kshell / kubectl debug

Slide 22

#7 - kube-ssh

Slide 23

#7 - kube-ssh

Slide 24

#7 - kube-ssh

Slide 25

#7 - kube-ssh

Slide 26

Summary

Slide 27

#8 - Webhooks

Slide 28

#8 - Webhooks

Slide 29

#8 - Webhooks

Slide 30

#8 - Webhooks

Slide 31

#9 - Kubernetes API

Slide 32

#9 - Kubernetes API

Slide 33

#9 - Kubernetes API

Slide 34

#9 - Kubernetes API

Slide 35

#9 - Kubernetes API

Slide 36

#9 - Kubernetes API

Slide 37

#9 - Kubernetes API

Slide 38

#9 - Kubernetes API

Slide 39

#10 - CRDs & Operators

Slide 40

#10 - CRDs & Operators