Pod Deep Dive Everything You Didnโ€™t Know You Needed to Know Cloud Native Rejekts March 30th 2025

Hi ๐Ÿ‘‹, Iโ€™m Marcus Noble! Iโ€™m a platform engineer at working on release engineering, CI/CD and general Kubernetes development. I run a monthly newsletter - CloudNative.Now 7+ years experience running Kubernetes in production environments. ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Pod Deep Dive Everything You Didnโ€™t Know You Needed to Know Cloud Native Rejekts March 30th 2025

Pod Deep Dive The Interesting Bits Cloud Native Rejekts March 30th 2025

I have this as stickers if anyone wants one So what is a โ€œPodโ€? ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

โ€œPods are the smallest deployable units of computing that you can create and manage in Kubernetes.โ€ https://kubernetes.io/docs/concepts/workloads/pods/ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

So what is a โ€œPodโ€? Our containers / WASM โ— Smallest deployable unit of computing โ— A wrapper around one or more containers (or WASM functions) โ— Managed by the Kubernetes scheduler and assigned to nodes at runtime โ— Workloads within a pod all share the same runtime context (Linux namespaces, cgroups, network, etc) โ— Designed to be relatively ephemeral and disposable โ— Mostly immutable (only changes to image, activeDeadlineSeconds and additions to tolerations are allowed) ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

So what is a โ€œPodโ€? But really, itโ€™s much more than that! Too much to cover in fact! Interesting, weird or surprising bits: โ— Containers - Sidecar & Ephemeral, Pause Container โ— Images - Tags and Pull Policies โ— RuntimeClass - Different runtimes & WASM โ— Static Pods & Mirror Pods โ— Lifecycle Hooks โ— Pod Conditions & Readiness Gates โ— Config - Env Vars & Volumes โ— Node Scheduling โ— Networking - DNS Policy and Config ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Containers ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

pod.yaml Sidecar Containers โ— Enabled by default from v1.29 (feature gate SidecarContainers) โ— โ€œDisguisedโ€ as initContainers ๐Ÿคท โ— Launched when Pod scheduled, continues running until main application containers have fully stopped, then kubelet terminates all sidecars โ— Supports readinessProbe (unlike normal initContainers) and used to determine the ready state of the Pod โ— apiVersion: v1 kind: Pod metadata: name: โ€œtiny-podโ€ spec: initContainers: - name: logshipper image: alpine Only allowed value restartPolicy: Always On the container, not the Pod Termination handled more harshly than app containers - SIGTERM followed by SIGINT (Rest of spec omitted) before graceful exit likely ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

#~ Ephemeral Containers โ— Designed for debugging โ— Must be added via a special kubectl debug -it tiny-pod \ โ€”image=alpine \ โ€”target=nginx ephemeralcontainers handler, not via the Pod spec (e.g. kubectl debug) โ— Not supported on static pods โ— Can target specific container process namespaces with optional targetContainerName property โ— No support for ports, probes, resources or lifecycle on the container spec โ— Can re-attach to running ephemeral controller with kubectl attach pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œtiny-podโ€ Read only spec: ephemeralContainers: - name: debugger-67t9x image: alpine targetContainerName: nginx ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Pause Container โ— Every Pod includes an empty pause container which bootstraps the Pod with the cgroups, reservations and namespaces before the defined containers are created โ— This container can be thought of as a โ€œparent containerโ€ for all the containers within your Pod and will remain even if workload containers crash, ensuring namespaces and networking remain available โ— The pause container is always present but not visible via the Kubernetes API โ— Can be seen if you query directly on the node, e.g. with containerd: ~ # ctr -n k8s.io containers list | grep pause 03bfc9fa4bd0aebb0a9f84b1aad680f4b7 gsoci.azurecr.io/giantswarm/pause:3.9 io.containerd.runc.v2 05df0356a344c23d02afbff797742c67bd gsoci.azurecr.io/giantswarm/pause:3.9 io.containerd.runc.v2 066e0c8ee2962f276c4b7bb7d505e63f5b gsoci.azurecr.io/giantswarm/pause:3.9 io.containerd.runc.v2 0a6685e4d54e94c4acc36dbbb1a2b356de gsoci.azurecr.io/giantswarm/pause:3.9 io.containerd.runc.v2 ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Images ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Image Pull Policy โ— IfNotPresent - Fetches the image if not already found on the node (default if you specify a tag/sha other than โ€œlatestโ€) โ— Always - Will always fetch the image (default if you specify the tag as โ€œlatestโ€ or omit the tag) โ—‹ The caching mechanism is e๏ฌƒcient enough that in this mode if first compares the image layers from the remote registry and only then downloads if not already present on node โ— Never - Will not attempt to fetch the image, it must be loaded onto the node by some other means pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œtiny-podโ€ spec: containers: - name: โ€œnginxโ€ image: โ€œnginx:v1.2.3โ€ imagePullPolicy: โ€œAlwaysโ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Image Tags - SHA โ— Recommended best practice โ— SHA-based image tag ensure exactly the same image is used each time, even if tag it overwritten โ— If SHA is used, the tag is completely ignored and may no longer match the SHA! โš  โ—‹ Be careful with automated dependency updaters - make sure the sha is also updated! pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œtiny-podโ€ spec: containers: Meaningless / Ignored - name: โ€œnginxโ€ image: โ€œnginx:1.25.1@sha256:9d6b58feebd2dbโ€ฆ2072c9496โ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

RuntimeClass ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

RuntimeClass pod.yaml โ— Allows for multiple runtimes in a single cluster โ— If unset, uses default Container Runtime Interface (CRI) configured on the node โ— If set, must point to a RuntimeClass resource name and have the CRI handler configured up on the node โ— The scheduling property of the RuntimeClass ensures Pods are scheduled onto nodes with that runtime available (based on label selectors) apiVersion: v1 kind: Pod metadata: name: โ€œtiny-podโ€ spec: runtimeClassName: โ€œcrio-runtimeโ€ containers: - name: โ€œdemoโ€ image: โ€œnginx:latestโ€ โ€”apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: โ€œcrio-runtimeโ€ scheduling: nodeSelector: runtime: โ€œcrioโ€ handler: โ€œcrioโ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

RuntimeClass pod.yaml โ— Allows for multiple runtimes in a single cluster โ— If unset, uses default Container Runtime Interface (CRI) configured on the node โ— If set, must point to a RuntimeClass resource name and have the CRI handler configured up on the node โ— The scheduling property of the RuntimeClass ensures Pods are scheduled onto nodes with that runtime available (based on label selectors) โ— Can be used for WASM (web assembly) runtimes, not just containers apiVersion: v1 kind: Pod metadata: name: โ€œwasm-podโ€ spec: runtimeClassName: โ€œwasmedgeโ€ containers: - name: โ€œdemoโ€ image: โ€œmy-wasm-demo:latestโ€ โ€”apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: โ€œwasmedgeโ€ scheduling: nodeSelector: runtime: โ€œwasmedgeโ€ handler: โ€œwasmedgeโ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Static Pods ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

File stored on the host node disk Static Pods /etc/kubernetes/manifests/kube-scheduler.yaml โ— Managed directly by the Kubelet, not the API server โ— Always bound to one Kubelet on a specific node โ— Defined as static manifests either: โ— โ—‹ on disk of the node in the directory defined by โ€”pod-manifest-path โ—‹ or referenced from an URL using the โ€”manifest-url flag The Kubelet automatically tries to create a โ€œmirror Podโ€ on the API for each static Pod so that they are visible when querying the API server but apiVersion: v1 kind: Pod metadata: name: kube-scheduler namespace: kube-system spec: containers: - name: kube-scheduler image: kube-scheduler:v1.32.0 command: - kube-scheduler they cannot be modified via the API โ— Pod names get the node name as a su๏ฌƒx (e.g. kube-scheduler-control-plane-1) โ— Cannot refer to other resources (e.g. ConfigMaps) โ— The Kubelet watches the static directory and reconciles when files are changed/added/removed ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Lifecycle Hooks ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Example based on the Cilium chart provided by Bitnami Lifecycle Hooks โ— cilium-agent.yaml Guaranteed to trigger at least once but may be called multiple times. โ— postStart โ—‹ Runs immediately after container is created but no guarantee that it will execute before the containerโ€™s ENTRYPOINT. โ—‹ The container isnโ€™t marked as โ€œrunningโ€ until this completes. โ— preStop โ—‹ โ— Runs immediately before the container is terminated. Hook mechanisms available: โ—‹ exec - perform command in container โ—‹ httpGet - perform an HTTP GET request to the container โ—‹ sleep - pause the container for given duration apiVersion: v1 kind: Pod metadata: name: cilium-agent spec: containers: - name: cilium-agent image: โ€œcilium:latestโ€ lifecycle: postStart: exec: command: - /bin/bash - -ec - | if [[ โ€œ$(iptables-save | grep -E -c โ€˜AWS-SNAT-CHAIN)โ€ != โ€œ0โ€ ]]; then iptables-save | grep -E -v โ€˜AWS-SNAT-CHAINโ€™ | iptables-restore fi preStop: exec: command: - /opt/bitnami/scripts/cilium/uninstall-cni-plugin.sh - /host (if PodLifecycleSleepAction feature gate enabled) ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Conditions & Readiness Gates ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Conditions pod.yaml Kubelet manages the following Pod Conditions: โ— PodScheduled - the Pod has been scheduled to a node โ— PodReadyToStartContainers - (beta feature) the Pod sandbox has been created and networking configured โ— ContainersReady - all containers in the Pod are ready โ— Initialized - all the initContainers have completed โ— Ready - all containers ready and probes successfully passing apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: โ€ฆ status: conditions: - type: Ready status: โ€œFalseโ€ lastProbeTime: null lastTransitionTime: 2018-01-01T00:00:00Z - type: PodScheduled status: โ€œTrueโ€ lastProbeTime: null lastTransitionTime: 2018-01-01T00:00:00Z Each status condition may also containโ€ฆ โ— A ๐Ÿค–machine readable reason property and โ— A ๐Ÿง‘human readable message property โ€ฆthat can be used for debugging. ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Readiness Gates When container status and probes arenโ€™t enough to determine is a Pod really is ready then there is readinessGates! These must me handled by some external application that patches the status of the Pod once the readiness gate condition is met. Example usage: AWS Load Balancer supports readiness gates to indicate a pod is registered to the ALB/NLB. pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œaws-alb-exampleโ€ spec: readinessGates: - conditionType: โ€œtarget-health.elbv2.k8s.aws/k8s-readines-perf1000-7848e5026bโ€ status: conditions: - type: โ€œtarget-health.elbv2.k8s.aws/k8s-readines-perf1000-7848e5026bโ€ status: โ€œFalseโ€ message: โ€œInitial health checks in progressโ€ reason: โ€œElb.InitialHealthCheckingโ€ lastTransitionTime: 2021-11-01T00:00:00Z ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Config ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Config Environment Variables ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Environment Variables โ— Hardcoded & Dynamic, leveraging other environment variables with the $(ENV) syntax pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx env: - name: NAME value: โ€œWorldโ€ - name: GREETING value: โ€œHello, $(NAME)โ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Environment Variables โ— Hardcoded & Dynamic, leveraging other environment variables with the $(ENV) syntax โ— The Downward API allows exposing properties from the Pod fields as env vars. Not all fields are valid but you can use fields from the Podโ€™s metadata, spec, limits and status. pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP - name: CONTAINER_MEM_LIMIT valueFrom: resourceFieldRef: containerName: demo resource: limits.memory ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Config Volumes ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Volumes โ— pod.yaml ConfigMaps vs. Secrets - name vs. secretName apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx volumes: - name: config-vol configMap: name: sensitive-html - name: secret-vol secret: secretName: demo-html ๏ฟฝ๏ฟฝ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Volumes pod.yaml โ— ConfigMaps vs. Secrets - name vs. secretName โ— Downward API apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx volumeMounts: - name: podinfo mountPath: /etc/podinfo volumes: - name: podinfo downwardAPI: Becomes the filename items: - path: โ€œlabelsโ€ fieldRef: fieldPath: metadata.labels - path: โ€œannotationsโ€ fieldRef: fieldPath: metadata.annotations ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Volumes pod.yaml โ— ConfigMaps vs. Secrets - name vs. secretName โ— Downward API โ— EmptyDir โ—‹ The medium property can be set to Memory to use RAM-based storage. โ—‹ Use sizeLimit to avoid runaway storage usage to avoid impacting other workloads on the same node. โ–  If limit hit with default storage medium the Pod will be evicted โ–  If the medium is set to Memory the user gets a โ€œNo space left on deviceโ€ error instead. apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx volumes: - name: cache-volume emptyDir: medium: Memory sizeLimit: 500Mi Recommended to avoid filling host node disk ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Volumes pod.yaml โ— ConfigMaps vs. Secrets - name vs. secretName โ— Downward API โ— EmptyDir โ— Projected โ—‹ Combine multiple volume sources into a single volume mount directory โ—‹ Supports: Secret, ConfigMap, DownwardAPI, ServiceAccountToken and ClusterTrustBundle apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx volumeMounts: - name: web-content mountPath: /usr/share/nginx/html readOnly: true volumes: - name: web-content projected: sources: - configMap: name: web-index items: - key: index.html path: index.html - configMap: name: error-pages Entire contents of ConfigMap data ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Volumes pod.yaml โ— ConfigMaps vs. Secrets - name vs. secretName โ— Downward API โ— EmptyDir โ— Projected โ— Image (KEP #4639) โ—‹ Alpha in v1.31, disabled by default โ—‹ Allows mounting an OCI image as a volume โ—‹ Pull secrets handled the same as container images apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx volumeMounts: - name: oci-content mountPath: /usr/share/nginx/html readOnly: true volumes: - name: oci-content image: reference: quay.io/crio/artifact:v1 pullPolicy: IfNotPresent ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Scheduling ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Scheduling Resource Allocation ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Very brief overview! Resource Requests & Limits โ— Requests - How much resources are at least needed on a node to be scheduled there โ— Limits - enforced amount of resources a container can use โ—‹ CPU - enforced by CPU throttling โ—‹ Memory - enforced by kernel out of memory (OOM) terminations pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx resources: requests: memory: โ€œ64Miโ€ cpu: โ€œ250mโ€ limits: memory: โ€œ128Miโ€ cpu: โ€œ500mโ€ ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Resource Requests & Limits โ— pod.yaml Requests - How much resources are at least needed on a node to be scheduled there โ— Limits - enforced amount of resources a container can use โ—‹ CPU - enforced by CPU throttling โ—‹ Memory - enforced by kernel out of memory (OOM) terminations โ— Custom resource types can be managed by 3rd party controllers (e.g. nvidia.com/gpu) โ—‹ Requests & limits must be the same โ—‹ You cannot specify requests without limits For GPU resources apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: containers: - name: demo image: nginx resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1 ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Resource Requests & Limits โ— Requests - How much resources are at least needed on a node to be scheduled there โ— Limits - enforced amount of resources a container can use โ—‹ CPU - enforced by CPU throttling โ—‹ Memory - enforced by kernel out of memory (OOM) terminations โ— Custom resource types can be managed by 3rd party controllers (e.g. nvidia.com/gpu) โ— โ—‹ Requests & limits must be the same โ—‹ You cannot specify requests without limits Pod limit and requests are calculated from the sum of all the containers โ—‹ pod.yaml apiVersion: v1 kind: Pod metadata: name: โ€œdemo-podโ€ spec: resources: requests: memory: โ€œ100Miโ€ limits: memory: โ€œ200Miโ€ containers: - name: demo image: nginx v1.32 introduces a new alpha (disabled by default) feature that supports pod-level resource specification ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Scheduling Node Assignment ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Node Assignment โ— pod.yaml topologySpreadConstraints - more control over the spread of Pods apiVersion: v1 across a cluster when scaling replicas โ—‹ skew = number of pods in topology - min pods in any topology โ—‹ topologyKey - the label on nodes to use as the groupings (usually failure domains) โ—‹ whenUnsatisfiable - Either DoNotSchedule or ScheduleAnyway โ—‹ No guarantee the constraints remain satisfied when Pods are removed (e.g. scaling down) โ—‹ Combined with other node assignment strategies (e.g. a๏ฌƒnity) kind: Pod metadata: name: โ€œdemo-podโ€ labels: app: nginx spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: nginx containers: - name: demo image: nginx ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Scheduling Scheduler Logic ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Priority & Preemption โ— Pods can be given a priority to indicate their importance compared to other Pods. If a Pod is unable to be scheduled and has a higher priority than already scheduled Pods the scheduler will evict (preempt) the lower priority to make room โ— PriorityClass resource is used to define possible priorities in a cluster โ— PodDisruptionBudget are handled on a best-effort basis and not guaranteed to be honoured โ— You can avoid preempting lower priority Pods by setting preemptionPolicy: Never on the PriorityClass โ—‹ This effects the scheduler queue but doesnโ€™t cause pods to be pod.yaml apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: These pods are important โ€”apiVersion: v1 kind: Pod metadata: name: demo spec: priorityClassName: high-priority containers: - name: demo image: nginx evicted ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Multiple / Alternative Schedulers โ— schedulerName - indicates which scheduler a Pod should be managed by โ— In not set, or set to default-scheduler then the built-in Kubernetes scheduler is used pod.yaml apiVersion: v1 kind: Pod metadata: name: demo spec: schedulerName: custom-scheduler containers: - name: demo image: nginx ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Networking ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

DNS โ— pod.yaml Usually, depending on the DNS mechanism used in the cluster (e.g. CoreDNS), each Pod also gets an A record โ—‹ โ— E.g. 172-17-0-3.default.pod.cluster.local A Pods hostname is set to its metadata.name by default but can be overridden with the spec.hostname property and an additional subdomain set with spec.subdomain โ— โ—‹ E.g. my-demo.example.default.svc.cluster.local โ—‹ (This doesnโ€™t mean other Pods can resolve that hostname) Add extra entries to the Pods /etc/hosts file with spec.hostAliases apiVersion: v1 kind: Pod metadata: name: demo spec: hostname: my-demo subdomain: example setHostnameAsFQDN: true hostAliases: - ip: โ€œ127.0.0.1โ€ hostnames: - โ€œdemo.localโ€ containers: - name: demo image: nginx ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

DNS Policy โ— pod.yaml Define how the Pods DNS configuration defaults should be specified: โ—‹ Default - inherits the nodes DNS resolution config โ—‹ ClusterFirst - matches against in-cluster resources first Actually the default โ—‹ value ๐Ÿคท โ—‹ before sending forwarding to an upstream nameserver ClusterFirstWithHostNet - should be used when using host network otherwise the Pod will fallback to Default None - ignores all DNS config from cluster and expects all to be set via dnsConfig apiVersion: v1 kind: Pod metadata: name: demo spec: dnsPolicy: ClusterFirst containers: - name: demo image: nginx ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

DNS Config pod.yaml โ— More control over the DNS settings used within a Pod โ— nameservers - a list of IPs to use as the DNS servers (max 3) โ— searches - list of DNS search domains to use for hostname lookup, merged into the base search domains generated (max 32) โ— options - a list of name/value pairs to define extra DNS configuration options /etc/resolv.conf nameserver 192.0.2.1 search ns1.svc.cluster-domain.example my.dns.search.suffix options ndots:2 edns0 apiVersion: v1 kind: Pod metadata: name: demo spec: containers: - name: busybox Only use the below config image: nginx dnsPolicy: โ€œNoneโ€ dnsConfig: nameservers: - 192.0.2.1 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: โ€œ2โ€ - name: edns0 ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Wrap-up Before After ๐Ÿ˜ @Marcus@k8s.social | ๐ŸŒ MarcusNoble.com | ๐Ÿฆ‹ @averagemarcus.bsky.social

Wrap-up Slides and resources available at: https://go-get.link/rejekts25 Thoughts, comments and feedback: feedback@marcusnoble.co.uk https://k8s.social/@Marcus https://bsky.app/profile/averagemarcus.bsky.social Thank you