Or: How I Learned to Stop Worrying and Love the YAML π
π± The Existential Crisis
Picture this: It's 2015. jQuery is still cool. Docker is "that whale thing." You deploy code by SSHing into a server named after your cat. Life has meaning.
Then someone in a conference room with too many whiteboards utters the cursed phrase: "We need to modernize our infrastructure."
Fast forward to today, and you're staring at 47 YAML files, questioning every decision that led you to this moment, while a colleague enthusiastically explains that "a Pod is just an abstraction over containers, which are themselves abstractions over processes, wrapped in cgroups and namespaces."
You nod. You understand nothing. You are not alone. π€
Welcome to Kubernetes, or as I like to call it: "The answer to a question you didn't know you were asking, to a problem you didn't know you had, using terminology invented by a committee of philosophers who really hate whitespace."
But fear not, brave developer. By the end of this article, you'll understand Kubernetes well enough to either deploy your applications with confidence or at least nod more convincingly in meetings while internally screaming.
π Chapter 1: What Even IS Kubernetes?
π― The Honest Explanation
Kubernetes (abbreviated K8s, because apparently typing eight letters was too much effort) is a container orchestration platform.
"But what does that mean?" I hear you cry into the void.
Let me explain with an analogy that will haunt your dreams:
π³ Imagine you're running a restaurant empire.
| Real World | Kubernetes World |
|---|---|
| Your recipe | Your code |
| A chef with their own portable kitchen | A container |
| The company building portable kitchens | containerd, CRI-O* |
| The RESTAURANT MANAGER FROM HELL | Kubernetes |
π Note: Docker as a container runtime was deprecated in Kubernetes 1.24. Modern clusters use containerd or CRI-O. You can still build images with Dockerβthe runtime just runs differently now.
The Manager (Kubernetes):
- π Decides how many chefs you need at any moment
- π₯ Fires chefs who look tired (health check failed)
- π Hires identical replacement chefs automatically (self-healing)
- π Brings in extra chefs when the lunch rush hits (horizontal scaling)
- π Redirects customers to available chefs (load balancing)
- π Moves chefs to a different location if one restaurant catches fire (node failure)
- π Keeps track of the secret recipes (secrets management)
- β Doesn't actually know how to cook anything
That last point is crucial. Kubernetes doesn't run your codeβit makes sure your code is always running somewhere, somehow, despite the universe's best attempts to stop it.
π€ Why Does This Exist? (The Problem It Solves)
Before Kubernetes, scaling applications meant:
- Manual server provisioning - "Hey ops team, we need 3 more servers by Friday"
- Snowflake servers - Each server configured slightly differently, documented in someone's head
- Deployment fear - "If we deploy on Friday, we might not go home until Monday"
- No self-healing - Server dies at 3 AM? Hope you like being on-call!
- Resource waste - One app per server, even if it only uses 10% of resources
Kubernetes solves these by:
- π€ Automating everything - Declare what you want, K8s makes it happen
- π¦ Standardizing deployments - Same process everywhere, every time
- π‘οΈ Self-healing - Dead containers get replaced automatically
- π Efficient resource usage - Many apps per server, bin-packing optimization
- π Zero-downtime deployments - Rolling updates are the default
ποΈ The Object Hierarchy (a.k.a. "The Circle of Life")
Before we dive deeper, let's understand how Kubernetes objects relate to each other. This hierarchy is fundamental to understanding why things work the way they do:
'I want 3 copies of my app running'"] end subgraph Mid["π Mid-Level (Auto-Managed)"] RS["ReplicaSet
'I ensure exactly 3 Pods exist'"] end subgraph Low["π¦ Low-Level (The Actual Workers)"] P1["Pod 1"] P2["Pod 2"] P3["Pod 3"] end subgraph Atomic["π³ Atomic Level"] C1["Container"] C2["Container"] C3["Container"] end DEP -->|"creates & manages"| RS RS -->|"creates & manages"| P1 RS -->|"creates & manages"| P2 RS -->|"creates & manages"| P3 P1 -->|"runs"| C1 P2 -->|"runs"| C2 P3 -->|"runs"| C3
π Why this hierarchy?
| Level | Object | Why It Exists |
|---|---|---|
| You interact with | Deployment | Provides update strategies, rollback history, declarative scaling |
| Auto-managed | ReplicaSet | Maintains exact Pod count. New one created per Deployment version (enables rollback!) |
| Worker | Pod | Scheduling unit. Shares network/storage between containers |
| Actual process | Container | Your application code running |
π‘ Key insight: You never touch ReplicaSets directly. They're an implementation detail. When you update a Deployment, it creates a new ReplicaSet and gradually shifts trafficβthat's how rollbacks work! Old ReplicaSets are kept (with 0 replicas) so you can roll back instantly.
ποΈ The Architecture: A Map of the Kingdom
π€ Kubelet
π¦ Pod π¦ Pod"]:1 N2["πͺ Node 2
π€ Kubelet
π¦ Pod π¦ Pod"]:1 N3["πͺ Node 3
π€ Kubelet
π¦ Pod π¦ Pod"]:1 control --> N1 control --> N2 control --> N3
π§© Control Plane Components Explained
| Component | What It Does | Why It's Designed This Way | If It Dies... |
|---|---|---|---|
| π API Server | Front door for ALL communication. RESTful API that everything talks to. | Single point of entry = security boundary, audit logging, authentication. In HA, multiple API servers behind a load balancer. | You're locked out. Run 3+ for HA. |
| π etcd | Distributed key-value store using Raft consensus. THE source of truth for cluster state. | Raft protocol = consistent even with node failures. Separate from API for modularity. | π Total cluster loss. Backup religiously. |
| π Scheduler | Watches for unassigned Pods, picks optimal Node based on resources, affinity, taints. | Decoupled from API = can be replaced/customized. Pluggable scoring algorithms. | New Pods stay Pending. Existing keep running. |
| π Controller Manager | Runs control loops: Deployment controller, ReplicaSet controller, Node controller, etc. | Each controller is single-purpose = easier to understand, debug, extend. | Cluster stops self-healing. Drift not corrected. |
π Why is it designed this way?
Kubernetes follows a declarative, reconciliation-based architecture:
- You declare desired state: "I want 3 replicas of my app"
- Controllers constantly compare desired vs actual state
- Controllers take action to reconcile differences
- This loop runs forever, every few seconds
This is fundamentally different from imperative systems ("start 3 servers"). If something drifts, Kubernetes fixes it automatically.
π¦ Chapter 2: Pods and Deployments β The Core Building Blocks
π¦ Pods: The Atomic Unit of Kubernetes
A Pod is the smallest deployable unit. It's one or more containers that share:
- π Network namespace (they communicate via
localhost, share IP address) - πΎ Storage volumes (optional shared filesystems)
- β° Lifecycle (scheduled together, start together, die together)
π€ Why Pods instead of just Containers?
Sometimes you need tightly coupled containers:
- Sidecar pattern: Main app + log shipper in same Pod
- Ambassador pattern: Main app + proxy in same Pod
- Adapter pattern: Main app + format converter in same Pod
These containers MUST be on the same node, share network, and scale together. That's what Pods provide.
β οΈ Important truth bomb: You almost never create Pods directly. That's amateur hour.
# pod.yaml - FOR EDUCATIONAL PURPOSES ONLY π
# Creating this in production is a cry for help
apiVersion: v1
kind: Pod
metadata:
name: my-lonely-pod
labels:
shame: "yes" # π
manually-created: "true"
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80β Why not create Pods directly?
- π If a Pod dies, it stays dead. No resurrection.
- π No rolling updatesβyou'd have to delete and recreate
- π No scalingβyou'd create each Pod manually
- βͺ No rollbackβhope you saved that old YAML!
π Deployments: The Proper Wayβ’
A Deployment is the standard way to run stateless applications. It provides:
| Feature | What It Does | Why You Need It |
|---|---|---|
| Declarative updates | You say "version 2.0", K8s figures out how | No manual coordination needed |
| Rolling updates | Gradual replacement of Pods | Zero downtime during deploys |
| Rollback | Undo to any previous version | Fix that 3 AM mistake in seconds |
| Self-healing | Dead Pods get replaced | Sleep through the night |
| Scaling | Change replica count anytime | Handle traffic spikes |
# deployment.yaml - This is what production systems use β
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
# π·οΈ Use standard Kubernetes labels for consistency
app.kubernetes.io/name: my-app
app.kubernetes.io/version: "1.2.3"
app.kubernetes.io/component: backend
spec:
replicas: 3 # π― "I want 3 copies running at all times"
# π Update strategy - how to replace old Pods with new ones
strategy:
type: RollingUpdate # Default and recommended for stateless apps
rollingUpdate:
maxSurge: 1 # π Allow 1 extra Pod during update (4 total briefly)
maxUnavailable: 0 # π‘οΈ Never have fewer than 3 running
# Why these values? Prioritizes availability over speed.
# For faster updates: maxSurge: 25%, maxUnavailable: 25%
# π― Selector: "Which Pods belong to this Deployment?"
selector:
matchLabels:
app.kubernetes.io/name: my-app # Must match template.metadata.labels!
template: # π Pod template - blueprint for each Pod
metadata:
labels:
app.kubernetes.io/name: my-app # β οΈ MUST MATCH selector above!
app.kubernetes.io/version: "1.2.3"
spec:
# π Graceful shutdown configuration
terminationGracePeriodSeconds: 60 # Give app 60s to finish requests
containers:
- name: app
image: myregistry/myapp:v1.2.3 # π·οΈ Always use specific tags, NEVER :latest
imagePullPolicy: IfNotPresent # π₯ Don't re-pull if image exists locally
# πͺ Port declaration (documentation + service discovery)
ports:
- containerPort: 8080
name: http # Named ports are clearer
# π° Resource Management - ALWAYS SET THESE
resources:
requests: # "I need at least this much to function"
memory: "128Mi" # Scheduler uses this for placement decisions
cpu: "100m" # 100 millicores = 0.1 CPU core
limits: # "Never let me exceed this"
memory: "256Mi" # Exceeding = OOMKilled π
cpu: "500m" # Exceeding = throttled (not killed)
# π‘ Note: Some teams omit CPU limits to avoid throttling.
# If you set them, monitor for latency impacts.
# π Environment variables (non-sensitive config)
env:
- name: LOG_LEVEL
value: "info"
- name: APP_ENV
value: "production"π Resource Units Explained
Understanding resource units is crucial for proper capacity planning:
| Unit | Meaning | Example | Notes |
|---|---|---|---|
100m | 100 millicores | 10% of 1 CPU core | 1000m = 1 full core |
0.1 | Same as 100m | 10% of 1 CPU core | Decimal notation works too |
128Mi | 128 mebibytes | ~134 MB | Binary units (1024-based) |
128M | 128 megabytes | 128 MB | Decimal units (1000-based) |
1Gi | 1 gibibyte | ~1.07 GB | Use for memory typically |
π‘ Best Practice for Setting Resources:
- Start with low requests, monitor actual usage with
kubectl top - Set memory limits ~1.5-2x requests initially
- Use Vertical Pod Autoscaler (VPA) for recommendations
- Memory limit = hard ceiling (OOMKill if exceeded)
- CPU limit = soft ceiling (throttling, not death) - some teams omit this
π The Rolling Update Dance
When you update a Deployment (new image, config change, etc.), here's exactly what happens:
π Key insights:
- At no point did we have zero running instances
- Old ReplicaSet is kept (with 0 replicas) for instant rollback
- Each new Pod must pass readiness probe before old Pod is terminated
- Your users experienced zero downtime
βͺ Rollback: Your Safety Net
# π± Something's wrong! Roll back immediately!
kubectl rollout undo deployment/my-app
# π See rollout history
kubectl rollout history deployment/my-app
# βͺ Roll back to specific revision
kubectl rollout undo deployment/my-app --to-revision=2
# π Watch rollout progress
kubectl rollout status deployment/my-appπ‘ Why rollback is instant: Remember those old ReplicaSets? Kubernetes just scales up the old one and scales down the new one. No image pulling, no waitingβthe old Pods were ready to go!
π Chapter 3: Services β Stable Endpoints in a Chaotic World
π€ The Problem Services Solve
Here's the challenge: Pods are ephemeral. They get IP addresses when created. When they die and are replaced, they get new IP addresses.
Pod my-app-abc123: IP 10.244.1.5 β π Dies
Pod my-app-xyz789: IP 10.244.2.8 β π Created (different IP!)Imagine if every time your favorite restaurant hired a new chef, you had to learn their home address to order food. That's Pods without Services.
Services provide:
- π Stable IP address that never changes
- π€ DNS name for easy discovery
- βοΈ Load balancing across healthy Pods
- π Service discovery via environment variables and DNS
π― Service Types: Choose Your Adventure
Default type
Internal traffic only"] end subgraph External["π External Access"] NP["NodePort
Development/Testing
Port 30000-32767"] LB["LoadBalancer
Production
Cloud provider LB"] end subgraph Special["π Special Purpose"] EN["ExternalName
DNS alias
External services"] HL["Headless
clusterIP: None
Direct Pod access"] end
| Type | Who Can Access | Use Case | Cost | Example |
|---|---|---|---|---|
| π ClusterIP | Internal Pods only | Service-to-service communication | Free | API calling database |
| πͺ NodePort | External via NodeIP:30000-32767 | Development, on-prem | Free | Testing externally |
| βοΈ LoadBalancer | External via cloud LB | Production external access | πΈπΈπΈ | Public website |
| π ExternalName | DNS CNAME record | Abstracting external deps | Free | db.example.com β RDS |
| π Headless | Direct Pod IPs | StatefulSets, custom discovery | Free | Database clusters |
π ClusterIP Service Example (The Default)
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-app-service
labels:
app.kubernetes.io/name: my-app
spec:
type: ClusterIP # π Default, can omit
# π― Selector: "Route traffic to Pods with these labels"
selector:
app.kubernetes.io/name: my-app # Must match Pod labels EXACTLY!
ports:
- name: http # π Named ports are best practice
port: 80 # πͺ Port the Service listens on
targetPort: http # π― Use named port from Pod spec!
protocol: TCP # TCP is default, can omit
# π‘ Why separate port and targetPort?
# - Service presents a standard interface (port 80)
# - Pods can use any port internally (8080)
# - You can change Pod port without affecting clientsπ How Service Discovery Works
When a Service is created, Kubernetes does two magical things:
wants to call my-svc"] end DNS["π CoreDNS
my-svc β 10.96.45.12"] subgraph SVC["β‘ Service: my-svc
ClusterIP: 10.96.45.12"] EP["Endpoints:
10.244.1.5:8080
10.244.2.8:8080
10.244.3.3:8080"] end P1["π¦ Pod 1
10.244.1.5"] P2["π¦ Pod 2
10.244.2.8"] P3["π¦ Pod 3
10.244.3.3"] App -->|"1οΈβ£ DNS lookup"| DNS DNS -->|"2οΈβ£ Returns IP"| App App -->|"3οΈβ£ TCP connection"| SVC SVC -->|"4οΈβ£ Load balance"| P1 SVC -.->|"or"| P2 SVC -.->|"or"| P3
π DNS Name Formats:
| Format | When to Use | Example |
|---|---|---|
my-svc | Same namespace | http://my-svc/api |
my-svc.other-ns | Different namespace | http://my-svc.payments/charge |
my-svc.other-ns.svc.cluster.local | Full FQDN (rarely needed) | Cross-cluster scenarios |
π‘ Pro tip: Always use the short form within the same namespace. It's cleaner and Kubernetes adds the suffix automatically.
π Endpoints: The Magic Behind Services
Services don't magically know where Pods are. They maintain an Endpoints object:
# π See which Pods a Service routes to
kubectl get endpoints my-app-service
NAME ENDPOINTS AGE
my-app-service 10.244.1.5:8080,10.244.2.8:8080,10.244.3.3:8080 5mβ οΈ If you see <none> for endpoints:
- Your selector doesn't match any Pod labels
- No Pods are passing readiness probes
- Pods exist but in wrong namespace
This is the #1 debugging step when Services don't work!
πͺ Chapter 4: Ingress β The Fancy Front Door
π€ Why Ingress Exists
Services are great, but:
- LoadBalancers cost money πΈ (one per service = budget death)
- NodePorts are ugly π¬ (nobody wants
myapp.com:31847) - No path-based routing (can't route
/apiand/webseparately) - No SSL termination at Service level
Ingress provides:
- π£οΈ Path-based routing (
/apiβ API service,/webβ Web service) - π Host-based routing (multiple domains, one IP)
- π TLS/SSL termination (HTTPS handled at the edge)
- π° Cost efficiency (one LoadBalancer for many services)
βοΈ How Ingress Works (Two Components)
Just configuration/rules"] end subgraph Controller["π€ Ingress Controller
(Must be installed separately!)"] IC["nginx-ingress
traefik
AWS ALB Controller
etc."] end subgraph Result["π Actual Routing"] LB["βοΈ Load Balancer"] SVC1["β‘ Service 1"] SVC2["β‘ Service 2"] end ING -->|"Controller reads"| IC IC -->|"Configures"| LB LB --> SVC1 LB --> SVC2
β οΈ Important: The Ingress resource alone does nothing! You need an Ingress Controller (nginx-ingress, traefik, etc.) actually installed in your cluster.
π Ingress Example
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
# π§ Controller-specific settings (these are for nginx)
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
ingressClassName: nginx # π― Which controller handles this (not annotation!)
# π TLS Configuration
tls:
- hosts:
- myapp.example.com
- api.example.com
secretName: tls-secret # Certificate stored as K8s Secret
# π£οΈ Routing Rules
rules:
# Rule 1: myapp.example.com
- host: myapp.example.com
http:
paths:
- path: /api # π― /api/* goes to api-service
pathType: Prefix # Prefix = /api, /api/, /api/users all match
backend:
service:
name: api-service
port:
number: 80
- path: / # π― Everything else goes to web-service
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
# Rule 2: api.example.com (different domain)
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80π Traffic Flow Visualization
Reads Ingress rules
Routes traffic"] end ING["π Ingress Resource
myapp.example.com:
/api β api-svc
/ β web-svc"] subgraph Services["β‘ Services"] API["api-service"] WEB["web-service"] end subgraph Pods["π¦ Pods"] AP1["API Pod 1"] AP2["API Pod 2"] WP1["Web Pod 1"] WP2["Web Pod 2"] end end Internet -->|"https://myapp.example.com/api/users"| IC IC -.->|"reads config"| ING IC -->|"/api/*"| API IC -->|"/*"| WEB API --> AP1 API --> AP2 WEB --> WP1 WEB --> WP2
π‘ Path Types Explained:
| Type | Matches | Use Case |
|---|---|---|
Prefix | /api, /api/, /api/users | Most common, REST APIs |
Exact | Only /api exactly | Specific endpoints |
ImplementationSpecific | Controller decides | Legacy, avoid |
βοΈ Chapter 5: ConfigMaps and Secrets β Externalizing Configuration
π― Why Externalize Configuration?
The 12-Factor App methodology says: Store config in the environment, not in code.
Why?
- π Same image, different environments (dev/staging/prod)
- π Secrets stay secret (not in Git history!)
- β‘ Change config without rebuilding (faster deployments)
- π₯ Separation of concerns (devs write code, ops manage config)
π ConfigMaps: For Non-Sensitive Data
ConfigMaps store configuration data as key-value pairs or entire files:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
# π Simple key-value pairs
LOG_LEVEL: "info"
DATABASE_HOST: "db.internal.svc.cluster.local"
FEATURE_NEW_UI: "true"
MAX_CONNECTIONS: "100"
# π Entire configuration files
nginx.conf: |
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://backend:8080;
proxy_set_header Host $host;
}
}
application.yaml: |
spring:
profiles:
active: production
datasource:
url: jdbc:postgresql://db:5432/myapp
logging:
level:
root: INFOπ Secrets: For Sensitive Data
β οΈ Critical Warning: Kubernetes Secrets are base64 encoded, NOT encrypted by default. Anyone with cluster access can decode them!
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque # Generic secret type
stringData: # π‘ Use stringData, K8s encodes automatically
DB_PASSWORD: "super-secret-password-123"
API_KEY: "sk-abc123def456"
JWT_SECRET: "my-jwt-signing-key"
# β οΈ The 'data' field requires base64 encoding:
# data:
# DB_PASSWORD: c3VwZXItc2VjcmV0LXBhc3N3b3JkLTEyMw==π For Real Security:
- Enable encryption at rest for etcd
- Use external secret managers: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault
- Use the External Secrets Operator to sync external secrets
- Implement RBAC to restrict Secret access
π Injecting Configuration into Pods
There are three ways to use ConfigMaps and Secrets in Pods:
apiVersion: apps/v1
kind: Deployment
metadata:
name: configured-app
spec:
replicas: 2
selector:
matchLabels:
app: configured-app
template:
metadata:
labels:
app: configured-app
spec:
containers:
- name: app
image: myapp:latest
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π METHOD 1: Load ALL keys as environment variables
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
envFrom:
- configMapRef:
name: app-config # All keys become env vars
- secretRef:
name: app-secrets # β οΈ Be careful, exposes all secrets
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π― METHOD 2: Cherry-pick specific values (RECOMMENDED)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
env:
- name: DATABASE_PASSWORD # Env var name in container
valueFrom:
secretKeyRef:
name: app-secrets # Secret name
key: DB_PASSWORD # Key within Secret
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: LOG_LEVEL
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π METHOD 3: Mount as files (great for config files)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
volumeMounts:
- name: config-volume
mountPath: /etc/config # ConfigMap keys become files here
readOnly: true
- name: secret-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: config-volume
configMap:
name: app-config
items: # π‘ Optional: mount specific keys only
- key: nginx.conf
path: nginx.conf # /etc/config/nginx.conf
- name: secret-volume
secret:
secretName: app-secrets
defaultMode: 0400 # π Restrictive permissionsπ‘ Which Method to Use?
| Method | Best For | Pros | Cons |
|---|---|---|---|
envFrom | Simple apps, all config needed | Easy, automatic | Exposes everything, naming conflicts |
env + valueFrom | Production apps | Explicit, documented | More YAML |
| Volume mounts | Config files (nginx.conf, etc.) | Files stay files | App must read files |
π₯ Chapter 6: Health Checks β Keeping Your Pods Honest
π€ Why Health Checks Matter
Without health checks:
- π§ Zombie Pods - Process running but not responding
- π Cascading failures - Bad Pod gets traffic, fails, repeats
- π΄ Slow startup issues - Pod not ready but getting traffic
With health checks:
- π Automatic recovery - Unhealthy containers restarted
- π¦ Traffic control - Only ready Pods receive traffic
- β° Startup tolerance - Slow apps given time to initialize
π The Three Probe Types
'Are you alive?'"] RP["β Readiness Probe
'Can you serve traffic?'"] SP["π Startup Probe
'Are you done starting?'"] end subgraph Actions["π On Failure"] LPA["π Container KILLED
and restarted"] RPA["π« Removed from
Service endpoints"] SPA["βΈοΈ Other probes
disabled until pass"] end LP --> LPA RP --> RPA SP --> SPA
| Probe | Question It Answers | On Failure | When to Use |
|---|---|---|---|
| π Liveness | "Is the process stuck/deadlocked?" | Container killed & restarted | Always. Catches hung processes. |
| β Readiness | "Can you handle requests right now?" | Removed from Service (no traffic) | Always. Prevents traffic to unready Pods. |
| π Startup | "Have you finished initializing?" | Liveness/Readiness probes paused | Slow-starting apps (Java, legacy) |
π Complete Health Check Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: healthy-app
spec:
replicas: 3
selector:
matchLabels:
app: healthy-app
template:
metadata:
labels:
app: healthy-app
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
name: http
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π STARTUP PROBE (for slow-starting applications)
# "Disable liveness/readiness until startup completes"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
startupProbe:
httpGet:
path: /healthz
port: http
failureThreshold: 30 # 30 Γ 10s = 5 min max startup
periodSeconds: 10
# π‘ Once this passes, liveness/readiness probes take over
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# π LIVENESS PROBE
# "If this fails, kill and restart the container"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
livenessProbe:
httpGet:
path: /healthz # π‘ Lightweight endpoint
port: http
initialDelaySeconds: 0 # Startup probe handles delay
periodSeconds: 10 # π Check every 10 seconds
timeoutSeconds: 5 # β±οΈ Timeout per check
failureThreshold: 3 # β Fail 3 times = restart
successThreshold: 1 # β
1 success = healthy
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# β
READINESS PROBE
# "If this fails, stop sending traffic to this Pod"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
readinessProbe:
httpGet:
path: /ready # π‘ Can be different from liveness!
port: http
initialDelaySeconds: 0 # Startup probe handles delay
periodSeconds: 5 # Check more frequently
timeoutSeconds: 3
failureThreshold: 3 # Fail 3 times = remove from LB
successThreshold: 1π§ Probe Types Available
# π HTTP GET (most common)
httpGet:
path: /healthz
port: 8080
httpHeaders: # Optional custom headers
- name: Custom-Header
value: Awesome
# π TCP Socket (for non-HTTP services)
tcpSocket:
port: 3306 # Just checks if port is open
# π» Exec (run a command)
exec:
command:
- cat
- /tmp/healthy
# Exit code 0 = healthy
# π gRPC (for gRPC services, K8s 1.24+)
grpc:
port: 50051π‘ Health Check Best Practices
| Practice | Why |
|---|---|
| Liveness β Readiness endpoints | Liveness: "am I broken?" Readiness: "am I ready for MORE traffic?" |
| Don't check dependencies in liveness | If DB is down, restarting your app won't fix it! |
| DO check dependencies in readiness | Don't send traffic if you can't serve it |
| Set appropriate timeouts | Too short = false positives. Too long = slow recovery. |
| Use startup probes for slow apps | Prevents liveness probe killing during startup |
| Keep probes lightweight | Heavy probes can cause issues under load |
π Chapter 7: Graceful Shutdown β The Art of Dying Well
π€ Why Graceful Shutdown Matters
When Kubernetes terminates a Pod (during updates, scaling down, node drain), what happens to in-flight requests?
Without graceful shutdown:
- π₯ Requests get dropped mid-response
- π Users see 502/503 errors
- π Retries create thundering herd
With graceful shutdown:
- β Current requests complete
- π« New requests go elsewhere
- π Users notice nothing
β° The Termination Sequence
β’ Stop accepting new requests
β’ Finish in-flight requests
β’ Close DB connections
β’ Flush buffers end Note over K8s: β° Wait terminationGracePeriodSeconds alt App exits cleanly App-->>K8s: Exit 0 β Note over K8s: Clean shutdown! else Timeout exceeded K8s->>App: 5οΈβ£ SIGKILL (force kill) π Note over K8s: Hard termination end
π Graceful Shutdown Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: graceful-app
spec:
replicas: 3
selector:
matchLabels:
app: graceful-app
template:
metadata:
labels:
app: graceful-app
spec:
# β° How long to wait before SIGKILL (default: 30s)
terminationGracePeriodSeconds: 60
containers:
- name: app
image: myapp:latest
# π§ Lifecycle hooks
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# π‘ Why sleep? Give load balancer time to update!
# Endpoints update is async - traffic may still
# be routing here for a few seconds
sleep 5π» Application-Side SIGTERM Handling
Your application MUST handle SIGTERM properly:
Node.js:
process.on('SIGTERM', async () => {
console.log('SIGTERM received, shutting down gracefully');
// Stop accepting new connections
server.close(async () => {
await database.disconnect();
process.exit(0);
});
// Force exit after timeout
setTimeout(() => process.exit(1), 25000);
});Java Spring Boot:
# application.yaml
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30sPython:
import signal, sys
def handle_sigterm(signum, frame):
print("SIGTERM received, shutting down...")
# Cleanup code here
sys.exit(0)
signal.signal(signal.SIGTERM, handle_sigterm)π Chapter 8: Production Best Practices β The Checklist That Saves Careers
β The Production Readiness Checklist
π² RESOURCES
β
Resource requests AND limits set for all containers
β
Requests based on actual observed usage
β
Memory limit = hard ceiling (OOMKill risk understood)
π² HEALTH & AVAILABILITY
β
Startup probe configured (for slow-starting apps)
β
Liveness probe configured
β
Readiness probe configured (different endpoint from liveness!)
β
Multiple replicas (minimum 2, recommend 3+)
β
PodDisruptionBudget defined
β
Pod anti-affinity (spread across nodes)
β
TopologySpreadConstraints (spread across zones)
π² LIFECYCLE
β
Application handles SIGTERM gracefully
β
terminationGracePeriodSeconds set appropriately
β
preStop hook if needed (for LB drain time)
π² SECURITY
β
Container runs as non-root user
β
Read-only root filesystem (if possible)
β
No privilege escalation
β
Drop all capabilities
β
Secrets in Secret objects (not ConfigMaps!)
β
Network policies restrict Pod communication
β
Service account explicitly set (not default)
π² IMAGES
β
Specific image tag (NEVER :latest in production)
β
imagePullPolicy: IfNotPresent
β
Image from trusted registry
β
Image scanned for vulnerabilities
π² OBSERVABILITY
β
Logging to stdout/stderr
β
Metrics exposed (/metrics endpoint)
β
Alerts configured for key metrics
π² SCALING
β
HorizontalPodAutoscaler configured (if applicable)π High Availability: Spreading Pods Across Failure Domains
Don't put all your eggs in one basket (node):
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-app
spec:
replicas: 3
selector:
matchLabels:
app: ha-app
template:
metadata:
labels:
app: ha-app
spec:
# π Spread Pods across nodes
affinity:
podAntiAffinity:
# π― "Prefer" = best effort, won't block scheduling
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: ha-app
topologyKey: kubernetes.io/hostname # Different nodes
# π Spread across availability zones
topologySpreadConstraints:
- maxSkew: 1 # Max difference between zones
topologyKey: topology.kubernetes.io/zone # Spread across AZs
whenUnsatisfiable: ScheduleAnyway # Don't block if can't satisfy
labelSelector:
matchLabels:
app: ha-app
containers:
- name: app
image: myapp:latestπ‘οΈ Pod Disruption Budgets: Maintaining Availability During Disruptions
PDBs prevent cluster operations (node drains, upgrades) from killing too many Pods:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ha-app-pdb
spec:
minAvailable: 2 # Always keep at least 2 running
# OR: maxUnavailable: 1 # Never have more than 1 down
selector:
matchLabels:
app: ha-appπ Horizontal Pod Autoscaler: Automatic Scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80π Security: Running Securely
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
spec:
replicas: 2
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
# π Use dedicated service account
serviceAccountName: secure-app-sa
automountServiceAccountToken: false # Don't mount token unless needed
# π Pod-level security context
securityContext:
runAsNonRoot: true # π« Containers cannot run as root
runAsUser: 1000 # π€ Run as UID 1000
runAsGroup: 1000 # π₯ Run as GID 1000
fsGroup: 1000 # π Volume ownership
seccompProfile:
type: RuntimeDefault # π‘οΈ Apply default seccomp profile
containers:
- name: app
image: myapp:v1.0.0
# π Container-level security context
securityContext:
allowPrivilegeEscalation: false # π« Can't gain privileges
readOnlyRootFilesystem: true # π Can't write to filesystem
capabilities:
drop:
- ALL # π« Drop all Linux capabilitiesπ Network Policy: Zero-Trust Networking
By default, all Pods can talk to all other Pods. Lock it down:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: my-app-netpol
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
- Egress
# π₯ Who can talk TO my pods?
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx # Only from ingress namespace
ports:
- port: 8080
# π€ Who can my pods talk TO?
egress:
- to:
- namespaceSelector:
matchLabels:
name: database # Only to database namespace
ports:
- port: 5432
- to:
- namespaceSelector: {} # Allow DNS
ports:
- port: 53
protocol: UDPπ§ Chapter 9: The kubectl Survival Guide
π Commands Organized by Task
π Viewing Resources
# π List resources
kubectl get pods # Pods in current namespace
kubectl get pods -A # ALL namespaces
kubectl get pods -o wide # Extra columns (node, IP)
kubectl get pods -w # Watch mode (live updates)
kubectl get all # Common resources (not actually all!)
# π Detailed information
kubectl describe pod <n> # Full details + events
kubectl describe deployment <n> # Deployment details
# π Resource usage (requires metrics-server)
kubectl top pods # CPU/memory usage
kubectl top nodes # Node resource usageπ Debugging
# π Logs
kubectl logs <pod> # Container logs
kubectl logs <pod> -f # Stream logs (like tail -f)
kubectl logs <pod> --previous # Logs from crashed container
kubectl logs <pod> -c <container> # Specific container in Pod
kubectl logs -l app=myapp # Logs from all Pods with label
# π Shell access
kubectl exec -it <pod> -- /bin/sh # Shell into container
kubectl exec -it <pod> -- /bin/bash # If bash available
kubectl exec <pod> -- cat /etc/config # Run single command
# π Port forwarding
kubectl port-forward <pod> 8080:80 # Local:Pod
kubectl port-forward svc/<svc> 8080:80 # Via Service
# π Events (crucial for debugging!)
kubectl get events --sort-by='.lastTimestamp'
kubectl get events --field-selector type=WarningβοΈ Making Changes
# π Apply configuration
kubectl apply -f manifest.yaml # Create or update
kubectl apply -f ./manifests/ # Apply all files in directory
kubectl apply -k ./kustomize/ # Apply with Kustomize
# ποΈ Delete resources
kubectl delete -f manifest.yaml # Delete by file
kubectl delete pod <n> # Delete specific Pod
kubectl delete pods -l app=myapp # Delete by label
# π Scaling
kubectl scale deployment <n> --replicas=5
kubectl autoscale deployment <n> --min=2 --max=10 --cpu-percent=80π Rollout Management
# π Status
kubectl rollout status deployment <n> # Watch rollout
kubectl rollout history deployment <n> # View history
# βͺ Rollback
kubectl rollout undo deployment <n> # Previous version
kubectl rollout undo deployment <n> --to-revision=2 # Specific revision
# π Restart
kubectl rollout restart deployment <n> # Trigger rolling restartπ§ Context & Namespace
# π Context (cluster) management
kubectl config get-contexts # List clusters
kubectl config use-context <n> # Switch cluster
kubectl config current-context # Show current
# π Namespace management
kubectl get namespaces
kubectl config set-context --current --namespace=<ns> # Set default! π―π₯ Chapter 10: Troubleshooting β A Chronicle of Preventable Suffering
πΊοΈ The Troubleshooting Flowchart
β’ Tag exists?
β’ Private registry auth?
β’ Network to registry?"] Status -->|"π₯ CrashLoopBackOff"| Crash["kubectl logs --previous"] Crash --> CrashCause{"What killed it?"} CrashCause -->|"Application error"| AppFix["Fix your code π "] CrashCause -->|"OOMKilled"| OOM["Increase memory limits"] CrashCause -->|"Exit code 1"| Config["Check env vars & config"] CrashCause -->|"Exit code 137"| SIGKILL["OOMKilled or slow shutdown"] Status -->|"β Running but broken"| Running["kubectl logs -f"] Running --> SvcCheck{"Is Service working?"} SvcCheck --> Endpoints["kubectl get endpoints"] Endpoints -->|"No endpoints"| Labels["π·οΈ CHECK YOUR LABELS!
Selector β Pod labels"] Endpoints -->|"Has endpoints"| AppDebug["Check app logs
exec into Pod"]
π¨ The Classic Failures
β³ Pending - "Waiting in Limbo"
NAME READY STATUS RESTARTS AGE
my-app-abc123 0/1 Pending 0 10mTranslation: Scheduler can't find a home for your Pod.
Debug:
kubectl describe pod my-app-abc123
# Look at the "Events" section at the bottom!Common causes & fixes:
| Event Message | Cause | Fix |
|---|---|---|
Insufficient cpu | No node has enough CPU | Reduce requests or add nodes |
Insufficient memory | No node has enough memory | Reduce requests or add nodes |
node(s) had taint | Taints blocking | Add tolerations or remove taints |
didn't match Pod's node affinity | Affinity mismatch | Fix nodeSelector/affinity rules |
persistentvolumeclaim not found | PVC missing | Create the PVC |
πΌοΈ ImagePullBackOff - "Can't Get Your Container"
NAME READY STATUS RESTARTS AGE
my-app-abc123 0/1 ImagePullBackOff 0 5mTranslation: Kubernetes can't download your container image.
Checklist:
- π€ Image name spelled correctly? (typos are #1 cause!)
- π·οΈ Tag exists? Did you push it?
- π Private registry? Add
imagePullSecrets - π Can nodes reach the registry? (network/firewall)
- β° Registry rate limiting? (Docker Hub!)
π₯ CrashLoopBackOff - "Repeatedly Dying"
NAME READY STATUS RESTARTS AGE
my-app-abc123 0/1 CrashLoopBackOff 5 3mTranslation: Your container starts, crashes, restarts... forever.
Debug:
kubectl logs my-app-abc123 --previous
kubectl describe pod my-app-abc123 # Check "Last State" sectionExit codes:
| Exit Code | Meaning | Common Cause |
|---|---|---|
1 | Application error | Check logs! |
137 | SIGKILL (128+9) | OOMKilled |
143 | SIGTERM (128+15) | Graceful shutdown |
126 | Command not executable | Bad entrypoint |
127 | Command not found | Typo in command |
π No Endpoints - "Service Can't Find Pods"
$ kubectl get endpoints my-service
NAME ENDPOINTS AGE
my-service <none> 5m # π± No Pods found!Translation: Your Service selector doesn't match any Pod labels.
Debug:
# What is the Service looking for?
kubectl get service my-service -o yaml | grep -A5 selector
# What labels do Pods have?
kubectl get pods --show-labels
# Compare them! They must match EXACTLY.π¦ Chapter 11: The Complete Production Example
Here's everything we've learned, combined into a production-ready deployment:
# ποΈ Complete Production-Ready Kubernetes Application
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
---
# π Namespace
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
---
# π ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
LOG_LEVEL: "info"
MAX_CONNECTIONS: "100"
---
# π Secret
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: production
type: Opaque
stringData:
API_KEY: "your-api-key-here"
DB_PASSWORD: "super-secret-password"
---
# π Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: production-app
namespace: production
automountServiceAccountToken: false
---
# π Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: production-app
namespace: production
labels:
app.kubernetes.io/name: production-app
app.kubernetes.io/version: "1.0.0"
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: production-app
template:
metadata:
labels:
app.kubernetes.io/name: production-app
app.kubernetes.io/version: "1.0.0"
spec:
serviceAccountName: production-app
terminationGracePeriodSeconds: 60
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: production-app
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: production-app
containers:
- name: app
image: nginx:1.25-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: http
envFrom:
- configMapRef:
name: app-config
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: app-secrets
key: API_KEY
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "200m"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
startupProbe:
httpGet:
path: /
port: http
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /
port: http
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: http
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
---
# β‘ Service
apiVersion: v1
kind: Service
metadata:
name: production-app-service
namespace: production
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: production-app
ports:
- name: http
port: 80
targetPort: http
---
# π‘οΈ PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: production-app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app.kubernetes.io/name: production-app
---
# π HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: production-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: production-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70π§ͺ Try It Yourself!
# π₯οΈ Start a local cluster
minikube start
# OR
kind create cluster
# π¦ Deploy
kubectl apply -f complete-app.yaml
# π Watch
kubectl get pods -n production -w
# π Check status
kubectl rollout status deployment/production-app -n production
# π§ͺ Test
kubectl port-forward -n production svc/production-app-service 8080:80
curl http://localhost:8080
# ποΈ Cleanup
kubectl delete -f complete-app.yamlπ TL;DR β The One-Page Survival Guide
| π¦ Resource | π― Purpose | π§ Key Command |
|---|---|---|
| Pod | Runs containers | kubectl get pods |
| Deployment | Manages Pods | kubectl get deployments |
| Service | Stable endpoint | kubectl get services |
| Ingress | HTTP routing | kubectl get ingress |
| ConfigMap | Config | kubectl get configmaps |
| Secret | Sensitive config | kubectl get secrets |
| PDB | Availability | kubectl get pdb |
| HPA | Auto-scaling | kubectl get hpa |
π Debug Flow
kubectl get pods # 1οΈβ£ What's running?
kubectl describe pod <n> # 2οΈβ£ What's wrong? (check Events!)
kubectl logs <n> # 3οΈβ£ What did it say?
kubectl logs <n> --previous # 4οΈβ£ Why did it die?
kubectl exec -it <n> -- sh # 5οΈβ£ Let me look inside
kubectl get endpoints <svc> # 6οΈβ£ Can Service find Pods?π Golden Rules
- β Always set resource requests AND limits
- β Configure startup, liveness AND readiness probes
- β Labels must match between selector and Pod template
- β Handle SIGTERM for graceful shutdown
- β
Never use
:latesttag in production - β
When in doubt:
kubectl describeand check Events - β Check endpoints when Services don't work
- β Run as non-root with minimal capabilities
π Glossary
| Term | Definition |
|---|---|
| Pod | Smallest deployable unit; 1+ containers sharing network/storage |
| Deployment | Controller managing ReplicaSets; handles updates, rollbacks |
| ReplicaSet | Maintains specified number of Pod replicas |
| Service | Stable network endpoint routing to Pods |
| Ingress | HTTP/HTTPS routing, TLS termination |
| ConfigMap | Non-sensitive configuration data |
| Secret | Sensitive data (base64 encoded by default) |
| Namespace | Virtual cluster for resource isolation |
| Probe | Health check (startup, liveness, readiness) |
| PDB | Pod Disruption Budget; protects availability |
| HPA | Horizontal Pod Autoscaler; automatic scaling |
| etcd | Distributed key-value store; cluster state |
| Kubelet | Node agent; manages Pods on each node |
| SIGTERM | Termination signal; graceful shutdown |
May your Pods be healthy, your rollouts smooth, and your YAML forever valid. βΈοΈ
Now go forth and orchestrate! π