🚢 Kubernetes for the Confused: A Survival Guide for Developers Who Just Wanted to Deploy a Web App

Or: How I Learned to Stop Worrying and Love the YAML 📜

😱 The Existential Crisis

Picture this: It's 2015. jQuery is still cool. Docker is "that whale thing." You deploy code by SSHing into a server named after your cat. Life has meaning.

Then someone in a conference room with too many whiteboards utters the cursed phrase: "We need to modernize our infrastructure."

Fast forward to today, and you're staring at 47 YAML files, questioning every decision that led you to this moment, while a colleague enthusiastically explains that "a Pod is just an abstraction over containers, which are themselves abstractions over processes, wrapped in cgroups and namespaces."

You nod. You understand nothing. You are not alone. 🤝

Welcome to Kubernetes, or as I like to call it: "The answer to a question you didn't know you were asking, to a problem you didn't know you had, using terminology invented by a committee of philosophers who really hate whitespace."

But fear not, brave developer. By the end of this article, you'll understand Kubernetes well enough to either deploy your applications with confidence or at least nod more convincingly in meetings while internally screaming.

📖 Chapter 1: What Even IS Kubernetes?

🎯 The Honest Explanation

Kubernetes (abbreviated K8s, because apparently typing eight letters was too much effort) is a container orchestration platform.

"But what does that mean?" I hear you cry into the void.

Let me explain with an analogy that will haunt your dreams:

🍳 Imagine you're running a restaurant empire.

Real World	Kubernetes World
Your recipe	Your code
A chef with their own portable kitchen	A container
The company building portable kitchens	containerd, CRI-O*
The RESTAURANT MANAGER FROM HELL	Kubernetes

📝 Note: Docker as a container runtime was deprecated in Kubernetes 1.24. Modern clusters use containerd or CRI-O. You can still build images with Docker—the runtime just runs differently now.

The Manager (Kubernetes):

📊 Decides how many chefs you need at any moment
🔥 Fires chefs who look tired (health check failed)
🔄 Hires identical replacement chefs automatically (self-healing)
📈 Brings in extra chefs when the lunch rush hits (horizontal scaling)
🔀 Redirects customers to available chefs (load balancing)
🚚 Moves chefs to a different location if one restaurant catches fire (node failure)
🔐 Keeps track of the secret recipes (secrets management)
❓ Doesn't actually know how to cook anything

That last point is crucial. Kubernetes doesn't run your code—it makes sure your code is always running somewhere, somehow, despite the universe's best attempts to stop it.

🤔 Why Does This Exist? (The Problem It Solves)

Before Kubernetes, scaling applications meant:

Manual server provisioning - "Hey ops team, we need 3 more servers by Friday"
Snowflake servers - Each server configured slightly differently, documented in someone's head
Deployment fear - "If we deploy on Friday, we might not go home until Monday"
No self-healing - Server dies at 3 AM? Hope you like being on-call!
Resource waste - One app per server, even if it only uses 10% of resources

Kubernetes solves these by:

🤖 Automating everything - Declare what you want, K8s makes it happen
📦 Standardizing deployments - Same process everywhere, every time
🛡️ Self-healing - Dead containers get replaced automatically
📊 Efficient resource usage - Many apps per server, bin-packing optimization
🔄 Zero-downtime deployments - Rolling updates are the default

🏗️ The Object Hierarchy (a.k.a. "The Circle of Life")

Before we dive deeper, let's understand how Kubernetes objects relate to each other. This hierarchy is fundamental to understanding why things work the way they do:

🔑 Why this hierarchy?

Level	Object	Why It Exists
You interact with	Deployment	Provides update strategies, rollback history, declarative scaling
Auto-managed	ReplicaSet	Maintains exact Pod count. New one created per Deployment version (enables rollback!)
Worker	Pod	Scheduling unit. Shares network/storage between containers
Actual process	Container	Your application code running

💡 Key insight: You never touch ReplicaSets directly. They're an implementation detail. When you update a Deployment, it creates a new ReplicaSet and gradually shifts traffic—that's how rollbacks work! Old ReplicaSets are kept (with 0 replicas) so you can roll back instantly.

🏛️ The Architecture: A Map of the Kingdom

block-beta columns 3 block:control:3 columns 4 API["📞 API Server"] ETCD["📚 etcd"] SCHED["📋 Scheduler"] CM["👔 Controller"] end space:3 N1["🏪 Node 1
🤖 Kubelet
📦 Pod 📦 Pod"]:1 N2["🏪 Node 2
🤖 Kubelet
📦 Pod 📦 Pod"]:1 N3["🏪 Node 3
🤖 Kubelet
📦 Pod 📦 Pod"]:1 control --> N1 control --> N2 control --> N3

🧩 Control Plane Components Explained

Component	What It Does	Why It's Designed This Way	If It Dies...
📞 API Server	Front door for ALL communication. RESTful API that everything talks to.	Single point of entry = security boundary, audit logging, authentication. In HA, multiple API servers behind a load balancer.	You're locked out. Run 3+ for HA.
📚 etcd	Distributed key-value store using Raft consensus. THE source of truth for cluster state.	Raft protocol = consistent even with node failures. Separate from API for modularity.	💀 Total cluster loss. Backup religiously.
📋 Scheduler	Watches for unassigned Pods, picks optimal Node based on resources, affinity, taints.	Decoupled from API = can be replaced/customized. Pluggable scoring algorithms.	New Pods stay Pending. Existing keep running.
👔 Controller Manager	Runs control loops: Deployment controller, ReplicaSet controller, Node controller, etc.	Each controller is single-purpose = easier to understand, debug, extend.	Cluster stops self-healing. Drift not corrected.

🔑 Why is it designed this way?

Kubernetes follows a declarative, reconciliation-based architecture:

You declare desired state: "I want 3 replicas of my app"
Controllers constantly compare desired vs actual state
Controllers take action to reconcile differences
This loop runs forever, every few seconds

This is fundamentally different from imperative systems ("start 3 servers"). If something drifts, Kubernetes fixes it automatically.

📦 Chapter 2: Pods and Deployments — The Core Building Blocks

📦 Pods: The Atomic Unit of Kubernetes

A Pod is the smallest deployable unit. It's one or more containers that share:

🌐 Network namespace (they communicate via localhost, share IP address)
💾 Storage volumes (optional shared filesystems)
⏰ Lifecycle (scheduled together, start together, die together)

🤔 Why Pods instead of just Containers?

Sometimes you need tightly coupled containers:

Sidecar pattern: Main app + log shipper in same Pod
Ambassador pattern: Main app + proxy in same Pod
Adapter pattern: Main app + format converter in same Pod

These containers MUST be on the same node, share network, and scale together. That's what Pods provide.

⚠️ Important truth bomb: You almost never create Pods directly. That's amateur hour.

# pod.yaml - FOR EDUCATIONAL PURPOSES ONLY 📚
# Creating this in production is a cry for help
apiVersion: v1
kind: Pod
metadata:
  name: my-lonely-pod
  labels:
    shame: "yes"           # 😅
    manually-created: "true"
spec:
  containers:
  - name: nginx
    image: nginx:1.25
    ports:
    - containerPort: 80

❓ Why not create Pods directly?

💀 If a Pod dies, it stays dead. No resurrection.
🔄 No rolling updates—you'd have to delete and recreate
📊 No scaling—you'd create each Pod manually
⏪ No rollback—hope you saved that old YAML!

🚀 Deployments: The Proper Way™

A Deployment is the standard way to run stateless applications. It provides:

Feature	What It Does	Why You Need It
Declarative updates	You say "version 2.0", K8s figures out how	No manual coordination needed
Rolling updates	Gradual replacement of Pods	Zero downtime during deploys
Rollback	Undo to any previous version	Fix that 3 AM mistake in seconds
Self-healing	Dead Pods get replaced	Sleep through the night
Scaling	Change replica count anytime	Handle traffic spikes

# deployment.yaml - This is what production systems use ✅
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    # 🏷️ Use standard Kubernetes labels for consistency
    app.kubernetes.io/name: my-app
    app.kubernetes.io/version: "1.2.3"
    app.kubernetes.io/component: backend
spec:
  replicas: 3                    # 🎯 "I want 3 copies running at all times"
  
  # 🔄 Update strategy - how to replace old Pods with new ones
  strategy:
    type: RollingUpdate          # Default and recommended for stateless apps
    rollingUpdate:
      maxSurge: 1                # 📈 Allow 1 extra Pod during update (4 total briefly)
      maxUnavailable: 0          # 🛡️ Never have fewer than 3 running
      # Why these values? Prioritizes availability over speed.
      # For faster updates: maxSurge: 25%, maxUnavailable: 25%
  
  # 🎯 Selector: "Which Pods belong to this Deployment?"
  selector:
    matchLabels:
      app.kubernetes.io/name: my-app   # Must match template.metadata.labels!
      
  template:                      # 📋 Pod template - blueprint for each Pod
    metadata:
      labels:
        app.kubernetes.io/name: my-app  # ⚠️ MUST MATCH selector above!
        app.kubernetes.io/version: "1.2.3"
    spec:
      # 🛑 Graceful shutdown configuration
      terminationGracePeriodSeconds: 60  # Give app 60s to finish requests
      
      containers:
      - name: app
        image: myregistry/myapp:v1.2.3   # 🏷️ Always use specific tags, NEVER :latest
        imagePullPolicy: IfNotPresent    # 📥 Don't re-pull if image exists locally
        
        # 🚪 Port declaration (documentation + service discovery)
        ports:
        - containerPort: 8080
          name: http                      # Named ports are clearer
        
        # 💰 Resource Management - ALWAYS SET THESE
        resources:
          requests:              # "I need at least this much to function"
            memory: "128Mi"      # Scheduler uses this for placement decisions
            cpu: "100m"          # 100 millicores = 0.1 CPU core
          limits:                # "Never let me exceed this"
            memory: "256Mi"      # Exceeding = OOMKilled 💀
            cpu: "500m"          # Exceeding = throttled (not killed)
            # 💡 Note: Some teams omit CPU limits to avoid throttling.
            # If you set them, monitor for latency impacts.
        
        # 🔍 Environment variables (non-sensitive config)
        env:
        - name: LOG_LEVEL
          value: "info"
        - name: APP_ENV
          value: "production"

📊 Resource Units Explained

Understanding resource units is crucial for proper capacity planning:

Unit	Meaning	Example	Notes
`100m`	100 millicores	10% of 1 CPU core	1000m = 1 full core
`0.1`	Same as 100m	10% of 1 CPU core	Decimal notation works too
`128Mi`	128 mebibytes	~134 MB	Binary units (1024-based)
`128M`	128 megabytes	128 MB	Decimal units (1000-based)
`1Gi`	1 gibibyte	~1.07 GB	Use for memory typically

💡 Best Practice for Setting Resources:

Start with low requests, monitor actual usage with kubectl top
Set memory limits ~1.5-2x requests initially
Use Vertical Pod Autoscaler (VPA) for recommendations
Memory limit = hard ceiling (OOMKill if exceeded)
CPU limit = soft ceiling (throttling, not death) - some teams omit this

🔄 The Rolling Update Dance

When you update a Deployment (new image, config change, etc.), here's exactly what happens:

sequenceDiagram participant You as 👤 You participant API as 📞 API Server participant DC as 👔 Deployment Controller participant Old as 📦 Old ReplicaSet (v1) participant New as 📦 New ReplicaSet (v2) You->>API: kubectl apply (image: v2) API->>DC: Deployment updated! DC->>New: Create new ReplicaSet Note over New: replicas: 0 → 1 DC->>New: Start Pod v2 #1 Note over New: Pod starting... New-->>DC: Pod Ready! ✅ DC->>Old: Scale down Note over Old: replicas: 3 → 2 DC->>New: Scale up Note over New: replicas: 1 → 2 New-->>DC: Pod #2 Ready! ✅ DC->>Old: Scale down Note over Old: replicas: 2 → 1 DC->>New: Scale up Note over New: replicas: 2 → 3 New-->>DC: Pod #3 Ready! ✅ DC->>Old: Scale down Note over Old: replicas: 1 → 0 Note over DC: 🎉 Rollout Complete! Note over Old: Kept for rollback!

🔑 Key insights:

At no point did we have zero running instances
Old ReplicaSet is kept (with 0 replicas) for instant rollback
Each new Pod must pass readiness probe before old Pod is terminated
Your users experienced zero downtime

⏪ Rollback: Your Safety Net

# 😱 Something's wrong! Roll back immediately!
kubectl rollout undo deployment/my-app

# 📜 See rollout history
kubectl rollout history deployment/my-app

# ⏪ Roll back to specific revision
kubectl rollout undo deployment/my-app --to-revision=2

# 👀 Watch rollout progress
kubectl rollout status deployment/my-app

💡 Why rollback is instant: Remember those old ReplicaSets? Kubernetes just scales up the old one and scales down the new one. No image pulling, no waiting—the old Pods were ready to go!

🌐 Chapter 3: Services — Stable Endpoints in a Chaotic World

🤔 The Problem Services Solve

Here's the challenge: Pods are ephemeral. They get IP addresses when created. When they die and are replaced, they get new IP addresses.

Pod my-app-abc123: IP 10.244.1.5   →   💀 Dies
Pod my-app-xyz789: IP 10.244.2.8   →   🆕 Created (different IP!)

Imagine if every time your favorite restaurant hired a new chef, you had to learn their home address to order food. That's Pods without Services.

Services provide:

🏠 Stable IP address that never changes
🔤 DNS name for easy discovery
⚖️ Load balancing across healthy Pods
🔍 Service discovery via environment variables and DNS

🎯 Service Types: Choose Your Adventure

flowchart TB subgraph Internal["🔒 Cluster Internal"] CIP["ClusterIP
Default type
Internal traffic only"] end subgraph External["🌍 External Access"] NP["NodePort
Development/Testing
Port 30000-32767"] LB["LoadBalancer
Production
Cloud provider LB"] end subgraph Special["🔗 Special Purpose"] EN["ExternalName
DNS alias
External services"] HL["Headless
clusterIP: None
Direct Pod access"] end

Type	Who Can Access	Use Case	Cost	Example
🔒 ClusterIP	Internal Pods only	Service-to-service communication	Free	API calling database
🚪 NodePort	External via `NodeIP:30000-32767`	Development, on-prem	Free	Testing externally
☁️ LoadBalancer	External via cloud LB	Production external access	💸💸💸	Public website
🔗 ExternalName	DNS CNAME record	Abstracting external deps	Free	`db.example.com` → RDS
📍 Headless	Direct Pod IPs	StatefulSets, custom discovery	Free	Database clusters

📝 ClusterIP Service Example (The Default)

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
  labels:
    app.kubernetes.io/name: my-app
spec:
  type: ClusterIP              # 🔒 Default, can omit
  
  # 🎯 Selector: "Route traffic to Pods with these labels"
  selector:
    app.kubernetes.io/name: my-app   # Must match Pod labels EXACTLY!
  
  ports:
  - name: http                 # 📛 Named ports are best practice
    port: 80                   # 🚪 Port the Service listens on
    targetPort: http           # 🎯 Use named port from Pod spec!
    protocol: TCP              # TCP is default, can omit

# 💡 Why separate port and targetPort?
# - Service presents a standard interface (port 80)
# - Pods can use any port internally (8080)
# - You can change Pod port without affecting clients

🔍 How Service Discovery Works

When a Service is created, Kubernetes does two magical things:

📛 DNS Name Formats:

Format	When to Use	Example
`my-svc`	Same namespace	`http://my-svc/api`
`my-svc.other-ns`	Different namespace	`http://my-svc.payments/charge`
`my-svc.other-ns.svc.cluster.local`	Full FQDN (rarely needed)	Cross-cluster scenarios

💡 Pro tip: Always use the short form within the same namespace. It's cleaner and Kubernetes adds the suffix automatically.

🔌 Endpoints: The Magic Behind Services

Services don't magically know where Pods are. They maintain an Endpoints object:

# 👀 See which Pods a Service routes to
kubectl get endpoints my-app-service

NAME              ENDPOINTS                                         AGE
my-app-service    10.244.1.5:8080,10.244.2.8:8080,10.244.3.3:8080   5m

⚠️ If you see <none> for endpoints:

Your selector doesn't match any Pod labels
No Pods are passing readiness probes
Pods exist but in wrong namespace

This is the #1 debugging step when Services don't work!

🚪 Chapter 4: Ingress — The Fancy Front Door

🤔 Why Ingress Exists

Services are great, but:

LoadBalancers cost money 💸 (one per service = budget death)
NodePorts are ugly 😬 (nobody wants myapp.com:31847)
No path-based routing (can't route /api and /web separately)
No SSL termination at Service level

Ingress provides:

🛣️ Path-based routing (/api → API service, /web → Web service)
🏠 Host-based routing (multiple domains, one IP)
🔐 TLS/SSL termination (HTTPS handled at the edge)
💰 Cost efficiency (one LoadBalancer for many services)

⚙️ How Ingress Works (Two Components)

flowchart TB subgraph You["👤 You Create"] ING["📜 Ingress Resource
Just configuration/rules"] end subgraph Controller["🤖 Ingress Controller
(Must be installed separately!)"] IC["nginx-ingress
traefik
AWS ALB Controller
etc."] end subgraph Result["🌐 Actual Routing"] LB["☁️ Load Balancer"] SVC1["⚡ Service 1"] SVC2["⚡ Service 2"] end ING -->|"Controller reads"| IC IC -->|"Configures"| LB LB --> SVC1 LB --> SVC2

⚠️ Important: The Ingress resource alone does nothing! You need an Ingress Controller (nginx-ingress, traefik, etc.) actually installed in your cluster.

📝 Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    # 🔧 Controller-specific settings (these are for nginx)
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  ingressClassName: nginx      # 🎯 Which controller handles this (not annotation!)
  
  # 🔐 TLS Configuration
  tls:
  - hosts:
    - myapp.example.com
    - api.example.com
    secretName: tls-secret     # Certificate stored as K8s Secret
  
  # 🛣️ Routing Rules
  rules:
  # Rule 1: myapp.example.com
  - host: myapp.example.com
    http:
      paths:
      - path: /api             # 🎯 /api/* goes to api-service
        pathType: Prefix       # Prefix = /api, /api/, /api/users all match
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /                # 🎯 Everything else goes to web-service
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
              
  # Rule 2: api.example.com (different domain)
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

🔀 Traffic Flow Visualization

flowchart TB Internet((🌍 Internet)) subgraph Cluster["☸️ Kubernetes Cluster"] subgraph IC["🚦 Ingress Controller"] NGINX["nginx Pod
Reads Ingress rules
Routes traffic"] end ING["📜 Ingress Resource
myapp.example.com:
/api → api-svc
/ → web-svc"] subgraph Services["⚡ Services"] API["api-service"] WEB["web-service"] end subgraph Pods["📦 Pods"] AP1["API Pod 1"] AP2["API Pod 2"] WP1["Web Pod 1"] WP2["Web Pod 2"] end end Internet -->|"https://myapp.example.com/api/users"| IC IC -.->|"reads config"| ING IC -->|"/api/*"| API IC -->|"/*"| WEB API --> AP1 API --> AP2 WEB --> WP1 WEB --> WP2

💡 Path Types Explained:

Type	Matches	Use Case
`Prefix`	`/api`, `/api/`, `/api/users`	Most common, REST APIs
`Exact`	Only `/api` exactly	Specific endpoints
`ImplementationSpecific`	Controller decides	Legacy, avoid

⚙️ Chapter 5: ConfigMaps and Secrets — Externalizing Configuration

🎯 Why Externalize Configuration?

The 12-Factor App methodology says: Store config in the environment, not in code.

Why?

🔄 Same image, different environments (dev/staging/prod)
🔒 Secrets stay secret (not in Git history!)
⚡ Change config without rebuilding (faster deployments)
👥 Separation of concerns (devs write code, ops manage config)

📄 ConfigMaps: For Non-Sensitive Data

ConfigMaps store configuration data as key-value pairs or entire files:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  # 🔑 Simple key-value pairs
  LOG_LEVEL: "info"
  DATABASE_HOST: "db.internal.svc.cluster.local"
  FEATURE_NEW_UI: "true"
  MAX_CONNECTIONS: "100"
  
  # 📄 Entire configuration files
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        
        location / {
            proxy_pass http://backend:8080;
            proxy_set_header Host $host;
        }
    }
    
  application.yaml: |
    spring:
      profiles:
        active: production
      datasource:
        url: jdbc:postgresql://db:5432/myapp
    logging:
      level:
        root: INFO

🔐 Secrets: For Sensitive Data

⚠️ Critical Warning: Kubernetes Secrets are base64 encoded, NOT encrypted by default. Anyone with cluster access can decode them!

apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque                    # Generic secret type
stringData:                     # 💡 Use stringData, K8s encodes automatically
  DB_PASSWORD: "super-secret-password-123"
  API_KEY: "sk-abc123def456"
  JWT_SECRET: "my-jwt-signing-key"

# ⚠️ The 'data' field requires base64 encoding:
# data:
#   DB_PASSWORD: c3VwZXItc2VjcmV0LXBhc3N3b3JkLTEyMw==

🔒 For Real Security:

Enable encryption at rest for etcd
Use external secret managers: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault
Use the External Secrets Operator to sync external secrets
Implement RBAC to restrict Secret access

💉 Injecting Configuration into Pods

There are three ways to use ConfigMaps and Secrets in Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: configured-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: configured-app
  template:
    metadata:
      labels:
        app: configured-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        
        # ═══════════════════════════════════════════════════════════
        # 📋 METHOD 1: Load ALL keys as environment variables
        # ═══════════════════════════════════════════════════════════
        envFrom:
        - configMapRef:
            name: app-config        # All keys become env vars
        - secretRef:
            name: app-secrets       # ⚠️ Be careful, exposes all secrets
        
        # ═══════════════════════════════════════════════════════════
        # 🎯 METHOD 2: Cherry-pick specific values (RECOMMENDED)
        # ═══════════════════════════════════════════════════════════
        env:
        - name: DATABASE_PASSWORD   # Env var name in container
          valueFrom:
            secretKeyRef:
              name: app-secrets     # Secret name
              key: DB_PASSWORD      # Key within Secret
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: LOG_LEVEL
        
        # ═══════════════════════════════════════════════════════════
        # 📁 METHOD 3: Mount as files (great for config files)
        # ═══════════════════════════════════════════════════════════
        volumeMounts:
        - name: config-volume
          mountPath: /etc/config    # ConfigMap keys become files here
          readOnly: true
        - name: secret-volume
          mountPath: /etc/secrets
          readOnly: true
          
      volumes:
      - name: config-volume
        configMap:
          name: app-config
          items:                    # 💡 Optional: mount specific keys only
          - key: nginx.conf
            path: nginx.conf        # /etc/config/nginx.conf
      - name: secret-volume
        secret:
          secretName: app-secrets
          defaultMode: 0400         # 🔒 Restrictive permissions

💡 Which Method to Use?

Method	Best For	Pros	Cons
`envFrom`	Simple apps, all config needed	Easy, automatic	Exposes everything, naming conflicts
`env` + `valueFrom`	Production apps	Explicit, documented	More YAML
Volume mounts	Config files (nginx.conf, etc.)	Files stay files	App must read files

🏥 Chapter 6: Health Checks — Keeping Your Pods Honest

🤔 Why Health Checks Matter

Without health checks:

🧟 Zombie Pods - Process running but not responding
💀 Cascading failures - Bad Pod gets traffic, fails, repeats
😴 Slow startup issues - Pod not ready but getting traffic

With health checks:

🔄 Automatic recovery - Unhealthy containers restarted
🚦 Traffic control - Only ready Pods receive traffic
⏰ Startup tolerance - Slow apps given time to initialize

🔍 The Three Probe Types

flowchart TB subgraph Probes["🏥 Health Probe Types"] LP["💓 Liveness Probe
'Are you alive?'"] RP["✅ Readiness Probe
'Can you serve traffic?'"] SP["🚀 Startup Probe
'Are you done starting?'"] end subgraph Actions["📋 On Failure"] LPA["🔄 Container KILLED
and restarted"] RPA["🚫 Removed from
Service endpoints"] SPA["⏸️ Other probes
disabled until pass"] end LP --> LPA RP --> RPA SP --> SPA

Probe	Question It Answers	On Failure	When to Use
💓 Liveness	"Is the process stuck/deadlocked?"	Container killed & restarted	Always. Catches hung processes.
✅ Readiness	"Can you handle requests right now?"	Removed from Service (no traffic)	Always. Prevents traffic to unready Pods.
🚀 Startup	"Have you finished initializing?"	Liveness/Readiness probes paused	Slow-starting apps (Java, legacy)

📝 Complete Health Check Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: healthy-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: healthy-app
  template:
    metadata:
      labels:
        app: healthy-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
          name: http
        
        # ═══════════════════════════════════════════════════════════
        # 🚀 STARTUP PROBE (for slow-starting applications)
        # "Disable liveness/readiness until startup completes"
        # ═══════════════════════════════════════════════════════════
        startupProbe:
          httpGet:
            path: /healthz
            port: http
          failureThreshold: 30      # 30 × 10s = 5 min max startup
          periodSeconds: 10
          # 💡 Once this passes, liveness/readiness probes take over
        
        # ═══════════════════════════════════════════════════════════
        # 💓 LIVENESS PROBE
        # "If this fails, kill and restart the container"
        # ═══════════════════════════════════════════════════════════
        livenessProbe:
          httpGet:
            path: /healthz          # 💡 Lightweight endpoint
            port: http
          initialDelaySeconds: 0    # Startup probe handles delay
          periodSeconds: 10         # 🔄 Check every 10 seconds
          timeoutSeconds: 5         # ⏱️ Timeout per check
          failureThreshold: 3       # ❌ Fail 3 times = restart
          successThreshold: 1       # ✅ 1 success = healthy
        
        # ═══════════════════════════════════════════════════════════
        # ✅ READINESS PROBE
        # "If this fails, stop sending traffic to this Pod"
        # ═══════════════════════════════════════════════════════════
        readinessProbe:
          httpGet:
            path: /ready            # 💡 Can be different from liveness!
            port: http
          initialDelaySeconds: 0    # Startup probe handles delay
          periodSeconds: 5          # Check more frequently
          timeoutSeconds: 3
          failureThreshold: 3       # Fail 3 times = remove from LB
          successThreshold: 1

🔧 Probe Types Available

# 🌐 HTTP GET (most common)
httpGet:
  path: /healthz
  port: 8080
  httpHeaders:                      # Optional custom headers
  - name: Custom-Header
    value: Awesome

# 🔌 TCP Socket (for non-HTTP services)
tcpSocket:
  port: 3306                        # Just checks if port is open

# 💻 Exec (run a command)
exec:
  command:
  - cat
  - /tmp/healthy
  # Exit code 0 = healthy

# 🌐 gRPC (for gRPC services, K8s 1.24+)
grpc:
  port: 50051

💡 Health Check Best Practices

Practice	Why
Liveness ≠ Readiness endpoints	Liveness: "am I broken?" Readiness: "am I ready for MORE traffic?"
Don't check dependencies in liveness	If DB is down, restarting your app won't fix it!
DO check dependencies in readiness	Don't send traffic if you can't serve it
Set appropriate timeouts	Too short = false positives. Too long = slow recovery.
Use startup probes for slow apps	Prevents liveness probe killing during startup
Keep probes lightweight	Heavy probes can cause issues under load

🛑 Chapter 7: Graceful Shutdown — The Art of Dying Well

🤔 Why Graceful Shutdown Matters

When Kubernetes terminates a Pod (during updates, scaling down, node drain), what happens to in-flight requests?

Without graceful shutdown:

💥 Requests get dropped mid-response
😠 Users see 502/503 errors
🔄 Retries create thundering herd

With graceful shutdown:

✅ Current requests complete
🚫 New requests go elsewhere
😊 Users notice nothing

⏰ The Termination Sequence

sequenceDiagram participant K8s as ☸️ Kubernetes participant EP as 🔌 Endpoints Controller participant Pod as 📦 Pod participant App as 💻 Your Application K8s->>Pod: 1️⃣ Pod marked for termination K8s->>EP: 2️⃣ Remove Pod from Service endpoints Note over EP: Traffic stops flowing to Pod par Parallel execution K8s->>Pod: 3️⃣ Run preStop hook (if defined) Note over Pod: e.g., sleep 5 and K8s->>App: 4️⃣ Send SIGTERM Note over App: Your app should:
• Stop accepting new requests
• Finish in-flight requests
• Close DB connections
• Flush buffers end Note over K8s: ⏰ Wait terminationGracePeriodSeconds alt App exits cleanly App-->>K8s: Exit 0 ✅ Note over K8s: Clean shutdown! else Timeout exceeded K8s->>App: 5️⃣ SIGKILL (force kill) 💀 Note over K8s: Hard termination end

📝 Graceful Shutdown Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: graceful-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: graceful-app
  template:
    metadata:
      labels:
        app: graceful-app
    spec:
      # ⏰ How long to wait before SIGKILL (default: 30s)
      terminationGracePeriodSeconds: 60
      
      containers:
      - name: app
        image: myapp:latest
        
        # 🔧 Lifecycle hooks
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                # 💡 Why sleep? Give load balancer time to update!
                # Endpoints update is async - traffic may still
                # be routing here for a few seconds
                sleep 5

💻 Application-Side SIGTERM Handling

Your application MUST handle SIGTERM properly:

Node.js:

process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully');
  
  // Stop accepting new connections
  server.close(async () => {
    await database.disconnect();
    process.exit(0);
  });
  
  // Force exit after timeout
  setTimeout(() => process.exit(1), 25000);
});

Java Spring Boot:

# application.yaml
server:
  shutdown: graceful
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Python:

import signal, sys

def handle_sigterm(signum, frame):
    print("SIGTERM received, shutting down...")
    # Cleanup code here
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

🏆 Chapter 8: Production Best Practices — The Checklist That Saves Careers

✅ The Production Readiness Checklist

🔲 RESOURCES
  ✅ Resource requests AND limits set for all containers
  ✅ Requests based on actual observed usage
  ✅ Memory limit = hard ceiling (OOMKill risk understood)
  
🔲 HEALTH & AVAILABILITY  
  ✅ Startup probe configured (for slow-starting apps)
  ✅ Liveness probe configured
  ✅ Readiness probe configured (different endpoint from liveness!)
  ✅ Multiple replicas (minimum 2, recommend 3+)
  ✅ PodDisruptionBudget defined
  ✅ Pod anti-affinity (spread across nodes)
  ✅ TopologySpreadConstraints (spread across zones)
  
🔲 LIFECYCLE
  ✅ Application handles SIGTERM gracefully
  ✅ terminationGracePeriodSeconds set appropriately
  ✅ preStop hook if needed (for LB drain time)
  
🔲 SECURITY
  ✅ Container runs as non-root user
  ✅ Read-only root filesystem (if possible)
  ✅ No privilege escalation
  ✅ Drop all capabilities
  ✅ Secrets in Secret objects (not ConfigMaps!)
  ✅ Network policies restrict Pod communication
  ✅ Service account explicitly set (not default)
  
🔲 IMAGES
  ✅ Specific image tag (NEVER :latest in production)
  ✅ imagePullPolicy: IfNotPresent
  ✅ Image from trusted registry
  ✅ Image scanned for vulnerabilities
  
🔲 OBSERVABILITY
  ✅ Logging to stdout/stderr
  ✅ Metrics exposed (/metrics endpoint)
  ✅ Alerts configured for key metrics

🔲 SCALING
  ✅ HorizontalPodAutoscaler configured (if applicable)

🌍 High Availability: Spreading Pods Across Failure Domains

Don't put all your eggs in one basket (node):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ha-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ha-app
  template:
    metadata:
      labels:
        app: ha-app
    spec:
      # 🌍 Spread Pods across nodes
      affinity:
        podAntiAffinity:
          # 🎯 "Prefer" = best effort, won't block scheduling
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: ha-app
              topologyKey: kubernetes.io/hostname    # Different nodes
      
      # 🌐 Spread across availability zones
      topologySpreadConstraints:
      - maxSkew: 1                                   # Max difference between zones
        topologyKey: topology.kubernetes.io/zone    # Spread across AZs
        whenUnsatisfiable: ScheduleAnyway           # Don't block if can't satisfy
        labelSelector:
          matchLabels:
            app: ha-app
      
      containers:
      - name: app
        image: myapp:latest

🛡️ Pod Disruption Budgets: Maintaining Availability During Disruptions

PDBs prevent cluster operations (node drains, upgrades) from killing too many Pods:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ha-app-pdb
spec:
  minAvailable: 2                 # Always keep at least 2 running
  # OR: maxUnavailable: 1         # Never have more than 1 down
  selector:
    matchLabels:
      app: ha-app

📈 Horizontal Pod Autoscaler: Automatic Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # Scale up when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

🔒 Security: Running Securely

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      # 🔐 Use dedicated service account
      serviceAccountName: secure-app-sa
      automountServiceAccountToken: false  # Don't mount token unless needed
      
      # 🔒 Pod-level security context
      securityContext:
        runAsNonRoot: true           # 🚫 Containers cannot run as root
        runAsUser: 1000              # 👤 Run as UID 1000
        runAsGroup: 1000             # 👥 Run as GID 1000
        fsGroup: 1000                # 📁 Volume ownership
        seccompProfile:
          type: RuntimeDefault       # 🛡️ Apply default seccomp profile
      
      containers:
      - name: app
        image: myapp:v1.0.0
        
        # 🔒 Container-level security context
        securityContext:
          allowPrivilegeEscalation: false    # 🚫 Can't gain privileges
          readOnlyRootFilesystem: true       # 📁 Can't write to filesystem
          capabilities:
            drop:
            - ALL                             # 🚫 Drop all Linux capabilities

🌐 Network Policy: Zero-Trust Networking

By default, all Pods can talk to all other Pods. Lock it down:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-app-netpol
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
  - Ingress
  - Egress
  
  # 📥 Who can talk TO my pods?
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx      # Only from ingress namespace
    ports:
    - port: 8080
  
  # 📤 Who can my pods talk TO?
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database           # Only to database namespace
    ports:
    - port: 5432
  - to:
    - namespaceSelector: {}        # Allow DNS
    ports:
    - port: 53
      protocol: UDP

🔧 Chapter 9: The kubectl Survival Guide

📋 Commands Organized by Task

👀 Viewing Resources

# 📋 List resources
kubectl get pods                        # Pods in current namespace
kubectl get pods -A                     # ALL namespaces
kubectl get pods -o wide                # Extra columns (node, IP)
kubectl get pods -w                     # Watch mode (live updates)
kubectl get all                         # Common resources (not actually all!)

# 🔍 Detailed information
kubectl describe pod <n>             # Full details + events
kubectl describe deployment <n>      # Deployment details

# 📊 Resource usage (requires metrics-server)
kubectl top pods                        # CPU/memory usage
kubectl top nodes                       # Node resource usage

🔍 Debugging

# 📜 Logs
kubectl logs <pod>                      # Container logs
kubectl logs <pod> -f                   # Stream logs (like tail -f)
kubectl logs <pod> --previous           # Logs from crashed container
kubectl logs <pod> -c <container>       # Specific container in Pod
kubectl logs -l app=myapp               # Logs from all Pods with label

# 🐚 Shell access
kubectl exec -it <pod> -- /bin/sh       # Shell into container
kubectl exec -it <pod> -- /bin/bash     # If bash available
kubectl exec <pod> -- cat /etc/config   # Run single command

# 🔌 Port forwarding
kubectl port-forward <pod> 8080:80      # Local:Pod
kubectl port-forward svc/<svc> 8080:80  # Via Service

# 📋 Events (crucial for debugging!)
kubectl get events --sort-by='.lastTimestamp'
kubectl get events --field-selector type=Warning

✏️ Making Changes

# 📝 Apply configuration
kubectl apply -f manifest.yaml          # Create or update
kubectl apply -f ./manifests/           # Apply all files in directory
kubectl apply -k ./kustomize/           # Apply with Kustomize

# 🗑️ Delete resources
kubectl delete -f manifest.yaml         # Delete by file
kubectl delete pod <n>               # Delete specific Pod
kubectl delete pods -l app=myapp        # Delete by label

# 📊 Scaling
kubectl scale deployment <n> --replicas=5
kubectl autoscale deployment <n> --min=2 --max=10 --cpu-percent=80

🔄 Rollout Management

# 👀 Status
kubectl rollout status deployment <n>     # Watch rollout
kubectl rollout history deployment <n>    # View history

# ⏪ Rollback
kubectl rollout undo deployment <n>                   # Previous version
kubectl rollout undo deployment <n> --to-revision=2   # Specific revision

# 🔄 Restart
kubectl rollout restart deployment <n>    # Trigger rolling restart

🔧 Context & Namespace

# 📍 Context (cluster) management
kubectl config get-contexts             # List clusters
kubectl config use-context <n>       # Switch cluster
kubectl config current-context          # Show current

# 📁 Namespace management
kubectl get namespaces
kubectl config set-context --current --namespace=<ns>  # Set default! 🎯

🔥 Chapter 10: Troubleshooting — A Chronicle of Preventable Suffering

🗺️ The Troubleshooting Flowchart

flowchart TD Start["🔥 Something is broken!"] Start --> GetPods["kubectl get pods"] GetPods --> Status{"What's the Status?"} Status -->|"⏳ Pending"| Pending["kubectl describe pod"] Pending --> PendingCause{"Check Events section"} PendingCause -->|"Insufficient CPU/memory"| Resources["Scale cluster or reduce requests"] PendingCause -->|"No nodes match"| Affinity["Check nodeSelector/affinity"] PendingCause -->|"PVC not bound"| PVC["kubectl get pvc"] PendingCause -->|"Taint not tolerated"| Taint["Add toleration or remove taint"] Status -->|"🖼️ ImagePullBackOff"| Image["Check describe pod Events"] Image --> ImageFix["• Typo in image name?
• Tag exists?
• Private registry auth?
• Network to registry?"] Status -->|"💥 CrashLoopBackOff"| Crash["kubectl logs --previous"] Crash --> CrashCause{"What killed it?"} CrashCause -->|"Application error"| AppFix["Fix your code 😅"] CrashCause -->|"OOMKilled"| OOM["Increase memory limits"] CrashCause -->|"Exit code 1"| Config["Check env vars & config"] CrashCause -->|"Exit code 137"| SIGKILL["OOMKilled or slow shutdown"] Status -->|"✅ Running but broken"| Running["kubectl logs -f"] Running --> SvcCheck{"Is Service working?"} SvcCheck --> Endpoints["kubectl get endpoints"] Endpoints -->|"No endpoints"| Labels["🏷️ CHECK YOUR LABELS!
Selector ≠ Pod labels"] Endpoints -->|"Has endpoints"| AppDebug["Check app logs
exec into Pod"]

🚨 The Classic Failures

⏳ Pending - "Waiting in Limbo"

NAME                     READY   STATUS    RESTARTS   AGE
my-app-abc123            0/1     Pending   0          10m

Translation: Scheduler can't find a home for your Pod.

Debug:

kubectl describe pod my-app-abc123
# Look at the "Events" section at the bottom!

Common causes & fixes:

Event Message	Cause	Fix
`Insufficient cpu`	No node has enough CPU	Reduce requests or add nodes
`Insufficient memory`	No node has enough memory	Reduce requests or add nodes
`node(s) had taint`	Taints blocking	Add tolerations or remove taints
`didn't match Pod's node affinity`	Affinity mismatch	Fix nodeSelector/affinity rules
`persistentvolumeclaim not found`	PVC missing	Create the PVC

🖼️ ImagePullBackOff - "Can't Get Your Container"

NAME                     READY   STATUS             RESTARTS   AGE
my-app-abc123            0/1     ImagePullBackOff   0          5m

Translation: Kubernetes can't download your container image.

Checklist:

🔤 Image name spelled correctly? (typos are #1 cause!)
🏷️ Tag exists? Did you push it?
🔐 Private registry? Add imagePullSecrets
🌐 Can nodes reach the registry? (network/firewall)
⏰ Registry rate limiting? (Docker Hub!)

💥 CrashLoopBackOff - "Repeatedly Dying"

NAME                     READY   STATUS             RESTARTS   AGE
my-app-abc123            0/1     CrashLoopBackOff   5          3m

Translation: Your container starts, crashes, restarts... forever.

Debug:

kubectl logs my-app-abc123 --previous
kubectl describe pod my-app-abc123  # Check "Last State" section

Exit codes:

Exit Code	Meaning	Common Cause
`1`	Application error	Check logs!
`137`	SIGKILL (128+9)	OOMKilled
`143`	SIGTERM (128+15)	Graceful shutdown
`126`	Command not executable	Bad entrypoint
`127`	Command not found	Typo in command

🔌 No Endpoints - "Service Can't Find Pods"

$ kubectl get endpoints my-service
NAME         ENDPOINTS   AGE
my-service   <none>      5m    # 😱 No Pods found!

Translation: Your Service selector doesn't match any Pod labels.

Debug:

# What is the Service looking for?
kubectl get service my-service -o yaml | grep -A5 selector

# What labels do Pods have?
kubectl get pods --show-labels

# Compare them! They must match EXACTLY.

📦 Chapter 11: The Complete Production Example

Here's everything we've learned, combined into a production-ready deployment:

# 🏗️ Complete Production-Ready Kubernetes Application
# ═══════════════════════════════════════════════════════════════════════════

---
# 📁 Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production

---
# 📄 ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"

---
# 🔐 Secret
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
type: Opaque
stringData:
  API_KEY: "your-api-key-here"
  DB_PASSWORD: "super-secret-password"

---
# 🔐 Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: production-app
  namespace: production
automountServiceAccountToken: false

---
# 🚀 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
  namespace: production
  labels:
    app.kubernetes.io/name: production-app
    app.kubernetes.io/version: "1.0.0"
spec:
  replicas: 3
  
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  
  selector:
    matchLabels:
      app.kubernetes.io/name: production-app
      
  template:
    metadata:
      labels:
        app.kubernetes.io/name: production-app
        app.kubernetes.io/version: "1.0.0"
    spec:
      serviceAccountName: production-app
      terminationGracePeriodSeconds: 60
      
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/name: production-app
              topologyKey: kubernetes.io/hostname
      
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: production-app
      
      containers:
      - name: app
        image: nginx:1.25-alpine
        imagePullPolicy: IfNotPresent
        
        ports:
        - containerPort: 80
          name: http
        
        envFrom:
        - configMapRef:
            name: app-config
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: API_KEY
        
        resources:
          requests:
            memory: "64Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "200m"
        
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        
        startupProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 30
          periodSeconds: 10
        
        livenessProbe:
          httpGet:
            path: /
            port: http
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /
            port: http
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]

---
# ⚡ Service
apiVersion: v1
kind: Service
metadata:
  name: production-app-service
  namespace: production
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: production-app
  ports:
  - name: http
    port: 80
    targetPort: http

---
# 🛡️ PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: production-app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: production-app

---
# 📈 HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

🧪 Try It Yourself!

# 🖥️ Start a local cluster
minikube start
# OR
kind create cluster

# 📦 Deploy
kubectl apply -f complete-app.yaml

# 👀 Watch
kubectl get pods -n production -w

# 📊 Check status
kubectl rollout status deployment/production-app -n production

# 🧪 Test
kubectl port-forward -n production svc/production-app-service 8080:80
curl http://localhost:8080

# 🗑️ Cleanup
kubectl delete -f complete-app.yaml

📚 TL;DR — The One-Page Survival Guide

📦 Resource	🎯 Purpose	🔧 Key Command
Pod	Runs containers	`kubectl get pods`
Deployment	Manages Pods	`kubectl get deployments`
Service	Stable endpoint	`kubectl get services`
Ingress	HTTP routing	`kubectl get ingress`
ConfigMap	Config	`kubectl get configmaps`
Secret	Sensitive config	`kubectl get secrets`
PDB	Availability	`kubectl get pdb`
HPA	Auto-scaling	`kubectl get hpa`

🔍 Debug Flow

kubectl get pods              # 1️⃣ What's running?
kubectl describe pod <n>      # 2️⃣ What's wrong? (check Events!)
kubectl logs <n>              # 3️⃣ What did it say?
kubectl logs <n> --previous   # 4️⃣ Why did it die?
kubectl exec -it <n> -- sh    # 5️⃣ Let me look inside
kubectl get endpoints <svc>   # 6️⃣ Can Service find Pods?

🏆 Golden Rules

✅ Always set resource requests AND limits
✅ Configure startup, liveness AND readiness probes
✅ Labels must match between selector and Pod template
✅ Handle SIGTERM for graceful shutdown
✅ Never use :latest tag in production
✅ When in doubt: kubectl describe and check Events
✅ Check endpoints when Services don't work
✅ Run as non-root with minimal capabilities

📖 Glossary

Term	Definition
Pod	Smallest deployable unit; 1+ containers sharing network/storage
Deployment	Controller managing ReplicaSets; handles updates, rollbacks
ReplicaSet	Maintains specified number of Pod replicas
Service	Stable network endpoint routing to Pods
Ingress	HTTP/HTTPS routing, TLS termination
ConfigMap	Non-sensitive configuration data
Secret	Sensitive data (base64 encoded by default)
Namespace	Virtual cluster for resource isolation
Probe	Health check (startup, liveness, readiness)
PDB	Pod Disruption Budget; protects availability
HPA	Horizontal Pod Autoscaler; automatic scaling
etcd	Distributed key-value store; cluster state
Kubelet	Node agent; manages Pods on each node
SIGTERM	Termination signal; graceful shutdown

May your Pods be healthy, your rollouts smooth, and your YAML forever valid. ☸️

Now go forth and orchestrate! 🚀

📚 Further Reading

Tagged in:

Kubernetes K8s DevOps Containers Docker Yaml Pods Deployments Security Infrastructure Developers Platform Engineering Container Orchestration Cloud Native Tutorial

Robert Marcel Saveanu

The Parable of the Coin-Eating Kingdom 🎰

The Parable of the Coin-Eating Kingdom 🎰

🛠️ The QA Survival Kit: Your Complete Reference Guide

Press ESC to close

Or check our Popular Categories...

😱 The Existential Crisis

📖 Chapter 1: What Even IS Kubernetes?

🎯 The Honest Explanation

🤔 Why Does This Exist? (The Problem It Solves)

🏗️ The Object Hierarchy (a.k.a. "The Circle of Life")

🏛️ The Architecture: A Map of the Kingdom

🧩 Control Plane Components Explained

📦 Chapter 2: Pods and Deployments — The Core Building Blocks

📦 Pods: The Atomic Unit of Kubernetes

🚀 Deployments: The Proper Way™

📊 Resource Units Explained

🔄 The Rolling Update Dance

⏪ Rollback: Your Safety Net

🌐 Chapter 3: Services — Stable Endpoints in a Chaotic World

🤔 The Problem Services Solve

🎯 Service Types: Choose Your Adventure

📝 ClusterIP Service Example (The Default)

🔍 How Service Discovery Works

🔌 Endpoints: The Magic Behind Services

🚪 Chapter 4: Ingress — The Fancy Front Door

🤔 Why Ingress Exists

⚙️ How Ingress Works (Two Components)

📝 Ingress Example

🔀 Traffic Flow Visualization

⚙️ Chapter 5: ConfigMaps and Secrets — Externalizing Configuration

🎯 Why Externalize Configuration?

📄 ConfigMaps: For Non-Sensitive Data

🔐 Secrets: For Sensitive Data

💉 Injecting Configuration into Pods

🏥 Chapter 6: Health Checks — Keeping Your Pods Honest

🤔 Why Health Checks Matter

🔍 The Three Probe Types

📝 Complete Health Check Configuration

🔧 Probe Types Available

💡 Health Check Best Practices

🛑 Chapter 7: Graceful Shutdown — The Art of Dying Well

🤔 Why Graceful Shutdown Matters

⏰ The Termination Sequence

📝 Graceful Shutdown Configuration

💻 Application-Side SIGTERM Handling

🏆 Chapter 8: Production Best Practices — The Checklist That Saves Careers

✅ The Production Readiness Checklist

🌍 High Availability: Spreading Pods Across Failure Domains

🛡️ Pod Disruption Budgets: Maintaining Availability During Disruptions

📈 Horizontal Pod Autoscaler: Automatic Scaling

🔒 Security: Running Securely

🌐 Network Policy: Zero-Trust Networking

🔧 Chapter 9: The kubectl Survival Guide

📋 Commands Organized by Task

👀 Viewing Resources

🔍 Debugging

✏️ Making Changes

🔄 Rollout Management

🔧 Context & Namespace

🔥 Chapter 10: Troubleshooting — A Chronicle of Preventable Suffering

🗺️ The Troubleshooting Flowchart

🚨 The Classic Failures

⏳ Pending - "Waiting in Limbo"

🖼️ ImagePullBackOff - "Can't Get Your Container"

💥 CrashLoopBackOff - "Repeatedly Dying"

🔌 No Endpoints - "Service Can't Find Pods"

📦 Chapter 11: The Complete Production Example

🧪 Try It Yourself!

📚 TL;DR — The One-Page Survival Guide

🔍 Debug Flow

🏆 Golden Rules

📖 Glossary

📚 Further Reading

Share Article:

Robert Marcel Saveanu

Other Stories

The Parable of the Coin-Eating Kingdom 🎰