Or: How I Learned to Stop Worrying and Love the YAML πŸ“œ


😱 The Existential Crisis

Picture this: It's 2015. jQuery is still cool. Docker is "that whale thing." You deploy code by SSHing into a server named after your cat. Life has meaning.

Then someone in a conference room with too many whiteboards utters the cursed phrase: "We need to modernize our infrastructure."

Fast forward to today, and you're staring at 47 YAML files, questioning every decision that led you to this moment, while a colleague enthusiastically explains that "a Pod is just an abstraction over containers, which are themselves abstractions over processes, wrapped in cgroups and namespaces."

You nod. You understand nothing. You are not alone. 🀝

Welcome to Kubernetes, or as I like to call it: "The answer to a question you didn't know you were asking, to a problem you didn't know you had, using terminology invented by a committee of philosophers who really hate whitespace."

But fear not, brave developer. By the end of this article, you'll understand Kubernetes well enough to either deploy your applications with confidence or at least nod more convincingly in meetings while internally screaming.


πŸ“– Chapter 1: What Even IS Kubernetes?

🎯 The Honest Explanation

Kubernetes (abbreviated K8s, because apparently typing eight letters was too much effort) is a container orchestration platform.

"But what does that mean?" I hear you cry into the void.

Let me explain with an analogy that will haunt your dreams:

🍳 Imagine you're running a restaurant empire.

Real WorldKubernetes World
Your recipeYour code
A chef with their own portable kitchenA container
The company building portable kitchenscontainerd, CRI-O*
The RESTAURANT MANAGER FROM HELLKubernetes
πŸ“ Note: Docker as a container runtime was deprecated in Kubernetes 1.24. Modern clusters use containerd or CRI-O. You can still build images with Dockerβ€”the runtime just runs differently now.

The Manager (Kubernetes):

  • πŸ“Š Decides how many chefs you need at any moment
  • πŸ”₯ Fires chefs who look tired (health check failed)
  • πŸ”„ Hires identical replacement chefs automatically (self-healing)
  • πŸ“ˆ Brings in extra chefs when the lunch rush hits (horizontal scaling)
  • πŸ”€ Redirects customers to available chefs (load balancing)
  • 🚚 Moves chefs to a different location if one restaurant catches fire (node failure)
  • πŸ” Keeps track of the secret recipes (secrets management)
  • ❓ Doesn't actually know how to cook anything

That last point is crucial. Kubernetes doesn't run your codeβ€”it makes sure your code is always running somewhere, somehow, despite the universe's best attempts to stop it.

πŸ€” Why Does This Exist? (The Problem It Solves)

Before Kubernetes, scaling applications meant:

  1. Manual server provisioning - "Hey ops team, we need 3 more servers by Friday"
  2. Snowflake servers - Each server configured slightly differently, documented in someone's head
  3. Deployment fear - "If we deploy on Friday, we might not go home until Monday"
  4. No self-healing - Server dies at 3 AM? Hope you like being on-call!
  5. Resource waste - One app per server, even if it only uses 10% of resources

Kubernetes solves these by:

  • πŸ€– Automating everything - Declare what you want, K8s makes it happen
  • πŸ“¦ Standardizing deployments - Same process everywhere, every time
  • πŸ›‘οΈ Self-healing - Dead containers get replaced automatically
  • πŸ“Š Efficient resource usage - Many apps per server, bin-packing optimization
  • πŸ”„ Zero-downtime deployments - Rolling updates are the default

πŸ—οΈ The Object Hierarchy (a.k.a. "The Circle of Life")

Before we dive deeper, let's understand how Kubernetes objects relate to each other. This hierarchy is fundamental to understanding why things work the way they do:

flowchart TB subgraph High["🎯 High-Level (What You Create)"] DEP["Deployment
'I want 3 copies of my app running'"] end subgraph Mid["πŸ“‹ Mid-Level (Auto-Managed)"] RS["ReplicaSet
'I ensure exactly 3 Pods exist'"] end subgraph Low["πŸ“¦ Low-Level (The Actual Workers)"] P1["Pod 1"] P2["Pod 2"] P3["Pod 3"] end subgraph Atomic["🐳 Atomic Level"] C1["Container"] C2["Container"] C3["Container"] end DEP -->|"creates & manages"| RS RS -->|"creates & manages"| P1 RS -->|"creates & manages"| P2 RS -->|"creates & manages"| P3 P1 -->|"runs"| C1 P2 -->|"runs"| C2 P3 -->|"runs"| C3

πŸ”‘ Why this hierarchy?

LevelObjectWhy It Exists
You interact withDeploymentProvides update strategies, rollback history, declarative scaling
Auto-managedReplicaSetMaintains exact Pod count. New one created per Deployment version (enables rollback!)
WorkerPodScheduling unit. Shares network/storage between containers
Actual processContainerYour application code running

πŸ’‘ Key insight: You never touch ReplicaSets directly. They're an implementation detail. When you update a Deployment, it creates a new ReplicaSet and gradually shifts trafficβ€”that's how rollbacks work! Old ReplicaSets are kept (with 0 replicas) so you can roll back instantly.

πŸ›οΈ The Architecture: A Map of the Kingdom

block-beta columns 3 block:control:3 columns 4 API["πŸ“ž API Server"] ETCD["πŸ“š etcd"] SCHED["πŸ“‹ Scheduler"] CM["πŸ‘” Controller"] end space:3 N1["πŸͺ Node 1
πŸ€– Kubelet
πŸ“¦ Pod πŸ“¦ Pod"]:1 N2["πŸͺ Node 2
πŸ€– Kubelet
πŸ“¦ Pod πŸ“¦ Pod"]:1 N3["πŸͺ Node 3
πŸ€– Kubelet
πŸ“¦ Pod πŸ“¦ Pod"]:1 control --> N1 control --> N2 control --> N3

🧩 Control Plane Components Explained

ComponentWhat It DoesWhy It's Designed This WayIf It Dies...
πŸ“ž API ServerFront door for ALL communication. RESTful API that everything talks to.Single point of entry = security boundary, audit logging, authentication. In HA, multiple API servers behind a load balancer.You're locked out. Run 3+ for HA.
πŸ“š etcdDistributed key-value store using Raft consensus. THE source of truth for cluster state.Raft protocol = consistent even with node failures. Separate from API for modularity.πŸ’€ Total cluster loss. Backup religiously.
πŸ“‹ SchedulerWatches for unassigned Pods, picks optimal Node based on resources, affinity, taints.Decoupled from API = can be replaced/customized. Pluggable scoring algorithms.New Pods stay Pending. Existing keep running.
πŸ‘” Controller ManagerRuns control loops: Deployment controller, ReplicaSet controller, Node controller, etc.Each controller is single-purpose = easier to understand, debug, extend.Cluster stops self-healing. Drift not corrected.

πŸ”‘ Why is it designed this way?

Kubernetes follows a declarative, reconciliation-based architecture:

  1. You declare desired state: "I want 3 replicas of my app"
  2. Controllers constantly compare desired vs actual state
  3. Controllers take action to reconcile differences
  4. This loop runs forever, every few seconds

This is fundamentally different from imperative systems ("start 3 servers"). If something drifts, Kubernetes fixes it automatically.


πŸ“¦ Chapter 2: Pods and Deployments β€” The Core Building Blocks

πŸ“¦ Pods: The Atomic Unit of Kubernetes

A Pod is the smallest deployable unit. It's one or more containers that share:

  • 🌐 Network namespace (they communicate via localhost, share IP address)
  • πŸ’Ύ Storage volumes (optional shared filesystems)
  • ⏰ Lifecycle (scheduled together, start together, die together)

πŸ€” Why Pods instead of just Containers?

Sometimes you need tightly coupled containers:

  • Sidecar pattern: Main app + log shipper in same Pod
  • Ambassador pattern: Main app + proxy in same Pod
  • Adapter pattern: Main app + format converter in same Pod

These containers MUST be on the same node, share network, and scale together. That's what Pods provide.

⚠️ Important truth bomb: You almost never create Pods directly. That's amateur hour.

# pod.yaml - FOR EDUCATIONAL PURPOSES ONLY πŸ“š
# Creating this in production is a cry for help
apiVersion: v1
kind: Pod
metadata:
  name: my-lonely-pod
  labels:
    shame: "yes"           # πŸ˜…
    manually-created: "true"
spec:
  containers:
  - name: nginx
    image: nginx:1.25
    ports:
    - containerPort: 80

❓ Why not create Pods directly?

  • πŸ’€ If a Pod dies, it stays dead. No resurrection.
  • πŸ”„ No rolling updatesβ€”you'd have to delete and recreate
  • πŸ“Š No scalingβ€”you'd create each Pod manually
  • βͺ No rollbackβ€”hope you saved that old YAML!

πŸš€ Deployments: The Proper Wayβ„’

A Deployment is the standard way to run stateless applications. It provides:

FeatureWhat It DoesWhy You Need It
Declarative updatesYou say "version 2.0", K8s figures out howNo manual coordination needed
Rolling updatesGradual replacement of PodsZero downtime during deploys
RollbackUndo to any previous versionFix that 3 AM mistake in seconds
Self-healingDead Pods get replacedSleep through the night
ScalingChange replica count anytimeHandle traffic spikes
# deployment.yaml - This is what production systems use βœ…
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    # 🏷️ Use standard Kubernetes labels for consistency
    app.kubernetes.io/name: my-app
    app.kubernetes.io/version: "1.2.3"
    app.kubernetes.io/component: backend
spec:
  replicas: 3                    # 🎯 "I want 3 copies running at all times"
  
  # πŸ”„ Update strategy - how to replace old Pods with new ones
  strategy:
    type: RollingUpdate          # Default and recommended for stateless apps
    rollingUpdate:
      maxSurge: 1                # πŸ“ˆ Allow 1 extra Pod during update (4 total briefly)
      maxUnavailable: 0          # πŸ›‘οΈ Never have fewer than 3 running
      # Why these values? Prioritizes availability over speed.
      # For faster updates: maxSurge: 25%, maxUnavailable: 25%
  
  # 🎯 Selector: "Which Pods belong to this Deployment?"
  selector:
    matchLabels:
      app.kubernetes.io/name: my-app   # Must match template.metadata.labels!
      
  template:                      # πŸ“‹ Pod template - blueprint for each Pod
    metadata:
      labels:
        app.kubernetes.io/name: my-app  # ⚠️ MUST MATCH selector above!
        app.kubernetes.io/version: "1.2.3"
    spec:
      # πŸ›‘ Graceful shutdown configuration
      terminationGracePeriodSeconds: 60  # Give app 60s to finish requests
      
      containers:
      - name: app
        image: myregistry/myapp:v1.2.3   # 🏷️ Always use specific tags, NEVER :latest
        imagePullPolicy: IfNotPresent    # πŸ“₯ Don't re-pull if image exists locally
        
        # πŸšͺ Port declaration (documentation + service discovery)
        ports:
        - containerPort: 8080
          name: http                      # Named ports are clearer
        
        # πŸ’° Resource Management - ALWAYS SET THESE
        resources:
          requests:              # "I need at least this much to function"
            memory: "128Mi"      # Scheduler uses this for placement decisions
            cpu: "100m"          # 100 millicores = 0.1 CPU core
          limits:                # "Never let me exceed this"
            memory: "256Mi"      # Exceeding = OOMKilled πŸ’€
            cpu: "500m"          # Exceeding = throttled (not killed)
            # πŸ’‘ Note: Some teams omit CPU limits to avoid throttling.
            # If you set them, monitor for latency impacts.
        
        # πŸ” Environment variables (non-sensitive config)
        env:
        - name: LOG_LEVEL
          value: "info"
        - name: APP_ENV
          value: "production"

πŸ“Š Resource Units Explained

Understanding resource units is crucial for proper capacity planning:

UnitMeaningExampleNotes
100m100 millicores10% of 1 CPU core1000m = 1 full core
0.1Same as 100m10% of 1 CPU coreDecimal notation works too
128Mi128 mebibytes~134 MBBinary units (1024-based)
128M128 megabytes128 MBDecimal units (1000-based)
1Gi1 gibibyte~1.07 GBUse for memory typically

πŸ’‘ Best Practice for Setting Resources:

  1. Start with low requests, monitor actual usage with kubectl top
  2. Set memory limits ~1.5-2x requests initially
  3. Use Vertical Pod Autoscaler (VPA) for recommendations
  4. Memory limit = hard ceiling (OOMKill if exceeded)
  5. CPU limit = soft ceiling (throttling, not death) - some teams omit this

πŸ”„ The Rolling Update Dance

When you update a Deployment (new image, config change, etc.), here's exactly what happens:

sequenceDiagram participant You as πŸ‘€ You participant API as πŸ“ž API Server participant DC as πŸ‘” Deployment Controller participant Old as πŸ“¦ Old ReplicaSet (v1) participant New as πŸ“¦ New ReplicaSet (v2) You->>API: kubectl apply (image: v2) API->>DC: Deployment updated! DC->>New: Create new ReplicaSet Note over New: replicas: 0 β†’ 1 DC->>New: Start Pod v2 #1 Note over New: Pod starting... New-->>DC: Pod Ready! βœ… DC->>Old: Scale down Note over Old: replicas: 3 β†’ 2 DC->>New: Scale up Note over New: replicas: 1 β†’ 2 New-->>DC: Pod #2 Ready! βœ… DC->>Old: Scale down Note over Old: replicas: 2 β†’ 1 DC->>New: Scale up Note over New: replicas: 2 β†’ 3 New-->>DC: Pod #3 Ready! βœ… DC->>Old: Scale down Note over Old: replicas: 1 β†’ 0 Note over DC: πŸŽ‰ Rollout Complete! Note over Old: Kept for rollback!

πŸ”‘ Key insights:

  • At no point did we have zero running instances
  • Old ReplicaSet is kept (with 0 replicas) for instant rollback
  • Each new Pod must pass readiness probe before old Pod is terminated
  • Your users experienced zero downtime

βͺ Rollback: Your Safety Net

# 😱 Something's wrong! Roll back immediately!
kubectl rollout undo deployment/my-app

# πŸ“œ See rollout history
kubectl rollout history deployment/my-app

# βͺ Roll back to specific revision
kubectl rollout undo deployment/my-app --to-revision=2

# πŸ‘€ Watch rollout progress
kubectl rollout status deployment/my-app

πŸ’‘ Why rollback is instant: Remember those old ReplicaSets? Kubernetes just scales up the old one and scales down the new one. No image pulling, no waitingβ€”the old Pods were ready to go!


🌐 Chapter 3: Services β€” Stable Endpoints in a Chaotic World

πŸ€” The Problem Services Solve

Here's the challenge: Pods are ephemeral. They get IP addresses when created. When they die and are replaced, they get new IP addresses.

Pod my-app-abc123: IP 10.244.1.5   β†’   πŸ’€ Dies
Pod my-app-xyz789: IP 10.244.2.8   β†’   πŸ†• Created (different IP!)

Imagine if every time your favorite restaurant hired a new chef, you had to learn their home address to order food. That's Pods without Services.

Services provide:

  • 🏠 Stable IP address that never changes
  • πŸ”€ DNS name for easy discovery
  • βš–οΈ Load balancing across healthy Pods
  • πŸ” Service discovery via environment variables and DNS

🎯 Service Types: Choose Your Adventure

flowchart TB subgraph Internal["πŸ”’ Cluster Internal"] CIP["ClusterIP
Default type
Internal traffic only"] end subgraph External["🌍 External Access"] NP["NodePort
Development/Testing
Port 30000-32767"] LB["LoadBalancer
Production
Cloud provider LB"] end subgraph Special["πŸ”— Special Purpose"] EN["ExternalName
DNS alias
External services"] HL["Headless
clusterIP: None
Direct Pod access"] end
TypeWho Can AccessUse CaseCostExample
πŸ”’ ClusterIPInternal Pods onlyService-to-service communicationFreeAPI calling database
πŸšͺ NodePortExternal via NodeIP:30000-32767Development, on-premFreeTesting externally
☁️ LoadBalancerExternal via cloud LBProduction external accessπŸ’ΈπŸ’ΈπŸ’ΈPublic website
πŸ”— ExternalNameDNS CNAME recordAbstracting external depsFreedb.example.com β†’ RDS
πŸ“ HeadlessDirect Pod IPsStatefulSets, custom discoveryFreeDatabase clusters

πŸ“ ClusterIP Service Example (The Default)

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
  labels:
    app.kubernetes.io/name: my-app
spec:
  type: ClusterIP              # πŸ”’ Default, can omit
  
  # 🎯 Selector: "Route traffic to Pods with these labels"
  selector:
    app.kubernetes.io/name: my-app   # Must match Pod labels EXACTLY!
  
  ports:
  - name: http                 # πŸ“› Named ports are best practice
    port: 80                   # πŸšͺ Port the Service listens on
    targetPort: http           # 🎯 Use named port from Pod spec!
    protocol: TCP              # TCP is default, can omit

# πŸ’‘ Why separate port and targetPort?
# - Service presents a standard interface (port 80)
# - Pods can use any port internally (8080)
# - You can change Pod port without affecting clients

πŸ” How Service Discovery Works

When a Service is created, Kubernetes does two magical things:

flowchart LR subgraph PodA["πŸ“¦ Client Pod"] App["Your App
wants to call my-svc"] end DNS["🌐 CoreDNS
my-svc β†’ 10.96.45.12"] subgraph SVC["⚑ Service: my-svc
ClusterIP: 10.96.45.12"] EP["Endpoints:
10.244.1.5:8080
10.244.2.8:8080
10.244.3.3:8080"] end P1["πŸ“¦ Pod 1
10.244.1.5"] P2["πŸ“¦ Pod 2
10.244.2.8"] P3["πŸ“¦ Pod 3
10.244.3.3"] App -->|"1️⃣ DNS lookup"| DNS DNS -->|"2️⃣ Returns IP"| App App -->|"3️⃣ TCP connection"| SVC SVC -->|"4️⃣ Load balance"| P1 SVC -.->|"or"| P2 SVC -.->|"or"| P3

πŸ“› DNS Name Formats:

FormatWhen to UseExample
my-svcSame namespacehttp://my-svc/api
my-svc.other-nsDifferent namespacehttp://my-svc.payments/charge
my-svc.other-ns.svc.cluster.localFull FQDN (rarely needed)Cross-cluster scenarios

πŸ’‘ Pro tip: Always use the short form within the same namespace. It's cleaner and Kubernetes adds the suffix automatically.

πŸ”Œ Endpoints: The Magic Behind Services

Services don't magically know where Pods are. They maintain an Endpoints object:

# πŸ‘€ See which Pods a Service routes to
kubectl get endpoints my-app-service

NAME              ENDPOINTS                                         AGE
my-app-service    10.244.1.5:8080,10.244.2.8:8080,10.244.3.3:8080   5m

⚠️ If you see <none> for endpoints:

  • Your selector doesn't match any Pod labels
  • No Pods are passing readiness probes
  • Pods exist but in wrong namespace

This is the #1 debugging step when Services don't work!


πŸšͺ Chapter 4: Ingress β€” The Fancy Front Door

πŸ€” Why Ingress Exists

Services are great, but:

  • LoadBalancers cost money πŸ’Έ (one per service = budget death)
  • NodePorts are ugly 😬 (nobody wants myapp.com:31847)
  • No path-based routing (can't route /api and /web separately)
  • No SSL termination at Service level

Ingress provides:

  • πŸ›£οΈ Path-based routing (/api β†’ API service, /web β†’ Web service)
  • 🏠 Host-based routing (multiple domains, one IP)
  • πŸ” TLS/SSL termination (HTTPS handled at the edge)
  • πŸ’° Cost efficiency (one LoadBalancer for many services)

βš™οΈ How Ingress Works (Two Components)

flowchart TB subgraph You["πŸ‘€ You Create"] ING["πŸ“œ Ingress Resource
Just configuration/rules"] end subgraph Controller["πŸ€– Ingress Controller
(Must be installed separately!)"] IC["nginx-ingress
traefik
AWS ALB Controller
etc."] end subgraph Result["🌐 Actual Routing"] LB["☁️ Load Balancer"] SVC1["⚑ Service 1"] SVC2["⚑ Service 2"] end ING -->|"Controller reads"| IC IC -->|"Configures"| LB LB --> SVC1 LB --> SVC2

⚠️ Important: The Ingress resource alone does nothing! You need an Ingress Controller (nginx-ingress, traefik, etc.) actually installed in your cluster.

πŸ“ Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    # πŸ”§ Controller-specific settings (these are for nginx)
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  ingressClassName: nginx      # 🎯 Which controller handles this (not annotation!)
  
  # πŸ” TLS Configuration
  tls:
  - hosts:
    - myapp.example.com
    - api.example.com
    secretName: tls-secret     # Certificate stored as K8s Secret
  
  # πŸ›£οΈ Routing Rules
  rules:
  # Rule 1: myapp.example.com
  - host: myapp.example.com
    http:
      paths:
      - path: /api             # 🎯 /api/* goes to api-service
        pathType: Prefix       # Prefix = /api, /api/, /api/users all match
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /                # 🎯 Everything else goes to web-service
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
              
  # Rule 2: api.example.com (different domain)
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

πŸ”€ Traffic Flow Visualization

flowchart TB Internet((🌍 Internet)) subgraph Cluster["☸️ Kubernetes Cluster"] subgraph IC["🚦 Ingress Controller"] NGINX["nginx Pod
Reads Ingress rules
Routes traffic"] end ING["πŸ“œ Ingress Resource
myapp.example.com:
/api β†’ api-svc
/ β†’ web-svc"] subgraph Services["⚑ Services"] API["api-service"] WEB["web-service"] end subgraph Pods["πŸ“¦ Pods"] AP1["API Pod 1"] AP2["API Pod 2"] WP1["Web Pod 1"] WP2["Web Pod 2"] end end Internet -->|"https://myapp.example.com/api/users"| IC IC -.->|"reads config"| ING IC -->|"/api/*"| API IC -->|"/*"| WEB API --> AP1 API --> AP2 WEB --> WP1 WEB --> WP2

πŸ’‘ Path Types Explained:

TypeMatchesUse Case
Prefix/api, /api/, /api/usersMost common, REST APIs
ExactOnly /api exactlySpecific endpoints
ImplementationSpecificController decidesLegacy, avoid

βš™οΈ Chapter 5: ConfigMaps and Secrets β€” Externalizing Configuration

🎯 Why Externalize Configuration?

The 12-Factor App methodology says: Store config in the environment, not in code.

Why?

  • πŸ”„ Same image, different environments (dev/staging/prod)
  • πŸ”’ Secrets stay secret (not in Git history!)
  • ⚑ Change config without rebuilding (faster deployments)
  • πŸ‘₯ Separation of concerns (devs write code, ops manage config)

πŸ“„ ConfigMaps: For Non-Sensitive Data

ConfigMaps store configuration data as key-value pairs or entire files:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  # πŸ”‘ Simple key-value pairs
  LOG_LEVEL: "info"
  DATABASE_HOST: "db.internal.svc.cluster.local"
  FEATURE_NEW_UI: "true"
  MAX_CONNECTIONS: "100"
  
  # πŸ“„ Entire configuration files
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        
        location / {
            proxy_pass http://backend:8080;
            proxy_set_header Host $host;
        }
    }
    
  application.yaml: |
    spring:
      profiles:
        active: production
      datasource:
        url: jdbc:postgresql://db:5432/myapp
    logging:
      level:
        root: INFO

πŸ” Secrets: For Sensitive Data

⚠️ Critical Warning: Kubernetes Secrets are base64 encoded, NOT encrypted by default. Anyone with cluster access can decode them!

apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque                    # Generic secret type
stringData:                     # πŸ’‘ Use stringData, K8s encodes automatically
  DB_PASSWORD: "super-secret-password-123"
  API_KEY: "sk-abc123def456"
  JWT_SECRET: "my-jwt-signing-key"

# ⚠️ The 'data' field requires base64 encoding:
# data:
#   DB_PASSWORD: c3VwZXItc2VjcmV0LXBhc3N3b3JkLTEyMw==

πŸ”’ For Real Security:

  • Enable encryption at rest for etcd
  • Use external secret managers: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault
  • Use the External Secrets Operator to sync external secrets
  • Implement RBAC to restrict Secret access

πŸ’‰ Injecting Configuration into Pods

There are three ways to use ConfigMaps and Secrets in Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: configured-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: configured-app
  template:
    metadata:
      labels:
        app: configured-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        
        # ═══════════════════════════════════════════════════════════
        # πŸ“‹ METHOD 1: Load ALL keys as environment variables
        # ═══════════════════════════════════════════════════════════
        envFrom:
        - configMapRef:
            name: app-config        # All keys become env vars
        - secretRef:
            name: app-secrets       # ⚠️ Be careful, exposes all secrets
        
        # ═══════════════════════════════════════════════════════════
        # 🎯 METHOD 2: Cherry-pick specific values (RECOMMENDED)
        # ═══════════════════════════════════════════════════════════
        env:
        - name: DATABASE_PASSWORD   # Env var name in container
          valueFrom:
            secretKeyRef:
              name: app-secrets     # Secret name
              key: DB_PASSWORD      # Key within Secret
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: LOG_LEVEL
        
        # ═══════════════════════════════════════════════════════════
        # πŸ“ METHOD 3: Mount as files (great for config files)
        # ═══════════════════════════════════════════════════════════
        volumeMounts:
        - name: config-volume
          mountPath: /etc/config    # ConfigMap keys become files here
          readOnly: true
        - name: secret-volume
          mountPath: /etc/secrets
          readOnly: true
          
      volumes:
      - name: config-volume
        configMap:
          name: app-config
          items:                    # πŸ’‘ Optional: mount specific keys only
          - key: nginx.conf
            path: nginx.conf        # /etc/config/nginx.conf
      - name: secret-volume
        secret:
          secretName: app-secrets
          defaultMode: 0400         # πŸ”’ Restrictive permissions

πŸ’‘ Which Method to Use?

MethodBest ForProsCons
envFromSimple apps, all config neededEasy, automaticExposes everything, naming conflicts
env + valueFromProduction appsExplicit, documentedMore YAML
Volume mountsConfig files (nginx.conf, etc.)Files stay filesApp must read files

πŸ₯ Chapter 6: Health Checks β€” Keeping Your Pods Honest

πŸ€” Why Health Checks Matter

Without health checks:

  • 🧟 Zombie Pods - Process running but not responding
  • πŸ’€ Cascading failures - Bad Pod gets traffic, fails, repeats
  • 😴 Slow startup issues - Pod not ready but getting traffic

With health checks:

  • πŸ”„ Automatic recovery - Unhealthy containers restarted
  • 🚦 Traffic control - Only ready Pods receive traffic
  • ⏰ Startup tolerance - Slow apps given time to initialize

πŸ” The Three Probe Types

flowchart TB subgraph Probes["πŸ₯ Health Probe Types"] LP["πŸ’“ Liveness Probe
'Are you alive?'"] RP["βœ… Readiness Probe
'Can you serve traffic?'"] SP["πŸš€ Startup Probe
'Are you done starting?'"] end subgraph Actions["πŸ“‹ On Failure"] LPA["πŸ”„ Container KILLED
and restarted"] RPA["🚫 Removed from
Service endpoints"] SPA["⏸️ Other probes
disabled until pass"] end LP --> LPA RP --> RPA SP --> SPA
ProbeQuestion It AnswersOn FailureWhen to Use
πŸ’“ Liveness"Is the process stuck/deadlocked?"Container killed & restartedAlways. Catches hung processes.
βœ… Readiness"Can you handle requests right now?"Removed from Service (no traffic)Always. Prevents traffic to unready Pods.
πŸš€ Startup"Have you finished initializing?"Liveness/Readiness probes pausedSlow-starting apps (Java, legacy)

πŸ“ Complete Health Check Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: healthy-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: healthy-app
  template:
    metadata:
      labels:
        app: healthy-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
          name: http
        
        # ═══════════════════════════════════════════════════════════
        # πŸš€ STARTUP PROBE (for slow-starting applications)
        # "Disable liveness/readiness until startup completes"
        # ═══════════════════════════════════════════════════════════
        startupProbe:
          httpGet:
            path: /healthz
            port: http
          failureThreshold: 30      # 30 Γ— 10s = 5 min max startup
          periodSeconds: 10
          # πŸ’‘ Once this passes, liveness/readiness probes take over
        
        # ═══════════════════════════════════════════════════════════
        # πŸ’“ LIVENESS PROBE
        # "If this fails, kill and restart the container"
        # ═══════════════════════════════════════════════════════════
        livenessProbe:
          httpGet:
            path: /healthz          # πŸ’‘ Lightweight endpoint
            port: http
          initialDelaySeconds: 0    # Startup probe handles delay
          periodSeconds: 10         # πŸ”„ Check every 10 seconds
          timeoutSeconds: 5         # ⏱️ Timeout per check
          failureThreshold: 3       # ❌ Fail 3 times = restart
          successThreshold: 1       # βœ… 1 success = healthy
        
        # ═══════════════════════════════════════════════════════════
        # βœ… READINESS PROBE
        # "If this fails, stop sending traffic to this Pod"
        # ═══════════════════════════════════════════════════════════
        readinessProbe:
          httpGet:
            path: /ready            # πŸ’‘ Can be different from liveness!
            port: http
          initialDelaySeconds: 0    # Startup probe handles delay
          periodSeconds: 5          # Check more frequently
          timeoutSeconds: 3
          failureThreshold: 3       # Fail 3 times = remove from LB
          successThreshold: 1

πŸ”§ Probe Types Available

# 🌐 HTTP GET (most common)
httpGet:
  path: /healthz
  port: 8080
  httpHeaders:                      # Optional custom headers
  - name: Custom-Header
    value: Awesome

# πŸ”Œ TCP Socket (for non-HTTP services)
tcpSocket:
  port: 3306                        # Just checks if port is open

# πŸ’» Exec (run a command)
exec:
  command:
  - cat
  - /tmp/healthy
  # Exit code 0 = healthy

# 🌐 gRPC (for gRPC services, K8s 1.24+)
grpc:
  port: 50051

πŸ’‘ Health Check Best Practices

PracticeWhy
Liveness β‰  Readiness endpointsLiveness: "am I broken?" Readiness: "am I ready for MORE traffic?"
Don't check dependencies in livenessIf DB is down, restarting your app won't fix it!
DO check dependencies in readinessDon't send traffic if you can't serve it
Set appropriate timeoutsToo short = false positives. Too long = slow recovery.
Use startup probes for slow appsPrevents liveness probe killing during startup
Keep probes lightweightHeavy probes can cause issues under load

πŸ›‘ Chapter 7: Graceful Shutdown β€” The Art of Dying Well

πŸ€” Why Graceful Shutdown Matters

When Kubernetes terminates a Pod (during updates, scaling down, node drain), what happens to in-flight requests?

Without graceful shutdown:

  • πŸ’₯ Requests get dropped mid-response
  • 😠 Users see 502/503 errors
  • πŸ”„ Retries create thundering herd

With graceful shutdown:

  • βœ… Current requests complete
  • 🚫 New requests go elsewhere
  • 😊 Users notice nothing

⏰ The Termination Sequence

sequenceDiagram participant K8s as ☸️ Kubernetes participant EP as πŸ”Œ Endpoints Controller participant Pod as πŸ“¦ Pod participant App as πŸ’» Your Application K8s->>Pod: 1️⃣ Pod marked for termination K8s->>EP: 2️⃣ Remove Pod from Service endpoints Note over EP: Traffic stops flowing to Pod par Parallel execution K8s->>Pod: 3️⃣ Run preStop hook (if defined) Note over Pod: e.g., sleep 5 and K8s->>App: 4️⃣ Send SIGTERM Note over App: Your app should:
β€’ Stop accepting new requests
β€’ Finish in-flight requests
β€’ Close DB connections
β€’ Flush buffers end Note over K8s: ⏰ Wait terminationGracePeriodSeconds alt App exits cleanly App-->>K8s: Exit 0 βœ… Note over K8s: Clean shutdown! else Timeout exceeded K8s->>App: 5️⃣ SIGKILL (force kill) πŸ’€ Note over K8s: Hard termination end

πŸ“ Graceful Shutdown Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: graceful-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: graceful-app
  template:
    metadata:
      labels:
        app: graceful-app
    spec:
      # ⏰ How long to wait before SIGKILL (default: 30s)
      terminationGracePeriodSeconds: 60
      
      containers:
      - name: app
        image: myapp:latest
        
        # πŸ”§ Lifecycle hooks
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                # πŸ’‘ Why sleep? Give load balancer time to update!
                # Endpoints update is async - traffic may still
                # be routing here for a few seconds
                sleep 5

πŸ’» Application-Side SIGTERM Handling

Your application MUST handle SIGTERM properly:

Node.js:

process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully');
  
  // Stop accepting new connections
  server.close(async () => {
    await database.disconnect();
    process.exit(0);
  });
  
  // Force exit after timeout
  setTimeout(() => process.exit(1), 25000);
});

Java Spring Boot:

# application.yaml
server:
  shutdown: graceful
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Python:

import signal, sys

def handle_sigterm(signum, frame):
    print("SIGTERM received, shutting down...")
    # Cleanup code here
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

πŸ† Chapter 8: Production Best Practices β€” The Checklist That Saves Careers

βœ… The Production Readiness Checklist

πŸ”² RESOURCES
  βœ… Resource requests AND limits set for all containers
  βœ… Requests based on actual observed usage
  βœ… Memory limit = hard ceiling (OOMKill risk understood)
  
πŸ”² HEALTH & AVAILABILITY  
  βœ… Startup probe configured (for slow-starting apps)
  βœ… Liveness probe configured
  βœ… Readiness probe configured (different endpoint from liveness!)
  βœ… Multiple replicas (minimum 2, recommend 3+)
  βœ… PodDisruptionBudget defined
  βœ… Pod anti-affinity (spread across nodes)
  βœ… TopologySpreadConstraints (spread across zones)
  
πŸ”² LIFECYCLE
  βœ… Application handles SIGTERM gracefully
  βœ… terminationGracePeriodSeconds set appropriately
  βœ… preStop hook if needed (for LB drain time)
  
πŸ”² SECURITY
  βœ… Container runs as non-root user
  βœ… Read-only root filesystem (if possible)
  βœ… No privilege escalation
  βœ… Drop all capabilities
  βœ… Secrets in Secret objects (not ConfigMaps!)
  βœ… Network policies restrict Pod communication
  βœ… Service account explicitly set (not default)
  
πŸ”² IMAGES
  βœ… Specific image tag (NEVER :latest in production)
  βœ… imagePullPolicy: IfNotPresent
  βœ… Image from trusted registry
  βœ… Image scanned for vulnerabilities
  
πŸ”² OBSERVABILITY
  βœ… Logging to stdout/stderr
  βœ… Metrics exposed (/metrics endpoint)
  βœ… Alerts configured for key metrics

πŸ”² SCALING
  βœ… HorizontalPodAutoscaler configured (if applicable)

🌍 High Availability: Spreading Pods Across Failure Domains

Don't put all your eggs in one basket (node):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ha-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ha-app
  template:
    metadata:
      labels:
        app: ha-app
    spec:
      # 🌍 Spread Pods across nodes
      affinity:
        podAntiAffinity:
          # 🎯 "Prefer" = best effort, won't block scheduling
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: ha-app
              topologyKey: kubernetes.io/hostname    # Different nodes
      
      # 🌐 Spread across availability zones
      topologySpreadConstraints:
      - maxSkew: 1                                   # Max difference between zones
        topologyKey: topology.kubernetes.io/zone    # Spread across AZs
        whenUnsatisfiable: ScheduleAnyway           # Don't block if can't satisfy
        labelSelector:
          matchLabels:
            app: ha-app
      
      containers:
      - name: app
        image: myapp:latest

πŸ›‘οΈ Pod Disruption Budgets: Maintaining Availability During Disruptions

PDBs prevent cluster operations (node drains, upgrades) from killing too many Pods:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ha-app-pdb
spec:
  minAvailable: 2                 # Always keep at least 2 running
  # OR: maxUnavailable: 1         # Never have more than 1 down
  selector:
    matchLabels:
      app: ha-app

πŸ“ˆ Horizontal Pod Autoscaler: Automatic Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # Scale up when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

πŸ”’ Security: Running Securely

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      # πŸ” Use dedicated service account
      serviceAccountName: secure-app-sa
      automountServiceAccountToken: false  # Don't mount token unless needed
      
      # πŸ”’ Pod-level security context
      securityContext:
        runAsNonRoot: true           # 🚫 Containers cannot run as root
        runAsUser: 1000              # πŸ‘€ Run as UID 1000
        runAsGroup: 1000             # πŸ‘₯ Run as GID 1000
        fsGroup: 1000                # πŸ“ Volume ownership
        seccompProfile:
          type: RuntimeDefault       # πŸ›‘οΈ Apply default seccomp profile
      
      containers:
      - name: app
        image: myapp:v1.0.0
        
        # πŸ”’ Container-level security context
        securityContext:
          allowPrivilegeEscalation: false    # 🚫 Can't gain privileges
          readOnlyRootFilesystem: true       # πŸ“ Can't write to filesystem
          capabilities:
            drop:
            - ALL                             # 🚫 Drop all Linux capabilities

🌐 Network Policy: Zero-Trust Networking

By default, all Pods can talk to all other Pods. Lock it down:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-app-netpol
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
  - Ingress
  - Egress
  
  # πŸ“₯ Who can talk TO my pods?
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx      # Only from ingress namespace
    ports:
    - port: 8080
  
  # πŸ“€ Who can my pods talk TO?
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database           # Only to database namespace
    ports:
    - port: 5432
  - to:
    - namespaceSelector: {}        # Allow DNS
    ports:
    - port: 53
      protocol: UDP

πŸ”§ Chapter 9: The kubectl Survival Guide

πŸ“‹ Commands Organized by Task

πŸ‘€ Viewing Resources

# πŸ“‹ List resources
kubectl get pods                        # Pods in current namespace
kubectl get pods -A                     # ALL namespaces
kubectl get pods -o wide                # Extra columns (node, IP)
kubectl get pods -w                     # Watch mode (live updates)
kubectl get all                         # Common resources (not actually all!)

# πŸ” Detailed information
kubectl describe pod <n>             # Full details + events
kubectl describe deployment <n>      # Deployment details

# πŸ“Š Resource usage (requires metrics-server)
kubectl top pods                        # CPU/memory usage
kubectl top nodes                       # Node resource usage

πŸ” Debugging

# πŸ“œ Logs
kubectl logs <pod>                      # Container logs
kubectl logs <pod> -f                   # Stream logs (like tail -f)
kubectl logs <pod> --previous           # Logs from crashed container
kubectl logs <pod> -c <container>       # Specific container in Pod
kubectl logs -l app=myapp               # Logs from all Pods with label

# 🐚 Shell access
kubectl exec -it <pod> -- /bin/sh       # Shell into container
kubectl exec -it <pod> -- /bin/bash     # If bash available
kubectl exec <pod> -- cat /etc/config   # Run single command

# πŸ”Œ Port forwarding
kubectl port-forward <pod> 8080:80      # Local:Pod
kubectl port-forward svc/<svc> 8080:80  # Via Service

# πŸ“‹ Events (crucial for debugging!)
kubectl get events --sort-by='.lastTimestamp'
kubectl get events --field-selector type=Warning

✏️ Making Changes

# πŸ“ Apply configuration
kubectl apply -f manifest.yaml          # Create or update
kubectl apply -f ./manifests/           # Apply all files in directory
kubectl apply -k ./kustomize/           # Apply with Kustomize

# πŸ—‘οΈ Delete resources
kubectl delete -f manifest.yaml         # Delete by file
kubectl delete pod <n>               # Delete specific Pod
kubectl delete pods -l app=myapp        # Delete by label

# πŸ“Š Scaling
kubectl scale deployment <n> --replicas=5
kubectl autoscale deployment <n> --min=2 --max=10 --cpu-percent=80

πŸ”„ Rollout Management

# πŸ‘€ Status
kubectl rollout status deployment <n>     # Watch rollout
kubectl rollout history deployment <n>    # View history

# βͺ Rollback
kubectl rollout undo deployment <n>                   # Previous version
kubectl rollout undo deployment <n> --to-revision=2   # Specific revision

# πŸ”„ Restart
kubectl rollout restart deployment <n>    # Trigger rolling restart

πŸ”§ Context & Namespace

# πŸ“ Context (cluster) management
kubectl config get-contexts             # List clusters
kubectl config use-context <n>       # Switch cluster
kubectl config current-context          # Show current

# πŸ“ Namespace management
kubectl get namespaces
kubectl config set-context --current --namespace=<ns>  # Set default! 🎯

πŸ”₯ Chapter 10: Troubleshooting β€” A Chronicle of Preventable Suffering

πŸ—ΊοΈ The Troubleshooting Flowchart

flowchart TD Start["πŸ”₯ Something is broken!"] Start --> GetPods["kubectl get pods"] GetPods --> Status{"What's the Status?"} Status -->|"⏳ Pending"| Pending["kubectl describe pod"] Pending --> PendingCause{"Check Events section"} PendingCause -->|"Insufficient CPU/memory"| Resources["Scale cluster or reduce requests"] PendingCause -->|"No nodes match"| Affinity["Check nodeSelector/affinity"] PendingCause -->|"PVC not bound"| PVC["kubectl get pvc"] PendingCause -->|"Taint not tolerated"| Taint["Add toleration or remove taint"] Status -->|"πŸ–ΌοΈ ImagePullBackOff"| Image["Check describe pod Events"] Image --> ImageFix["β€’ Typo in image name?
β€’ Tag exists?
β€’ Private registry auth?
β€’ Network to registry?"] Status -->|"πŸ’₯ CrashLoopBackOff"| Crash["kubectl logs --previous"] Crash --> CrashCause{"What killed it?"} CrashCause -->|"Application error"| AppFix["Fix your code πŸ˜…"] CrashCause -->|"OOMKilled"| OOM["Increase memory limits"] CrashCause -->|"Exit code 1"| Config["Check env vars & config"] CrashCause -->|"Exit code 137"| SIGKILL["OOMKilled or slow shutdown"] Status -->|"βœ… Running but broken"| Running["kubectl logs -f"] Running --> SvcCheck{"Is Service working?"} SvcCheck --> Endpoints["kubectl get endpoints"] Endpoints -->|"No endpoints"| Labels["🏷️ CHECK YOUR LABELS!
Selector β‰  Pod labels"] Endpoints -->|"Has endpoints"| AppDebug["Check app logs
exec into Pod"]

🚨 The Classic Failures

⏳ Pending - "Waiting in Limbo"

NAME                     READY   STATUS    RESTARTS   AGE
my-app-abc123            0/1     Pending   0          10m

Translation: Scheduler can't find a home for your Pod.

Debug:

kubectl describe pod my-app-abc123
# Look at the "Events" section at the bottom!

Common causes & fixes:

Event MessageCauseFix
Insufficient cpuNo node has enough CPUReduce requests or add nodes
Insufficient memoryNo node has enough memoryReduce requests or add nodes
node(s) had taintTaints blockingAdd tolerations or remove taints
didn't match Pod's node affinityAffinity mismatchFix nodeSelector/affinity rules
persistentvolumeclaim not foundPVC missingCreate the PVC

πŸ–ΌοΈ ImagePullBackOff - "Can't Get Your Container"

NAME                     READY   STATUS             RESTARTS   AGE
my-app-abc123            0/1     ImagePullBackOff   0          5m

Translation: Kubernetes can't download your container image.

Checklist:

  • πŸ”€ Image name spelled correctly? (typos are #1 cause!)
  • 🏷️ Tag exists? Did you push it?
  • πŸ” Private registry? Add imagePullSecrets
  • 🌐 Can nodes reach the registry? (network/firewall)
  • ⏰ Registry rate limiting? (Docker Hub!)

πŸ’₯ CrashLoopBackOff - "Repeatedly Dying"

NAME                     READY   STATUS             RESTARTS   AGE
my-app-abc123            0/1     CrashLoopBackOff   5          3m

Translation: Your container starts, crashes, restarts... forever.

Debug:

kubectl logs my-app-abc123 --previous
kubectl describe pod my-app-abc123  # Check "Last State" section

Exit codes:

Exit CodeMeaningCommon Cause
1Application errorCheck logs!
137SIGKILL (128+9)OOMKilled
143SIGTERM (128+15)Graceful shutdown
126Command not executableBad entrypoint
127Command not foundTypo in command

πŸ”Œ No Endpoints - "Service Can't Find Pods"

$ kubectl get endpoints my-service
NAME         ENDPOINTS   AGE
my-service   <none>      5m    # 😱 No Pods found!

Translation: Your Service selector doesn't match any Pod labels.

Debug:

# What is the Service looking for?
kubectl get service my-service -o yaml | grep -A5 selector

# What labels do Pods have?
kubectl get pods --show-labels

# Compare them! They must match EXACTLY.

πŸ“¦ Chapter 11: The Complete Production Example

Here's everything we've learned, combined into a production-ready deployment:

# πŸ—οΈ Complete Production-Ready Kubernetes Application
# ═══════════════════════════════════════════════════════════════════════════

---
# πŸ“ Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production

---
# πŸ“„ ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"

---
# πŸ” Secret
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
type: Opaque
stringData:
  API_KEY: "your-api-key-here"
  DB_PASSWORD: "super-secret-password"

---
# πŸ” Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: production-app
  namespace: production
automountServiceAccountToken: false

---
# πŸš€ Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
  namespace: production
  labels:
    app.kubernetes.io/name: production-app
    app.kubernetes.io/version: "1.0.0"
spec:
  replicas: 3
  
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  
  selector:
    matchLabels:
      app.kubernetes.io/name: production-app
      
  template:
    metadata:
      labels:
        app.kubernetes.io/name: production-app
        app.kubernetes.io/version: "1.0.0"
    spec:
      serviceAccountName: production-app
      terminationGracePeriodSeconds: 60
      
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/name: production-app
              topologyKey: kubernetes.io/hostname
      
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: production-app
      
      containers:
      - name: app
        image: nginx:1.25-alpine
        imagePullPolicy: IfNotPresent
        
        ports:
        - containerPort: 80
          name: http
        
        envFrom:
        - configMapRef:
            name: app-config
        env:
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: API_KEY
        
        resources:
          requests:
            memory: "64Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "200m"
        
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        
        startupProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 30
          periodSeconds: 10
        
        livenessProbe:
          httpGet:
            path: /
            port: http
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /
            port: http
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]

---
# ⚑ Service
apiVersion: v1
kind: Service
metadata:
  name: production-app-service
  namespace: production
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: production-app
  ports:
  - name: http
    port: 80
    targetPort: http

---
# πŸ›‘οΈ PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: production-app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: production-app

---
# πŸ“ˆ HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: production-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: production-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

πŸ§ͺ Try It Yourself!

# πŸ–₯️ Start a local cluster
minikube start
# OR
kind create cluster

# πŸ“¦ Deploy
kubectl apply -f complete-app.yaml

# πŸ‘€ Watch
kubectl get pods -n production -w

# πŸ“Š Check status
kubectl rollout status deployment/production-app -n production

# πŸ§ͺ Test
kubectl port-forward -n production svc/production-app-service 8080:80
curl http://localhost:8080

# πŸ—‘οΈ Cleanup
kubectl delete -f complete-app.yaml

πŸ“š TL;DR β€” The One-Page Survival Guide

πŸ“¦ Resource🎯 PurposeπŸ”§ Key Command
PodRuns containerskubectl get pods
DeploymentManages Podskubectl get deployments
ServiceStable endpointkubectl get services
IngressHTTP routingkubectl get ingress
ConfigMapConfigkubectl get configmaps
SecretSensitive configkubectl get secrets
PDBAvailabilitykubectl get pdb
HPAAuto-scalingkubectl get hpa

πŸ” Debug Flow

kubectl get pods              # 1️⃣ What's running?
kubectl describe pod <n>      # 2️⃣ What's wrong? (check Events!)
kubectl logs <n>              # 3️⃣ What did it say?
kubectl logs <n> --previous   # 4️⃣ Why did it die?
kubectl exec -it <n> -- sh    # 5️⃣ Let me look inside
kubectl get endpoints <svc>   # 6️⃣ Can Service find Pods?

πŸ† Golden Rules

  1. βœ… Always set resource requests AND limits
  2. βœ… Configure startup, liveness AND readiness probes
  3. βœ… Labels must match between selector and Pod template
  4. βœ… Handle SIGTERM for graceful shutdown
  5. βœ… Never use :latest tag in production
  6. βœ… When in doubt: kubectl describe and check Events
  7. βœ… Check endpoints when Services don't work
  8. βœ… Run as non-root with minimal capabilities

πŸ“– Glossary

TermDefinition
PodSmallest deployable unit; 1+ containers sharing network/storage
DeploymentController managing ReplicaSets; handles updates, rollbacks
ReplicaSetMaintains specified number of Pod replicas
ServiceStable network endpoint routing to Pods
IngressHTTP/HTTPS routing, TLS termination
ConfigMapNon-sensitive configuration data
SecretSensitive data (base64 encoded by default)
NamespaceVirtual cluster for resource isolation
ProbeHealth check (startup, liveness, readiness)
PDBPod Disruption Budget; protects availability
HPAHorizontal Pod Autoscaler; automatic scaling
etcdDistributed key-value store; cluster state
KubeletNode agent; manages Pods on each node
SIGTERMTermination signal; graceful shutdown

May your Pods be healthy, your rollouts smooth, and your YAML forever valid. ☸️

Now go forth and orchestrate! πŸš€


πŸ“š Further Reading