Overview
This recipe pairs a Deployment with an HPA — the autoscaler scales replicas between 2 and 10 based on CPU and memory pressure.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
labels:
app: web
spec:
replicas: 2
selector:
matchLabels:
app: web
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # zero-downtime rollouts
template:
metadata:
labels:
app: web
spec:
terminationGracePeriodSeconds: 60
containers:
- name: web
image: my-registry/web:latest
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /api/healthz
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/healthz
port: 3000
initialDelaySeconds: 20
periodSeconds: 15
failureThreshold: 3
hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # avoid flapping
Tips
- Always set
requests— HPA uses them to compute utilisation maxUnavailable: 0ensures zero-downtime rolling deploys- Pair with a
PodDisruptionBudget(minAvailable: 1) for cluster upgrades