Kubernetes Best Practices for Production Deployments
Kubernetes has become the de facto standard for container orchestration. However, running Kubernetes in production requires careful planning and adherence to best practices.
Resource Management
Resource Requests and Limits
Always define resource requests and limits for your containers:
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Quality of Service Classes
Kubernetes assigns QoS classes based on resource configuration:
- Guaranteed: Requests equal limits
- Burstable: Requests less than limits
- BestEffort: No requests or limits defined
High Availability
Pod Disruption Budgets
Protect your applications during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: myapp
Multi-Zone Deployments
Distribute pods across availability zones:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: topology.kubernetes.io/zone
Security Best Practices
Network Policies
Implement network segmentation:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Pod Security Standards
Use Pod Security Admission:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Monitoring and Observability
Health Checks
Implement proper liveness and readiness probes:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Logging
Use structured logging with proper log levels:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<match **>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
</match>
Backup and Disaster Recovery
Velero Backup
# Install Velero
velero install --provider aws --bucket k8s-backups --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1
# Create backup schedule
velero schedule create daily-backup --schedule="0 2 * * *" --include-namespaces production --ttl 720h
# Restore from backup
velero restore create --from-backup daily-backup-20250115
Configuration Management
ConfigMaps and Secrets
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
app.properties: |
server.port=8080
logging.level=INFO
---
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
stringData:
database-url: postgresql://db:5432/myapp
api-key: super-secret-key
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
template:
spec:
containers:
- name: app
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secrets
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: app-config
Cost Optimization
Resource Requests and Limits
Set appropriate values:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
Horizontal Pod Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Conclusion
Following these Kubernetes best practices will help you build reliable, secure, and cost-effective production deployments. Remember to continuously monitor and optimize your clusters based on real-world usage patterns.



