Kubernetes Security Hardening: From Cluster Setup to Runtime Protection
Complete guide to securing Kubernetes clusters, covering RBAC, network policies, pod security, and monitoring.
12 min read
ibrahimsql
2,302 words
Kubernetes Security Hardening: From Cluster Setup to Runtime Protection#
Kubernetes has become the de facto standard for container orchestration, but its complexity introduces numerous security challenges. This comprehensive guide covers essential security practices for hardening Kubernetes clusters.
Kubernetes Security Architecture#
The 4C's of Cloud Native Security#
- Cloud: Infrastructure and cloud provider security
- Cluster: Kubernetes cluster security
- Container: Container image and runtime security
- Code: Application code security
Attack Vectors in Kubernetes#
- API Server: Unauthorized access to Kubernetes API
- etcd: Direct access to cluster state
- Kubelet: Node agent vulnerabilities
- Container Runtime: Docker/containerd security issues
- Network: Pod-to-pod and external communications
- RBAC: Privilege escalation through misconfigurations
Cluster Hardening#
API Server Security#
# kube-apiserver configuration apiVersion: v1 kind: Pod metadata: name: kube-apiserver spec: containers: - name: kube-apiserver command: - kube-apiserver # Enable audit logging - --audit-log-path=/var/log/audit.log - --audit-log-maxage=30 - --audit-log-maxbackup=3 - --audit-log-maxsize=100 - --audit-policy-file=/etc/kubernetes/audit-policy.yaml # Disable anonymous auth - --anonymous-auth=false # Enable RBAC - --authorization-mode=Node,RBAC # Secure communication - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key - --client-ca-file=/etc/kubernetes/pki/ca.crt # Disable insecure port - --insecure-port=0 # Enable admission controllers - --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,ResourceQuota,LimitRanger
Audit Policy Configuration#
# /etc/kubernetes/audit-policy.yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: # Log all requests at RequestResponse level - level: RequestResponse resources: - group: "" resources: ["secrets", "configmaps"] # Log all authentication failures - level: Request users: ["system:anonymous"] # Log privilege escalation attempts - level: RequestResponse verbs: ["create", "update", "patch"] resources: - group: "rbac.authorization.k8s.io" resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"] # Log pod creation/deletion - level: Request verbs: ["create", "delete"] resources: - group: "" resources: ["pods"]
etcd Security#
# etcd secure configuration etcd \ --cert-file=/etc/kubernetes/pki/etcd/server.crt \ --key-file=/etc/kubernetes/pki/etcd/server.key \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \ --peer-key-file=/etc/kubernetes/pki/etcd/peer.key \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \ --client-cert-auth=true \ --peer-client-cert-auth=true
RBAC (Role-Based Access Control)#
Principle of Least Privilege#
# Service Account apiVersion: v1 kind: ServiceAccount metadata: name: app-service-account namespace: production --- # Role with minimal permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: production name: app-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"] - apiGroups: [""] resources: ["configmaps"] verbs: ["get"] --- # RoleBinding apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: app-rolebinding namespace: production subjects: - kind: ServiceAccount name: app-service-account namespace: production roleRef: kind: Role name: app-role apiGroup: rbac.authorization.k8s.io
RBAC Audit Script#
#!/usr/bin/env python3 """ Kubernetes RBAC Audit Tool """ import yaml from kubernetes import client, config from collections import defaultdict def audit_rbac(): """ Audit RBAC configurations for security issues """ config.load_kube_config() rbac_v1 = client.RbacAuthorizationV1Api() issues = [] # Check ClusterRoleBindings cluster_role_bindings = rbac_v1.list_cluster_role_binding() for binding in cluster_role_bindings.items: # Check for overly permissive bindings if binding.role_ref.name in ['cluster-admin', 'admin']: for subject in binding.subjects or []: if subject.kind == 'User' and subject.name != 'kubernetes-admin': issues.append(f"User {subject.name} has {binding.role_ref.name} access") elif subject.kind == 'ServiceAccount': issues.append(f"ServiceAccount {subject.namespace}/{subject.name} has {binding.role_ref.name} access") # Check for wildcard permissions cluster_roles = rbac_v1.list_cluster_role() for role in cluster_roles.items: for rule in role.rules or []: if '*' in (rule.verbs or []): issues.append(f"ClusterRole {role.metadata.name} has wildcard verbs") if '*' in (rule.resources or []): issues.append(f"ClusterRole {role.metadata.name} has wildcard resources") return issues def generate_rbac_report(): """ Generate comprehensive RBAC report """ issues = audit_rbac() print("=== Kubernetes RBAC Security Audit ===") print(f"Found {len(issues)} potential security issues:\n") for i, issue in enumerate(issues, 1): print(f"{i}. {issue}") if not issues: print("No obvious RBAC security issues found.") if __name__ == "__main__": generate_rbac_report()
Pod Security#
Pod Security Standards#
# Namespace with Pod Security Standards apiVersion: v1 kind: Namespace metadata: name: secure-namespace labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted --- # Secure Pod specification apiVersion: v1 kind: Pod metadata: name: secure-app namespace: secure-namespace spec: serviceAccountName: app-service-account securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: app image: myapp:1.0.0 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL add: - NET_BIND_SERVICE resources: limits: cpu: 500m memory: 512Mi ephemeral-storage: 1Gi requests: cpu: 100m memory: 128Mi ephemeral-storage: 100Mi volumeMounts: - name: tmp mountPath: /tmp - name: cache mountPath: /app/cache volumes: - name: tmp emptyDir: {} - name: cache emptyDir: {}
Pod Security Policy (Deprecated but still relevant)#
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted-psp spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL volumes: - 'configMap' - 'emptyDir' - 'projected' - 'secret' - 'downwardAPI' - 'persistentVolumeClaim' runAsUser: rule: 'MustRunAsNonRoot' seLinux: rule: 'RunAsAny' fsGroup: rule: 'RunAsAny' readOnlyRootFilesystem: true
Network Security#
Network Policies#
# Default deny all traffic apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress --- # Allow frontend to backend communication apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: frontend-to-backend namespace: production spec: podSelector: matchLabels: app: backend policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080 --- # Allow backend to database communication apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: backend-to-database namespace: production spec: podSelector: matchLabels: app: database policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: backend ports: - protocol: TCP port: 5432 --- # Allow egress to external APIs apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: backend-egress namespace: production spec: podSelector: matchLabels: app: backend policyTypes: - Egress egress: - to: [] ports: - protocol: TCP port: 443 - to: - namespaceSelector: matchLabels: name: kube-system ports: - protocol: UDP port: 53
Calico Network Policies (Advanced)#
apiVersion: projectcalico.org/v3 kind: GlobalNetworkPolicy metadata: name: deny-all-non-system-traffic spec: order: 1000 selector: projectcalico.org/namespace != "kube-system" types: - Ingress - Egress --- apiVersion: projectcalico.org/v3 kind: NetworkPolicy metadata: name: allow-app-traffic namespace: production spec: order: 100 selector: app == "web" types: - Ingress - Egress ingress: - action: Allow protocol: TCP source: selector: app == "frontend" destination: ports: - 8080 egress: - action: Allow protocol: TCP destination: selector: app == "database" ports: - 5432
Secrets Management#
External Secrets Operator#
# SecretStore for AWS Secrets Manager apiVersion: external-secrets.io/v1beta1 kind: SecretStore metadata: name: aws-secrets-manager namespace: production spec: provider: aws: service: SecretsManager region: us-west-2 auth: jwt: serviceAccountRef: name: external-secrets-sa --- # ExternalSecret apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: app-secrets namespace: production spec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager kind: SecretStore target: name: app-secrets creationPolicy: Owner data: - secretKey: database-password remoteRef: key: prod/database property: password - secretKey: api-key remoteRef: key: prod/api property: key
Sealed Secrets#
# Install Sealed Secrets controller kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml # Create a secret echo -n mypassword | kubectl create secret generic mysecret --dry-run=client --from-file=password=/dev/stdin -o yaml > mysecret.yaml # Seal the secret kubeseal -f mysecret.yaml -w mysealedsecret.yaml # Apply sealed secret kubectl apply -f mysealedsecret.yaml
Image Security#
Admission Controllers#
# OPA Gatekeeper constraint template apiVersion: templates.gatekeeper.sh/v1beta1 kind: ConstraintTemplate metadata: name: k8srequiredsecuritycontext spec: crd: spec: names: kind: K8sRequiredSecurityContext validation: openAPIV3Schema: type: object targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredsecuritycontext violation[{"msg": msg}] { container := input.review.object.spec.containers[_] not container.securityContext.runAsNonRoot msg := "Container must run as non-root user" } violation[{"msg": msg}] { container := input.review.object.spec.containers[_] not container.securityContext.readOnlyRootFilesystem msg := "Container must have read-only root filesystem" } --- # Constraint apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sRequiredSecurityContext metadata: name: must-have-security-context spec: match: kinds: - apiGroups: [""] kinds: ["Pod"] excludedNamespaces: ["kube-system", "gatekeeper-system"]
Image Scanning with Trivy Operator#
# Install Trivy Operator apiVersion: v1 kind: Namespace metadata: name: trivy-system --- apiVersion: apps/v1 kind: Deployment metadata: name: trivy-operator namespace: trivy-system spec: replicas: 1 selector: matchLabels: app: trivy-operator template: metadata: labels: app: trivy-operator spec: containers: - name: trivy-operator image: aquasec/trivy-operator:0.15.1 env: - name: OPERATOR_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: OPERATOR_TARGET_NAMESPACES value: "" - name: TRIVY_SERVER_INSECURE value: "false"
Runtime Security#
Falco Rules for Kubernetes#
# Custom Falco rules for Kubernetes - rule: Kubernetes Privilege Escalation desc: Detect privilege escalation in Kubernetes condition: > spawned_process and container and (proc.name in (su, sudo, doas) or proc.args contains "--privileged" or proc.args contains "--cap-add") output: > Privilege escalation detected (user=%user.name command=%proc.cmdline container=%container.name image=%container.image.repository) priority: HIGH tags: [kubernetes, privilege_escalation] - rule: Kubernetes Secret Access desc: Detect access to Kubernetes secrets condition: > open_read and container and fd.name contains "/var/run/secrets/kubernetes.io" output: > Kubernetes secret accessed (user=%user.name file=%fd.name container=%container.name image=%container.image.repository) priority: WARNING tags: [kubernetes, secrets] - rule: Kubernetes API Server Access desc: Detect direct access to Kubernetes API server condition: > outbound and container and fd.sip="10.96.0.1" and fd.sport=443 output: > Direct API server access (user=%user.name container=%container.name image=%container.image.repository) priority: INFO tags: [kubernetes, api_access]
Runtime Security Monitoring#
#!/usr/bin/env python3 """ Kubernetes Runtime Security Monitor """ import time import json from kubernetes import client, config, watch from datetime import datetime def monitor_pod_events(): """ Monitor pod events for security anomalies """ config.load_kube_config() v1 = client.CoreV1Api() w = watch.Watch() print("Starting Kubernetes security monitoring...") for event in w.stream(v1.list_pod_for_all_namespaces): event_type = event['type'] pod = event['object'] if event_type == 'ADDED': check_pod_security(pod) elif event_type == 'MODIFIED': check_pod_modifications(pod) def check_pod_security(pod): """ Check pod for security violations """ violations = [] # Check if running as root if pod.spec.security_context: if not pod.spec.security_context.run_as_non_root: violations.append("Pod may be running as root") # Check containers for container in pod.spec.containers: # Check for privileged containers if container.security_context and container.security_context.privileged: violations.append(f"Container {container.name} is privileged") # Check for dangerous capabilities if container.security_context and container.security_context.capabilities: dangerous_caps = ['SYS_ADMIN', 'NET_ADMIN', 'SYS_PTRACE'] if container.security_context.capabilities.add: for cap in container.security_context.capabilities.add: if cap in dangerous_caps: violations.append(f"Container {container.name} has dangerous capability {cap}") # Check for host network if pod.spec.host_network: violations.append("Pod is using host network") # Check for host PID if pod.spec.host_pid: violations.append("Pod is using host PID namespace") if violations: alert = { 'timestamp': datetime.now().isoformat(), 'pod_name': pod.metadata.name, 'namespace': pod.metadata.namespace, 'violations': violations } print(f"SECURITY ALERT: {json.dumps(alert, indent=2)}") def check_pod_modifications(pod): """ Check for suspicious pod modifications """ # This would typically compare against a baseline # For now, just log modifications to privileged pods if pod.spec.security_context and pod.spec.security_context.privileged: print(f"Privileged pod modified: {pod.metadata.namespace}/{pod.metadata.name}") if __name__ == "__main__": monitor_pod_events()
Compliance and Benchmarks#
CIS Kubernetes Benchmark#
#!/bin/bash # CIS Kubernetes Benchmark automation script echo "=== CIS Kubernetes Benchmark Check ===" # 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive echo "Checking API server pod specification file permissions..." stat -c %a /etc/kubernetes/manifests/kube-apiserver.yaml # 1.1.2 Ensure that the API server pod specification file ownership is set to root:root echo "Checking API server pod specification file ownership..." stat -c %U:%G /etc/kubernetes/manifests/kube-apiserver.yaml # 1.2.1 Ensure that the --anonymous-auth argument is set to false echo "Checking anonymous authentication..." ps -ef | grep kube-apiserver | grep -v grep | grep -o "anonymous-auth=[^[:space:]]*" # 1.2.2 Ensure that the --basic-auth-file argument is not set echo "Checking basic authentication..." ps -ef | grep kube-apiserver | grep -v grep | grep "basic-auth-file" && echo "FAIL: Basic auth enabled" || echo "PASS: Basic auth disabled" # 1.2.3 Ensure that the --token-auth-file parameter is not set echo "Checking token authentication..." ps -ef | grep kube-apiserver | grep -v grep | grep "token-auth-file" && echo "FAIL: Token auth enabled" || echo "PASS: Token auth disabled" # 1.2.5 Ensure that the --kubelet-https argument is set to true echo "Checking kubelet HTTPS..." ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-https=[^[:space:]]*" # 1.2.6 Ensure that the --kubelet-client-certificate and --kubelet-client-key arguments are set as appropriate echo "Checking kubelet client certificates..." ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-client-certificate=[^[:space:]]*" ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-client-key=[^[:space:]]*" # 1.2.7 Ensure that the --kubelet-certificate-authority argument is set as appropriate echo "Checking kubelet certificate authority..." ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-certificate-authority=[^[:space:]]*" # 1.2.8 Ensure that the --authorization-mode argument is not set to AlwaysAllow echo "Checking authorization mode..." ps -ef | grep kube-apiserver | grep -v grep | grep -o "authorization-mode=[^[:space:]]*" echo "=== Benchmark check completed ==="
Kube-bench Integration#
# Job to run kube-bench apiVersion: batch/v1 kind: Job metadata: name: kube-bench spec: template: spec: hostPID: true nodeSelector: kubernetes.io/os: linux tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule containers: - name: kube-bench image: aquasec/kube-bench:latest command: ["kube-bench"] args: ["--json"] volumeMounts: - name: var-lib-etcd mountPath: /var/lib/etcd readOnly: true - name: var-lib-kubelet mountPath: /var/lib/kubelet readOnly: true - name: etc-systemd mountPath: /etc/systemd readOnly: true - name: etc-kubernetes mountPath: /etc/kubernetes readOnly: true - name: usr-bin mountPath: /usr/local/mount-from-host/bin readOnly: true restartPolicy: Never volumes: - name: var-lib-etcd hostPath: path: "/var/lib/etcd" - name: var-lib-kubelet hostPath: path: "/var/lib/kubelet" - name: etc-systemd hostPath: path: "/etc/systemd" - name: etc-kubernetes hostPath: path: "/etc/kubernetes" - name: usr-bin hostPath: path: "/usr/bin"
Incident Response#
Kubernetes Forensics#
#!/bin/bash # Kubernetes incident response script NAMESPACE=$1 POD_NAME=$2 OUTPUT_DIR="/tmp/k8s-forensics-$(date +%Y%m%d-%H%M%S)" mkdir -p $OUTPUT_DIR echo "Collecting Kubernetes forensics data..." # Collect pod information kubectl describe pod $POD_NAME -n $NAMESPACE > $OUTPUT_DIR/pod-describe.txt kubectl get pod $POD_NAME -n $NAMESPACE -o yaml > $OUTPUT_DIR/pod-manifest.yaml kubectl logs $POD_NAME -n $NAMESPACE --previous > $OUTPUT_DIR/pod-logs-previous.txt kubectl logs $POD_NAME -n $NAMESPACE > $OUTPUT_DIR/pod-logs-current.txt # Collect events kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' > $OUTPUT_DIR/namespace-events.txt # Collect network policies kubectl get networkpolicies -n $NAMESPACE -o yaml > $OUTPUT_DIR/network-policies.yaml # Collect RBAC information kubectl get rolebindings -n $NAMESPACE -o yaml > $OUTPUT_DIR/rolebindings.yaml kubectl get roles -n $NAMESPACE -o yaml > $OUTPUT_DIR/roles.yaml # Collect secrets (metadata only) kubectl get secrets -n $NAMESPACE -o yaml | grep -v "data:" > $OUTPUT_DIR/secrets-metadata.yaml # Collect node information NODE=$(kubectl get pod $POD_NAME -n $NAMESPACE -o jsonpath='{.spec.nodeName}') kubectl describe node $NODE > $OUTPUT_DIR/node-describe.txt # Collect cluster-wide information kubectl get clusterroles -o yaml > $OUTPUT_DIR/clusterroles.yaml kubectl get clusterrolebindings -o yaml > $OUTPUT_DIR/clusterrolebindings.yaml echo "Forensics data collected in $OUTPUT_DIR"
Conclusion#
Kubernetes security requires a comprehensive approach covering:
- Cluster Hardening: Secure API server, etcd, and node configurations
- RBAC: Implement least privilege access controls
- Pod Security: Use security contexts and admission controllers
- Network Security: Implement network policies and segmentation
- Secrets Management: Use external secret management solutions
- Image Security: Scan images and use admission controllers
- Runtime Security: Monitor for anomalies and threats
- Compliance: Regular benchmarking and auditing
Security is not a one-time setup but an ongoing process. Stay updated with the latest Kubernetes security best practices and continuously monitor your clusters for threats.
Remember: Security is everyone's responsibility in a Kubernetes environment!