Kubernetes Security Hardening: From Cluster Setup to Runtime Protection#

Kubernetes has become the de facto standard for container orchestration, but its complexity introduces numerous security challenges. This comprehensive guide covers essential security practices for hardening Kubernetes clusters.

Kubernetes Security Architecture#

The 4C's of Cloud Native Security#

Cloud: Infrastructure and cloud provider security
Cluster: Kubernetes cluster security
Container: Container image and runtime security
Code: Application code security

Attack Vectors in Kubernetes#

API Server: Unauthorized access to Kubernetes API
etcd: Direct access to cluster state
Kubelet: Node agent vulnerabilities
Container Runtime: Docker/containerd security issues
Network: Pod-to-pod and external communications
RBAC: Privilege escalation through misconfigurations

Cluster Hardening#

API Server Security#

# kube-apiserver configuration
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
spec:
  containers:
  - name: kube-apiserver
    command:
    - kube-apiserver
    # Enable audit logging
    - --audit-log-path=/var/log/audit.log
    - --audit-log-maxage=30
    - --audit-log-maxbackup=3
    - --audit-log-maxsize=100
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
    
    # Disable anonymous auth
    - --anonymous-auth=false
    
    # Enable RBAC
    - --authorization-mode=Node,RBAC
    
    # Secure communication
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    
    # Disable insecure port
    - --insecure-port=0
    
    # Enable admission controllers
    - --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,ResourceQuota,LimitRanger

Audit Policy Configuration#

# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all requests at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
  
# Log all authentication failures
- level: Request
  users: ["system:anonymous"]
  
# Log privilege escalation attempts
- level: RequestResponse
  verbs: ["create", "update", "patch"]
  resources:
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"]
    
# Log pod creation/deletion
- level: Request
  verbs: ["create", "delete"]
  resources:
  - group: ""
    resources: ["pods"]

etcd Security#

# etcd secure configuration
etcd \
  --cert-file=/etc/kubernetes/pki/etcd/server.crt \
  --key-file=/etc/kubernetes/pki/etcd/server.key \
  --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
  --peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
  --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
  --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
  --client-cert-auth=true \
  --peer-client-cert-auth=true

RBAC (Role-Based Access Control)#

Principle of Least Privilege#

# Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: production
---
# Role with minimal permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: app-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get"]
---
# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-rolebinding
  namespace: production
subjects:
- kind: ServiceAccount
  name: app-service-account
  namespace: production
roleRef:
  kind: Role
  name: app-role
  apiGroup: rbac.authorization.k8s.io

RBAC Audit Script#

#!/usr/bin/env python3
"""
Kubernetes RBAC Audit Tool
"""

import yaml
from kubernetes import client, config
from collections import defaultdict

def audit_rbac():
    """
    Audit RBAC configurations for security issues
    """
    config.load_kube_config()
    rbac_v1 = client.RbacAuthorizationV1Api()
    
    issues = []
    
    # Check ClusterRoleBindings
    cluster_role_bindings = rbac_v1.list_cluster_role_binding()
    
    for binding in cluster_role_bindings.items:
        # Check for overly permissive bindings
        if binding.role_ref.name in ['cluster-admin', 'admin']:
            for subject in binding.subjects or []:
                if subject.kind == 'User' and subject.name != 'kubernetes-admin':
                    issues.append(f"User {subject.name} has {binding.role_ref.name} access")
                elif subject.kind == 'ServiceAccount':
                    issues.append(f"ServiceAccount {subject.namespace}/{subject.name} has {binding.role_ref.name} access")
    
    # Check for wildcard permissions
    cluster_roles = rbac_v1.list_cluster_role()
    
    for role in cluster_roles.items:
        for rule in role.rules or []:
            if '*' in (rule.verbs or []):
                issues.append(f"ClusterRole {role.metadata.name} has wildcard verbs")
            if '*' in (rule.resources or []):
                issues.append(f"ClusterRole {role.metadata.name} has wildcard resources")
    
    return issues

def generate_rbac_report():
    """
    Generate comprehensive RBAC report
    """
    issues = audit_rbac()
    
    print("=== Kubernetes RBAC Security Audit ===")
    print(f"Found {len(issues)} potential security issues:\n")
    
    for i, issue in enumerate(issues, 1):
        print(f"{i}. {issue}")
    
    if not issues:
        print("No obvious RBAC security issues found.")

if __name__ == "__main__":
    generate_rbac_report()

Pod Security#

Pod Security Standards#

# Namespace with Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
# Secure Pod specification
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
  namespace: secure-namespace
spec:
  serviceAccountName: app-service-account
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
        ephemeral-storage: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
        ephemeral-storage: 100Mi
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: cache
      mountPath: /app/cache
  volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

Pod Security Policy (Deprecated but still relevant)#

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true

Network Security#

Network Policies#

# Default deny all traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow frontend to backend communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
---
# Allow backend to database communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 5432
---
# Allow egress to external APIs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Egress
  egress:
  - to: []
    ports:
    - protocol: TCP
      port: 443
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53

Calico Network Policies (Advanced)#

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: deny-all-non-system-traffic
spec:
  order: 1000
  selector: projectcalico.org/namespace != "kube-system"
  types:
  - Ingress
  - Egress
---
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-app-traffic
  namespace: production
spec:
  order: 100
  selector: app == "web"
  types:
  - Ingress
  - Egress
  ingress:
  - action: Allow
    protocol: TCP
    source:
      selector: app == "frontend"
    destination:
      ports:
      - 8080
  egress:
  - action: Allow
    protocol: TCP
    destination:
      selector: app == "database"
      ports:
      - 5432

Secrets Management#

External Secrets Operator#

# SecretStore for AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-west-2
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
---
# ExternalSecret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-password
    remoteRef:
      key: prod/database
      property: password
  - secretKey: api-key
    remoteRef:
      key: prod/api
      property: key

Sealed Secrets#

# Install Sealed Secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml

# Create a secret
echo -n mypassword | kubectl create secret generic mysecret --dry-run=client --from-file=password=/dev/stdin -o yaml > mysecret.yaml

# Seal the secret
kubeseal -f mysecret.yaml -w mysealedsecret.yaml

# Apply sealed secret
kubectl apply -f mysealedsecret.yaml

Image Security#

Admission Controllers#

# OPA Gatekeeper constraint template
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredsecuritycontext
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredSecurityContext
      validation:
        openAPIV3Schema:
          type: object
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredsecuritycontext
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.runAsNonRoot
          msg := "Container must run as non-root user"
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.readOnlyRootFilesystem
          msg := "Container must have read-only root filesystem"
        }
---
# Constraint
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredSecurityContext
metadata:
  name: must-have-security-context
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system", "gatekeeper-system"]

Image Scanning with Trivy Operator#

# Install Trivy Operator
apiVersion: v1
kind: Namespace
metadata:
  name: trivy-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: trivy-operator
  namespace: trivy-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: trivy-operator
  template:
    metadata:
      labels:
        app: trivy-operator
    spec:
      containers:
      - name: trivy-operator
        image: aquasec/trivy-operator:0.15.1
        env:
        - name: OPERATOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: OPERATOR_TARGET_NAMESPACES
          value: ""
        - name: TRIVY_SERVER_INSECURE
          value: "false"

Runtime Security#

Falco Rules for Kubernetes#

# Custom Falco rules for Kubernetes
- rule: Kubernetes Privilege Escalation
  desc: Detect privilege escalation in Kubernetes
  condition: >
    spawned_process and container and
    (proc.name in (su, sudo, doas) or
     proc.args contains "--privileged" or
     proc.args contains "--cap-add")
  output: >
    Privilege escalation detected (user=%user.name command=%proc.cmdline
    container=%container.name image=%container.image.repository)
  priority: HIGH
  tags: [kubernetes, privilege_escalation]

- rule: Kubernetes Secret Access
  desc: Detect access to Kubernetes secrets
  condition: >
    open_read and container and
    fd.name contains "/var/run/secrets/kubernetes.io"
  output: >
    Kubernetes secret accessed (user=%user.name file=%fd.name
    container=%container.name image=%container.image.repository)
  priority: WARNING
  tags: [kubernetes, secrets]

- rule: Kubernetes API Server Access
  desc: Detect direct access to Kubernetes API server
  condition: >
    outbound and container and
    fd.sip="10.96.0.1" and fd.sport=443
  output: >
    Direct API server access (user=%user.name container=%container.name
    image=%container.image.repository)
  priority: INFO
  tags: [kubernetes, api_access]

Runtime Security Monitoring#

#!/usr/bin/env python3
"""
Kubernetes Runtime Security Monitor
"""

import time
import json
from kubernetes import client, config, watch
from datetime import datetime

def monitor_pod_events():
    """
    Monitor pod events for security anomalies
    """
    config.load_kube_config()
    v1 = client.CoreV1Api()
    
    w = watch.Watch()
    
    print("Starting Kubernetes security monitoring...")
    
    for event in w.stream(v1.list_pod_for_all_namespaces):
        event_type = event['type']
        pod = event['object']
        
        if event_type == 'ADDED':
            check_pod_security(pod)
        elif event_type == 'MODIFIED':
            check_pod_modifications(pod)

def check_pod_security(pod):
    """
    Check pod for security violations
    """
    violations = []
    
    # Check if running as root
    if pod.spec.security_context:
        if not pod.spec.security_context.run_as_non_root:
            violations.append("Pod may be running as root")
    
    # Check containers
    for container in pod.spec.containers:
        # Check for privileged containers
        if container.security_context and container.security_context.privileged:
            violations.append(f"Container {container.name} is privileged")
        
        # Check for dangerous capabilities
        if container.security_context and container.security_context.capabilities:
            dangerous_caps = ['SYS_ADMIN', 'NET_ADMIN', 'SYS_PTRACE']
            if container.security_context.capabilities.add:
                for cap in container.security_context.capabilities.add:
                    if cap in dangerous_caps:
                        violations.append(f"Container {container.name} has dangerous capability {cap}")
        
        # Check for host network
        if pod.spec.host_network:
            violations.append("Pod is using host network")
        
        # Check for host PID
        if pod.spec.host_pid:
            violations.append("Pod is using host PID namespace")
    
    if violations:
        alert = {
            'timestamp': datetime.now().isoformat(),
            'pod_name': pod.metadata.name,
            'namespace': pod.metadata.namespace,
            'violations': violations
        }
        print(f"SECURITY ALERT: {json.dumps(alert, indent=2)}")

def check_pod_modifications(pod):
    """
    Check for suspicious pod modifications
    """
    # This would typically compare against a baseline
    # For now, just log modifications to privileged pods
    if pod.spec.security_context and pod.spec.security_context.privileged:
        print(f"Privileged pod modified: {pod.metadata.namespace}/{pod.metadata.name}")

if __name__ == "__main__":
    monitor_pod_events()

Compliance and Benchmarks#

CIS Kubernetes Benchmark#

#!/bin/bash
# CIS Kubernetes Benchmark automation script

echo "=== CIS Kubernetes Benchmark Check ==="

# 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive
echo "Checking API server pod specification file permissions..."
stat -c %a /etc/kubernetes/manifests/kube-apiserver.yaml

# 1.1.2 Ensure that the API server pod specification file ownership is set to root:root
echo "Checking API server pod specification file ownership..."
stat -c %U:%G /etc/kubernetes/manifests/kube-apiserver.yaml

# 1.2.1 Ensure that the --anonymous-auth argument is set to false
echo "Checking anonymous authentication..."
ps -ef | grep kube-apiserver | grep -v grep | grep -o "anonymous-auth=[^[:space:]]*"

# 1.2.2 Ensure that the --basic-auth-file argument is not set
echo "Checking basic authentication..."
ps -ef | grep kube-apiserver | grep -v grep | grep "basic-auth-file" && echo "FAIL: Basic auth enabled" || echo "PASS: Basic auth disabled"

# 1.2.3 Ensure that the --token-auth-file parameter is not set
echo "Checking token authentication..."
ps -ef | grep kube-apiserver | grep -v grep | grep "token-auth-file" && echo "FAIL: Token auth enabled" || echo "PASS: Token auth disabled"

# 1.2.5 Ensure that the --kubelet-https argument is set to true
echo "Checking kubelet HTTPS..."
ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-https=[^[:space:]]*"

# 1.2.6 Ensure that the --kubelet-client-certificate and --kubelet-client-key arguments are set as appropriate
echo "Checking kubelet client certificates..."
ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-client-certificate=[^[:space:]]*"
ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-client-key=[^[:space:]]*"

# 1.2.7 Ensure that the --kubelet-certificate-authority argument is set as appropriate
echo "Checking kubelet certificate authority..."
ps -ef | grep kube-apiserver | grep -v grep | grep -o "kubelet-certificate-authority=[^[:space:]]*"

# 1.2.8 Ensure that the --authorization-mode argument is not set to AlwaysAllow
echo "Checking authorization mode..."
ps -ef | grep kube-apiserver | grep -v grep | grep -o "authorization-mode=[^[:space:]]*"

echo "=== Benchmark check completed ==="

Kube-bench Integration#

# Job to run kube-bench
apiVersion: batch/v1
kind: Job
metadata:
  name: kube-bench
spec:
  template:
    spec:
      hostPID: true
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: kube-bench
        image: aquasec/kube-bench:latest
        command: ["kube-bench"]
        args: ["--json"]
        volumeMounts:
        - name: var-lib-etcd
          mountPath: /var/lib/etcd
          readOnly: true
        - name: var-lib-kubelet
          mountPath: /var/lib/kubelet
          readOnly: true
        - name: etc-systemd
          mountPath: /etc/systemd
          readOnly: true
        - name: etc-kubernetes
          mountPath: /etc/kubernetes
          readOnly: true
        - name: usr-bin
          mountPath: /usr/local/mount-from-host/bin
          readOnly: true
      restartPolicy: Never
      volumes:
      - name: var-lib-etcd
        hostPath:
          path: "/var/lib/etcd"
      - name: var-lib-kubelet
        hostPath:
          path: "/var/lib/kubelet"
      - name: etc-systemd
        hostPath:
          path: "/etc/systemd"
      - name: etc-kubernetes
        hostPath:
          path: "/etc/kubernetes"
      - name: usr-bin
        hostPath:
          path: "/usr/bin"

Incident Response#

Kubernetes Forensics#

#!/bin/bash
# Kubernetes incident response script

NAMESPACE=$1
POD_NAME=$2
OUTPUT_DIR="/tmp/k8s-forensics-$(date +%Y%m%d-%H%M%S)"

mkdir -p $OUTPUT_DIR

echo "Collecting Kubernetes forensics data..."

# Collect pod information
kubectl describe pod $POD_NAME -n $NAMESPACE > $OUTPUT_DIR/pod-describe.txt
kubectl get pod $POD_NAME -n $NAMESPACE -o yaml > $OUTPUT_DIR/pod-manifest.yaml
kubectl logs $POD_NAME -n $NAMESPACE --previous > $OUTPUT_DIR/pod-logs-previous.txt
kubectl logs $POD_NAME -n $NAMESPACE > $OUTPUT_DIR/pod-logs-current.txt

# Collect events
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' > $OUTPUT_DIR/namespace-events.txt

# Collect network policies
kubectl get networkpolicies -n $NAMESPACE -o yaml > $OUTPUT_DIR/network-policies.yaml

# Collect RBAC information
kubectl get rolebindings -n $NAMESPACE -o yaml > $OUTPUT_DIR/rolebindings.yaml
kubectl get roles -n $NAMESPACE -o yaml > $OUTPUT_DIR/roles.yaml

# Collect secrets (metadata only)
kubectl get secrets -n $NAMESPACE -o yaml | grep -v "data:" > $OUTPUT_DIR/secrets-metadata.yaml

# Collect node information
NODE=$(kubectl get pod $POD_NAME -n $NAMESPACE -o jsonpath='{.spec.nodeName}')
kubectl describe node $NODE > $OUTPUT_DIR/node-describe.txt

# Collect cluster-wide information
kubectl get clusterroles -o yaml > $OUTPUT_DIR/clusterroles.yaml
kubectl get clusterrolebindings -o yaml > $OUTPUT_DIR/clusterrolebindings.yaml

echo "Forensics data collected in $OUTPUT_DIR"

Conclusion#

Kubernetes security requires a comprehensive approach covering:

Cluster Hardening: Secure API server, etcd, and node configurations
RBAC: Implement least privilege access controls
Pod Security: Use security contexts and admission controllers
Network Security: Implement network policies and segmentation
Secrets Management: Use external secret management solutions
Image Security: Scan images and use admission controllers
Runtime Security: Monitor for anomalies and threats
Compliance: Regular benchmarking and auditing

Security is not a one-time setup but an ongoing process. Stay updated with the latest Kubernetes security best practices and continuously monitor your clusters for threats.

Remember: Security is everyone's responsibility in a Kubernetes environment!

Kubernetes Security Hardening: From Cluster Setup to Runtime Protection