Skip to content

GoogleCloudPlatform/k8s-node-tools

Repository files navigation

Node Configuration in GKE

The below outlines a strategy for performing custom node configuration in Google Kubernetes Engine (GKE) using init containers. The goal is to apply node-level settings (like sysctl adjustments, software installations, or kernel parameter checks) before regular application workloads are scheduled onto those nodes. To achieve this isolation temporarily, we'll mark nodes as unshcedulable with taint until configuration is completed.

Supporting Tools for Node Configuration

A collection of tools, potentially packaged within a container image used by an init container, can help manage and configure various aspects of a Kubernetes node:

  • Kubelet: These tools can assist in managing and interacting with the Kubelet, the primary node agent in Kubernetes. This could include tools for:
    • Configuration management: Scripts or utilities to configure Kubelet settings.
    • Status monitoring: Tools to check the health and status of the Kubelet service.
    • Log analysis: Scripts to collect and analyze Kubelet logs for debugging.
  • Containerd: Tools related to containerd, the container runtime used by Kubelet to manage containers. This might include:
    • Container image management: Tools for pulling,cl inspecting, or managing container images on nodes.
    • Container lifecycle management: Utilities to interact with containerd for container creation, deletion, and status checks.
    • Snapshotter management: Tools to manage containerd snapshotter for efficient storage.
  • Sysctl: Tools for managing and configuring sysctl parameters on Kubernetes nodes. Sysctl allows you to modify kernel parameters at runtime. These tools could help with:
    • Parameter modification: Tools to apply specific sysctl settings to nodes, potentially for performance tuning or Node Specific node hardening required by your organization.
    • Configuration persistence: Mechanisms to ensure sysctl settings are applied consistently across node reboots.
  • Kernel: Tools for interacting with or gathering information about the node's kernel. This might include:
    • Kernel module management: Tools for loading, unloading, or checking the status of kernel modules.
    • Kernel parameter inspection: Utilities to examine kernel parameters beyond sysctl.
  • Software Installations: Tools to automate or simplify software installations on Kubernetes nodes. This could involve:
    • Package management: Scripts to install packages using package managers like apt or yum.
    • Binary deployments: Tools to deploy pre-compiled binaries to nodes.
    • Configuration management: Scripts to configure installed software.
  • Troubleshooting Tooling: A suite of tools to aid in diagnosing and resolving issues on Kubernetes nodes. This is likely to encompass:
    • Log collection and analysis: Tools to gather logs from various node components and facilitate analysis. to encompass:
    • Log collection and analysis: Tools to gather logs from various node components and facilitate analysis.

Part 1: Scheduling an Init Container for Node Modification

This example demonstrates how to use a DaemonSet with an init container to apply a sysctl configuration change to Kubernetes nodes upon pod startup. The DaemonSet will attempt to run on all available nodes.

Concept:

We deploy a DaemonSet ensuring one pod replica runs on each eligible node. This pod has an init container that runs first. The init container executes a command to modify a kernel parameter (sysctl) on the host node. Because modifying host kernel parameters requires high privileges, the init container runs in privileged mode and mounts the host's root filesystem. After the init container completes successfully, a minimal main container (pause) starts just to keep the pod running.

Example sysctl-init-daemonset.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: sysctl-init-daemonset
  labels:
    app: sysctl-modifier
spec:
  selector:
    matchLabels:
      app: sysctl-modifier
  template:
    metadata:
      labels:
        app: sysctl-modifier
    spec:
      # Allow access to host PID namespace if needed for specific tools
      hostPID: true
      volumes:
      - name: host-root-fs
        hostPath:
          path: /
          type: Directory # Mount the node's root filesystem
      initContainers:
      - name: apply-sysctl-value
        image: gcr.io/gke-release/debian-base # Small image with shell tools
        # *** Requires privilege to modify host kernel settings ***
        securityContext:
          privileged: true
        volumeMounts:
        - name: host-root-fs
          mountPath: /host # Access the host filesystem at /host
          readOnly: false # Needs write access to modify sysctl typically
        command: ["/bin/sh", "-c"]
        args:
        - |
          echo "Attempting to set net.ipv4.ip_forward=1 on the host..."
          # Use chroot to execute the command in the host's root filesystem context
          if chroot /host sysctl -w net.ipv4.ip_forward=1; then
            echo "Successfully set net.ipv4.ip_forward."
          else
            echo "Failed to set net.ipv4.ip_forward." >&2
            exit 1 # Fail the init container if command fails
          fi
          # Add other setup commands here if needed
      containers:
      - name: pause-container
        # Minimal container just to keep the Pod running after init succeeds
        image: gcr.io/gke-release/pause:latest
  updateStrategy:
    type: RollingUpdate

Walkthrough:

  1. Save the YAML: Save the content above as sysctl-init-daemonset.yaml.
  2. Apply the DaemonSet:
    kubectl apply -f sysctl-init-daemonset.yaml
  3. Verify Pod Creation: Check that the DaemonSet pods are being created on your nodes:
    kubectl get pods -l app=sysctl-modifier -o wide
    • (Wait for pods to reach 'Running' state).*
  4. Check Init Container Logs: View the logs of the init container on one of the pods to see its output:
    # Get a pod name first from the command above
    POD_NAME=$(kubectl get pods -l app=sysctl-modifier -o jsonpath='{.items[0].metadata.name}')
    kubectl logs $POD_NAME -c apply-sysctl-value
    You should see the "Attempting..." and "Successfully set..." messages.
  5. Verify on Node (Optional): You can confirm the change by SSHing into a node where the pod ran and executing sysctl net.ipv4.ip_forward, or by running a privileged debug pod on that node.

Part 2: Taint -> Configure with Init Container -> Untaint Workflow

This section outlines a more advanced and automated method for node configuration. The workflow uses a single, intelligent DaemonSet that not only applies the configuration to tainted nodes but also automatically removes the taint from the node once its job is complete. This approach is ideal for streamlining configuration changes across a node pool without manual intervention to make the nodes schedulable again.

The process leverages the kubectl binary available on GKE nodes and requires RBAC permissions for the DaemonSet's ServiceAccount to modify its own node object.

Concept:

  1. Taint Node Pool: A taint is applied to a node pool to prevent regular workloads from being scheduled, effectively reserving it for configuration.
  2. Deploy Smart DaemonSet: A DaemonSet is deployed with three key characteristics:
    • Toleration: It has a toleration to allow it to run on the tainted nodes.
    • Configuration Logic: An init container runs privileged commands to configure the node (e.g., setting sysctl values).
    • Untainting Logic: After applying the configuration, the same container uses the node's kubectl tool to remove the taint from itself, making the node available for general use.
  3. Verification: Once the DaemonSet pod completes its init container, the node is both configured and fully schedulable.

Walkthrough:

  1. Taint the Node Pool:

    • Identify your target GKE node pool, cluster, and location (zone/region).

    • Apply a specific taint using gcloud. Let's use node.config.status/stage=configuring:NoSchedule.

      # Replace placeholders with your actual values
      GKE_CLUSTER="your-cluster-name"
      NODE_POOL="your-node-pool-name"
      GKE_ZONE="your-zone" # Or GKE_REGION="your-region"
      
      gcloud container node-pools update $NODE_POOL \
        --cluster=$GKE_CLUSTER \
        --node-taints=node.config.status/stage=configuring:NoSchedule \
        --zone=$GKE_ZONE # Or --region=$GKE_REGION
    • Verify the taint is applied to nodes in the pool:

      kubectl describe node <node-name-in-pool> | grep Taints

      You should see node.config/status=initializing:NoSchedule.

  2. Create and Deploy the Self-Untainting DaemonSet:

    • The following YAML creates all the necessary components: a ServiceAccount for permissions, a ClusterRole granting node-patching rights, a ClusterRoleBinding to link them, and the DaemonSet itself. Save the following content as auto-untaint-daemonset.yaml.
# WARNING: THIS MAKES YOUR NODES LESS SECURE. This DaemonSet runs as privileged.
# WARNING: This DaemonSet runs as privileged, which has significant
# security implications. Only use this on clusters where you have
# strict controls over what is deployed.
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-config-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-patcher-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  # Permissions needed to read and remove a taint from the node.
  verbs: ["get", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-config-binding
subjects:
- kind: ServiceAccount
  name: node-config-sa
  namespace: default
roleRef:
  kind: ClusterRole
  name: node-patcher-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: auto-untaint-daemonset
  labels:
    app: auto-untaint-configurator
spec:
  selector:
    matchLabels:
      app: auto-untaint-configurator
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: auto-untaint-configurator
    spec:
      serviceAccountName: node-config-sa
      hostPID: true
      # Toleration now matches the taint on your node.
      tolerations:
      - key: "node.config.status/stage"
        operator: "Equal"
        value: "configuring"           
        effect: "NoSchedule"
      volumes:
      - name: host-root-fs
        hostPath:
          path: /
      initContainers:
      - name: configure-and-untaint
        image: ubuntu:22.04 # Using a standard container image.
        securityContext:
          privileged: true # Required for chroot and sysctl.
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - name: host-root-fs
          mountPath: /host
        command: ["/bin/bash", "-c"]
        args:
        - |
          # Using explicit error checking for each critical command.
          
          # Define the configuration and taint details.
          SYSCTL_PARAM="vm.max_map_count"
          SYSCTL_VALUE="262144"
          TAINT_KEY="node.config.status/stage" 

          echo "Running configuration on node: ${NODE_NAME}"

          # 1. APPLY CONFIGURATION
          echo "--> Applying ${SYSCTL_PARAM}=${SYSCTL_VALUE}..."
          if ! chroot /host sysctl -w "${SYSCTL_PARAM}=${SYSCTL_VALUE}"; then
            echo "ERROR: Failed to apply sysctl parameter." >&2
            exit 1
          fi
          echo "--> Configuration applied successfully."

          # 2. UNTAINT THE NODE
          # This command removes the taint from the node this pod is running on.
          echo "--> Untainting node ${NODE_NAME} by removing taint ${TAINT_KEY}..."
          if ! /host/home/kubernetes/bin/kubectl taint node "${NODE_NAME}" "${TAINT_KEY}:NoSchedule-"; then
            echo "ERROR: Failed to untaint the node." >&2
            exit 1
          fi
          echo "--> Node has been untainted and is now schedulable."
      # The main container is minimal; it just keeps the pod running.
      containers:
      - name: pause-container
        image: registry.k8s.io/pause:3.9
  • Apply the DaemonSet manifest.
    kubectl apply -f auto-untaint-daemonset.yaml
  1. Validate the DaemonSet:
  • Verify the pods are running on the tainted nodes. You should see the pod in a Running state after the init container completes.

    kubectl get pods -l app=auto-untaint-configurator -o wide
  • Check the logs to confirm execution. View the initContainer logs to ensure the script ran and untainted the node successfully.

    # Get a pod name from the command above
    POD_NAME=$(kubectl get pods -l app=auto-untaint-configurator -o jsonpath='{.items[0].metadata.name}')
    
    # Check the logs for that pod's init container
    kubectl logs $POD_NAME -c configure-and-untaint

    The output will confirm that the sysctl command ran and the node was untainted.


Part 3: Privileged DaemonSet Tradeoffs and Security Restrictions

Using securityContext: privileged: true in a DaemonSet (or any pod) is powerful but comes with significant security implications. It essentially disables most container isolation boundaries for that pod.

The Tradeoff:

  • Benefit: Grants the container capabilities necessary for deep host system interactions, such as:
    • Modifying kernel parameters (sysctl).
    • Loading/unloading kernel modules (modprobe).
    • Accessing host devices (/dev/*).
    • Modifying protected host filesystems.
    • Full network stack manipulation (beyond standard Kubernetes networking).
    • Running tools that require raw socket access or specific hardware interactions.
  • Cost: Massively increased security risk and potential for node/cluster instability.

Security Restrictions and Risks:

  • Container Escape/Host Compromise: A vulnerability within the privileged container's application or image can directly lead to root access on the host node. The attacker bypasses standard container defenses.
  • Violation of Least Privilege: Privileged mode grants all capabilities, likely far more than needed for a specific task. This broad access increases the potential damage if the container is compromised.
  • Node Destabilization: Accidental or malicious commands run within the privileged container (e.g., incorrect sysctl values, rm -rf /host/boot) can crash or corrupt the host node operating system.
  • Lateral Movement: Compromising one node via a privileged DaemonSet gives an attacker a strong foothold to attack other nodes, the Kubernetes control plane, or connected systems.
  • Data Exposure: Unrestricted access to the host filesystem (/) can expose sensitive data stored on the node, including credentials, keys, or data belonging to other pods (if accessible via host paths).
  • Increased Attack Surface: Exposes more of the host kernel's system calls and features to potential exploits from within the container.

Best Practices / Mitigations:

  • Avoid If Possible: The most secure approach is to avoid privileged: true entirely.
  • Use Linux Capabilities: If elevated rights are needed, grant specific Linux capabilities (e.g., NET_ADMIN, SYS_ADMIN, SYS_MODULE) in the securityContext.capabilities.add field instead of full privilege. This follows the principle of least privilege.
  • Limit Scope: Run privileged DaemonSets only on dedicated, possibly tainted, node pools to contain the potential blast radius.
  • Policy Enforcement: Use GKE Policy Controller (or OPA Gatekeeper) to create policies that restrict, audit, or require justification for deploying privileged containers.
  • Image Scanning & Trust: Use GKE Binary Authorization and rigorous image scanning to ensure only vetted, trusted container images are run with privilege.
  • Minimize Host Mounts: Only mount the specific host paths needed, and use readOnly: true whenever possible. Avoid mounting the entire root filesystem (/) unless absolutely necessary.
  • Regular Audits: Periodically review all workloads running with privileged: true.

About

K8S Node Tools

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages