The below outlines a strategy for performing custom node configuration in Google Kubernetes Engine (GKE) using init containers. The goal is to apply node-level settings (like sysctl adjustments, software installations, or kernel parameter checks) before regular application workloads are scheduled onto those nodes. To achieve this isolation temporarily, we'll mark nodes as unshcedulable with taint until configuration is completed.
A collection of tools, potentially packaged within a container image used by an init container, can help manage and configure various aspects of a Kubernetes node:
- Kubelet: These tools can assist in managing and interacting with the Kubelet, the primary node agent in Kubernetes. This could include tools for:
- Configuration management: Scripts or utilities to configure Kubelet settings.
- Status monitoring: Tools to check the health and status of the Kubelet service.
- Log analysis: Scripts to collect and analyze Kubelet logs for debugging.
- Containerd: Tools related to containerd, the container runtime used by Kubelet to manage containers. This might include:
- Container image management: Tools for pulling,cl inspecting, or managing container images on nodes.
- Container lifecycle management: Utilities to interact with containerd for container creation, deletion, and status checks.
- Snapshotter management: Tools to manage containerd snapshotter for efficient storage.
- Sysctl: Tools for managing and configuring sysctl parameters on Kubernetes nodes. Sysctl allows you to modify kernel parameters at runtime. These tools could help with:
- Parameter modification: Tools to apply specific sysctl settings to nodes, potentially for performance tuning or Node Specific node hardening required by your organization.
- Configuration persistence: Mechanisms to ensure sysctl settings are applied consistently across node reboots.
- Kernel: Tools for interacting with or gathering information about the node's kernel. This might include:
- Kernel module management: Tools for loading, unloading, or checking the status of kernel modules.
- Kernel parameter inspection: Utilities to examine kernel parameters beyond sysctl.
- Software Installations: Tools to automate or simplify software installations on Kubernetes nodes. This could involve:
- Package management: Scripts to install packages using package managers like
aptoryum. - Binary deployments: Tools to deploy pre-compiled binaries to nodes.
- Configuration management: Scripts to configure installed software.
- Package management: Scripts to install packages using package managers like
- Troubleshooting Tooling: A suite of tools to aid in diagnosing and resolving issues on Kubernetes nodes. This is likely to encompass:
- Log collection and analysis: Tools to gather logs from various node components and facilitate analysis. to encompass:
- Log collection and analysis: Tools to gather logs from various node components and facilitate analysis.
This example demonstrates how to use a DaemonSet with an init container to apply a sysctl configuration change to Kubernetes nodes upon pod startup. The DaemonSet will attempt to run on all available nodes.
We deploy a DaemonSet ensuring one pod replica runs on each eligible node. This pod has an init container that runs first. The init container executes a command to modify a kernel parameter (sysctl) on the host node. Because modifying host kernel parameters requires high privileges, the init container runs in privileged mode and mounts the host's root filesystem. After the init container completes successfully, a minimal main container (pause) starts just to keep the pod running.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sysctl-init-daemonset
labels:
app: sysctl-modifier
spec:
selector:
matchLabels:
app: sysctl-modifier
template:
metadata:
labels:
app: sysctl-modifier
spec:
# Allow access to host PID namespace if needed for specific tools
hostPID: true
volumes:
- name: host-root-fs
hostPath:
path: /
type: Directory # Mount the node's root filesystem
initContainers:
- name: apply-sysctl-value
image: gcr.io/gke-release/debian-base # Small image with shell tools
# *** Requires privilege to modify host kernel settings ***
securityContext:
privileged: true
volumeMounts:
- name: host-root-fs
mountPath: /host # Access the host filesystem at /host
readOnly: false # Needs write access to modify sysctl typically
command: ["/bin/sh", "-c"]
args:
- |
echo "Attempting to set net.ipv4.ip_forward=1 on the host..."
# Use chroot to execute the command in the host's root filesystem context
if chroot /host sysctl -w net.ipv4.ip_forward=1; then
echo "Successfully set net.ipv4.ip_forward."
else
echo "Failed to set net.ipv4.ip_forward." >&2
exit 1 # Fail the init container if command fails
fi
# Add other setup commands here if needed
containers:
- name: pause-container
# Minimal container just to keep the Pod running after init succeeds
image: gcr.io/gke-release/pause:latest
updateStrategy:
type: RollingUpdate- Save the YAML: Save the content above as
sysctl-init-daemonset.yaml. - Apply the DaemonSet:
kubectl apply -f sysctl-init-daemonset.yaml
- Verify Pod Creation: Check that the DaemonSet pods are being created on your nodes:
kubectl get pods -l app=sysctl-modifier -o wide
- (Wait for pods to reach 'Running' state).*
- Check Init Container Logs: View the logs of the init container on one of the pods to see its output:
You should see the "Attempting..." and "Successfully set..." messages.
# Get a pod name first from the command above POD_NAME=$(kubectl get pods -l app=sysctl-modifier -o jsonpath='{.items[0].metadata.name}') kubectl logs $POD_NAME -c apply-sysctl-value
- Verify on Node (Optional): You can confirm the change by SSHing into a node where the pod ran and executing
sysctl net.ipv4.ip_forward, or by running a privileged debug pod on that node.
This section outlines a more advanced and automated method for node configuration. The workflow uses a single, intelligent DaemonSet that not only applies the configuration to tainted nodes but also automatically removes the taint from the node once its job is complete. This approach is ideal for streamlining configuration changes across a node pool without manual intervention to make the nodes schedulable again.
The process leverages the kubectl binary available on GKE nodes and requires RBAC permissions for the DaemonSet's ServiceAccount to modify its own node object.
- Taint Node Pool: A taint is applied to a node pool to prevent regular workloads from being scheduled, effectively reserving it for configuration.
- Deploy Smart DaemonSet: A DaemonSet is deployed with three key characteristics:
- Toleration: It has a
tolerationto allow it to run on the tainted nodes. - Configuration Logic: An init container runs privileged commands to configure the node (e.g., setting
sysctlvalues). - Untainting Logic: After applying the configuration, the same container uses the node's
kubectltool to remove the taint from itself, making the node available for general use.
- Toleration: It has a
- Verification: Once the DaemonSet pod completes its init container, the node is both configured and fully schedulable.
-
Taint the Node Pool:
-
Identify your target GKE node pool, cluster, and location (zone/region).
-
Apply a specific taint using
gcloud. Let's usenode.config.status/stage=configuring:NoSchedule.# Replace placeholders with your actual values GKE_CLUSTER="your-cluster-name" NODE_POOL="your-node-pool-name" GKE_ZONE="your-zone" # Or GKE_REGION="your-region" gcloud container node-pools update $NODE_POOL \ --cluster=$GKE_CLUSTER \ --node-taints=node.config.status/stage=configuring:NoSchedule \ --zone=$GKE_ZONE # Or --region=$GKE_REGION
-
Verify the taint is applied to nodes in the pool:
kubectl describe node <node-name-in-pool> | grep Taints
You should see node.config/status=initializing:NoSchedule.
-
-
Create and Deploy the Self-Untainting DaemonSet:
- The following YAML creates all the necessary components: a ServiceAccount for permissions, a ClusterRole granting node-patching rights, a ClusterRoleBinding to link them, and the DaemonSet itself.
Save the following content as
auto-untaint-daemonset.yaml.
- The following YAML creates all the necessary components: a ServiceAccount for permissions, a ClusterRole granting node-patching rights, a ClusterRoleBinding to link them, and the DaemonSet itself.
Save the following content as
# WARNING: THIS MAKES YOUR NODES LESS SECURE. This DaemonSet runs as privileged.
# WARNING: This DaemonSet runs as privileged, which has significant
# security implications. Only use this on clusters where you have
# strict controls over what is deployed.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-config-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-patcher-role
rules:
- apiGroups: [""]
resources: ["nodes"]
# Permissions needed to read and remove a taint from the node.
verbs: ["get", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-config-binding
subjects:
- kind: ServiceAccount
name: node-config-sa
namespace: default
roleRef:
kind: ClusterRole
name: node-patcher-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: auto-untaint-daemonset
labels:
app: auto-untaint-configurator
spec:
selector:
matchLabels:
app: auto-untaint-configurator
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: auto-untaint-configurator
spec:
serviceAccountName: node-config-sa
hostPID: true
# Toleration now matches the taint on your node.
tolerations:
- key: "node.config.status/stage"
operator: "Equal"
value: "configuring"
effect: "NoSchedule"
volumes:
- name: host-root-fs
hostPath:
path: /
initContainers:
- name: configure-and-untaint
image: ubuntu:22.04 # Using a standard container image.
securityContext:
privileged: true # Required for chroot and sysctl.
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: host-root-fs
mountPath: /host
command: ["/bin/bash", "-c"]
args:
- |
# Using explicit error checking for each critical command.
# Define the configuration and taint details.
SYSCTL_PARAM="vm.max_map_count"
SYSCTL_VALUE="262144"
TAINT_KEY="node.config.status/stage"
echo "Running configuration on node: ${NODE_NAME}"
# 1. APPLY CONFIGURATION
echo "--> Applying ${SYSCTL_PARAM}=${SYSCTL_VALUE}..."
if ! chroot /host sysctl -w "${SYSCTL_PARAM}=${SYSCTL_VALUE}"; then
echo "ERROR: Failed to apply sysctl parameter." >&2
exit 1
fi
echo "--> Configuration applied successfully."
# 2. UNTAINT THE NODE
# This command removes the taint from the node this pod is running on.
echo "--> Untainting node ${NODE_NAME} by removing taint ${TAINT_KEY}..."
if ! /host/home/kubernetes/bin/kubectl taint node "${NODE_NAME}" "${TAINT_KEY}:NoSchedule-"; then
echo "ERROR: Failed to untaint the node." >&2
exit 1
fi
echo "--> Node has been untainted and is now schedulable."
# The main container is minimal; it just keeps the pod running.
containers:
- name: pause-container
image: registry.k8s.io/pause:3.9
- Apply the
DaemonSetmanifest.kubectl apply -f auto-untaint-daemonset.yaml
- Validate the DaemonSet:
-
Verify the pods are running on the tainted nodes. You should see the pod in a
Runningstate after the init container completes.kubectl get pods -l app=auto-untaint-configurator -o wide
-
Check the logs to confirm execution. View the
initContainerlogs to ensure the script ran and untainted the node successfully.# Get a pod name from the command above POD_NAME=$(kubectl get pods -l app=auto-untaint-configurator -o jsonpath='{.items[0].metadata.name}') # Check the logs for that pod's init container kubectl logs $POD_NAME -c configure-and-untaint
The output will confirm that the
sysctlcommand ran and the node was untainted.
Using securityContext: privileged: true in a DaemonSet (or any pod) is powerful but comes with significant security implications. It essentially disables most container isolation boundaries for that pod.
- Benefit: Grants the container capabilities necessary for deep host system interactions, such as:
- Modifying kernel parameters (
sysctl). - Loading/unloading kernel modules (
modprobe). - Accessing host devices (
/dev/*). - Modifying protected host filesystems.
- Full network stack manipulation (beyond standard Kubernetes networking).
- Running tools that require raw socket access or specific hardware interactions.
- Modifying kernel parameters (
- Cost: Massively increased security risk and potential for node/cluster instability.
- Container Escape/Host Compromise: A vulnerability within the privileged container's application or image can directly lead to root access on the host node. The attacker bypasses standard container defenses.
- Violation of Least Privilege: Privileged mode grants all capabilities, likely far more than needed for a specific task. This broad access increases the potential damage if the container is compromised.
- Node Destabilization: Accidental or malicious commands run within the privileged container (e.g., incorrect
sysctlvalues,rm -rf /host/boot) can crash or corrupt the host node operating system. - Lateral Movement: Compromising one node via a privileged DaemonSet gives an attacker a strong foothold to attack other nodes, the Kubernetes control plane, or connected systems.
- Data Exposure: Unrestricted access to the host filesystem (
/) can expose sensitive data stored on the node, including credentials, keys, or data belonging to other pods (if accessible via host paths). - Increased Attack Surface: Exposes more of the host kernel's system calls and features to potential exploits from within the container.
- Avoid If Possible: The most secure approach is to avoid
privileged: trueentirely. - Use Linux Capabilities: If elevated rights are needed, grant specific Linux capabilities (e.g.,
NET_ADMIN,SYS_ADMIN,SYS_MODULE) in thesecurityContext.capabilities.addfield instead of full privilege. This follows the principle of least privilege. - Limit Scope: Run privileged DaemonSets only on dedicated, possibly tainted, node pools to contain the potential blast radius.
- Policy Enforcement: Use GKE Policy Controller (or OPA Gatekeeper) to create policies that restrict, audit, or require justification for deploying privileged containers.
- Image Scanning & Trust: Use GKE Binary Authorization and rigorous image scanning to ensure only vetted, trusted container images are run with privilege.
- Minimize Host Mounts: Only mount the specific host paths needed, and use
readOnly: truewhenever possible. Avoid mounting the entire root filesystem (/) unless absolutely necessary. - Regular Audits: Periodically review all workloads running with
privileged: true.