Why GPU workloads need a custom Pod Security Admission baseline

The Pod Security Admission controller rejects privileged pods by default, but the NVIDIA device plugin requires privileged access to initialize GPU drivers.

Pod Security Admission (PSA) is a built-in Kubernetes admission controller that enforces security profiles at the namespace level. It replaces the older PodSecurityPolicy resource and operates by labeling namespaces with pod-security.kubernetes.io/enforce values of privileged, baseline, or restricted. The restricted profile is the strictest, disallowing hostNetwork, hostPID, and privileged: true containers. This profile is the default for new namespaces in many hardened clusters, yet it conflicts with infrastructure components that require elevated kernel access.

The NVIDIA GPU Operator manages the lifecycle of GPU drivers and device plugins on Kubernetes. Its device-plugin component runs as a DaemonSet that must access host device files like /dev/nvidia0 and /dev/nvidiactl. To do this, the pod specification must request privileged: true or specific capabilities that exceed the baseline profile. Without this access, the container cannot enumerate the GPU hardware, and the node remains unschedulable for GPU workloads. The conflict arises when a cluster enforces restricted globally, causing the infrastructure pods to fail admission while user workloads succeed.

This tension is not a configuration error but a design constraint. The solution requires isolating the privilege requirements of infrastructure from the security posture of user workloads.

The mechanism of enforcement

Pod Security Admission evaluates incoming pod specifications against the profile defined on the target namespace. It runs early in the admission chain, before custom admission webhooks. If the pod violates the enforce profile, the API server returns a 403 Forbidden error immediately. Custom webhooks cannot override this rejection because PSA is a built-in controller with higher precedence in the admission sequence.

The enforcement profile is set via namespace labels. A namespace labeled pod-security.kubernetes.io/enforce: restricted will reject any pod requesting privileged: true. A namespace labeled pod-security.kubernetes.io/enforce: privileged will allow it. The NVIDIA GPU Operator installs its components into a specific namespace, typically gpu-operator. This namespace must be labeled to permit the device plugin’s security requirements, while user namespaces remain restricted.

The following YAML block shows the namespace configuration required to allow the GPU Operator to function without disabling security for the entire cluster.

apiVersion: v1
kind: Namespace
metadata:
  name: gpu-operator
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

The enforce label permits the device plugin to start. The warn and audit labels ensure that if a user accidentally deploys a privileged pod into this namespace, they receive a warning or audit log entry, maintaining visibility. This separation allows the infrastructure to operate with necessary privileges while keeping the user-facing surface area secure.

The device plugin pod itself requests the necessary privileges. A simplified excerpt from the nvidia-device-plugin DaemonSet shows the security context.

spec:
  containers:
  - name: nvidia-device-plugin
    image: nvcr.io/nvidia/k8s-device-plugin:v0.14.0
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add: ["SYS_ADMIN"]
      privileged: true

The privileged: true field is the specific trigger that PSA checks. If the namespace label is restricted, the API server rejects this pod creation request. If the label is privileged, the request proceeds to the kubelet.

Failure modes

The most common symptom of a PSA conflict is a pod stuck in CreateContainerConfigError or Pending with a FailedScheduling event. However, when PSA is the cause, the event explicitly cites the admission controller. Running kubectl describe pod <pod-name> -n gpu-operator reveals the rejection reason in the Events section.

Type     Reason     Age   From               Message
----     ------     ----  ----               -------
Warning  Failed     2m    kubelet            Error: container has runAsNonRoot and image has non-numeric user
Warning  Failed     2m    kubelet            Pod is rejected by PodSecurity admission

The specific message “Pod is rejected by PodSecurity admission” indicates the policy violation. The kubectl get events -n gpu-operator command aggregates these failures. Without this visibility, operators often misdiagnose the issue as a node resource shortage or a driver mismatch.

A second failure mode occurs when the enforce label is removed after the operator is installed. If an administrator changes the namespace label from privileged to restricted, existing pods are not immediately terminated, but any new pod updates or rollouts will fail. This creates a silent drift where the DaemonSet cannot update to a new version, leaving the cluster with stale drivers. The kubectl rollout status daemonset nvidia-device-plugin -n gpu-operator command will hang indefinitely if the new pods are rejected by PSA.

The audit and warn modes help prevent this. Setting pod-security.kubernetes.io/audit: restricted on the gpu-operator namespace logs violations to the audit log even if enforce: privileged is active. This allows operators to detect if a new version of the GPU Operator introduces a profile violation that would break in a stricter environment.

The scope of the exception

The namespace label strategy isolates privilege to the infrastructure layer. It is not a cluster-wide exemption. A user deploying a training workload into a restricted namespace cannot request privileged: true even if they have cluster-admin permissions. The PSA controller checks the namespace label, not the user’s role.

This distinction matters for multi-tenant clusters. In a shared environment, the gpu-operator namespace is owned by the platform team. User namespaces are owned by data science teams. The platform team labels the infrastructure namespace privileged. The data science teams label their namespaces restricted. This ensures that GPU drivers are loaded securely without exposing the user workloads to host-level access.

Some operators attempt to use custom admission webhooks to grant exceptions. This approach is invalid for PSA enforcement. The Pod Security Admission controller runs before custom webhooks in the standard admission chain. If PSA rejects the pod, the custom webhook never executes. There is no mechanism to configure a webhook to override a restricted profile enforcement. The only valid exception is the namespace label itself.

Decision frame

The question the next time a GPU device plugin fails to start is not “does the webhook allow this.” It is “does the namespace label match the pod’s security context.” Pod Security Admission is a built-in controller that enforces security profiles before custom logic runs. If the gpu-operator namespace is labeled enforce: restricted, the device plugin will never start, regardless of admission webhooks. The operator must verify the namespace labels using kubectl get namespace gpu-operator --show-labels before investigating driver logs. The tradeoff is isolation versus complexity: one privileged namespace for infrastructure is safer than a global enforce: privileged policy.