Fix GCP GKE Node Image Errors
When working with GCP GKE, you may encounter a configuration error that prevents your deployment from working. This guide explains the most common mistake with node image and shows the exact fix.
A Common Mistake
Using the default node image (Container-Optimized OS with containerd) when a GPU workload requires a specific driver version available only on Ubuntu, causing GPU workloads to fail.
The incorrect command:
gcloud container node-pools create gpu-pool --cluster=my-cluster --zone=us-central1-a --accelerator=type=nvidia-tesla-t4,count=1 --image-type=COS_CONTAINERD
Error output:
Created GPU node pool with COS.
GPU driver installation fails:
nvidia-smi
NVIDIA-SMI has failed because it could not communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.
COS has limited GPU driver support for some GPU types.
The Correct Approach
The right way to configure node image in GCP GKE:
gcloud container node-pools create gpu-pool --cluster=my-cluster --zone=us-central1-a --accelerator=type=nvidia-tesla-t4,count=1 --image-type=UBUNTU_CONTAINERD
Successful result:
Created GPU node pool with Ubuntu.
Driver installation succeeds:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
+-----------------------------------------------------------------------------+
How to Prevent This
Choose the right image type: COS_CONTAINERD (default, most secure), UBUNTU_CONTAINERD (wider driver support), Windows_LTSC (Windows workloads). COS is recommended for non-GPU workloads. Ubuntu is better for GPU workloads requiring specific drivers. COS is free; Ubuntu license costs apply for some use cases.
FAQ
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Secure your cloud with DodaTech.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro