Unable to access nvidia v100 gpu inside pod container #11520
Replies: 3 comments
-
Node description showing allocatable gpu in capacity |
Beta Was this translation helpful? Give feedback.
-
Please read the k3s docs on use of the Nvidia container runtime. Your pod spec does not specify use of the nvidia runtime class. |
Beta Was this translation helpful? Give feedback.
-
Haha, I remember this being a giant pain in the ass. I'll save you (and anyone else) some trouble. nvidia-smi.yaml apiVersion: v1
kind: Pod
metadata:
name: nvidia-smi
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
runtimeClassName: nvidia
restartPolicy: OnFailure
containers:
- name: nvidia-smi
image: nvidia/cuda:12.1.0-base-ubuntu22.04
command: ['sh', '-c', "nvidia-smi"]
resources:
limits:
nvidia.com/gpu: "1" This will print the generic nvidia-smi log to the pod. If you see the GPU show up on that pod, it's usually a success. I have it where it goes to a specific node that has the GPU installed and so it should be successful for every pod that makes it there (all of them -- unless something goes very, very wrong). FWIW -- Above response is in relation to |
Beta Was this translation helpful? Give feedback.
-
I have created multi node k3s cluster
Control plane node has ubuntu 20.4 without gpu
Worker node nvidia dgx v100 with gpu
Deployed nvidia gpu operator which contains nvidia device plugin daemonset
Created pod using below .yaml
After login to shell of pod unable to access gpu nvidia-smi command not found
On worker node everything is working fine drivers installed correctly and nvidia-smi is also working.
Please help with this issue.
Beta Was this translation helpful? Give feedback.
All reactions