-
Notifications
You must be signed in to change notification settings - Fork 442
Description
What happened:
worker:
replicas: 1
vgpu:
enabled: true # <- 新增这一行,开启 HAMi vGPU
resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
nvidia.com/gpu: "4"
nvidia.com/gpucores: 10 # 占用 15% GPU 核心
nvidia.com/gpumem-percentage: 5 # 占用 10% 显存
ports:
- containerPort: 30001
protocol: TCP
Command:
xinference-worker
Args:
-e
http://service-supervisor:9997
--host
$(POD_IP)
--worker-port
30001
--log-level
debug
Limits:
nvidia.com/gpu: 4
nvidia.com/gpucores: 10
nvidia.com/gpumem-percentage: 5
Requests:
cpu: 2
memory: 8Gi
nvidia.com/gpu: 4
nvidia.com/gpucores: 10
nvidia.com/gpumem-percentage: 5
Environment:
POD_IP: (v1:status.podIP)
XINFERENCE_PROCESS_START_METHOD: <set to the key 'XINFERENCE_PROCESS_START_METHOD' of config map 'xinferencecluster-distributed-config'> Optional: false
XINFERENCE_MODEL_SRC: <set to the key 'XINFERENCE_MODEL_SRC' of config map 'xinferencecluster-distributed-config'> Optional: false
XINFERENCE_HOME: <set to the key 'XINFERENCE_HOME' of config map 'xinferencecluster-distributed-config'> Optional: false
Mounts:
/dev/shm from shm (rw)
/opt/xinference from xinference-home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-flwq4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
xinference-home:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: xinferencecluster-distributed-home-pvc
ReadOnly: false
shm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 128Gi
kube-api-access-flwq4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/arch=amd64
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
Warning FailedScheduling 67s hami-scheduler 0/4 nodes are available: 1 NodeUnfitPod, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 2 node unregistered. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod..
Warning FilteringFailed 67s hami-scheduler no available node, 3 nodes do not meet
nvidia.com/gpu: "4" If there are more than two, an error will be reported
What you expected to happen: Enable and allocate VGPU normally
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
- The output of
nvidia-smi -aon your host - Your docker or containerd configuration file (e.g:
/etc/docker/daemon.json) - The hami-device-plugin container logs
- The hami-scheduler container logs
- The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet) - Any relevant kernel output lines from
dmesg
Environment:
- HAMi version:
- nvidia driver or other AI device driver version:
- Docker version from
docker version - Docker command, image and tag used
- Kernel version from
uname -a - Others: