calling nvidia-smi in udev rule is a gap to the driver from Nvidia
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
HWE Next |
Fix Released
|
Undecided
|
Unassigned | ||
NVIDIA Drivers Ubuntu |
Fix Released
|
Undecided
|
Alberto Milone | ||
OEM Priority Project |
Fix Released
|
Critical
|
Cyrus Lien |
Bug Description
There's some timing issue (LP: #1812784) that caused by nvidia-driver-430 counting on udev rule calling nvidia-smi in 71-nvidia.rules to create device node "nvidia0" and "nvidiactl".
This behavior is different from the .run driver released from Nvidia which is counting on nvidia-modprobe(a setuid root utility which nvidia-installer installs by default.), and the issue does not happen to the system which installed the same version of .run driver directly.
Consulting Nvidia, they don't expect the nodes to be created by udev rules, so the solution should be either revise our way to follow Nvida calling nvidia-modprobe or find some way to close this gap instead of calling nvidia-smi.
# Target:
- remove nvidia-smi from 71-nvidia.rules , because it's a workaround and it impacts some platform (LP: #1812784).
- either revise our way to follow Nvida calling nvidia-modprobe or find some way to close this gap instead of calling nvidia-smi
# Concern:
- nvidia-modprobe is a setuid root utility, will it be a security concern?
# Machine environment:
- on an ice lake machine (BIOS ID: 097B)
- 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d11] (rev a1)
- kernel: 5.0.0-1016-oem-osp1
- nvidia driver: nvidia-driver-430
# caseA : install nvidia-driver-430:
1. install nvidia-driver-430 on machine.
2. remove nvidia-smi from 71-nvidia.rules (same as https:/
3. reboot machine to text mode.
4. $ systemctl start gdm <= "nvidia0" and "nvidiactl" will not be created under /dev
5. loging failure with xorg failure
"gdm-x-
# caseB : install .run from source code of nvidia-driver-430:
1. execute NVIDIA-
2. reboot machine to text mode.
3. $ systemctl start gdm <= "nvidia0" and "nvidiactl" will be created automatically.
4. login works well.
# Analyze :
In caseB, from strace of gnome-shell which will be initialed by gdm systemd service, it was using libnvidia-
In caseA, from strace of gnome-shell, it follow the same way of caseB, but the nvidia-modprobe is not there, then /dev/nvidiactl will never be created.
Then on caseA, coping nvidia-modprobe to /usr/bin and setuid root. then it works well as caseB.
related issues:
https:/
- on Precision-7730 NVIDIA Corporation GP104GLM [Quadro P5200 Mobile][10de:1bb5]
https:/
https:/
https:/
https:/
https:/
https:/
the nvidia-modprobe was rejected by security concern: https:/
Changed in oem-priority: | |
importance: | Undecided → Critical |
assignee: | nobody → Alex Tu (alextu) |
description: | updated |
Changed in nvidia-drivers-ubuntu: | |
assignee: | nobody → Alberto Milone (albertomilone) |
Changed in oem-priority: | |
status: | New → Triaged |
tags: | added: nvidia-smi |
description: | updated |
tags: | added: oem-priority originate-from-1831013 somerville |
Changed in hwe-next: | |
status: | New → Triaged |
Changed in oem-priority: | |
assignee: | Alex Tu (alextu) → Leon Liao (lihow731) |
Changed in oem-priority: | |
assignee: | Leon Liao (lihow731) → Cyrus Lien (cyruslien) |
Changed in oem-priority: | |
status: | Triaged → Fix Released |
Changed in hwe-next: | |
status: | Triaged → Fix Released |
the strace of gnome-shell of success caseB. /dev/nvidiactl" , S_IFCHR|0666, makedev(195, 255)) = -1 EACCES (Permission denied) nvidia- modprobe to create it.
check line 137927
mknod("
then it call /usr/bin/