GPU device disable/enable test failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
We found a GPU device disable/enable test failure, and it is related to below call trace. When GPU device is disable, this call-trace happens at the device disable step.
The system does not panic but the driver is not loaded back.
%echo 1 > /sys/bus/
Note: after this command, PCI bus is not removed but only ‘remove’ file is disappeared with below call trace. All other PCI devices are removed successfully.
<Call trace>
[ 56.649648] hv_balloon: Max. dynamic memory size: 57344 MB
[ 457.438303] NVRM: Attempting to remove minor device 0 with non-zero usage count!
[ 457.438305] ------------[ cut here ]------------
[ 457.438465] WARNING: CPU: 4 PID: 5026 at /var/lib/
[ 457.438466] Modules linked in: xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_security bpfilter nvidia_uvm(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nls_iso8859_1 drm drm_panel_
[ 457.438493] CPU: 4 PID: 5026 Comm: bash Tainted: P OE 5.0.0-1025-azure #27~18.04.1-Ubuntu
[ 457.438494] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 457.438564] RIP: 0010:nvidia_
[ 457.438565] Code: ff e8 17 c5 9a f3 41 8b 95 68 04 00 00 48 c7 c6 f8 97 8e c1 bf 04 00 00 00 e8 cf 9c 00 00 48 c7 c7 b0 82 8e c1 e8 b6 8b a1 f3 <0f> 0b e8 cc a2 00 00 eb f9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[ 457.438566] RSP: 0018:ffffb1578b
[ 457.438567] RAX: 0000000000000024 RBX: ffff8ec43bdf0000 RCX: 0000000000000006
[ 457.438568] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff8ec445d15580
[ 457.438568] RBP: ffffb1578bcfbd40 R08: 0000000000000001 R09: 000000000000023c
[ 457.438569] R10: ffffb1578bcfba38 R11: 0000000000000000 R12: ffff8ec43d3b2000
[ 457.438569] R13: ffff8ec4388b3000 R14: ffffffffc19411b0 R15: 0000000000000060
[ 457.438570] FS: 00007f92d726374
[ 457.438573] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 457.438573] CR2: 0000560d3d973f60 CR3: 0000000e4aeca004 CR4: 00000000001606e0
[ 457.438574] Call Trace:
[ 457.438579] pci_device_
[ 457.438582] device_
[ 457.438583] device_
[ 457.438585] pci_stop_
[ 457.438586] pci_stop_
[ 457.438588] remove_
[ 457.438590] dev_attr_
[ 457.438592] sysfs_kf_
[ 457.438593] kernfs_
[ 457.438596] __vfs_write+
[ 457.438598] vfs_write+
[ 457.438599] ksys_write+
[ 457.438601] __x64_sys_
[ 457.438603] do_syscall_
[ 457.438607] entry_SYSCALL_
[ 457.438608] RIP: 0033:0x7f92d6947154
[ 457.438609] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[ 457.438610] RSP: 002b:00007ffe5a
[ 457.438611] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f92d6947154
[ 457.438612] RDX: 0000000000000002 RSI: 0000560d3d7bd8c0 RDI: 0000000000000001
[ 457.438612] RBP: 0000560d3d7bd8c0 R08: 000000000000000a R09: 0000000000000001
[ 457.438613] R10: 000000000000000a R11: 0000000000000246 R12: 00007f92d6c23760
[ 457.438613] R13: 0000000000000002 R14: 00007f92d6c1f2a0 R15: 00007f92d6c1e760
[ 457.438615] ---[ end trace 64ddc7a9a2dd8bd8 ]---
Kernel: 5.0.0-1025-azure
This issue happens with 18.04 and not 16.04.