Bionic/linux-azure: Call trace on Ubuntu 18.04 VM with Standard NV24
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Invalid
|
Medium
|
Unassigned | ||
Bionic |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Tim Gardner | ||
linux-azure-5.4 (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Tim Gardner | ||
Focal |
Invalid
|
Undecided
|
Unassigned |
Bug Description
SRU Justification
[Impact]
During large scale deployment testing, we found below call trace when provisioning Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed instance 10 times and encountered once.
It looks like a race condition when probe device, but finally all devices can be probed.
[ 4.938162] sysfs: cannot create duplicate filename '/devices/
[ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure #64~18.04.1-Ubuntu
[ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 4.955812] Workqueue: hv_pri_chan vmbus_add_
[ 4.955812] Call Trace:
[ 4.955812] dump_stack+
[ 4.955812] sysfs_warn_
[ 4.955812] sysfs_add_
[ 4.955812] sysfs_create_
[ 4.955812] pci_create_
[ 4.955812] pci_bus_
[ 4.955812] pci_bus_
[ 4.955812] hv_pci_
[ 4.955812] vmbus_probe+
[ 4.955812] really_
[ 4.955812] driver_
[ 4.955812] __device_
[ 4.955812] ? driver_
[ 4.955812] bus_for_
[ 4.955812] __device_
[ 4.955812] device_
[ 4.955812] bus_probe_
[ 4.955812] device_
[ 4.955812] device_
[ 4.955812] vmbus_device_
[ 4.955812] vmbus_add_
[ 4.955812] process_
[ 4.955812] worker_
[ 4.955812] kthread+0x121/0x140
[ 4.955812] ? process_
[ 4.955812] ? kthread_
[ 4.955812] ret_from_
[ 5.043612] hv_pci 47505500-
[ 5.260563] hv_pci 47505500-
Dexuan did some research and it looks like this is a longstanding race condition bug in the generic PCI subsystem (due to the timing, there can be more than 1 place where the PCI code tries to create the same ‘config’ sysfs file):
https:/
The bug was reported on 7/16/2020, and the last reply was on 6/25/2021. It looks like this has not been fixed after 1+ year…
Business Impact
[Test Case]
Repeated deployment on a Standard_NV24 instance. MS reported the reproduction rate is 3/551 before the patch, and 0/838 with the patch.
[Where things could go wrong]
Deployments could fail for other reasons.
[Other info]
SF: #00321027
CVE References
affects: | linux (Ubuntu) → linux-azure (Ubuntu) |
Changed in linux-azure (Ubuntu): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
importance: | Undecided → Medium |
Changed in linux-azure-5.4 (Ubuntu Focal): | |
status: | New → Invalid |
Changed in linux-azure-5.4 (Ubuntu Bionic): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Tim Gardner (timg-tpi) |
Changed in linux-azure (Ubuntu Bionic): | |
status: | New → Invalid |
Changed in linux-azure (Ubuntu): | |
assignee: | Tim Gardner (timg-tpi) → nobody |
status: | New → Invalid |
Changed in linux-azure (Ubuntu Focal): | |
status: | New → In Progress |
assignee: | nobody → Tim Gardner (timg-tpi) |
Changed in linux-azure (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux-azure-5.4 (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
This bug is awaiting verification that the linux-azure/ 5.4.0-1065. 68 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- focal' to 'verification- done-focal' . If the problem still exists, change the tag 'verification- needed- focal' to 'verification- failed- focal'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!