2018-09-12 08:42:05 |
Andy Whitcroft |
bug |
|
|
added bug |
2018-09-12 08:42:13 |
Andy Whitcroft |
linux (Ubuntu): status |
New |
Confirmed |
|
2018-09-12 08:42:20 |
Andy Whitcroft |
linux (Ubuntu): importance |
Undecided |
Critical |
|
2018-09-12 08:42:23 |
Andy Whitcroft |
linux (Ubuntu): assignee |
|
Andy Whitcroft (apw) |
|
2018-09-12 08:42:28 |
Andy Whitcroft |
nominated for series |
|
Ubuntu Bionic |
|
2018-09-12 08:42:28 |
Andy Whitcroft |
bug task added |
|
linux (Ubuntu Bionic) |
|
2018-09-12 08:42:41 |
Andy Whitcroft |
linux (Ubuntu Bionic): importance |
Undecided |
Critical |
|
2018-09-12 08:42:43 |
Andy Whitcroft |
linux (Ubuntu Bionic): assignee |
|
Andy Whitcroft (apw) |
|
2018-09-12 08:42:49 |
Andy Whitcroft |
linux (Ubuntu Bionic): status |
New |
In Progress |
|
2018-09-12 09:04:38 |
Andy Whitcroft |
description |
We are seeing deadlocks during hotplug of devices under vfio.
As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()().
Code flow on PCIe hotplug event:
Execution flow 1:
device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )
vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )
Execution flow 2 triggered by above step "send event request to user":
vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )
vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 )
vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 )
pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )
DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c |
[Impact]
Attempts to hotplug devices shared to userspace (qemu) via vfio triggers a deadlock in the kernel. A reboot is required to resolve this.
[Test Case]
Set up a KVM instance with attached devices, attempt to hotplug those using ipmitool.
[Regression Potential]
The change is to an uncommonly used driver. There is common code changes, but these are a noop in the normal case and should be easy to confirm basic operation.
[Other Info]
This fix has been verified by the reporter as fixing the deadlock.
===
We are seeing deadlocks during hotplug of devices under vfio.
As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()().
Code flow on PCIe hotplug event:
Execution flow 1:
device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )
device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )
device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )
vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )
vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923
send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 )
Execution flow 2 triggered by above step "send event request to user":
vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )
vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 )
vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 )
pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )
pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )
pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )
DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c |
|
2018-09-12 09:11:25 |
Andy Whitcroft |
summary |
vfio_pci_release hotplug deadlock |
device hotplug of vfio devices can lead to deadlock in vfio_pci_release |
|
2018-09-12 09:22:14 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2018-09-12 12:11:41 |
Seth Forshee |
linux (Ubuntu): status |
Confirmed |
Fix Committed |
|
2018-09-14 17:02:56 |
Brad Figg |
tags |
|
verification-needed-bionic |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2017-5715 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-14633 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-15572 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-15594 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-17182 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-3639 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-6554 |
|
2018-10-01 17:15:35 |
Launchpad Janitor |
cve linked |
|
2018-6555 |
|
2018-10-03 14:26:54 |
Joseph Salisbury |
linux (Ubuntu): status |
Fix Committed |
Fix Released |
|
2019-02-14 15:52:48 |
Andy Whitcroft |
tags |
verification-needed-bionic |
kernel-fixup-verification-needed-bionic |
|
2019-02-14 16:17:22 |
Andy Whitcroft |
tags |
kernel-fixup-verification-needed-bionic |
kernel-fixup-verification-needed-bionic verification-done-bionic |
|
2019-07-24 20:53:56 |
Brad Figg |
tags |
kernel-fixup-verification-needed-bionic verification-done-bionic |
cscc kernel-fixup-verification-needed-bionic verification-done-bionic |
|