Activity log for bug #1792099

Date Who What changed Old value New value Message
2018-09-12 08:42:05 Andy Whitcroft bug added bug
2018-09-12 08:42:13 Andy Whitcroft linux (Ubuntu): status New Confirmed
2018-09-12 08:42:20 Andy Whitcroft linux (Ubuntu): importance Undecided Critical
2018-09-12 08:42:23 Andy Whitcroft linux (Ubuntu): assignee Andy Whitcroft (apw)
2018-09-12 08:42:28 Andy Whitcroft nominated for series Ubuntu Bionic
2018-09-12 08:42:28 Andy Whitcroft bug task added linux (Ubuntu Bionic)
2018-09-12 08:42:41 Andy Whitcroft linux (Ubuntu Bionic): importance Undecided Critical
2018-09-12 08:42:43 Andy Whitcroft linux (Ubuntu Bionic): assignee Andy Whitcroft (apw)
2018-09-12 08:42:49 Andy Whitcroft linux (Ubuntu Bionic): status New In Progress
2018-09-12 09:04:38 Andy Whitcroft description We are seeing deadlocks during hotplug of devices under vfio. As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1: device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 ) device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 ) device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 ) vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923 send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user": vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 ) vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 ) vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 ) pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 ) pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 ) pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 ) DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c [Impact] Attempts to hotplug devices shared to userspace (qemu) via vfio triggers a deadlock in the kernel. A reboot is required to resolve this. [Test Case] Set up a KVM instance with attached devices, attempt to hotplug those using ipmitool. [Regression Potential] The change is to an uncommonly used driver. There is common code changes, but these are a noop in the normal case and should be easy to confirm basic operation. [Other Info] This fix has been verified by the reporter as fixing the deadlock. === We are seeing deadlocks during hotplug of devices under vfio. As per the Linux kernel source code, there is a deadlock situation between vfio_pci_remove() and vfio_pci_release() on PCIe hotplug events. This issue can be avoided either by skipping the PCIe reset functionality or do device_unlock() in vfio_pci_remove() beforfe calling the function vfio_del_group_dev()(). Code flow on PCIe hotplug event: Execution flow 1:   device_release_driver() ( ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L935 )    device_release_driver_internal() ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L908 )    device_lock(dev); ( https://elixir.bootlin.com/linux/latest/source/drivers/base/dd.c#L915 )    vfio_pci_remove() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )      vfio_del_group_dev() https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L923        send event request to user and wait for VFIO_PCI_DEVICE release in vfio_pci_release() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/vfio.c#L967 ) Execution flow 2 triggered by above step "send event request to user":   vfio_pci_releas() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L392 )     vfio_pci_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L302 )       vfio_pci_try_bus_reset() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/vfio/pci/vfio_pci.c#L1346 )         pci_try_reset_bus() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4981 )           pci_bus_save_and_disable() ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4760 )             pci_dev_lock(dev); ( https://elixir.bootlin.com/linux/v4.18.5/source/drivers/pci/pci.c#L4765 )              DEADLOCK here since PCI_DEIVCE_LOCK is held by PCI_DEVICE remove code path in DD.c
2018-09-12 09:11:25 Andy Whitcroft summary vfio_pci_release hotplug deadlock device hotplug of vfio devices can lead to deadlock in vfio_pci_release
2018-09-12 09:22:14 Kleber Sacilotto de Souza linux (Ubuntu Bionic): status In Progress Fix Committed
2018-09-12 12:11:41 Seth Forshee linux (Ubuntu): status Confirmed Fix Committed
2018-09-14 17:02:56 Brad Figg tags verification-needed-bionic
2018-10-01 17:15:35 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-10-01 17:15:35 Launchpad Janitor cve linked 2017-5715
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-14633
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-15572
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-15594
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-17182
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-3639
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-6554
2018-10-01 17:15:35 Launchpad Janitor cve linked 2018-6555
2018-10-03 14:26:54 Joseph Salisbury linux (Ubuntu): status Fix Committed Fix Released
2019-02-14 15:52:48 Andy Whitcroft tags verification-needed-bionic kernel-fixup-verification-needed-bionic
2019-02-14 16:17:22 Andy Whitcroft tags kernel-fixup-verification-needed-bionic kernel-fixup-verification-needed-bionic verification-done-bionic
2019-07-24 20:53:56 Brad Figg tags kernel-fixup-verification-needed-bionic verification-done-bionic cscc kernel-fixup-verification-needed-bionic verification-done-bionic