Activity log for bug #1807393

Date Who What changed Old value New value Message
2018-12-07 13:43:46 Guilherme G. Piccoli bug added bug
2018-12-07 15:01:45 Guilherme G. Piccoli attachment added TEST patch for qemu nvme virtual device https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1807393/+attachment/5220068/+files/0001-hw-block-nvme-NVMe-hack-to-forcibly-miss-an-interrup.patch
2018-12-07 18:49:48 Guilherme G. Piccoli description Description to be updated [Impact] * 1 [Test Case] * 2 [Regression Potential] * 3 [Impact] * NVMe controllers potentially could miss to send an interrupt, specially due to bugs in virtual devices(which are common those days - qemu has its own NVMe virtual device, so does AWS). This would be a difficult to debug situation, because NVMe driver only reports the request timeout, not the reason. * The upstream patch proposed to SRU here here, 7776db1ccc12 ("NVMe/pci: Poll CQ on timeout") was designed to provide more information in these cases, by pro-actively polling the CQEs on request timeouts, to check if the specific request was completed and some issue (probably a missed interrupt) prevented the driver to notice, or if the request really wasn't completed, which indicates more severe issues. * Although quite useful for debugging, this patch could help to mitigate issues in cloud environments like AWS, in case we may have jitter in request completion and the i/o timeout was set to low values, or even in case of atypical bugs in the virtual NVMe controller. With this patch, if polling succeeds the NVMe driver will continue working instead of trying a reset controller procedure, which may lead to fails in the rootfs - refer to https://launchpad.net/bugs/1788035. [Test Case] * It's a bit tricky to artificially create a situation of missed interrupt; one idea was to implement a small hack in the NVMe qemu virtual device that given a trigger in guest kernel, will induce the virtual device to skip an interrupt. The hack patch is present in comment #1 below. * To trigger such hack from guest kernel, all is needed is to issue a raw admin command from nvme-cli: "nvme admin-passthru -o 0xff /dev/nvme0" After that, just perform some I/Os to see one of them aborting - one could use dd for this goal, like "dd if=/dev/zero of=/dev/nvme0n1 count=5". [Regression Potential] * There are no clear risks in adding such polling mechanism to the NVMe driver; one bad thing that was neverreported but could happen with this patch is the device could be in a bad state IRQ-wise that a reset would fix, but the patch could cause all requests to be completed via polling, which prevents the adapter reset. This is however a very hypothetical situation, which would also happen in the mainline kernel (since it has the patch).
2018-12-07 18:49:59 Guilherme G. Piccoli nominated for series Ubuntu Xenial
2018-12-07 18:51:00 Eric Desrochers bug task added linux (Ubuntu Xenial)
2018-12-07 18:52:19 Guilherme G. Piccoli linux (Ubuntu Xenial): status New Confirmed
2018-12-07 18:52:21 Guilherme G. Piccoli linux (Ubuntu Xenial): assignee Guilherme G. Piccoli (gpiccoli)
2018-12-07 22:02:55 Guilherme G. Piccoli linux (Ubuntu): status Confirmed In Progress
2018-12-07 22:02:57 Guilherme G. Piccoli linux (Ubuntu Xenial): status Confirmed In Progress
2018-12-07 22:03:01 Guilherme G. Piccoli linux (Ubuntu Xenial): importance Undecided High
2018-12-07 22:29:02 Gabriel Muñoz bug added subscriber Gabriel Muñoz
2018-12-10 04:48:32 Dominique Poulain bug added subscriber Dominique Poulain
2019-01-02 11:18:32 Dominik Grzywaczewski bug added subscriber Dominik Grzywaczewski
2019-01-09 09:27:28 Khaled El Mously linux (Ubuntu Xenial): status In Progress Fix Committed
2019-01-17 14:21:42 Brad Figg tags sts sts verification-needed-xenial
2019-01-17 17:50:51 Guilherme G. Piccoli tags sts verification-needed-xenial sts verification-done-xenial
2019-01-23 16:42:49 Rok Zlender bug added subscriber Rok Zlender
2019-01-24 18:55:21 Olaf Doemer bug added subscriber Olaf Doemer
2019-02-04 08:47:55 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2019-02-04 08:47:55 Launchpad Janitor cve linked 2000-1134
2019-02-04 08:47:55 Launchpad Janitor cve linked 2007-3852
2019-02-04 08:47:55 Launchpad Janitor cve linked 2008-0525
2019-02-04 08:47:55 Launchpad Janitor cve linked 2009-0416
2019-02-04 08:47:55 Launchpad Janitor cve linked 2011-4834
2019-02-04 08:47:55 Launchpad Janitor cve linked 2015-1838
2019-02-04 08:47:55 Launchpad Janitor cve linked 2015-7442
2019-02-04 08:47:55 Launchpad Janitor cve linked 2016-7489
2019-02-04 08:47:55 Launchpad Janitor cve linked 2017-5715
2019-02-04 08:47:55 Launchpad Janitor cve linked 2018-19407
2019-07-24 20:57:08 Brad Figg tags sts verification-done-xenial cscc sts verification-done-xenial
2020-07-14 14:54:12 Guilherme G. Piccoli linux (Ubuntu): status In Progress Fix Released