2018-12-07 13:43:46 |
Guilherme G. Piccoli |
bug |
|
|
added bug |
2018-12-07 15:01:45 |
Guilherme G. Piccoli |
attachment added |
|
TEST patch for qemu nvme virtual device https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1807393/+attachment/5220068/+files/0001-hw-block-nvme-NVMe-hack-to-forcibly-miss-an-interrup.patch |
|
2018-12-07 18:49:48 |
Guilherme G. Piccoli |
description |
Description to be updated
[Impact]
* 1
[Test Case]
* 2
[Regression Potential]
* 3 |
[Impact]
* NVMe controllers potentially could miss to send an interrupt, specially
due to bugs in virtual devices(which are common those days - qemu has its
own NVMe virtual device, so does AWS). This would be a difficult to
debug situation, because NVMe driver only reports the request timeout,
not the reason.
* The upstream patch proposed to SRU here here, 7776db1ccc12
("NVMe/pci: Poll CQ on timeout") was designed to provide more information
in these cases, by pro-actively polling the CQEs on request timeouts, to
check if the specific request was completed and some issue (probably a
missed interrupt) prevented the driver to notice, or if the request really
wasn't completed, which indicates more severe issues.
* Although quite useful for debugging, this patch could help to mitigate
issues in cloud environments like AWS, in case we may have jitter in
request completion and the i/o timeout was set to low values, or even
in case of atypical bugs in the virtual NVMe controller. With this patch,
if polling succeeds the NVMe driver will continue working instead of
trying a reset controller procedure, which may lead to fails in the
rootfs - refer to https://launchpad.net/bugs/1788035.
[Test Case]
* It's a bit tricky to artificially create a situation of missed interrupt;
one idea was to implement a small hack in the NVMe qemu virtual device
that given a trigger in guest kernel, will induce the virtual device to
skip an interrupt. The hack patch is present in comment #1 below.
* To trigger such hack from guest kernel, all is needed is to issue a
raw admin command from nvme-cli: "nvme admin-passthru -o 0xff /dev/nvme0"
After that, just perform some I/Os to see one of them aborting - one could
use dd for this goal, like "dd if=/dev/zero of=/dev/nvme0n1 count=5".
[Regression Potential]
* There are no clear risks in adding such polling mechanism to the NVMe driver; one bad thing that was neverreported but could happen with this patch is the device could be in a bad state IRQ-wise that a reset would fix, but
the patch could cause all requests to be completed via polling, which
prevents the adapter reset. This is however a very hypothetical situation,
which would also happen in the mainline kernel (since it has the patch). |
|
2018-12-07 18:49:59 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Xenial |
|
2018-12-07 18:51:00 |
Eric Desrochers |
bug task added |
|
linux (Ubuntu Xenial) |
|
2018-12-07 18:52:19 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): status |
New |
Confirmed |
|
2018-12-07 18:52:21 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2018-12-07 22:02:55 |
Guilherme G. Piccoli |
linux (Ubuntu): status |
Confirmed |
In Progress |
|
2018-12-07 22:02:57 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): status |
Confirmed |
In Progress |
|
2018-12-07 22:03:01 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): importance |
Undecided |
High |
|
2018-12-07 22:29:02 |
Gabriel Muñoz |
bug |
|
|
added subscriber Gabriel Muñoz |
2018-12-10 04:48:32 |
Dominique Poulain |
bug |
|
|
added subscriber Dominique Poulain |
2019-01-02 11:18:32 |
Dominik Grzywaczewski |
bug |
|
|
added subscriber Dominik Grzywaczewski |
2019-01-09 09:27:28 |
Khaled El Mously |
linux (Ubuntu Xenial): status |
In Progress |
Fix Committed |
|
2019-01-17 14:21:42 |
Brad Figg |
tags |
sts |
sts verification-needed-xenial |
|
2019-01-17 17:50:51 |
Guilherme G. Piccoli |
tags |
sts verification-needed-xenial |
sts verification-done-xenial |
|
2019-01-23 16:42:49 |
Rok Zlender |
bug |
|
|
added subscriber Rok Zlender |
2019-01-24 18:55:21 |
Olaf Doemer |
bug |
|
|
added subscriber Olaf Doemer |
2019-02-04 08:47:55 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2000-1134 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2007-3852 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2008-0525 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2009-0416 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2011-4834 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2015-1838 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2015-7442 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2016-7489 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2017-5715 |
|
2019-02-04 08:47:55 |
Launchpad Janitor |
cve linked |
|
2018-19407 |
|
2019-07-24 20:57:08 |
Brad Figg |
tags |
sts verification-done-xenial |
cscc sts verification-done-xenial |
|
2020-07-14 14:54:12 |
Guilherme G. Piccoli |
linux (Ubuntu): status |
In Progress |
Fix Released |
|