Comment 14 for bug 1788035

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I'm investigating this issue, and built a kernel with the following two patches:

a) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7776db1ccc1
b) A debug patch present in http://lists.infradead.org/pipermail/linux-nvme/2017-February/008498.html

The idea of the first patch, which was merged upstream in Linux 4.12, is to poll the completion
queue of the device in the event of a timeout - if it succeeds, means that the device didn't post a completion, so could be an adapter issue.

The idea of the 2nd patch is just to provide debug information in case of a mismatch in the choice
of the blk-mq hw queue in nvme driver - it's a debug patch proposed in the mailing list to address a similar bug report in the past.

The kernel with the debug patches is available in PPA - to install it, one can follow the below instructions:

a) sudo add-apt-repository ppa:gpiccoli/test-nvme-182638
b) sudo apt-get update
c) sudo apt-get install linux-image-4.4.0-1073-aws

After installation is complete, please reboot the instance and after it's restarted,
check "uname -rv" output, which should be:

"4.4.0-1073-aws #83+hf182638v20181129b1-Ubuntu SMP Fri Nov 30 17:09:30 UTC 2018"

Please notice this is a test kernel, shouldn't be used in any production environment, nor is
officially supported in any form.

Anybody that can test this, much appreciated. Please post the complete dmesg after/if the issue is triggered.
Thanks,

Guilherme