We encountered an instance that had a nvme failure very early on in boot today. I've updated our internal Canonical case as well as our Amazon case on this, but posting relevant details here as well for consistency: # uname -a Linux XXX 4.4.0-1069-aws #79-Ubuntu SMP Mon Sep 24 15:01:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS" # echo type $EC2_INSTANCE_TYPE type m5.xlarge # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 10G 0 disk / # ls -al /dev/nvme* /dev/xvd* /dev/sd* ls: cannot access '/dev/xvd*': No such file or directory crw------- 1 root root 248, 0 Oct 31 15:02 /dev/nvme0 brw-rw---- 1 root disk 259, 0 Oct 31 15:02 /dev/nvme0n1 lrwxrwxrwx 1 root root 7 Oct 31 15:02 /dev/sda1 -> nvme0n1 # dmesg | grep '63\.' [ 63.401466] nvme 0000:00:1f.0: I/O 0 QID 0 timeout, disable controller [ 63.505790] nvme 0000:00:1f.0: Cancelling I/O 0 QID 0 [ 63.505812] nvme 0000:00:1f.0: Identify Controller failed (-4) [ 63.507536] nvme 0000:00:1f.0: Removing after probe failure [ 63.507604] iounmap: bad address ffffc90001b40000 [ 63.508941] CPU: 1 PID: 351 Comm: kworker/1:3 Tainted: P O 4.4.0-1069-aws #79-Ubuntu [ 63.508943] Hardware name: Amazon EC2 m5.xlarge/, BIOS 1.0 10/16/2017 [ 63.508948] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 63.508950] 0000000000000286 3501e2639044a4d2 ffff8800372bfce0 ffffffff923ffe03 [ 63.508952] ffff88040dd878f0 ffffc90001b40000 ffff8800372bfd00 ffffffff9206d3af [ 63.508954] ffff88040dd878f0 ffff88040dd87a58 ffff8800372bfd10 ffffffff9206d3ec [ 63.508956] Call Trace: [ 63.508961] [] dump_stack+0x63/0x90 [ 63.508965] [] iounmap.part.1+0x7f/0x90 [ 63.508967] [] iounmap+0x2c/0x30 [ 63.508969] [] nvme_dev_unmap.isra.35+0x1a/0x30 [nvme] [ 63.508972] [] nvme_remove+0xce/0xe0 [nvme] [ 63.508976] [] pci_device_remove+0x3e/0xc0 [ 63.508980] [] __device_release_driver+0xa4/0x150 [ 63.508982] [] device_release_driver+0x23/0x30 [ 63.508986] [] pci_stop_bus_device+0x7a/0xa0 [ 63.508988] [] pci_stop_and_remove_bus_device_locked+0x1a/0x30 [ 63.508990] [] nvme_remove_dead_ctrl_work+0x3c/0x50 [nvme] [ 63.508994] [] process_one_work+0x16b/0x490 [ 63.508996] [] worker_thread+0x4b/0x4d0 [ 63.508998] [] ? process_one_work+0x490/0x490 [ 63.509001] [] kthread+0xe7/0x100 [ 63.509005] [] ? __schedule+0x301/0x7f0 [ 63.509007] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 63.509009] [] ret_from_fork+0x55/0x80 [ 63.509011] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 63.509013] Trying to free nonexistent resource <00000000febf8000-00000000febfbfff> # modinfo nvme filename: /lib/modules/4.4.0-1069-aws/kernel/drivers/nvme/host/nvme.ko version: 1.0 license: GPL author: Matthew Wilcox