Servers with two controllers. The second one disappear (with a kernel trace). > cat /proc/version Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 After upgrading kernel, my ZFS pool becomes DEGRADED: > zpool status pool: zp0 state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: none requested config: NAME STATE READ WRITE CKSUM zp0 DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1 ONLINE 0 0 0 9486952355712335023 UNAVAIL 0 0 0 was /dev/nvme1n1 Only ONE controller listed: !! > nvme list Node SN Model Version Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 CVMD4391006B800GGN INTEL SSDPE2ME800G4 1.0 1 800,17 GB / 800,17 GB 512 B + 0 B 8DV10102 The bug isn't fixed for me. [ 68.950042] nvme 0000:82:00.0: I/O 0 QID 0 timeout, disable controller [ 69.054149] nvme 0000:82:00.0: Cancelling I/O 0 QID 0 [ 69.054182] nvme 0000:82:00.0: Identify Controller failed (-4) [ 69.060132] nvme 0000:82:00.0: Removing after probe failure [ 69.060284] iounmap: bad address ffffc9000cf34000 [ 69.065020] CPU: 14 PID: 247 Comm: kworker/14:1 Tainted: P OE 4.4.0-47-generic #68-Ubuntu [ 69.065034] Hardware name: Supermicro SYS-F618R2-RC1+/X10DRFR-N, BIOS 2.0 01/27/2016 [ 69.065040] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 69.065050] 0000000000000286 00000000e10d6171 ffff8820340efce0 ffffffff813f5aa3 [ 69.065052] ffff88203454b4f0 ffffc9000cf34000 ffff8820340efd00 ffffffff8106bdff [ 69.065054] ffff88203454b4f0 ffff88203454b658 ffff8820340efd10 ffffffff8106be3c [ 69.065056] Call Trace: [ 69.065068] [] dump_stack+0x63/0x90 [ 69.065089] [] iounmap.part.1+0x7f/0x90 [ 69.065093] [] iounmap+0x2c/0x30 [ 69.065097] [] nvme_dev_unmap.isra.35+0x1a/0x30 [nvme] [ 69.065099] [] nvme_remove+0xce/0xe0 [nvme] [ 69.065108] [] pci_device_remove+0x39/0xc0 [ 69.065117] [] __device_release_driver+0xa1/0x150 [ 69.065119] [] device_release_driver+0x23/0x30 [ 69.065123] [] pci_stop_bus_device+0x8a/0xa0 [ 69.065125] [] pci_stop_and_remove_bus_device_locked+0x1a/0x30 [ 69.065129] [] nvme_remove_dead_ctrl_work+0x3c/0x50 [nvme] [ 69.065136] [] process_one_work+0x165/0x480 [ 69.065138] [] worker_thread+0x4b/0x4c0 [ 69.065141] [] ? process_one_work+0x480/0x480 [ 69.065143] [] ? process_one_work+0x480/0x480 [ 69.065147] [] kthread+0xd8/0xf0 [ 69.065150] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 69.065157] [] ret_from_fork+0x3f/0x70 [ 69.065158] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 69.065161] Trying to free nonexistent resource <00000000fbd10000-00000000fbd13fff>