Servers with two controllers. The second one disappear (with a kernel trace).
> cat /proc/version
Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
After upgrading kernel, my ZFS pool becomes DEGRADED:
> zpool status
pool: zp0
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zp0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
nvme0n1 ONLINE 0 0 0
9486952355712335023 UNAVAIL 0 0 0 was /dev/nvme1n1
Only ONE controller listed: !!
> nvme list
Node SN Model Version Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 CVMD4391006B800GGN INTEL SSDPE2ME800G4 1.0 1 800,17 GB / 800,17 GB 512 B + 0 B 8DV10102
Servers with two controllers. The second one disappear (with a kernel trace).
> cat /proc/version 16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~
After upgrading kernel, my ZFS pool becomes DEGRADED: zfsonlinux. org/msg/ ZFS-8000- 4J
> zpool status
pool: zp0
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://
scan: none requested
config:
NAME STATE READ WRITE CKSUM 712335023 UNAVAIL 0 0 0 was /dev/nvme1n1
zp0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
nvme0n1 ONLINE 0 0 0
9486952355
Only ONE controller listed: !!
> nvme list ------- ------ ------- ------- ------- ------- ------- ----- -------- --------- ------- ------- ------- ----- ---------------- --------
Node SN Model Version Namespace Usage Format FW Rev
---------------- -------
/dev/nvme0n1 CVMD4391006B800GGN INTEL SSDPE2ME800G4 1.0 1 800,17 GB / 800,17 GB 512 B + 0 B 8DV10102
The bug isn't fixed for me.
[ 68.950042] nvme 0000:82:00.0: I/O 0 QID 0 timeout, disable controller RC1+/X10DRFR- N, BIOS 2.0 01/27/2016 dead_ctrl_ work [nvme] aa3>] dump_stack+ 0x63/0x90 dff>] iounmap. part.1+ 0x7f/0x90 e3c>] iounmap+0x2c/0x30 64a>] nvme_dev_ unmap.isra. 35+0x1a/ 0x30 [nvme] 75e>] nvme_remove+ 0xce/0xe0 [nvme] 009>] pci_device_ remove+ 0x39/0xc0 5e1>] __device_ release_ driver+ 0xa1/0x150 6b3>] device_ release_ driver+ 0x23/0x30 a7a>] pci_stop_ bus_device+ 0x8a/0xa0 bca>] pci_stop_ and_remove_ bus_device_ locked+ 0x1a/0x30 09c>] nvme_remove_ dead_ctrl_ work+0x3c/ 0x50 [nvme] 4a5>] process_ one_work+ 0x165/0x480 80b>] worker_ thread+ 0x4b/0x4c0 7c0>] ? process_ one_work+ 0x480/0x480 7c0>] ? process_ one_work+ 0x480/0x480 9e8>] kthread+0xd8/0xf0 910>] ? kthread_ create_ on_node+ 0x1e0/0x1e0 38f>] ret_from_ fork+0x3f/ 0x70 910>] ? kthread_ create_ on_node+ 0x1e0/0x1e0 00-00000000fbd1 3fff>
[ 69.054149] nvme 0000:82:00.0: Cancelling I/O 0 QID 0
[ 69.054182] nvme 0000:82:00.0: Identify Controller failed (-4)
[ 69.060132] nvme 0000:82:00.0: Removing after probe failure
[ 69.060284] iounmap: bad address ffffc9000cf34000
[ 69.065020] CPU: 14 PID: 247 Comm: kworker/14:1 Tainted: P OE 4.4.0-47-generic #68-Ubuntu
[ 69.065034] Hardware name: Supermicro SYS-F618R2-
[ 69.065040] Workqueue: events nvme_remove_
[ 69.065050] 0000000000000286 00000000e10d6171 ffff8820340efce0 ffffffff813f5aa3
[ 69.065052] ffff88203454b4f0 ffffc9000cf34000 ffff8820340efd00 ffffffff8106bdff
[ 69.065054] ffff88203454b4f0 ffff88203454b658 ffff8820340efd10 ffffffff8106be3c
[ 69.065056] Call Trace:
[ 69.065068] [<ffffffff813f5
[ 69.065089] [<ffffffff8106b
[ 69.065093] [<ffffffff8106b
[ 69.065097] [<ffffffffc01c3
[ 69.065099] [<ffffffffc01c4
[ 69.065108] [<ffffffff81447
[ 69.065117] [<ffffffff81558
[ 69.065119] [<ffffffff81558
[ 69.065123] [<ffffffff8143f
[ 69.065125] [<ffffffff8143f
[ 69.065129] [<ffffffffc01c3
[ 69.065136] [<ffffffff8109a
[ 69.065138] [<ffffffff8109a
[ 69.065141] [<ffffffff8109a
[ 69.065143] [<ffffffff8109a
[ 69.065147] [<ffffffff810a0
[ 69.065150] [<ffffffff810a0
[ 69.065157] [<ffffffff81835
[ 69.065158] [<ffffffff810a0
[ 69.065161] Trying to free nonexistent resource <00000000fbd100