2016-04-20 15:22:12 |
Dale Hamel |
bug |
|
|
added bug |
2016-04-20 15:22:12 |
Dale Hamel |
attachment added |
|
loopcrash https://bugs.launchpad.net/bugs/1572630/+attachment/4640780/+files/loopcrash |
|
2016-04-20 15:22:36 |
Dale Hamel |
attachment added |
|
dmidecode https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/+attachment/4640781/+files/dmidecode |
|
2016-04-20 15:24:10 |
Dale Hamel |
attachment added |
|
lspci https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/+attachment/4640782/+files/lspci |
|
2016-04-20 15:27:04 |
Dale Hamel |
attachment added |
|
lshw https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/+attachment/4640783/+files/lshw |
|
2016-04-21 17:31:48 |
Launchpad Janitor |
linux-lts-xenial (Ubuntu): status |
New |
Confirmed |
|
2016-05-19 05:38:03 |
Alberto Salvia Novella |
linux-lts-xenial (Ubuntu): importance |
Undecided |
Critical |
|
2016-06-28 06:11:31 |
Proton |
attachment added |
|
Screenshot https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/+attachment/4691400/+files/Snip20160628_3.png |
|
2016-07-06 16:45:52 |
Eric Desrochers |
bug |
|
|
added subscriber Eric Desrochers |
2016-07-13 14:11:33 |
Eric Desrochers |
linux-lts-xenial (Ubuntu): assignee |
|
Eric Desrochers (slashd) |
|
2016-07-13 14:12:04 |
Eric Desrochers |
linux-lts-xenial (Ubuntu): assignee |
Eric Desrochers (slashd) |
|
|
2016-07-13 14:12:29 |
Eric Desrochers |
linux-lts-xenial (Ubuntu): assignee |
|
Eric Desrochers (slashd) |
|
2016-07-21 13:37:01 |
Eric Desrochers |
linux-lts-xenial (Ubuntu): status |
Confirmed |
In Progress |
|
2016-07-21 13:39:15 |
Eric Desrochers |
linux-lts-xenial (Ubuntu): importance |
Critical |
Medium |
|
2016-08-22 08:57:30 |
Arne de Bruijn |
bug |
|
|
added subscriber Arne de Bruijn |
2016-08-29 17:16:01 |
Eric Desrochers |
description |
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed the test kernel including the fix solve this particular problem.
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
|
2016-08-29 17:16:45 |
Eric Desrochers |
description |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed the test kernel including the fix solve this particular problem.
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed the test kernel including the fix solve this particular problem:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
|
2016-08-29 17:23:21 |
Eric Desrochers |
description |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed the test kernel including the fix solve this particular problem:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed that upstream mainline 4.6-rc1 solve the situation and that the test kernel I have provided including the fix solves this particular problem as well.
Confirmation by Proton :
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
|
2016-08-29 17:25:52 |
Eric Desrochers |
description |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace and the upstream fix.
* The fix is hard to verify, but user "Proton" was able to confirmed that upstream mainline 4.6-rc1 solve the situation and that the test kernel I have provided including the fix solves this particular problem as well.
Confirmation by Proton :
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is hard to verify, but user "Proton" was able to confirmed that upstream mainline 4.6-rc1 solve the situation and that the test kernel I have provided including the fix solves this particular problem as well.
Confirmation by Proton :
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
|
2016-08-29 17:26:10 |
Eric Desrochers |
description |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is hard to verify, but user "Proton" was able to confirmed that upstream mainline 4.6-rc1 solve the situation and that the test kernel I have provided including the fix solves this particular problem as well.
Confirmation by Proton :
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
[Impact]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
[Test Case]
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached.
[Regression Potential]
* Fix implemented upstream starting with v4.6-rc1
* The fix is fairly straightfoward given the stack trace.
* The fix is hard to verify, but user "Proton" was able to confirmed that upstream mainline 4.6-rc1 solve the situation and that the test kernel I have provided including the fix solves this particular problem as well.
Confirmation by Proton :
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
[Other Info]
* https://lkml.org/lkml/2016/3/16/40
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
* http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
[Original Description]
We discovered a pretty serious regression introduced in 4.4.0-18.
At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk', a snippet of the track is below, and full panic dump is attached. The panic dump was collected via serial console, as the kernel panics so early that we cannot kdump it.
[ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
[ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
[ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
[ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but the trace is not identical.
We discovered this issue when we were experimenting with linux-generic-lts-xenial from trusty-updates on a 14.04 installation. When we installed it, 4.4.0-15 was the current package, and it worked fine and provided a large amount of improvements for us. Background security updates installed 4.4.0-18, and this updated and grub and became the default kernel. On a reboot, the node panics about 2 seconds in, resulting in a machine in a dead state. We were able to boot a rescue image and roll bac kto 4.4.0-15, which works nicely. We currently have pinning on 4.4.0-15 to prevent this problem from coming back, but would prefer to see the problem fixed.
I'll attach lspci, lshw, and dmidecode for our hardware as well, but this is happening on pretty vanilla supermicro nodes. We are able to consistently reproduce it on our hardware. It is not reproducible in EC2, only on metal. |
|
2016-08-31 16:52:43 |
Tim Gardner |
bug task added |
|
linux (Ubuntu) |
|
2016-08-31 16:53:03 |
Tim Gardner |
nominated for series |
|
Ubuntu Xenial |
|
2016-08-31 16:53:03 |
Tim Gardner |
bug task added |
|
linux (Ubuntu Xenial) |
|
2016-08-31 16:53:03 |
Tim Gardner |
bug task added |
|
linux-lts-xenial (Ubuntu Xenial) |
|
2016-08-31 16:53:03 |
Tim Gardner |
nominated for series |
|
Ubuntu Yakkety |
|
2016-08-31 16:53:03 |
Tim Gardner |
bug task added |
|
linux (Ubuntu Yakkety) |
|
2016-08-31 16:53:03 |
Tim Gardner |
bug task added |
|
linux-lts-xenial (Ubuntu Yakkety) |
|
2016-08-31 16:53:54 |
Tim Gardner |
linux (Ubuntu Xenial): status |
New |
Fix Committed |
|
2016-08-31 16:54:06 |
Tim Gardner |
linux (Ubuntu Yakkety): status |
New |
Fix Released |
|
2016-08-31 16:54:24 |
Tim Gardner |
linux-lts-xenial (Ubuntu Xenial): status |
New |
In Progress |
|
2016-08-31 16:54:47 |
Tim Gardner |
linux-lts-xenial (Ubuntu Xenial): status |
In Progress |
Invalid |
|
2016-08-31 16:55:05 |
Tim Gardner |
linux-lts-xenial (Ubuntu Yakkety): status |
In Progress |
Invalid |
|
2016-08-31 16:55:18 |
Tim Gardner |
nominated for series |
|
Ubuntu Trusty |
|
2016-08-31 16:55:18 |
Tim Gardner |
bug task added |
|
linux (Ubuntu Trusty) |
|
2016-08-31 16:55:18 |
Tim Gardner |
bug task added |
|
linux-lts-xenial (Ubuntu Trusty) |
|
2016-08-31 16:55:29 |
Tim Gardner |
linux-lts-xenial (Ubuntu Trusty): status |
New |
In Progress |
|
2016-08-31 16:55:38 |
Tim Gardner |
linux (Ubuntu Trusty): status |
New |
Invalid |
|
2016-08-31 16:55:55 |
Tim Gardner |
linux-lts-xenial (Ubuntu Yakkety): assignee |
Eric Desrochers (slashd) |
|
|
2016-08-31 16:56:39 |
Tim Gardner |
linux (Ubuntu Xenial): assignee |
|
Eric Desrochers (slashd) |
|
2016-08-31 16:57:08 |
Tim Gardner |
linux-lts-xenial (Ubuntu Trusty): assignee |
|
Eric Desrochers (slashd) |
|
2016-09-13 20:48:55 |
Eric Desrochers |
linux (Ubuntu Xenial): importance |
Undecided |
Medium |
|
2016-09-13 20:49:02 |
Eric Desrochers |
linux-lts-xenial (Ubuntu Trusty): importance |
Undecided |
Medium |
|
2016-09-26 19:17:28 |
Brad Figg |
tags |
kernel kernel-bug panic xenial |
kernel kernel-bug panic verification-needed-xenial xenial |
|
2016-09-28 14:43:08 |
Eric Desrochers |
linux-lts-xenial (Ubuntu Xenial): assignee |
|
Eric Desrochers (slashd) |
|
2016-09-28 14:43:25 |
Eric Desrochers |
linux-lts-xenial (Ubuntu Xenial): status |
Invalid |
In Progress |
|
2016-09-28 14:43:42 |
Eric Desrochers |
linux-lts-xenial (Ubuntu Xenial): importance |
Undecided |
Medium |
|
2016-09-28 17:01:44 |
Arne de Bruijn |
removed subscriber Arne de Bruijn |
|
|
|
2016-10-09 15:29:47 |
Eric Desrochers |
tags |
kernel kernel-bug panic verification-needed-xenial xenial |
kernel kernel-bug panic verification-done-xenial xenial |
|
2016-10-10 17:37:52 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2016-10-10 17:37:52 |
Launchpad Janitor |
cve linked |
|
2016-6828 |
|
2016-10-10 17:37:52 |
Launchpad Janitor |
cve linked |
|
2016-7039 |
|
2016-10-10 17:37:53 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2016-10-10 17:51:13 |
Launchpad Janitor |
linux-lts-xenial (Ubuntu Trusty): status |
In Progress |
Fix Released |
|
2016-10-10 17:51:14 |
Launchpad Janitor |
linux-lts-xenial (Ubuntu Trusty): status |
In Progress |
Fix Released |
|
2016-10-11 03:25:29 |
Eric Desrochers |
linux-lts-xenial (Ubuntu Xenial): assignee |
Eric Desrochers (slashd) |
|
|