3.13.0-88-generic:amd64 fails to boot on Dell PowerEdge R630

Bug #1592597 reported by Jill Rouleau on 2016-06-14
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Trusty
High
Unassigned

Bug Description

Following recent update to 3.13.0-88, R630 on Trusty panics on boot. Recovery boot just hangs. 3.13.0-87 was able to be booted normally once, then kernel panics on subsequent attempts. Booting is successful with the previously running kernel, 3.13.0-61.

will upload an sosreport.

Jill Rouleau (jillrouleau) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1592597

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream stable kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.13 stable kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance!

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.11-trusty/

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
status: New → Incomplete
importance: Undecided → High
tags: added: kernel-da-key
James Troup (elmo) wrote :

Joe, this is a production customer environment, so no we can't test newer kernels. I'm also not sure why a new upstream version would be relevant since this appears to be a regression in released kernels?

Changed in linux (Ubuntu):
status: Incomplete → New
Changed in linux (Ubuntu Trusty):
status: Incomplete → New
Brad Figg (brad-figg) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1592597

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Trusty):
status: New → Incomplete
Jill Rouleau (jillrouleau) wrote :

system is headless, getting a 403 with lynx (lp#997020). what logs can I provide?

Joseph Salisbury (jsalisbury) wrote :

Testing the latest upstream kernel will tell us if this bug is already fixed in mainline. If it is fixed, we can then work to identify the commit that fixes the bug.

An alternative would be to perform a kernel bisect, which will identify the commit that introduced the regression. However, that will require testing about 7 - 10 test kernels.

Is it not possible to test any test kernels on the affected system?

tags: added: needs-bisect regression-update
vik (askvictor) wrote :

I'm also getting kernel panic with this kernel on Fujitsu T730. I can successfully boot 3.13.0-87.

Joseph Salisbury (jsalisbury) wrote :

@vik, would it be possible for you test test the latest upstream kernel mentioned in comment #4?

Andy Whitcroft (apw) wrote :
Download full text (3.9 KiB)

We look to have had a hard recurrance of this issue on the system. After much messing about we appear to have gotten some real stack traces. We see similar ones on a range of kernel versions in 3.13.0-* series:

NULL pointer dereference at 0000000000000010
[ 15.316900] IP: [<ffffffff813700a1>] rb_next+0x1/0x50
[ 15.316910] PGD 0
[ 15.316913] Oops: 0000 [#1] SMP
[ 15.316917] Modules linked in: btrfs xor raid6_pq libcrc32c hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ixgbe igb aesni_intel aes_x86_64 lrw dca gf128mul glue_helper ablk_helper ptp cryptd ahci pps_core i2c_algo_bit libahci mdi
o megaraid_sas wmi
[ 15.316940] CPU: 13 PID: 883 Comm: upstart-udev-br Not tainted 3.13.0-88-generic #135-Ubuntu
[ 15.316942] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[ 15.316943] task: ffff883fcd8d1800 ti: ffff883fcd870000 task.ti: ffff883fcd870000
[ 15.316947] RIP: 0010:[<ffffffff813700a1>] [<ffffffff813700a1>] rb_next+0x1/0x50
[ 15.316948] RSP: 0018:ffff883fcd871870 EFLAGS: 00010046
[ 15.316950] RAX: 0000000000000000 RBX: ffff883fd063f600 RCX: 0000000000000000
[ 15.316951] RDX: ffff883ffecd3180 RSI: ffff883ffecd3228 RDI: 0000000000000010
[ 15.316952] RBP: ffff883fcd8718b8 R08: 0000000000000000 R09: 0000000000000001
[ 15.316953] R10: 0000000000000001 R11: 0000000000000304 R12: 0000000000000000
[ 15.316954] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 15.316956] FS: 00007f63bc298740(0000) GS:ffff883ffecc0000(0000) knlGS:0000000000000000
[ 15.316957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.316958] CR2: 0000000000000010 CR3: 0000003fd0604000 CR4: 00000000001407e0
[ 15.316959] Stack:
[ 15.316965] ffff883fcd8718b8 ffffffff810a2c72 0000000dcd8718a8 ffff883ffecd3180
[ 15.316970] ffff883fcd8d1c28 ffff883ffecd3180 000000000000000d 0000000000000007
[ 15.316974] 0000000000000007 ffff883fcd871918 ffffffff8172d712 ffff883fcd8d1800
[ 15.316975] Call Trace:
[ 15.316986] [<ffffffff810a2c72>] ? pick_next_task_fair+0x102/0x1b0
[ 15.316994] [<ffffffff8172d712>] __schedule+0x142/0x7f0
[ 15.316997] [<ffffffff8172dde9>] schedule+0x29/0x70
[ 15.317001] [<ffffffff8172d42d>] schedule_hrtimeout_range_clock+0x14d/0x170
[ 15.317007] [<ffffffff811d50df>] ? __pollwait+0x7f/0xf0
[ 15.317014] [<ffffffff8173190b>] ? _raw_spin_unlock_bh+0x1b/0x40
[ 15.317019] [<ffffffff81657cce>] ? netlink_poll+0x13e/0x1e0
[ 15.317022] [<ffffffff8172d463>] schedule_hrtimeout_range+0x13/0x20
[ 15.317026] [<ffffffff811d5199>] poll_schedule_timeout+0x49/0x70
[ 15.317029] [<ffffffff811d5b86>] do_select+0x5b6/0x780
[ 15.317037] [<ffffffff811a54ad>] ? kfree+0x11d/0x160
[ 15.317041] [<ffffffff811d5320>] ? poll_select_copy_remaining+0x130/0x130
[ 15.317044] [<ffffffff811d5320>] ? poll_select_copy_remaining+0x130/0x130
[ 15.317047] [<ffffffff811d5320>] ? poll_select_copy_remaining+0x130/0x130
[ 15.317055] [<ffffffff8115bac3>] ? __alloc_pages_nodemask+0x1a3/0xb90
[ 15.317061] [<ffffffff8161402a>] ? sock_recvmsg+0x9a/0xd0
[ 15.317070] [<ffffffff811524f3>] ? unlock_page+0x23/0x30
[ 15.317078] [<ffffffff811794b9>] ? __do_fault+0x429/0x530
[ 1...

Read more...

Stefan Bader (smb) wrote :

Moving to a Xenial kernel seemed to cure some affected systems, so we would assume it being fixed released from there.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Stefan Bader (smb) wrote :
Download full text (3.7 KiB)

There also was some case of a different panic which looked like below:

[ 16.191583] BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
[ 16.201096] IP: [<ffffffff810a69f7>] check_preempt_wakeup+0x137/0x270
[ 16.208349] PGD 0
[ 16.211201] Oops: 0000 [#1] SMP
[ 16.214877] Modules linked in: btrfs xor raid6_pq libcrc32c crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd igb ixgbe i2c_algo_bit ahci libahci dca ptp pps_core mdio megaraid_sas wmi
[ 16.238385] CPU: 0 PID: 1 Comm: init Not tainted 3.13.0-88-generic #135-Ubuntu
[ 16.246509] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[ 16.254918] task: ffff883fd26b8000 ti: ffff881fd2aac000 task.ti: ffff881fd2aac000
[ 16.263327] RIP: 0010:[<ffffffff810a69f7>] [<ffffffff810a69f7>] check_preempt_wakeup+0x137/0x270
[ 16.273317] RSP: 0018:ffff881fd2aad9e8 EFLAGS: 00010087
[ 16.279280] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000
[ 16.287281] RDX: 0000000000000002 RSI: ffff881fce08b000 RDI: ffff881fff413200
[ 16.295283] RBP: ffff881fd2aada20 R08: 0000000000000000 R09: 0000000000000000
[ 16.303284] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 16.311285] R13: ffff883fd26b8000 R14: ffff881fff413180 R15: 0000000000000001
[ 16.319292] FS: 00007fce8b255840(0000) GS:ffff881fff400000(0000) knlGS:0000000000000000
[ 16.328379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.334827] CR2: 0000000000000148 CR3: 0000001fd0fb7000 CR4: 00000000001407f0
[ 16.342833] Stack:
[ 16.345099] ffff881fd2aada10 ffffffff8109fde5 ffff881fff413180 ffff881fce08b000
[ 16.353487] ffff881fff413180 0000000000000001 0000000000000000 ffff881fd2aada38
[ 16.361884] ffffffff8109aa25 ffff881fce08b000 ffff881fd2aada60 ffffffff8109aa59
[ 16.370276] Call Trace:
[ 16.373045] [<ffffffff8109fde5>] ? sched_clock_cpu+0xb5/0x100
[ 16.379606] [<ffffffff8109aa25>] check_preempt_curr+0x85/0xa0
[ 16.386154] [<ffffffff8109aa59>] ttwu_do_wakeup+0x19/0xf0
[ 16.392317] [<ffffffff8109abdd>] ttwu_do_activate.constprop.75+0x5d/0x70
[ 16.399935] [<ffffffff8109d172>] try_to_wake_up+0x1d2/0x2c0
[ 16.406287] [<ffffffff8109d2b2>] default_wake_function+0x12/0x20
[ 16.413135] [<ffffffff811d5386>] pollwake+0x66/0x70
[ 16.418715] [<ffffffff8109d2a0>] ? wake_up_state+0x20/0x20
[ 16.424982] [<ffffffff810ad398>] __wake_up_common+0x58/0x90
[ 16.431335] [<ffffffff810ad945>] __wake_up_sync_key+0x45/0x60
[ 16.437890] [<ffffffff816165fa>] sock_def_readable+0x3a/0x70
[ 16.444346] [<ffffffff816c42e1>] unix_stream_sendmsg+0x1f1/0x400
[ 16.451189] [<ffffffff816139ab>] sock_sendmsg+0x8b/0xc0
[ 16.457163] [<ffffffff811524f3>] ? unlock_page+0x23/0x30
[ 16.463237] [<ffffffff8117afd2>] ? do_wp_page+0x392/0x7b0
[ 16.469399] [<ffffffff81613db9>] ___sys_sendmsg+0x389/0x3a0
[ 16.475763] [<ffffffff81735b44>] ? __do_page_fault+0x204/0x560
[ 16.482411] [<ffffffff811d7eec>] ? __dentry_kill+0x15c/0x1e0
[ 16.488866] [<ffffffff811d801d>] ? dput+0xad/0x180
[ 16.494358] [<ffffffff811e13e4>] ? mntput+0x24/0x40
[ 16.499942] [<ffffffff811c2cfe>] ? __...

Read more...

Andy Whitcroft (apw) wrote :

This second trace looks related to the fix below which was in v3.16 and would support at least that issue being fixed in lts-x kernel:

  commit eb7a59b2c888c2518ba2c9d0020343ca71aa9dee
  Author: Michael wang <email address hidden>
  Date: Thu Feb 20 11:14:53 2014 +0800

    sched/fair: Reset se-depth when task switched to FAIR

Andy Whitcroft (apw) on 2016-07-01
Changed in linux (Ubuntu Trusty):
status: Incomplete → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers