3.13.0-155.205 Kernel Panic - divide by zero

Bug #1787258 reported by Ryan Smith
48
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Critical
Unassigned
Trusty
Fix Released
Critical
Tyler Hicks

Bug Description

[Impact]

Booting the 3.13.0-155.205 generic kernel on a m3 AWS ec2 instance results in a kernel panic during boot.

[Test Case]

Boot with the 3.13.0-155.205 kernel on an m3 instance and verify that it panics on boot.

Boot a patched kernel on an m3 instance and verify that it boots, without a panic, and that the following warning is present in the kernel logs:

  smpboot: x86_max_cores == zero !?!?

[Regression Potential]

The only potential for regressions is in systems that panic while bootnig.

[Original Report]

We have updated our 14.04 aws ec2 instances from 3.13.0-153.204 to 3.13.0-155.205, and upon reboot they all kernel panic. full log attached.

[ 0.064081] FEATURE SPEC_CTRL Not Present
[ 0.068730] mce: CPU supports 2 MCE banks
[ 0.072027] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0
[ 0.072027] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0
[ 0.080004] Spectre V2 mitigation: Mitigation: Full generic retpoline
[ 0.084004] Spectre V2 mitigation: Speculation control IBPB not-supported IBRS not-supported
[ 0.088005] Speculative Store Bypass: Vulnerable
[ 0.092402] Freeing SMP alternatives memory: 32K (ffffffff81e7a000 - ffffffff81e82000)
[ 0.104581] ACPI: Core revision 20131115
[ 0.111088] ACPI: All ACPI Tables successfully acquired
[ 0.114991] ftrace: allocating 28746 entries in 113 pages
[ 0.160066] divide error: 0000 [#1] SMP
[ 0.163922] Modules linked in:
[ 0.164000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-155-generic #205-Ubuntu
[ 0.164000] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 0.164000] task: ffff8801e4828000 ti: ffff8801e4826000 task.ti: ffff8801e4826000
[ 0.164000] RIP: 0010:[<ffffffff81d4b9f2>] [<ffffffff81d4b9f2>] smp_store_boot_cpu_info+0x58/0x191
[ 0.164000] RSP: 0000:ffff8801e4827e98 EFLAGS: 00010286
[ 0.164000] RAX: 000000000000000e RBX: ffffffff81d18980 RCX: 0000000000000000
[ 0.164000] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffff8801efc13380
[ 0.164000] RBP: ffff8801e4827ec0 R08: ffffffff81d18988 R09: 0000000000000004
[ 0.164000] R10: ffffffff8180b6c0 R11: 0001f8ecf7bca282 R12: 0000000000013280
[ 0.164000] R13: 00000000ffffffff R14: 0000000000000100 R15: 000000000000d088
[ 0.164000] FS: 0000000000000000(0000) GS:ffff8801efc00000(0000) knlGS:0000000000000000
[ 0.164000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.164000] CR2: ffff8801effff000 CR3: 0000000001c0e000 CR4: 0000000000160670
[ 0.164000] Stack:
[ 0.164000] ffffffff81d18980 0000000000013280 0000000000000246 0000000000000100
[ 0.164000] 0000000000000000 ffff8801e4827ef0 ffffffff81d4bb82 ffffffff81e5df18
[ 0.164000] ffff8801e4828650 0000000000000246 0000000000000001 ffff8801e4827f00
[ 0.164000] Call Trace:
[ 0.164000] [<ffffffff81d4bb82>] native_smp_prepare_cpus+0x57/0x3e0
[ 0.164000] [<ffffffff81d404e1>] xen_hvm_smp_prepare_cpus+0x9/0x2e
[ 0.164000] [<ffffffff81d3a01b>] kernel_init_freeable+0xa7/0x1eb
[ 0.164000] [<ffffffff81727500>] ? rest_init+0x80/0x80
[ 0.164000] [<ffffffff8172750e>] kernel_init+0xe/0x130
[ 0.164000] [<ffffffff8174a88e>] ret_from_fork+0x6e/0xa0
[ 0.164000] [<ffffffff81727500>] ? rest_init+0x80/0x80
[ 0.164000] Code: 48 89 c7 41 83 cd ff 41 54 53 f3 a5 66 c7 80 da 00 00 00 00 00 be d0 00 00 00 0f b7 0d 20 a9 fc ff 8b 05 e2 c5 28 00 8d 44 01 ff <f7> f1 31 d2 89 05 c8 c2 fc ff 8d 81 ff 7f 00 00 f7 f1 89 c3 89
[ 0.164000] RIP [<ffffffff81d4b9f2>] smp_store_boot_cpu_info+0x58/0x191
[ 0.164000] RSP <ffff8801e4827e98>
[ 0.324006] ---[ end trace 8671c9f8a4dc811d ]---
[ 0.328017] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.328017]

Tags: cscc trusty
Revision history for this message
Ryan Smith (homebrewsky) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1787258

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Ryan Smith (homebrewsky) wrote :

I am unable to run 'apport-collect' due to the kernel panic.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Ryan Smith (homebrewsky)
description: updated
Revision history for this message
Matt Wilson (msw-amazon) wrote :

What instance type saw this kernel panic?

Revision history for this message
Dave Compton (sircompo) wrote :

One of my servers died with a Kernel Panic on reboot after the update to 3.13.0-155-generic.
It was an m3.large on AMI ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 (ami-1711732d).

Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Trusty):
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Ryan Smith (homebrewsky) wrote :

m3.large for me as well.

Tyler Hicks (tyhicks)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Tyler Hicks (tyhicks)
Revision history for this message
Tyler Hicks (tyhicks) wrote :

In comparing the Xenial and Trusty backports for L1TF, I noticed that Trusty is missing this patch:

  https://git.kernel.org/linus/56402d63eefe22179f7311a51ff2094731420406

I've cherry-picked the commit and built a test kernel:

  https://people.canonical.com/~tyhicks/lp1787258.1/

Please give it a shot (I will myself, shortly) and report back results. Thanks!

Revision history for this message
Dave Compton (sircompo) wrote :

Boots OK after changing instance type to m4.large.
Poorly tested Spectre fix?

Revision history for this message
Ryan Smith (homebrewsky) wrote :

I feel this bug is not a duplicate of #1787127. bug #1787127 is not a kernel panic due to division by zero.

Revision history for this message
Tyler Hicks (tyhicks) wrote :

Agreed, it is not a dupe of bug #1787127.

Revision history for this message
Robert C Jennings (rcj) wrote :

@tyhicks, I've tested your kernel from comment #7.

1. Launch 2 VMs in us-west-2 with ami-4218403a (20180722, the serial prior to the latest)
2. Upgrade the first VM to the kernel in -updates, reboot, and observe the panic in the console log
3. On the 2nd VM, install the linux-image and linux-headers packages from the link in comment #7 and reboot SUCCESS
  * Observed "[ 0.156060] smpboot: x86_max_cores == zero !?!?" in dmesg
  * I rebooted a few times just to satisfy myself.
4. Ensure this VM does panic by removing tyhicks' kernel, upgrading the stock kernel, and rebooting. VM console shows panic.

Tyler Hicks (tyhicks)
description: updated
description: updated
Revision history for this message
mig5 (mig5) wrote :

This affected me on 3 t2.medium EC2 instances in eu-west-1.

A fourth machine, updated the previous day, but to same kernel 3.13.0-155.205 and still Ubuntu 14.04, was fine, somehow unaffected.

Stefan Bader (smb)
Changed in linux (Ubuntu Trusty):
status: Confirmed → Fix Committed
Revision history for this message
Ryan Smith (homebrewsky) wrote :

any ETA on releasing the fix into the wild?

information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Brad Figg (brad-figg) wrote :

We're actively working on this problem. I believe a fixed kernel will be out tomorrow.

Revision history for this message
Ryan Smith (homebrewsky) wrote :

Great! Thanks for the quick fix!

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Ryan Smith (homebrewsky) wrote :

I checked the https://packages.ubuntu.com/trusty-updates/kernel/, and don't see a new kernel image released yet. Any ETA on it being available?

Revision history for this message
Brad Figg (brad-figg) wrote :

We have just released a Trusty kernel (3.13.0-156.206) which should address this issue.

Revision history for this message
Pascal Ouellet (pas.ouellet) wrote :

Tested on a few AWS instances that wouldn't boot on 3.13.0-155.205 and they are now back to normal after upgrading to 3.13.0-156.206.

Thanks for the quick fix!

Tyler Hicks (tyhicks)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.