Ubuntu 16.04 4.8.0 kernel crashing on EC2 instances at boot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
After switching to the linux-hwe kernel on 16.04.2, we started observing kernel crashes on boot on some of our EC2 instances (so far, it only seems to happen on the newer M4 types). The instance becomes unresponsive when this happens. It looks like a rapl issue - we have blacklisted intel_rapl and intel_rapl_perf for now. Here is the trace:
general protection fault: 0000 [#1] SMP
Modules linked in: intel_rapl_perf(+) i2c_piix4 input_leds parport_pc serio_raw mac_hid parport sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscs
i_tcp libiscsi_tcp libiscsi scsi_transport_
c raid1 raid0 multipath linear cirrus crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt aesni_intel fb_sy
s_fops aes_x86_64 lrw glue_helper ablk_helper cryptd drm ixgbevf psmouse pata_acpi floppy fjes
CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 4.8.0-39-generic #42~16.04.1-Ubuntu
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
task: ffff8bee465a1d80 task.stack: ffff8bee465ac000
RIP: 0010:[<
RSP: 0000:ffff8bee46
RAX: 0000000000000200 RBX: ffffffffc0728730 RCX: 0000000000000000
RDX: 0000000000000200 RSI: 0000000000000200 RDI: 0000000000000200
RBP: ffff8bee465afe30 R08: 0000000000000000 R09: 0000000000000001
R10: ffff8bee45ec2600 R11: ffff8bec41fbce00 R12: 6401b4899ff8202c
R13: 0000000000000002 R14: ffff8bee4fc0daa0 R15: 0000000000000000
FS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563b0d9f9dc8 CR3: 000000020608e000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffffffffc0728730 0000000000000002 000000000000004e ffff8bee465afe70
ffffffff95883d86 ffff8bee4fc0daa0 ffff8bee4fc0daa0 0000000000000002
ffffffff9663df60 ffff8bee464b85f0 ffff8bec47c19300 ffff8bee465afe90
Call Trace:
[<ffffffffc072
[<ffffffff9588
[<ffffffff9588
[<ffffffff958a
[<ffffffff958a
[<ffffffff958a
[<ffffffff9609
[<ffffffff958a
Code: 23 00 00 4c 8b a4 ca 10 01 00 00 48 c7 c2 80 a0 00 00 48 01 c2 e8 6e 56 50 d5 3b 05 fc 67 03 d6 7c 0e f0 4c 0f ab 2d 4d 23 00 00 <45> 89 6c 24 08 5b 31
c0 41 5c 41 5d 5d c3 0f 1f 44 00 00 55 48
RIP [<ffffffffc0728
RSP <ffff8bee465afe18>
---[ end trace cd71880c1b07dfa5 ]---
BUG: unable to handle kernel paging request at 000000007957b4e8
IP: [<ffffffff958c6
PGD 0
Oops: 0000 [#2] SMP
Modules linked in: intel_rapl_perf(+) i2c_piix4 input_leds parport_pc serio_raw mac_hid parport sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscs
i_tcp libiscsi_tcp libiscsi scsi_transport_
c raid1 raid0 multipath linear cirrus crct10dif_pclmul ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt aesni_intel fb_sy
s_fops aes_x86_64 lrw glue_helper ablk_helper cryptd drm ixgbevf psmouse pata_acpi floppy fjes
CPU: 2 PID: 20 Comm: cpuhp/2 Tainted: G D 4.8.0-39-generic #42~16.04.1-Ubuntu
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
task: ffff8bee465a1d80 task.stack: ffff8bee465ac000
RIP: 0010:[<
RSP: 0000:ffff8bee46
RAX: 0000000000000282 RBX: ffff8bee465aff10 RCX: 0000000000000000
RDX: 000000007957b4e8 RSI: 0000000000000003 RDI: ffff8bee465aff10
RBP: ffff8bee465afe70 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8bee45ec2600 R11: 000000000000022f R12: ffff8bee465aff18
R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000003
FS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000007957b4e8 CR3: 000000005fc06000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
00000001465a1d80 0000000000000000 ffff8bee465aff10 ffff8bee465aff08
0000000000000282 0000000000000000 0000000000000000 ffff8bee465afe80
ffffffff958c6e43 ffff8bee465afea8 ffffffff958c78c7 ffff8bee465a24d8
Call Trace:
[<ffffffff958c
[<ffffffff958c
[<ffffffff9588
[<ffffffff9588
[<ffffffff9609
[<ffffffff958a
Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 4c 8d 67 08 53 41 89 f7 48 83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 49 39 d4 <48> 8b 32 74 45 41 89
ce 48 8d 42 e8 4c 8d 6e e8 eb 03 49 89 d5
RIP [<ffffffff958c6
RSP <ffff8bee465afe38>
CR2: 000000007957b4e8
---[ end trace cd71880c1b07dfa6 ]---
Fixing recursive fault but reboot is needed!
# uname -a
Linux ip-10-50-244-48 4.8.0-39-generic #42~16.04.1-Ubuntu SMP Mon Feb 20 15:06:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -rd
Description: Ubuntu 16.04.2 LTS
Release: 16.04
affects: | linux-hwe (Ubuntu) → linux (Ubuntu) |
Status changed to 'Confirmed' because the bug affects multiple users.