We are using this AMI for our 9 exactly same d2.2xlarge instances deployed in us-east-1:
ubuntu/images/hvm/ubuntu-trusty-14.04-amd64-server-20160314-2016-03-29_1459261902 (ami-93cfd9f9)
All of them are running fine but last week we experiences two reboots with no evident reason.
We asked Amazon support for help and they provided following kernel panic messages which cause restarts:
Dear Sirs,
We are using this AMI for our 9 exactly same d2.2xlarge instances deployed in us-east-1: images/ hvm/ubuntu- trusty- 14.04-amd64- server- 20160314- 2016-03- 29_1459261902 (ami-93cfd9f9)
ubuntu/
All of them are running fine but last week we experiences two reboots with no evident reason.
We asked Amazon support for help and they provided following kernel panic messages which cause restarts:
------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----- d61>] rb_next+0x1/0x50 ffffffff81370d6 1>] [<ffffffff81370 d61>] rb_next+0x1/0x50 46fe20 EFLAGS: 00010046 0(0000) GS:ffff880f4fc2 0000(0000) knlGS:000000000 0000000 f52>] ? pick_next_ task_fair+ 0x102/0x1b0 48f>] __schedule+ 0x13f/0x7f0 089>] schedule_ preempt_ disabled+ 0x29/0x70 058>] cpu_startup_ entry+0x268/ 0x2b0 88d>] start_secondary +0x21d/ 0x2d0 d61>] rb_next+0x1/0x50 ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -----
[4777272.889262] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[4777272.893249] IP: [<ffffffff81370
[4777272.893249] PGD a1d20b067 PUD 14dfb6067 PMD 0
[4777272.893249] Oops: 0000 [#1] SMP
[4777272.893249] Modules linked in: btrfs raid6_pq xor ufs msdos xfs libcrc32c iptable_filter ip_tables x_tables dm_crypt syscopyarea sysfillrect sysimgblt fb_sys_fops serio_raw isofs crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse ixgbevf floppy
[4777272.893249] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 3.13.0-85-generic #129-Ubuntu
[4777272.893249] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[4777272.893249] task: ffff880efc478000 ti: ffff880efc46e000 task.ti: ffff880efc46e000
[4777272.893249] RIP: 0010:[<
[4777272.893249] RSP: 0018:ffff880efc
[4777272.893249] RAX: 0000000000000000 RBX: ffff880efae26a00 RCX: 0000000000003d8a
[4777272.893249] RDX: 00000000148106c1 RSI: ffff880efae26e00 RDI: 0000000000000010
[4777272.893249] RBP: ffff880efc46fe68 R08: 000000000091c6d0 R09: 0000000000000000
[4777272.893249] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[4777272.893249] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffff4b48ce
[4777272.893249] FS: 000000000000000
[4777272.893249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4777272.893249] CR2: 0000000000000010 CR3: 000000014dad5000 CR4: 00000000001406e0
[4777272.893249] Stack:
[4777272.893249] ffff880efc46fe68 ffffffff810a2f52 000000000000d160 ffff880f4fc33180
[4777272.893249] ffff880efc478430 ffff880f4fc33180 0000000000000001 0000000000000000
[4777272.893249] ffff880efc46ffd8 ffff880efc46fec8 ffffffff8172f48f ffff880efc478000
[4777272.893249] Call Trace:
[4777272.893249] [<ffffffff810a2
[4777272.893249] [<ffffffff8172f
[4777272.893249] [<ffffffff81730
[4777272.893249] [<ffffffff810c2
[4777272.893249] [<ffffffff81042
[4777272.893249] Code: e5 48 85 c0 75 07 eb 19 66 90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44 00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb
[4777272.893249] RIP [<ffffffff81370
[4777272.893249] RSP <ffff880efc46fe20>
[4777272.893249] CR2: 0000000000000010
[4777272.893249] ---[ end trace c9f72935ef221890 ]---
[4777272.893249] Kernel panic - not syncing: Attempted to kill the idle task!
[4777272.893249] Shutting down cpus with NMI
-------
[72906.872064] IP: [<ffffffff81370 721>] rb_next+0x1/0x50 ffffffff8137072 1>] [<ffffffff81370 721>] rb_next+0x1/0x50 485e20 EFLAGS: 00010046 0(0000) GS:ffff880f4fc8 0000(0000) knlGS:000000000 0000000 f32>] ? pick_next_ task_fair+ 0x102/0x1b0 17f>] __schedule+ 0x13f/0x7f0 d79>] schedule_ preempt_ disabled+ 0x29/0x70 008>] cpu_startup_ entry+0x268/ 0x2b0 8bd>] start_secondary +0x21d/ 0x2d0 721>] rb_next+0x1/0x50 ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -----
[72906.872064] PGD 99f26f067 PUD ea4964067 PMD 0
[72906.872064] Oops: 0000 [#1] SMP
[72906.872064] Modules linked in: iptable_filter ip_tables x_tables dm_crypt syscopyarea sysfillrect sysimgblt fb_sys_fops serio_raw isofs xfs libcrc32c crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy ixgbevf
[72906.872064] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.13.0-86-generic #131-Ubuntu
[72906.872064] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
[72906.872064] task: ffff880efc47c800 ti: ffff880efc484000 task.ti: ffff880efc484000
[72906.872064] RIP: 0010:[<
[72906.872064] RSP: 0018:ffff880efc
[72906.872064] RAX: 0000000000000000 RBX: ffff880eefdb8600 RCX: 0000000000005bfc
[72906.872064] RDX: 000000000309e7aa RSI: ffff880eefdb9600 RDI: 0000000000000010
[72906.872064] RBP: ffff880efc485e68 R08: 000000000002398c R09: 0000000000000000
[72906.872064] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000000
[72906.872064] R13: 0000000000000000 R14: 0000000000000000 R15: fffffffffff02262
[72906.872064] FS: 000000000000000
[72906.872064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72906.872064] CR2: 0000000000000010 CR3: 0000000b0488a000 CR4: 00000000001406e0
[72906.872064] Stack:
[72906.872064] ffff880efc485e68 ffffffff810a2f32 000000000000d160 ffff880f4fc93180
[72906.872064] ffff880efc47cc30 ffff880f4fc93180 0000000000000004 0000000000000000
[72906.872064] ffff880efc485fd8 ffff880efc485ec8 ffffffff8172e17f ffff880efc47c800
[72906.872064] Call Trace:
[72906.872064] [<ffffffff810a2
[72906.872064] [<ffffffff8172e
[72906.872064] [<ffffffff8172e
[72906.872064] [<ffffffff810c2
[72906.872064] [<ffffffff81042
[72906.872064] Code: e5 48 85 c0 75 07 eb 19 66 90 48 89 d0 48 8b 50 10 48 85 d2 75 f4 48 8b 50 08 48 85 d2 75 eb 5d c3 31 c0 5d c3 0f 1f 44 00 00 55 <48> 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb
[72906.872064] RIP [<ffffffff81370
[72906.872064] RSP <ffff880efc485e20>
[72906.872064] CR2: 0000000000000010
[72906.872064] ---[ end trace e35916ef54f0c31a ]---
[72906.872064] Kernel panic - not syncing: Attempted to kill the idle task!
[72906.872064] Shutting down cpus with NMI
-------
The instances are using mentioned AMI + aptitude safe-upgrade.
Thank You for any help,
Ivan