Activity log for bug #1687512

Date Who What changed Old value New value Message
2017-05-02 01:01:32 Daniel Axtens bug added bug
2017-05-02 15:38:40 Joseph Salisbury tags kernel-da-key
2017-05-02 15:38:51 Joseph Salisbury linux (Ubuntu): importance Undecided High
2017-05-02 15:39:02 Joseph Salisbury nominated for series Ubuntu Xenial
2017-05-02 15:39:02 Joseph Salisbury bug task added linux (Ubuntu Xenial)
2017-05-02 15:39:13 Joseph Salisbury linux (Ubuntu Xenial): status New Triaged
2017-05-02 15:39:16 Joseph Salisbury linux (Ubuntu Xenial): importance Undecided High
2017-05-02 15:39:25 Joseph Salisbury linux (Ubuntu): status Confirmed Triaged
2017-05-03 23:49:02 Daniel Axtens description We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: 0000 [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: ffff8803ee67c000 [24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000 [24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000 [24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000 [24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000 [24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178 [24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) knlGS:0000000000000000 [24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0 [24334.673851] Stack: [24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 ffff880036529800 [24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 ffff8803ffd16e70 [24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 0000000000000000 [24334.700172] Call Trace: [24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0 [24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980 [24334.714349] [<ffffffff81804585>] schedule+0x35/0x80 [24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10 [24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350 [24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.771473] RSP <ffff8803ee67fdd8> [24334.775077] CR2: 0000000000000050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: 0000 [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: ffff8800bbb10000 [155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086 [155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: ffff8800bb777400 [155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 0000000000000000 [155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: ffff8803ed29aa00 [155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 0000000000000000 [155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 0000000000000001 [155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000 [155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 00000000001406e0 [155852.207967] Stack: [155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 0000000000000000 [155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 ffff8803ffc96e70 [155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 0000000000000001 [155852.234506] Call Trace: [155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10 [155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0 [155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980 [155852.254491] [<ffffffff81804585>] schedule+0x35/0x80 [155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130 [155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180 [155852.272712] [<ffffffff81807270>] ? schedule_hrtimeout_range_clock+0xa0/0x130 [155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20 [155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310 [155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80 [155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0 [155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75 [155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.345270] RSP <ffff8800bbb13ce0> [155852.348958] CR2: 0000000000000050 [155852.353086] ---[ end trace 8ce693b2314611c4 ]--- Similar issues have been reported in the community for kernels based on 4.4: https://github.com/kubernetes/kops/issues/874 These panics occur in the CFS code when a next buddy is set on an entity that is not on a run-queue. This causes pick_next_entity to end up with curr == left == NULL, which means it will call into wakeup_preempt_entity() with a valid next buddy and a NULL left, which it will try to dereference, causing a panic. This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to catch when a sched_entity in the hierarchy was not on_rq, as per https://marc.info/?l=linux-kernel&m=146651668921468&w=2 The stack-trace for the WARN is quite involved: Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here ]------------ Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at /build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 set_next_buddy+0x55/0x70() Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Apr 25 14:14:48 (none) kernel: [ 5339.764647] 0000000000000086 00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3 Apr 25 14:14:48 (none) kernel: [ 5339.764650] 0000000000000000 ffffffff81cbae20 ffff8803ed947640 ffffffff81081302 Apr 25 14:14:48 (none) kernel: [ 5339.764652] ffff8800bb5fc800 ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400 Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace: Apr 25 14:14:48 (none) kernel: [ 5339.764665] [<ffffffff813f83c3>] dump_stack+0x63/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764669] [<ffffffff81081302>] warn_slowpath_common+0x82/0xc0 Apr 25 14:14:48 (none) kernel: [ 5339.764672] [<ffffffff8108144a>] warn_slowpath_null+0x1a/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764674] [<ffffffff810b52b5>] set_next_buddy+0x55/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764676] [<ffffffff810b59a4>] check_preempt_wakeup+0x244/0x250 Apr 25 14:14:48 (none) kernel: [ 5339.764679] [<ffffffff810ab580>] check_preempt_curr+0x80/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764682] [<ffffffff810b42eb>] attach_task+0x4b/0x60 Apr 25 14:14:48 (none) kernel: [ 5339.764685] [<ffffffff810be067>] load_balance+0x5b7/0x980 Apr 25 14:14:48 (none) kernel: [ 5339.764688] [<ffffffff810be6e1>] pick_next_task_fair+0x2b1/0x4f0 Apr 25 14:14:48 (none) kernel: [ 5339.764692] [<ffffffff81837c5f>] __schedule+0x15f/0xa30 Apr 25 14:14:48 (none) kernel: [ 5339.764694] [<ffffffff81838565>] schedule+0x35/0x80 Apr 25 14:14:48 (none) kernel: [ 5339.764697] [<ffffffff8183ba85>] schedule_hrtimeout_range_clock+0xc5/0x1b0 Apr 25 14:14:48 (none) kernel: [ 5339.764700] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764703] [<ffffffff8183ba79>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0 Apr 25 14:14:48 (none) kernel: [ 5339.764705] [<ffffffff8183bb83>] schedule_hrtimeout_range+0x13/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764709] [<ffffffff81223914>] poll_schedule_timeout+0x44/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764711] [<ffffffff81224407>] do_select+0x727/0x810 Apr 25 14:14:48 (none) kernel: [ 5339.764715] [<ffffffff811fb932>] ? page_counter_uncharge+0x22/0x40 Apr 25 14:14:48 (none) kernel: [ 5339.764718] [<ffffffff811fdb1c>] ? drain_stock.isra.33+0x6c/0xa0 Apr 25 14:14:48 (none) kernel: [ 5339.764720] [<ffffffff810b5349>] ? update_curr+0x79/0x160 Apr 25 14:14:48 (none) kernel: [ 5339.764722] [<ffffffff810b550c>] ? update_cfs_shares+0xbc/0x100 Apr 25 14:14:48 (none) kernel: [ 5339.764724] [<ffffffff810b742b>] ? dequeue_entity+0x41b/0xa80 Apr 25 14:14:48 (none) kernel: [ 5339.764729] [<ffffffff810719f7>] ? gup_pud_range+0x127/0x220 Apr 25 14:14:48 (none) kernel: [ 5339.764731] [<ffffffff810baa9c>] ? set_next_entity+0x9c/0xb0 Apr 25 14:14:48 (none) kernel: [ 5339.764736] [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0 Apr 25 14:14:48 (none) kernel: [ 5339.764740] [<ffffffff81401304>] ? timerqueue_del+0x24/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764742] [<ffffffff810efa3c>] ? __remove_hrtimer+0x3c/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764744] [<ffffffff810efb61>] ? hrtimer_try_to_cancel+0xd1/0x130 Apr 25 14:14:48 (none) kernel: [ 5339.764746] [<ffffffff810efbd9>] ? hrtimer_cancel+0x19/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764751] [<ffffffff81101166>] ? futex_wait+0x206/0x280 Apr 25 14:14:48 (none) kernel: [ 5339.764753] [<ffffffff810ab5a9>] ? ttwu_do_wakeup+0x19/0xe0 Apr 25 14:14:48 (none) kernel: [ 5339.764756] [<ffffffff812246bf>] core_sys_select+0x1cf/0x2f0 Apr 25 14:14:48 (none) kernel: [ 5339.764758] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764762] [<ffffffff81128447>] ? audit_filter_rules+0x217/0xe30 Apr 25 14:14:48 (none) kernel: [ 5339.764764] [<ffffffff81103860>] ? do_futex+0x120/0x540 Apr 25 14:14:48 (none) kernel: [ 5339.764768] [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764772] [<ffffffff810f53aa>] ? ktime_get_ts64+0x4a/0xf0 Apr 25 14:14:48 (none) kernel: [ 5339.764774] [<ffffffff8122489a>] SyS_select+0xba/0x110 Apr 25 14:14:48 (none) kernel: [ 5339.764777] [<ffffffff8183c672>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 ]--- Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series. We are just waiting on final confirmation that the fix works before beginning the SRU process. SRU Justification ----------------- [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --------------- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: 0000 [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: ffff8803ee67c000 [24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000 [24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000 [24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000 [24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000 [24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178 [24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) knlGS:0000000000000000 [24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0 [24334.673851] Stack: [24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 ffff880036529800 [24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 ffff8803ffd16e70 [24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 0000000000000000 [24334.700172] Call Trace: [24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0 [24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980 [24334.714349] [<ffffffff81804585>] schedule+0x35/0x80 [24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10 [24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350 [24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [24334.771473] RSP <ffff8803ee67fdd8> [24334.775077] CR2: 0000000000000050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: 0000 [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: ffff8800bbb10000 [155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086 [155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: ffff8800bb777400 [155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 0000000000000000 [155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: ffff8803ed29aa00 [155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 0000000000000000 [155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 0000000000000001 [155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000 [155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 00000000001406e0 [155852.207967] Stack: [155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 0000000000000000 [155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 ffff8803ffc96e70 [155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 0000000000000001 [155852.234506] Call Trace: [155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10 [155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0 [155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980 [155852.254491] [<ffffffff81804585>] schedule+0x35/0x80 [155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130 [155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180 [155852.272712] [<ffffffff81807270>] ? schedule_hrtimeout_range_clock+0xa0/0x130 [155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20 [155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310 [155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80 [155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0 [155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75 [155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160 [155852.345270] RSP <ffff8800bbb13ce0> [155852.348958] CR2: 0000000000000050 [155852.353086] ---[ end trace 8ce693b2314611c4 ]--- Similar issues have been reported in the community for kernels based on 4.4: https://github.com/kubernetes/kops/issues/874 These panics occur in the CFS code when a next buddy is set on an entity that is not on a run-queue. This causes pick_next_entity to end up with curr == left == NULL, which means it will call into wakeup_preempt_entity() with a valid next buddy and a NULL left, which it will try to dereference, causing a panic. This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to catch when a sched_entity in the hierarchy was not on_rq, as per https://marc.info/?l=linux-kernel&m=146651668921468&w=2 The stack-trace for the WARN is quite involved: Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here ]------------ Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at /build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 set_next_buddy+0x55/0x70() Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Apr 25 14:14:48 (none) kernel: [ 5339.764647] 0000000000000086 00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3 Apr 25 14:14:48 (none) kernel: [ 5339.764650] 0000000000000000 ffffffff81cbae20 ffff8803ed947640 ffffffff81081302 Apr 25 14:14:48 (none) kernel: [ 5339.764652] ffff8800bb5fc800 ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400 Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace: Apr 25 14:14:48 (none) kernel: [ 5339.764665] [<ffffffff813f83c3>] dump_stack+0x63/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764669] [<ffffffff81081302>] warn_slowpath_common+0x82/0xc0 Apr 25 14:14:48 (none) kernel: [ 5339.764672] [<ffffffff8108144a>] warn_slowpath_null+0x1a/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764674] [<ffffffff810b52b5>] set_next_buddy+0x55/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764676] [<ffffffff810b59a4>] check_preempt_wakeup+0x244/0x250 Apr 25 14:14:48 (none) kernel: [ 5339.764679] [<ffffffff810ab580>] check_preempt_curr+0x80/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764682] [<ffffffff810b42eb>] attach_task+0x4b/0x60 Apr 25 14:14:48 (none) kernel: [ 5339.764685] [<ffffffff810be067>] load_balance+0x5b7/0x980 Apr 25 14:14:48 (none) kernel: [ 5339.764688] [<ffffffff810be6e1>] pick_next_task_fair+0x2b1/0x4f0 Apr 25 14:14:48 (none) kernel: [ 5339.764692] [<ffffffff81837c5f>] __schedule+0x15f/0xa30 Apr 25 14:14:48 (none) kernel: [ 5339.764694] [<ffffffff81838565>] schedule+0x35/0x80 Apr 25 14:14:48 (none) kernel: [ 5339.764697] [<ffffffff8183ba85>] schedule_hrtimeout_range_clock+0xc5/0x1b0 Apr 25 14:14:48 (none) kernel: [ 5339.764700] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764703] [<ffffffff8183ba79>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0 Apr 25 14:14:48 (none) kernel: [ 5339.764705] [<ffffffff8183bb83>] schedule_hrtimeout_range+0x13/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764709] [<ffffffff81223914>] poll_schedule_timeout+0x44/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764711] [<ffffffff81224407>] do_select+0x727/0x810 Apr 25 14:14:48 (none) kernel: [ 5339.764715] [<ffffffff811fb932>] ? page_counter_uncharge+0x22/0x40 Apr 25 14:14:48 (none) kernel: [ 5339.764718] [<ffffffff811fdb1c>] ? drain_stock.isra.33+0x6c/0xa0 Apr 25 14:14:48 (none) kernel: [ 5339.764720] [<ffffffff810b5349>] ? update_curr+0x79/0x160 Apr 25 14:14:48 (none) kernel: [ 5339.764722] [<ffffffff810b550c>] ? update_cfs_shares+0xbc/0x100 Apr 25 14:14:48 (none) kernel: [ 5339.764724] [<ffffffff810b742b>] ? dequeue_entity+0x41b/0xa80 Apr 25 14:14:48 (none) kernel: [ 5339.764729] [<ffffffff810719f7>] ? gup_pud_range+0x127/0x220 Apr 25 14:14:48 (none) kernel: [ 5339.764731] [<ffffffff810baa9c>] ? set_next_entity+0x9c/0xb0 Apr 25 14:14:48 (none) kernel: [ 5339.764736] [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0 Apr 25 14:14:48 (none) kernel: [ 5339.764740] [<ffffffff81401304>] ? timerqueue_del+0x24/0x70 Apr 25 14:14:48 (none) kernel: [ 5339.764742] [<ffffffff810efa3c>] ? __remove_hrtimer+0x3c/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764744] [<ffffffff810efb61>] ? hrtimer_try_to_cancel+0xd1/0x130 Apr 25 14:14:48 (none) kernel: [ 5339.764746] [<ffffffff810efbd9>] ? hrtimer_cancel+0x19/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764751] [<ffffffff81101166>] ? futex_wait+0x206/0x280 Apr 25 14:14:48 (none) kernel: [ 5339.764753] [<ffffffff810ab5a9>] ? ttwu_do_wakeup+0x19/0xe0 Apr 25 14:14:48 (none) kernel: [ 5339.764756] [<ffffffff812246bf>] core_sys_select+0x1cf/0x2f0 Apr 25 14:14:48 (none) kernel: [ 5339.764758] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90 Apr 25 14:14:48 (none) kernel: [ 5339.764762] [<ffffffff81128447>] ? audit_filter_rules+0x217/0xe30 Apr 25 14:14:48 (none) kernel: [ 5339.764764] [<ffffffff81103860>] ? do_futex+0x120/0x540 Apr 25 14:14:48 (none) kernel: [ 5339.764768] [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 25 14:14:48 (none) kernel: [ 5339.764772] [<ffffffff810f53aa>] ? ktime_get_ts64+0x4a/0xf0 Apr 25 14:14:48 (none) kernel: [ 5339.764774] [<ffffffff8122489a>] SyS_select+0xba/0x110 Apr 25 14:14:48 (none) kernel: [ 5339.764777] [<ffffffff8183c672>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 ]--- Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series.
2017-05-17 12:10:25 Kleber Sacilotto de Souza linux (Ubuntu Xenial): status Triaged Fix Committed
2017-05-25 23:09:43 Jay Vosburgh bug added subscriber Jay Vosburgh
2017-05-26 02:56:19 Thadeu Lima de Souza Cascardo tags kernel-da-key kernel-da-key verification-needed-xenial
2017-05-26 20:55:24 Jay Vosburgh tags kernel-da-key verification-needed-xenial kernel-da-key verification-done-xenial
2017-06-06 14:51:25 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2017-06-06 14:51:25 Launchpad Janitor cve linked 2017-0605
2017-08-21 01:01:35 Daniel Axtens linux (Ubuntu): status Triaged Fix Released