Activity log for bug #1763454

Date Who What changed Old value New value Message
2018-04-12 16:55:53 schu bug added bug
2018-04-12 16:58:06 Alban Crequy bug added subscriber Alban Crequy
2018-04-12 17:00:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-04-12 17:00:06 Ubuntu Kernel Bot tags xenial
2018-04-12 17:46:11 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2018-04-12 17:46:18 Joseph Salisbury nominated for series Ubuntu Xenial
2018-04-12 17:46:18 Joseph Salisbury bug task added linux (Ubuntu Xenial)
2018-04-12 17:46:26 Joseph Salisbury linux (Ubuntu Xenial): status New Incomplete
2018-04-12 17:46:29 Joseph Salisbury linux (Ubuntu Xenial): importance Undecided Medium
2018-04-12 17:47:15 Joseph Salisbury linux (Ubuntu Xenial): status Incomplete Triaged
2018-04-12 17:47:18 Joseph Salisbury linux (Ubuntu): status Incomplete Triaged
2018-04-12 18:12:25 Alban Crequy bug added subscriber Seth Forshee
2018-04-12 19:33:45 Seth Forshee linux (Ubuntu Xenial): assignee Seth Forshee (sforshee)
2018-04-13 13:46:31 Seth Forshee description Hey, we are currently debugging an issue with Scope [1] where the initialization of the used tcptracer-bpf [2] leads to a kernel oops at the first call of `bpf_map_lookup_elem`. The OS is Ubuntu Xenial with kernel version `Ubuntu 4.4.0-119.143-generic 4.4.114`. `4.4.0-116.140` does not show the problem. Example: ``` [ 58.763045] BUG: unable to handle kernel paging request at 000000003c0c41a8 [ 58.846450] IP: [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 58.909436] PGD 800000003be04067 PUD 3bea1067 PMD 0 [ 58.914876] Oops: 0000 [#1] SMP [ 58.915581] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay vboxsf isofs ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vboxguest input_leds serio_raw parport_pc parport video ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mptspi aesni_intel scsi_transport_spi mptscsih aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd mptbase psmouse e1000 [ 59.678145] CPU: 1 PID: 1810 Comm: scope Not tainted 4.4.0-119-generic #143-Ubuntu [ 59.790501] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 59.846405] task: ffff88003ae23800 ti: ffff880022c84000 task.ti: ffff880022c84000 [ 60.000524] RIP: 0010:[<ffffffff8117cd76>] [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 60.178029] RSP: 0018:ffff880022c87960 EFLAGS: 00010082 [ 60.257957] RAX: ffffffff8117cd70 RBX: ffffc9000022f090 RCX: 0000000000000000 [ 60.350704] RDX: 0000000000000000 RSI: ffff880022c87ba8 RDI: 000000003c0c4180 [ 60.449182] RBP: ffff880022c87be8 R08: 0000000000000000 R09: 0000000000000800 [ 60.547638] R10: ffff88003ae23800 R11: ffff88003ca12e10 R12: 0000000000000000 [ 60.570757] R13: ffff88003c601200 R14: ffff88003fd10020 R15: ffff880022c87d10 [ 60.678811] FS: 00007f95ba372700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 60.778636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.866380] CR2: 000000003c0c41a8 CR3: 000000003aeae000 CR4: 0000000000060670 [ 60.963736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 61.069195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 61.187006] Stack: [ 61.189256] ffff880022c87be8 ffffffff81177411 0000000000000000 0000000000000001 [ 61.253133] 000000003c0c4180 ffff880022c87ba8 0000000000000000 0000000000000000 [ 61.345334] 0000000000000000 ffff880022c87d10 0000000000000000 0000000000000001 [ 61.459069] Call Trace: [ 61.505273] [<ffffffff81177411>] ? __bpf_prog_run+0x7a1/0x1360 [ 61.625511] [<ffffffff810b7939>] ? update_curr+0x79/0x170 [ 61.741423] [<ffffffff810b7b0c>] ? update_cfs_shares+0xbc/0x100 [ 61.837892] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 61.941349] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.073874] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.185260] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.186239] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.305193] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.399854] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.406219] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.407994] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.410491] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.431220] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.497078] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.559245] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.661493] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.712927] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.799216] [<ffffffff8116c4c7>] trace_call_bpf+0x37/0x50 [ 62.881570] [<ffffffff8116ca57>] kprobe_perf_func+0x37/0x250 [ 62.977365] [<ffffffff810ac186>] ? finish_task_switch+0x76/0x230 [ 62.981405] [<ffffffff810cd7b1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 63.092978] [<ffffffff8116e271>] kprobe_dispatcher+0x31/0x50 [ 63.184696] [<ffffffff817921d1>] ? tcp_close+0x1/0x440 [ 63.260350] [<ffffffff81061976>] kprobe_ftrace_handler+0xb6/0x120 [ 63.275694] [<ffffffff817921d5>] ? tcp_close+0x5/0x440 [ 63.278202] [<ffffffff81145108>] ftrace_ops_recurs_func+0x58/0xb0 [ 63.289826] [<ffffffffc00050d5>] 0xffffffffc00050d5 [ 63.291573] [<ffffffff817921d0>] ? tcp_check_oom+0x150/0x150 [ 63.299743] [<ffffffff817921d1>] ? tcp_close+0x1/0x440 [ 63.301658] [<ffffffff817921d5>] tcp_close+0x5/0x440 [ 63.340651] [<ffffffff817bbee2>] inet_release+0x42/0x70 [ 63.440655] [<ffffffff817921d5>] ? tcp_close+0x5/0x440 [ 63.549368] [<ffffffff817bbee2>] ? inet_release+0x42/0x70 [ 63.655199] [<ffffffff817ec580>] inet6_release+0x30/0x40 [ 63.657005] [<ffffffff817235c5>] sock_release+0x25/0x80 [ 63.658693] [<ffffffff81723632>] sock_close+0x12/0x20 [ 63.660735] [<ffffffff812162c7>] __fput+0xe7/0x230 [ 63.662210] [<ffffffff8121644e>] ____fput+0xe/0x10 [ 63.664371] [<ffffffff810a14c6>] task_work_run+0x86/0xb0 [ 63.667217] [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0 [ 63.669889] [<ffffffff81003c7e>] syscall_return_slowpath+0x4e/0x60 [ 63.673627] [<ffffffff8184f8b0>] int_ret_from_sys_call+0x25/0x9f [ 63.704763] Code: 41 be 01 00 00 00 e8 fa bd ff ff 49 89 c5 eb 94 e8 f0 14 0a 00 4c 89 eb e9 e2 fe ff ff e8 a3 60 f0 ff 0f 1f 00 0f 1f 44 00 00 55 <48> 8b 47 28 48 89 e5 48 8b 40 18 e8 8a 83 6d 00 5d c3 0f 1f 84 [ 63.900088] RIP [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 63.903014] RSP <ffff880022c87960> [ 63.905151] CR2: 000000003c0c41a8 [ 63.906757] ---[ end trace dc24e8c214caa65b ]--- ``` git bisect points to commit 68dd63b26223880d1b431b6bf54e45d93d04361a bpf: fix branch pruning logic We tested with a simple kprobe that counts read syscalls and can reproduce the bug. ``` struct bpf_map_def SEC("maps/count") count = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(__u32), .value_size = sizeof(__u64), .max_entries = 1, .map_flags = 0, }; SEC("kprobe/SyS_read") int kprobe(struct pt_regs *ctx) { u64 *count_ptr = NULL; u64 zero = 0, one = 1, current_count = 1; count_ptr = bpf_map_lookup_elem(&count, &zero); if (count_ptr != NULL) { (*count_ptr)++; current_count = *count_ptr; } else { bpf_map_update_elem(&count, &zero, &one, BPF_ANY); } printt("current count %lu\n", current_count); return 0; } ``` You can find our test program here: https://files.schu.io/tmp/oops It should either trigger the oops or exit after 5 seconds and return the number of calls. ``` while true; do echo hello; sleep 1; done & # make sure there are read syscalls done ./oops ``` [1] https://github.com/weaveworks/scope/issues/3131 [2] https://github.com/weaveworks/tcptracer-bpf SRU Justification Impact: Some unfortunate timing between the fix for CVE-2017-17862 being backported and some updates from upstream stable resulted in us not having some hunks from the CVE patch. This is causing oopses (see below). Fix: Add in the missing hunks from the CVE patch. Test case: See test results in comment #4. Regression potential: This just updates the code to match the upstream patch, which has been upstream for months, so regression potential should be low. --- Hey, we are currently debugging an issue with Scope [1] where the initialization of the used tcptracer-bpf [2] leads to a kernel oops at the first call of `bpf_map_lookup_elem`. The OS is Ubuntu Xenial with kernel version `Ubuntu 4.4.0-119.143-generic 4.4.114`. `4.4.0-116.140` does not show the problem. Example: ``` [ 58.763045] BUG: unable to handle kernel paging request at 000000003c0c41a8 [ 58.846450] IP: [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 58.909436] PGD 800000003be04067 PUD 3bea1067 PMD 0 [ 58.914876] Oops: 0000 [#1] SMP [ 58.915581] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay vboxsf isofs ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vboxguest input_leds serio_raw parport_pc parport video ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mptspi aesni_intel scsi_transport_spi mptscsih aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd mptbase psmouse e1000 [ 59.678145] CPU: 1 PID: 1810 Comm: scope Not tainted 4.4.0-119-generic #143-Ubuntu [ 59.790501] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 59.846405] task: ffff88003ae23800 ti: ffff880022c84000 task.ti: ffff880022c84000 [ 60.000524] RIP: 0010:[<ffffffff8117cd76>] [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 60.178029] RSP: 0018:ffff880022c87960 EFLAGS: 00010082 [ 60.257957] RAX: ffffffff8117cd70 RBX: ffffc9000022f090 RCX: 0000000000000000 [ 60.350704] RDX: 0000000000000000 RSI: ffff880022c87ba8 RDI: 000000003c0c4180 [ 60.449182] RBP: ffff880022c87be8 R08: 0000000000000000 R09: 0000000000000800 [ 60.547638] R10: ffff88003ae23800 R11: ffff88003ca12e10 R12: 0000000000000000 [ 60.570757] R13: ffff88003c601200 R14: ffff88003fd10020 R15: ffff880022c87d10 [ 60.678811] FS: 00007f95ba372700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 60.778636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.866380] CR2: 000000003c0c41a8 CR3: 000000003aeae000 CR4: 0000000000060670 [ 60.963736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 61.069195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 61.187006] Stack: [ 61.189256] ffff880022c87be8 ffffffff81177411 0000000000000000 0000000000000001 [ 61.253133] 000000003c0c4180 ffff880022c87ba8 0000000000000000 0000000000000000 [ 61.345334] 0000000000000000 ffff880022c87d10 0000000000000000 0000000000000001 [ 61.459069] Call Trace: [ 61.505273] [<ffffffff81177411>] ? __bpf_prog_run+0x7a1/0x1360 [ 61.625511] [<ffffffff810b7939>] ? update_curr+0x79/0x170 [ 61.741423] [<ffffffff810b7b0c>] ? update_cfs_shares+0xbc/0x100 [ 61.837892] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 61.941349] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.073874] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.185260] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.186239] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.305193] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.399854] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.406219] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.407994] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.410491] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.431220] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.497078] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.559245] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.661493] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0 [ 62.712927] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0 [ 62.799216] [<ffffffff8116c4c7>] trace_call_bpf+0x37/0x50 [ 62.881570] [<ffffffff8116ca57>] kprobe_perf_func+0x37/0x250 [ 62.977365] [<ffffffff810ac186>] ? finish_task_switch+0x76/0x230 [ 62.981405] [<ffffffff810cd7b1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 63.092978] [<ffffffff8116e271>] kprobe_dispatcher+0x31/0x50 [ 63.184696] [<ffffffff817921d1>] ? tcp_close+0x1/0x440 [ 63.260350] [<ffffffff81061976>] kprobe_ftrace_handler+0xb6/0x120 [ 63.275694] [<ffffffff817921d5>] ? tcp_close+0x5/0x440 [ 63.278202] [<ffffffff81145108>] ftrace_ops_recurs_func+0x58/0xb0 [ 63.289826] [<ffffffffc00050d5>] 0xffffffffc00050d5 [ 63.291573] [<ffffffff817921d0>] ? tcp_check_oom+0x150/0x150 [ 63.299743] [<ffffffff817921d1>] ? tcp_close+0x1/0x440 [ 63.301658] [<ffffffff817921d5>] tcp_close+0x5/0x440 [ 63.340651] [<ffffffff817bbee2>] inet_release+0x42/0x70 [ 63.440655] [<ffffffff817921d5>] ? tcp_close+0x5/0x440 [ 63.549368] [<ffffffff817bbee2>] ? inet_release+0x42/0x70 [ 63.655199] [<ffffffff817ec580>] inet6_release+0x30/0x40 [ 63.657005] [<ffffffff817235c5>] sock_release+0x25/0x80 [ 63.658693] [<ffffffff81723632>] sock_close+0x12/0x20 [ 63.660735] [<ffffffff812162c7>] __fput+0xe7/0x230 [ 63.662210] [<ffffffff8121644e>] ____fput+0xe/0x10 [ 63.664371] [<ffffffff810a14c6>] task_work_run+0x86/0xb0 [ 63.667217] [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0 [ 63.669889] [<ffffffff81003c7e>] syscall_return_slowpath+0x4e/0x60 [ 63.673627] [<ffffffff8184f8b0>] int_ret_from_sys_call+0x25/0x9f [ 63.704763] Code: 41 be 01 00 00 00 e8 fa bd ff ff 49 89 c5 eb 94 e8 f0 14 0a 00 4c 89 eb e9 e2 fe ff ff e8 a3 60 f0 ff 0f 1f 00 0f 1f 44 00 00 55 <48> 8b 47 28 48 89 e5 48 8b 40 18 e8 8a 83 6d 00 5d c3 0f 1f 84 [ 63.900088] RIP [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20 [ 63.903014] RSP <ffff880022c87960> [ 63.905151] CR2: 000000003c0c41a8 [ 63.906757] ---[ end trace dc24e8c214caa65b ]--- ``` git bisect points to commit    68dd63b26223880d1b431b6bf54e45d93d04361a bpf: fix branch pruning logic We tested with a simple kprobe that counts read syscalls and can reproduce the bug. ``` struct bpf_map_def SEC("maps/count") count = {         .type = BPF_MAP_TYPE_HASH,         .key_size = sizeof(__u32),         .value_size = sizeof(__u64),         .max_entries = 1,         .map_flags = 0, }; SEC("kprobe/SyS_read") int kprobe(struct pt_regs *ctx) {         u64 *count_ptr = NULL;         u64 zero = 0, one = 1, current_count = 1;         count_ptr = bpf_map_lookup_elem(&count, &zero);         if (count_ptr != NULL) {                 (*count_ptr)++;                 current_count = *count_ptr;         } else {                 bpf_map_update_elem(&count, &zero, &one, BPF_ANY);         }         printt("current count %lu\n", current_count);         return 0; } ``` You can find our test program here: https://files.schu.io/tmp/oops It should either trigger the oops or exit after 5 seconds and return the number of calls. ``` while true; do echo hello; sleep 1; done & # make sure there are read syscalls done ./oops ``` [1] https://github.com/weaveworks/scope/issues/3131 [2] https://github.com/weaveworks/tcptracer-bpf
2018-04-18 08:58:04 Olaf Seibert bug added subscriber Olaf Seibert
2018-04-19 17:03:18 Seth Forshee linux (Ubuntu Xenial): importance Medium High
2018-04-19 17:03:22 Seth Forshee linux (Ubuntu): status Triaged Invalid
2018-04-19 17:14:29 Steffen Neubauer bug added subscriber Steffen Neubauer
2018-04-20 12:12:09 Stefan Bader linux (Ubuntu Xenial): status Triaged Fix Committed
2018-04-20 20:20:36 Jason Sievert bug added subscriber Jason Sievert
2018-04-27 19:12:20 Brad Figg tags xenial verification-needed-xenial xenial
2018-04-30 10:44:30 Bodo Petermann tags verification-needed-xenial xenial verification-done-xenial xenial
2018-05-22 00:00:38 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2018-05-22 00:00:38 Launchpad Janitor cve linked 2017-16995
2018-05-22 00:00:38 Launchpad Janitor cve linked 2017-17862
2018-05-22 00:00:38 Launchpad Janitor cve linked 2018-1000004
2018-05-22 00:00:38 Launchpad Janitor cve linked 2018-3639