bpf_map_lookup_elem: BUG: unable to handle kernel paging request

Bug #1763454 reported by schu on 2018-04-12
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Xenial
High
Seth Forshee

Bug Description

SRU Justification

Impact: Some unfortunate timing between the fix for CVE-2017-17862 being backported and some updates from upstream stable resulted in us not having some hunks from the CVE patch. This is causing oopses (see below).

Fix: Add in the missing hunks from the CVE patch.

Test case: See test results in comment #4.

Regression potential: This just updates the code to match the upstream patch, which has been upstream for months, so regression potential should be low.

---

Hey,

we are currently debugging an issue with Scope [1] where the initialization of the used tcptracer-bpf [2] leads to a kernel oops at the first call of `bpf_map_lookup_elem`. The OS is Ubuntu Xenial with kernel version `Ubuntu 4.4.0-119.143-generic 4.4.114`. `4.4.0-116.140` does not show the problem.

Example:

```
[ 58.763045] BUG: unable to handle kernel paging request at 000000003c0c41a8
[ 58.846450] IP: [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20
[ 58.909436] PGD 800000003be04067 PUD 3bea1067 PMD 0
[ 58.914876] Oops: 0000 [#1] SMP
[ 58.915581] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc overlay vboxsf isofs ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vboxguest input_leds serio_raw parport_pc parport video ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mptspi aesni_intel scsi_transport_spi mptscsih aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd mptbase psmouse e1000
[ 59.678145] CPU: 1 PID: 1810 Comm: scope Not tainted 4.4.0-119-generic #143-Ubuntu
[ 59.790501] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 59.846405] task: ffff88003ae23800 ti: ffff880022c84000 task.ti: ffff880022c84000
[ 60.000524] RIP: 0010:[<ffffffff8117cd76>] [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20
[ 60.178029] RSP: 0018:ffff880022c87960 EFLAGS: 00010082
[ 60.257957] RAX: ffffffff8117cd70 RBX: ffffc9000022f090 RCX: 0000000000000000
[ 60.350704] RDX: 0000000000000000 RSI: ffff880022c87ba8 RDI: 000000003c0c4180
[ 60.449182] RBP: ffff880022c87be8 R08: 0000000000000000 R09: 0000000000000800
[ 60.547638] R10: ffff88003ae23800 R11: ffff88003ca12e10 R12: 0000000000000000
[ 60.570757] R13: ffff88003c601200 R14: ffff88003fd10020 R15: ffff880022c87d10
[ 60.678811] FS: 00007f95ba372700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 60.778636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.866380] CR2: 000000003c0c41a8 CR3: 000000003aeae000 CR4: 0000000000060670
[ 60.963736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 61.069195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 61.187006] Stack:
[ 61.189256] ffff880022c87be8 ffffffff81177411 0000000000000000 0000000000000001
[ 61.253133] 000000003c0c4180 ffff880022c87ba8 0000000000000000 0000000000000000
[ 61.345334] 0000000000000000 ffff880022c87d10 0000000000000000 0000000000000001
[ 61.459069] Call Trace:
[ 61.505273] [<ffffffff81177411>] ? __bpf_prog_run+0x7a1/0x1360
[ 61.625511] [<ffffffff810b7939>] ? update_curr+0x79/0x170
[ 61.741423] [<ffffffff810b7b0c>] ? update_cfs_shares+0xbc/0x100
[ 61.837892] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 61.941349] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.073874] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.185260] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.186239] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.305193] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.399854] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.406219] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.407994] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.410491] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.431220] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.497078] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.559245] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.661493] [<ffffffff8184b04d>] ? __schedule+0x30d/0x7f0
[ 62.712927] [<ffffffff8184b041>] ? __schedule+0x301/0x7f0
[ 62.799216] [<ffffffff8116c4c7>] trace_call_bpf+0x37/0x50
[ 62.881570] [<ffffffff8116ca57>] kprobe_perf_func+0x37/0x250
[ 62.977365] [<ffffffff810ac186>] ? finish_task_switch+0x76/0x230
[ 62.981405] [<ffffffff810cd7b1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 63.092978] [<ffffffff8116e271>] kprobe_dispatcher+0x31/0x50
[ 63.184696] [<ffffffff817921d1>] ? tcp_close+0x1/0x440
[ 63.260350] [<ffffffff81061976>] kprobe_ftrace_handler+0xb6/0x120
[ 63.275694] [<ffffffff817921d5>] ? tcp_close+0x5/0x440
[ 63.278202] [<ffffffff81145108>] ftrace_ops_recurs_func+0x58/0xb0
[ 63.289826] [<ffffffffc00050d5>] 0xffffffffc00050d5
[ 63.291573] [<ffffffff817921d0>] ? tcp_check_oom+0x150/0x150
[ 63.299743] [<ffffffff817921d1>] ? tcp_close+0x1/0x440
[ 63.301658] [<ffffffff817921d5>] tcp_close+0x5/0x440
[ 63.340651] [<ffffffff817bbee2>] inet_release+0x42/0x70
[ 63.440655] [<ffffffff817921d5>] ? tcp_close+0x5/0x440
[ 63.549368] [<ffffffff817bbee2>] ? inet_release+0x42/0x70
[ 63.655199] [<ffffffff817ec580>] inet6_release+0x30/0x40
[ 63.657005] [<ffffffff817235c5>] sock_release+0x25/0x80
[ 63.658693] [<ffffffff81723632>] sock_close+0x12/0x20
[ 63.660735] [<ffffffff812162c7>] __fput+0xe7/0x230
[ 63.662210] [<ffffffff8121644e>] ____fput+0xe/0x10
[ 63.664371] [<ffffffff810a14c6>] task_work_run+0x86/0xb0
[ 63.667217] [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
[ 63.669889] [<ffffffff81003c7e>] syscall_return_slowpath+0x4e/0x60
[ 63.673627] [<ffffffff8184f8b0>] int_ret_from_sys_call+0x25/0x9f
[ 63.704763] Code: 41 be 01 00 00 00 e8 fa bd ff ff 49 89 c5 eb 94 e8 f0 14 0a 00 4c 89 eb e9 e2 fe ff ff e8 a3 60 f0 ff 0f 1f 00 0f 1f 44 00 00 55 <48> 8b 47 28 48 89 e5 48 8b 40 18 e8 8a 83 6d 00 5d c3 0f 1f 84
[ 63.900088] RIP [<ffffffff8117cd76>] bpf_map_lookup_elem+0x6/0x20
[ 63.903014] RSP <ffff880022c87960>
[ 63.905151] CR2: 000000003c0c41a8
[ 63.906757] ---[ end trace dc24e8c214caa65b ]---
```

git bisect points to commit

   68dd63b26223880d1b431b6bf54e45d93d04361a bpf: fix branch pruning logic

We tested with a simple kprobe that counts read syscalls and can reproduce the bug.

```
struct bpf_map_def SEC("maps/count") count = {
        .type = BPF_MAP_TYPE_HASH,
        .key_size = sizeof(__u32),
        .value_size = sizeof(__u64),
        .max_entries = 1,
        .map_flags = 0,
};

SEC("kprobe/SyS_read")
int kprobe(struct pt_regs *ctx)
{
        u64 *count_ptr = NULL;
        u64 zero = 0, one = 1, current_count = 1;

        count_ptr = bpf_map_lookup_elem(&count, &zero);
        if (count_ptr != NULL) {
                (*count_ptr)++;
                current_count = *count_ptr;
        } else {
                bpf_map_update_elem(&count, &zero, &one, BPF_ANY);
        }
        printt("current count %lu\n", current_count);

        return 0;
}
```

You can find our test program here: https://files.schu.io/tmp/oops It should either trigger the oops or exit after 5 seconds and return the number of calls.

```
while true; do echo hello; sleep 1; done & # make sure there are read syscalls done
./oops
```

[1] https://github.com/weaveworks/scope/issues/3131
[2] https://github.com/weaveworks/tcptracer-bpf

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1763454

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
importance: Undecided → Medium
status: Incomplete → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Alban Crequy (muadda) wrote :

I'm taking a guess at the cause of this kernel panic:

commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 was backported from upstream kernel into the Ubuntu kernel.

But this part was not backported:

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4213,6 +4216,8 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len,
        memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off);
        memcpy(new_data + off + cnt - 1, old_data + off,
               sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1));
+ for (i = off; i < off + cnt - 1; i++)
+ new_data[i].seen = true;
        env->insn_aux_data = new_data;
        vfree(old_data);
        return 0;

The likely reason for omission is that the Ubuntu kernel does not have this function in kernel/bpf/verifier.c, so there was no place to apply the patch snippet above. In upstream kernel, adjust_insn_aux_data() is called from fixup_bpf_calls() and that function was not in kernel/bpf/verifier.c yet in Ubuntu kernel.

However, semantically, the patch should have been applied in kernel/bpf/syscall.c, which is the file where fixup_bpf_calls() was located before it got refactored by commit e245c5c6a5656.

As a result, the BPF_CALL instruction is mistakenly considered not seen by the verifier so the BPF instructions for array_map_lookup_elem() are not emitted.

Seth Forshee (sforshee) on 2018-04-12
Changed in linux (Ubuntu Xenial):
assignee: nobody → Seth Forshee (sforshee)
Seth Forshee (sforshee) wrote :

Test build is at the link below, please let me know if it fixes the issue. Thanks!

http://people.canonical.com/~sforshee/lp1763454/

schu (schuio) wrote :

`Ubuntu 4.4.0-119.143+lp1763454v201804121433-generic 4.4.114` does fix the problem for us. Tested with Scope e2b4b3edf63a62836ca27024003cc38aa6b9c0b5

Thanks Seth!

schu (schuio) wrote :

Seth, can you give an ETA for when the update with the fix will be published?

If it's only a matter of a few days, adding a workaround might not be necessary..
(https://github.com/weaveworks/scope/pull/3141#discussion_r181340479)

Thanks again.

On Fri, Apr 13, 2018 at 12:19:26PM -0000, schu wrote:
> Seth, can you give an ETA for when the update with the fix will be
> published?
>
> If it's only a matter of a few days, adding a workaround might not be necessary..
> (https://github.com/weaveworks/scope/pull/3141#discussion_r181340479)

It will certainly be more than a few days. We release kernels on a
3-week cycle to allow time for testing and verifications, only in
extreme cases to we rush fixes out faster than that (and just a few days
is difficult in any case).

It looks like we're currently on week 2 of the current cycle. The patch
will get included in the next cycle, which would release about 4 weeks
from now.

description: updated
Seth Forshee (sforshee) wrote :

Note though that it should be in xenial-proposed within two weeks (at which point you'll be prompted to verify the fix there).

Alban Crequy (muadda) wrote :

> to allow time for testing and verifications

Is the testing process documented somewhere? Ideally we should add a step in the process to test @schuio's reproducer (or an equivalent) to avoid future similar regressions for software using eBPF.

Seth Forshee (sforshee) wrote :

No it's not well documented, and it's a complicated mishmash of auto package tests (these do have documentation but not useful here) and autotests maintained by the kernel team. We do run some bpf autotests, we could see about adding them there.

http://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/

Seth, I reported the same issue in LP#1763352 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763352).
My patch there does essentially the same as yours, so I will mention in that ticket that it's a duplicate.

But your patch does not remove the wrong "env->insn_aux_data[insn_idx].seen = true" from kernel/bpf/verifier.c line 1844. I think that "seen" shouldn't be set there. The line was probably added there by mistake. It should have been added for the LD IMM64 case in the first place.

Seth Forshee (sforshee) wrote :

I've duped the other bug to this one.

I do agree that the "seen = true" you identified looks like a mistake, I will fix up the patch to remove that.

You also added some bounds checking. I see your point in adding that, I can't find anything which would guarantee that there is an additional instruction there. However the check_ld_imm() call before that can also assume that there's something after the current instruction, so either there is some guarantee that there's something after the current instruction or else that bounds check needs to be moved. As the upstream source still lacks a bounds check there too, it might be best to pursue this question upstream.

Bodo Petermann (bpetermann) wrote :

(I used my team account cloudstackers-7 before, now with my own one)

The bounds check may not be necessary, because replace_map_fd_with_map_ptr is called before do_check and the relevant check is already in replace_map_fd_with_map_ptr. But it's not obvious, so at least a comment in do_check may be a good idea.

Steffen Neubauer (sneubauer) wrote :

For us the importance of this issue would be High instead of Medium (not sure if there is an objective definition somewhere, could not find it). Reason is that we rely on BPF quite heavily in our infrastructure and servers just crash pretty much immediately once we install the current kernel version and reboot.

Alban Crequy (muadda) wrote :

I was wondering about the 'Importance' definition too. It's also a panic-reboot loop just after booting when using Weave Scope in the Kubernetes cluster because Scope installs the BPF probe during initialization.

Seth Forshee (sforshee) on 2018-04-19
Changed in linux (Ubuntu Xenial):
importance: Medium → High
Changed in linux (Ubuntu):
status: Triaged → Invalid
Stefan Bader (smb) on 2018-04-20
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Bodo Petermann (bpetermann) wrote :

Tested with kernel 4.4.0-123.147. Issue is fixed there.

tags: added: verification-done-xenial
removed: verification-needed-xenial
schu (schuio) wrote :

Ubuntu 4.4.0-123.147-generic 4.4.128 does fix it for us as well.

Launchpad Janitor (janitor) wrote :
Download full text (59.3 KiB)

This bug was fixed in the package linux - 4.4.0-127.153

---------------
linux (4.4.0-127.153) xenial; urgency=medium

  * CVE-2018-3639 (powerpc)
    - powerpc/pseries: Support firmware disable of RFI flush
    - powerpc/powernv: Support firmware disable of RFI flush
    - powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
    - powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
    - powerpc/rfi-flush: Always enable fallback flush on pseries
    - powerpc/rfi-flush: Differentiate enabled and patched flush types
    - powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration
    - powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
    - powerpc: Add security feature flags for Spectre/Meltdown
    - powerpc/pseries: Set or clear security feature flags
    - powerpc/powernv: Set or clear security feature flags
    - powerpc/64s: Move cpu_show_meltdown()
    - powerpc/64s: Enhance the information in cpu_show_meltdown()
    - powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
    - powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
    - powerpc/64s: Wire up cpu_show_spectre_v1()
    - powerpc/64s: Wire up cpu_show_spectre_v2()
    - powerpc/pseries: Fix clearing of security feature flags
    - powerpc: Move default security feature flags
    - powerpc/pseries: Restore default security feature flags on setup
    - SAUCE: powerpc/64s: Add support for a store forwarding barrier at kernel
      entry/exit

  * CVE-2018-3639 (x86)
    - SAUCE: Clean up IBPB and IBRS control functions and macros
    - SAUCE: Fix up IBPB and IBRS kernel parameters documentation
    - SAUCE: Remove #define X86_FEATURE_PTI
    - x86/cpufeature: Move some of the scattered feature bits to x86_capability
    - x86/cpufeature: Cleanup get_cpu_cap()
    - x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
    - x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
    - x86/cpufeatures: Add Intel feature bits for Speculation Control
    - SAUCE: x86/kvm: Expose SPEC_CTRL from the leaf
    - x86/cpufeatures: Add AMD feature bits for Speculation Control
    - x86/msr: Add definitions for new speculation control MSRs
    - SAUCE: x86/msr: Rename MSR spec control feature bits
    - x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
    - x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
    - x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
    - x86/speculation: Add <asm/msr-index.h> dependency
    - x86/cpufeatures: Clean up Spectre v2 related CPUID flags
    - x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
    - SAUCE: x86/speculation: Move vendor specific IBRS/IBPB control code
    - SAUCE: x86: Add alternative_msr_write
    - SAUCE: x86/nospec: Simplify alternative_msr_write()
    - SAUCE: x86/bugs: Concentrate bug detection into a separate function
    - SAUCE: x86/bugs: Concentrate bug reporting into a separate function
    - arch: Introduce post-init read-only memory
    - SAUCE: x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
    - SAUCE: x86/bugs, KVM: Support the combination of guest a...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers