kernel panic in bpf_map_update_elem (with bcc tool ext4slower)

Bug #1763352 reported by Bodo Petermann
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned
Xenial
Triaged
High
Unassigned

Bug Description

Using Ubuntu 16.04 LTS with 4.4.0 kernel on x86_64
After kernel update from 4.4.0-116 to 4.4.0-119 running an eBPF tool to log slow ext4 I/O (bcc-tool ext4slower) causes a kernel panic in bpf_map_update_elem.

Steps to reproduce:

* install bcc-tools 0.5.0-1 (https://github.com/iovisor/bcc), packages: bcc-tools, libbcc, python-bcc, libbcc-examples; from repo.iovisor.org
* install kernel 4.4.0-119 (linux-image-4.4.0-119-generic linux-image-extra-4.4.0-119 linux-headers-4.4.0-119 linux-headers-4.4.0-119-generic)
* reboot
* run ext4slower (sudo /usr/share/bcc/tools/ext4slower)
* system crashes

With kernel 4.4.0-116 ext4slower runs fine.

uname -a:
Linux <hostname> 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

BCC tools version: 0.5.0-1

Log of the kernel panic including backtrace:

[ 59.635900] BUG: unable to handle kernel paging request at 00000000b98aa6a8
[ 59.642345] IP: [<ffffffff8117cd96>] bpf_map_update_elem+0x6/0x20
[ 59.645782] PGD 0
[ 59.647150] Oops: 0000 [#1] SMP
[ 59.649808] Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd input_leds joydev serio_raw i2c_piix4 8250_fintek parport_pc parport mac_hid autofs4 hid_generic usbhid hid ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse drm pata_acpi floppy
[ 59.672373] CPU: 0 PID: 1 Comm: systemd Not tainted 4.4.0-119-generic #143-Ubuntu
[ 59.676407] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-2 04/01/2014
[ 59.680079] task: ffff88013a490000 ti: ffff88013a48c000 task.ti: ffff88013a48c000
[ 59.684124] RIP: 0010:[<ffffffff8117cd96>] [<ffffffff8117cd96>] bpf_map_update_elem+0x6/0x20
[ 59.689097] RSP: 0018:ffff88013a48f980 EFLAGS: 00010086
[ 59.691618] RAX: ffffffff8117cd90 RBX: ffffc90000818170 RCX: 0000000000000000
[ 59.695739] RDX: ffff88013a48fbd0 RSI: ffff88013a48fbe8 RDI: 00000000b98aa680
[ 59.699444] RBP: ffff88013a48fc08 R08: ffff880138d6b180 R09: 0000000000000800
[ 59.703264] R10: ffff88013a490000 R11: 0000000000000246 R12: 0000000000000000
[ 59.707164] R13: ffff8800ba0f4400 R14: ffff88013fc10020 R15: ffff88013a48fd58
[ 59.711112] FS: 00007f4039da58c0(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 59.715861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.719025] CR2: 00000000b98aa6a8 CR3: 00000000bb65e000 CR4: 0000000000160670
[ 59.722937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 59.727210] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 59.731074] Stack:
[ 59.732149] ffff88013a48fc08 ffffffff81177411 ffff88013fc20500 0000000000000000
[ 59.735760] 00000000b98aa680 ffff88013a48fbe8 ffff88013a48fbd0 0000000000000000
[ 59.740058] ffff880138d6b180 ffff88013a48fe58 ffff8800bb5ab000 0000000000000000
[ 59.744100] Call Trace:
[ 59.745434] [<ffffffff81177411>] ? __bpf_prog_run+0x7a1/0x1360
[ 59.748289] [<ffffffff810b7939>] ? update_curr+0x79/0x170
[ 59.751044] [<ffffffff814e1566>] ? vp_notify+0x16/0x20
[ 59.753601] [<ffffffff810b9a8b>] ? dequeue_entity+0x41b/0xa80
[ 59.756354] [<ffffffff810bcb59>] ? update_load_avg+0x219/0x7d0
[ 59.759171] [<ffffffff810bd1ac>] ? set_next_entity+0x9c/0xb0
[ 59.761890] [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0
[ 59.764667] [<ffffffff8184b081>] ? __schedule+0x341/0x7f0
[ 59.767155] [<ffffffff813dd708>] ? blk_mq_flush_plug_list+0x128/0x150
[ 59.770253] [<ffffffff8184bd60>] ? bit_wait+0x60/0x60
[ 59.772782] [<ffffffff8116c4c7>] trace_call_bpf+0x37/0x50
[ 59.775336] [<ffffffff8116ca57>] kprobe_perf_func+0x37/0x250
[ 59.778264] [<ffffffff810f8e2e>] ? ktime_get+0x3e/0xb0
[ 59.780724] [<ffffffff81142dd6>] ? delayacct_end+0x56/0x60
[ 59.783589] [<ffffffff8116e271>] kprobe_dispatcher+0x31/0x50
[ 59.786490] [<ffffffff811313d5>] aggr_pre_handler+0x45/0x80
[ 59.789534] [<ffffffff81194ae1>] ? generic_file_read_iter+0x1/0x690
[ 59.792407] [<ffffffff81061976>] kprobe_ftrace_handler+0xb6/0x120
[ 59.795329] [<ffffffff81194ae5>] ? generic_file_read_iter+0x5/0x690
[ 59.798453] [<ffffffff81145108>] ftrace_ops_recurs_func+0x58/0xb0
[ 59.801948] [<ffffffffc00180d5>] 0xffffffffc00180d5
[ 59.804357] [<ffffffff81194ae0>] ? filemap_write_and_wait_range+0x70/0x70
[ 59.808022] [<ffffffff81194ae1>] ? generic_file_read_iter+0x1/0x690
[ 59.811225] [<ffffffff81194ae5>] generic_file_read_iter+0x5/0x690
[ 59.814332] [<ffffffff81213eee>] new_sync_read+0x9e/0xe0
[ 59.817079] [<ffffffff81194ae5>] ? generic_file_read_iter+0x5/0x690
[ 59.820120] [<ffffffff8105fc79>] ? kretprobe_trampoline_holder+0x9/0x9
[ 59.825007] [<ffffffff81213f59>] __vfs_read+0x29/0x40
[ 59.827528] [<ffffffff81214526>] vfs_read+0x86/0x130
[ 59.829931] [<ffffffff81215275>] SyS_read+0x55/0xc0
[ 59.832277] [<ffffffff8184f708>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[ 59.835220] Code: f0 ff 0f 1f 00 0f 1f 44 00 00 55 48 8b 47 28 48 89 e5 48 8b 40 18 e8 8a 83 6d 00 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 8b 47 28 48 89 e5 48 8b 40 20 e8 6a 83 6d 00 48 98 5d c3 66
[ 59.848944] RIP [<ffffffff8117cd96>] bpf_map_update_elem+0x6/0x20
[ 59.851995] RSP <ffff88013a48f980>
[ 59.853771] CR2: 00000000b98aa6a8
[ 59.855364] ---[ end trace a90b99e65b1f34d9 ]---
[ 59.867439] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 59.867439]
[ 59.872906] Kernel Offset: disabled
[ 59.874834] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 59.874834]

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1763352/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
s11-cloudstackers (cloudstackers-7) wrote :

Actual problematic package name: linux-image-4.4.0-119-generic

affects: ubuntu → linux-meta-lts-xenial (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-meta-lts-xenial (Ubuntu):
status: New → Confirmed
Revision history for this message
Bodo Petermann (bpetermann) wrote :

This issue is most likely caused by the BPF verifier changes from ubuntu-genial commit 68dd63b26223880d1b431b6bf54e45d93d04361a.

The BPF uses a map and bpf_map_update_elem crashes because the map pointer is invalid.

The map pointer is invalid because the BPF instruction to load the pointer into the BPF register R1 was broken. It's opcode 0x18 / BPF_LD_IMM64. The lower part of the 64bit value is correctly loaded (from the IMM value in that instruction). The following instruction should be opcode 0 and IMM should be the higher 32 bits of the value. But at the time the instruction is processed opcode is 0xBF and IMM=0.

actual:
- BPF_LD_IMM64 instruction: 0x18 0x01 0x00 0x00 0x00 0x1b 0xbf 0x38
- following instruction: 0xbf 0x00 0x00 0x00 0x00 0x00 0x00 0x00
expected:
- BPF_LD_IMM64 instruction: 0x18 0x01 0x00 0x00 0x00 0x1b 0xbf 0x38
- following instruction: 0x00 0x00 0x00 0x00 <higher 32 bits of map pointer>

In kernel/bpf/verifier.c the function replace_map_fd_with_map_ptr() prepares the IMM values of the instruction pair correctly.
But after setting up the map pointer the verifier continues. In do_check() the instructions that are part of accessed code are marked as "seen". After do_check() the function sanitize_dead_code() is run to turn "unseen" BPF instructions into NOPs. That's what happens with the 2nd half of the BPF_LD_IMM64! The part that carries the higher 32 bits of the value was not marked as "seen" and is consequently patched into a NOP which resets the higher 32 bits of the map pointer to 0.

The change to sanitize the dead code was introduced in commit 68dd63b26223880d1b431b6bf54e45d93d04361a "bpf: fix branch pruning logic".

The fix would be to add a line
env->insn_aux_data[insn_idx].seen = true;
in kernel/bpf/verifier.c (see below or in attached patch)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8a40719c6ae5..5cc8b3f406e5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2005,6 +2005,7 @@ process_bpf_exit:
                                        return err;

                                insn_idx++;
+ env->insn_aux_data[insn_idx].seen = true;
                        } else {
                                verbose("invalid BPF_LD mode\n");
                                return -EINVAL;

tags: added: patch
Revision history for this message
Bodo Petermann (bpetermann) wrote :

I revised the patch a little bit. It needs an additional bounds check for insn_idx. And another seen=true line needs to removed. See the newly attached patch.

affects: linux-meta-lts-xenial (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
importance: Undecided → High
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Xenial):
status: Incomplete → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Triaged
tags: added: kernel-da-key xenial
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

You may want to submit your patch to the Ubuntu kernel team mailing list for review:
<email address hidden>

Revision history for this message
s11-cloudstackers (cloudstackers-7) wrote :

The same issue was reported in LP#1763454
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454

Seth Forshee provided a patch there that solves the issue here as well. His patch has been sent to the mailing list already.

From my point of view we could treat this ticket here as a duplicate of LP#1763454

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.