Kernel Oops - unable to handle kernel NULL pointer dereference at 0000000000000910 in update_blocked_averages

Bug #1567622 reported by Thomas Orozco
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Running in Virtualbox as a guest in a Macbook Pro, I'm running into a reliably-reproducible kernel oops on Ubuntu 16.04 with Linux 4.4.0-17-generic.

The stack trace isn't always the same, although the error always happens in the same `update_blocked_averages` function.

Here' one oops:

```
[ 56.972407] BUG: unable to handle kernel NULL pointer dereference at 0000000000000910
[ 57.076894] IP: [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 57.173751] PGD 0
[ 57.193537] Oops: 0000 [#1] SMP
[ 57.245472] Modules linked in: veth xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs ppdev crct10dif_pclmul crc32_pclmul snd_intel8x0 snd_ac97_codec aesni_intel ac97_bus aes_x86_64 snd_pcm lrw gf128mul snd_timer input_leds glue_helper ablk_helper snd cryptd joydev serio_raw soundcore i2c_piix4 8250_fintek mac_hid parport_pc lp parport autofs4 hid_generic usbhid ahci psmouse hid libahci pata_acpi e1000 video fjes
[ 57.937239] CPU: 1 PID: 269 Comm: systemd-journal Not tainted 4.4.0-17-generic #33-Ubuntu
[ 58.049743] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 58.189828] task: ffff8800d8f90dc0 ti: ffff8800da9a8000 task.ti: ffff8800da9a8000
[ 58.299087] RIP: 0010:[<ffffffff810b403d>] [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 58.320327] RSP: 0018:ffff8800da9abc70 EFLAGS: 00010046
[ 58.366603] RAX: 0000000000000000 RBX: ffff88020c3bae00 RCX: 0000000000000000
[ 58.514856] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ 58.612464] RBP: ffff8800da9abcc8 R08: afb504000afb5041 R09: 0000000000000000
[ 58.738629] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000045
[ 58.885338] R13: 0000000000000000 R14: ffff8802198975c0 R15: 0000000000000001
[ 59.001563] FS: 00007f7db7f6e840(0000) GS:ffff880219880000(0000) knlGS:0000000000000000
[ 59.160653] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.296218] CR2: 0000000000000910 CR3: 000000020c3fd000 CR4: 00000000000406e0
[ 59.418514] Stack:
[ 59.458306] 00000000000205f5 ffff880219896d00 0000000000000086 afb504000afb5041
[ 59.580604] 0000000000000000 0000ad60ffff01fe 0000000000000001 00000000ffff02f8
[ 59.720268] 0000000000016d00 ffff880219896d00 ffff8800d8f91310 ffff8800da9abd38
[ 59.848040] Call Trace:
[ 59.902635] [<ffffffff810bcbe7>] pick_next_task_fair+0x1e7/0x4f0
[ 59.992238] [<ffffffff8181e175>] __schedule+0x125/0xa10
[ 60.030794] [<ffffffff8181ea95>] schedule+0x35/0x80
[ 60.074193] [<ffffffff81822053>] schedule_hrtimeout_range_clock+0x193/0x1b0
[ 60.134427] [<ffffffff81254137>] ? ep_scan_ready_list+0x1e7/0x1f0
[ 60.195234] [<ffffffff810c9591>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 60.289354] [<ffffffff81822083>] schedule_hrtimeout_range+0x13/0x20
[ 60.325540] [<ffffffff81254430>] ep_poll+0x2c0/0x3d0
[ 60.328490] [<ffffffff810aaca0>] ? wake_up_q+0x70/0x70
[ 60.387888] [<ffffffff81255718>] SyS_epoll_wait+0xb8/0xd0
[ 60.502960] [<ffffffff81822b72>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 60.610213] Code: 24 08 49 b8 41 50 fb 0a 00 04 b5 af 4d 89 fe 0f 1f 44 00 00 44 8b 9b 24 01 00 00 45 85 db 0f 85 a0 03 00 00 48 8b 83 c8 00 00 00 <48> 8b 80 10 09 00 00 48 2b 83 18 01 00 00 48 8b 93 a0 00 00 00
[ 60.941287] RIP [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 61.111617] RSP <ffff8800da9abc70>
[ 61.201406] CR2: 0000000000000910
[ 61.258351] ---[ end trace fe3df8ee7b476828 ]---
```

Here's another one:

```
[ 154.454951] BUG: unable to handle kernel [ 154.455154] device vethe5e9554 entered promiscuous mode
[ 154.455202] IPv6: ADDRCONF(NETDEV_UP): vethe5e9554: link is not ready
[ 154.455203] docker0: port 1(vethe5e9554) entered forwarding state
[ 154.455210] docker0: port 1(vethe5e9554) entered forwarding state

[ 154.744305] NULL pointer dereference at 0000000000000910
[ 154.783574] IP: [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 154.784877] PGD 0
[ 154.785372] Oops: 0000 [#1] SMP
[ 154.790654] Modules linked in: veth xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs vboxsf(OE) ppdev crct10dif_pclmul crc32_pclmul aesni_intel snd_intel8x0 aes_x86_64 input_leds lrw snd_ac97_codec gf128mul glue_helper ac97_bus ablk_helper cryptd serio_raw vboxvideo(OE) joydev snd_pcm i2c_piix4 snd_timer drm 8250_fintek snd soundcore vboxguest(OE) parport_pc mac_hid lp parport autofs4 hid_generic usbhid ahci hid psmouse libahci e1000 pata_acpi video fjes
[ 155.387419] CPU: 0 PID: 2043 Comm: systemd-udevd Tainted: G OE 4.4.0-17-generic #33-Ubuntu
[ 155.546119] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.654775] task: ffff8800dbb3b700 ti: ffff88020da8c000 task.ti: ffff88020da8c000
[ 155.753906] RIP: 0010:[<ffffffff810b403d>] [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 155.775805] RSP: 0000:ffff880219803df0 EFLAGS: 00010046
[ 155.811224] RAX: 0000000000000000 RBX: ffff8800db891200 RCX: 0000000000000001
[ 155.857524] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: 0000000000000002
[ 155.861832] RBP: ffff880219803e50 R08: afb504000afb5041 R09: 0000000000000100
[ 155.980061] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000291
[ 156.059221] R13: 0000000000000000 R14: ffff8802198175c0 R15: 0000000000000001
[ 156.167527] FS: 00007f06cde8c8c0(0000) GS:ffff880219800000(0000) knlGS:0000000000000000
[ 156.302318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 156.386563] CR2: 0000000000000910 CR3: 000000020c6a7000 CR4: 00000000000406f0
[ 156.507153] Stack:
[ 156.559351] 0000000000445496 ffff880219816d00 0000000000000286 afb504000afb5041
[ 156.590109] 0000000000000000 0043430019816d00 0000000000016d00 0000000000000000
[ 156.684722] 00000000ffff61a3 0000000000000001 ffff880219816d00 ffff8800db5097f8
[ 156.798239] Call Trace:
[ 156.837119] <IRQ>
[ 156.863469] [<ffffffff810bcf3b>] rebalance_domains+0x4b/0x2d0
[ 156.997302] [<ffffffff810ed7b9>] ? update_process_times+0x59/0x60
[ 157.111072] [<ffffffff810bd399>] run_rebalance_domains+0x1d9/0x210
[ 157.183338] [<ffffffff81051a4d>] ? lapic_next_event+0x1d/0x30
[ 157.271222] [<ffffffff81084851>] __do_softirq+0x101/0x290
[ 157.359428] [<ffffffff81084b53>] irq_exit+0xa3/0xb0
[ 157.453079] [<ffffffff81825622>] smp_apic_timer_interrupt+0x42/0x50
[ 157.535290] [<ffffffff818238e2>] apic_timer_interrupt+0x82/0x90
[ 157.656699] <EOI>
[ 157.688002] [<ffffffff8118c88c>] ? filemap_map_pages+0x8c/0x230
[ 157.752691] [<ffffffff810c9591>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 157.806798] [<ffffffff811bec61>] handle_mm_fault+0x1131/0x1820
[ 157.860940] [<ffffffff8122be24>] ? mntput+0x24/0x40
[ 157.904327] [<ffffffff8106a537>] __do_page_fault+0x197/0x400
[ 157.984122] [<ffffffff8106a7c2>] do_page_fault+0x22/0x30
[ 158.076895] [<ffffffff81824cf8>] page_fault+0x28/0x30
[ 158.138672] Code: 24 08 49 b8 41 50 fb 0a 00 04 b5 af 4d 89 fe 0f 1f 44 00 00 44 8b 9b 24 01 00 00 45 85 db 0f 85 a0 03 00 00 48 8b 83 c8 00 00 00 <48> 8b 80 10 09 00 00 48 2b 83 18 01 00 00 48 8b 93 a0 00 00 00
[ 158.290783] RIP [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 158.383333] RSP <ffff880219803df0>
[ 158.414588] CR2: 0000000000000910
[ 158.467115] ---[ end trace 27ceb18bfd8d94ca ]---
[ 158.505862] Kernel panic - not syncing: Fatal exception in interrupt
[ 159.629144] Shutting down cpus with NMI
[ 159.745790] Kernel Offset: disabled
[ 159.859365] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
```

And a third one:

```
[ 139.661389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000910
[ 139.807547] IP: [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 139.946882] PGD 0
[ 139.984609] Oops: 0000 [#1] SMP
[ 140.048650] Modules linked in: veth xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs vboxsf ppdev crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_intel8x0 snd_ac97_codec vboxvideo ac97_bus input_leds joydev snd_pcm drm snd_timer serio_raw snd i2c_piix4 soundcore 8250_fintek vboxguest parport_pc mac_hid lp parport autofs4 hid_generic usbhid ahci psmouse libahci hid e1000 pata_acpi fjes video
[ 141.115576] CPU: 0 PID: 2038 Comm: systemd-udevd Not tainted 4.4.0-17-generic #33-Ubuntu
[ 141.310652] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 141.415336] task: ffff88020c2b9b80 ti: ffff8800dadf0000 task.ti: ffff8800dadf0000
[ 141.519708] RIP: 0010:[<ffffffff810b403d>] [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 141.707701] RSP: 0018:ffff8800dadf3c70 EFLAGS: 00010046
[ 141.824270] RAX: 0000000000000000 RBX: ffff88020c1a9000 RCX: 0000000000000001
[ 141.999630] RDX: 0000000000000485 RSI: 00000000cd4629c5 RDI: ffff880219bf1000
[ 142.180519] RBP: ffff8800dadf3cc8 R08: afb504000afb5041 R09: 0000000000000000
[ 142.369893] R10: 00000000ffff518b R11: 0000000000000000 R12: 00000000ffff5285
[ 142.525548] R13: 0000000000016d00 R14: ffff8802198175c0 R15: ffff8802198175c0
[ 142.675206] FS: 00007ff7980cf8c0(0000) GS:ffff880219800000(0000) knlGS:0000000000000000
[ 142.871842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 143.019994] CR2: 0000000000000910 CR3: 00000000d8e0e000 CR4: 00000000000406f0
[ 143.181043] Stack:
[ 143.225953] ffffffff813fcd45 ffff880219816d00 0000000000000086 ffffffff810c9591
[ 143.308932] ffff88020ee31c00 00000000ffff518b 0000000000000000 00000000ffff5285
[ 143.470616] 0000000000016d00 ffff880219816d00 ffff88020c2ba0d0 ffff8800dadf3d38
[ 143.593228] Call Trace:
[ 143.695835] [<ffffffff813fcd45>] ? find_next_bit+0x15/0x20
[ 143.812410] [<ffffffff810c9591>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 143.999872] [<ffffffff810bcbe7>] pick_next_task_fair+0x1e7/0x4f0
[ 144.104135] [<ffffffff8181e175>] __schedule+0x125/0xa10
[ 144.120508] [<ffffffff8181ea95>] schedule+0x35/0x80
[ 144.226139] [<ffffffff81821f85>] schedule_hrtimeout_range_clock+0xc5/0x1b0
[ 144.372190] [<ffffffff810edbf0>] ? __hrtimer_init+0x90/0x90
[ 144.491899] [<ffffffff81821f79>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0
[ 144.646067] [<ffffffff81822083>] schedule_hrtimeout_range+0x13/0x20
[ 144.687266] [<ffffffff81254430>] ep_poll+0x2c0/0x3d0
[ 144.689511] [<ffffffff810aaca0>] ? wake_up_q+0x70/0x70
[ 144.785095] [<ffffffff81255718>] SyS_epoll_wait+0xb8/0xd0
[ 144.843879] [<ffffffff81822b72>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 144.859757] Code: 24 08 49 b8 41 50 fb 0a 00 04 b5 af 4d 89 fe 0f 1f 44 00 00 44 8b 9b 24 01 00 00 45 85 db 0f 85 a0 03 00 00 48 8b 83 c8 00 00 00 <48> 8b 80 10 09 00 00 48 2b 83 18 01 00 00 48 8b 93 a0 00 00 00
[ 144.982177] RIP [<ffffffff810b403d>] update_blocked_averages+0x8d/0x520
[ 145.024132] RSP <ffff8800dadf3c70>
[ 145.069952] CR2: 0000000000000910
[ 145.118873] ---[ end trace 5324671710cba237 ]---
```

I can reproduce this reliably using the following script (which relies on Docker). This starts a container, then adds another task into its namespace via `docker exec`, and then kills PID 1 in the container, which tears down the PID namespace and should kill both tasks. If I remove the `docker exec` line, there is no crash.

```
#!/bin/bash
set -exu

cid=$(docker run -it -d ubuntu sleep 1000)
docker exec "$cid" sleep 1000 &
sleep 1
docker kill -s KILL "$cid"

docker run -it --rm ubuntu echo "Alive 1?"
docker run -it --rm ubuntu echo "Alive 2?"
docker run -it --rm ubuntu echo "Alive 3?"
```

In the vast majority of cases, the crash will happen before "Alive 1?" is printed to the screen. In some cases, it'll happen at "Alive 2?".

I tried reproducing the issue on an AWS instance, with no success so far.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-17-generic 4.4.0-17.33
ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
Uname: Linux 4.4.0-17-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.4.0-17-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/pcmC0D1c', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Thu Apr 7 20:53:43 2016
HibernationDevice: RESUME=/dev/mapper/ubuntu--build--vg-swap_1
InstallationDate: Installed on 2016-02-03 (63 days ago)
InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: innotek GmbH VirtualBox
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-17-generic root=/dev/mapper/ubuntu--build--vg-root ro console=ttyS0,9600 console=tty0
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-17-generic N/A
 linux-backports-modules-4.4.0-17-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to xenial on 2016-04-06 (1 days ago)
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Revision history for this message
Thomas Orozco (torozco) wrote :
Revision history for this message
Thomas Orozco (torozco) wrote :

I should mention that I had the Virtualbox Kernel modules installed in the two latter oops, but that uninstalling them (which is why they are not in first oops) does not make a difference.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Orozco (torozco) wrote :

Possibly relevant: the issue does *not* occur if I run my VM with a single core.

Revision history for this message
Thomas Orozco (torozco) wrote :

Good news: upgrading to latest VirtualBox seems to resolve the issue!

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
penalvch (penalvch) wrote :

Thomas Orozco, to clarify, could you please advise which version of VirtualBox you were using when this problem happened, and which one you upgraded to that resolved it?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.