Kernel oops and system lock up when invoking wg-quick up

Bug #1854225 reported by Neil McPhail
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

On 2 occasions over the past week I have had full system crashes after running "wg-quick up wg0". On the terminal, the command does not complete (i.e. it does not return to the prompt), the fans on my laptop start whirring and the system gradually becomes unresponsive before my desktop crashes and the system becomes completely unresponsive. On both occasions I opened another window to run "top" to see what process was consuming resources but "top" never actually runs. On the second occasion I managed to run "dmesg" before the system crashed completely and saw multiple lines of text about a kernel oops and red-highlighted text about a null-pointer dereference.

I could reboot with "Alt-PrtScr_REISUB". On reboot I was confronted with the "system problem detected" dialog, but selecting the "report" option didn't seem to do anything. I have 2 reports in /var/crash from the last oops which I will attach to this report.

I cannot reproduce this on demand. Most of the time, wg-quick performs normally. On both occasions the laptop had recently woken from suspend, but invoking "wg-quick" after waking from suspend doesn't trigger it on demand. On the first occasion I was running with stock boot options. On the second, I was running with "mitigations=off" as an experiment.

$ lsb_release -rd
Description: Ubuntu 19.10
Release: 19.10

$ apt policy wireguard
wireguard:
  Installed: 0.0.20190913-1ubuntu1
  Candidate: 0.0.20190913-1ubuntu1
  Version table:
 *** 0.0.20190913-1ubuntu1 500
        500 http://gb.archive.ubuntu.com/ubuntu eoan/universe amd64 Packages
        500 http://gb.archive.ubuntu.com/ubuntu eoan/universe i386 Packages
        100 /var/lib/dpkg/status

$ apt policy wireguard-tools
wireguard-tools:
  Installed: 0.0.20190913-1ubuntu1
  Candidate: 0.0.20190913-1ubuntu1
  Version table:
 *** 0.0.20190913-1ubuntu1 500
        500 http://gb.archive.ubuntu.com/ubuntu eoan/universe amd64 Packages
        100 /var/lib/dpkg/status

$ uname -a
Linux padbeast 5.3.0-23-generic #25-Ubuntu SMP Tue Nov 12 09:22:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/wireguard/wg0.conf
[Interface]
PrivateKey = MyPrivateKey=
Address = 10.66.66.5/24,fd42:42:42::5/64
DNS = 8.8.8.8,1.1.1.1

[Peer]
PublicKey = MyPublicKey=
Endpoint = my.domain.com:1195
AllowedIPs = 0.0.0.0/0,::/0

I'm reporting this as a security bug due to the "Null pointer dereference" in the kernel, but don't know if that is relevant. I don't know how to access or send the old dmesg information, so please let me know how to access this or how to collect it if the crash recurs.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: wireguard 0.0.20190913-1ubuntu1
ProcVersionSignature: Ubuntu 5.3.0-23.25-generic 5.3.7
Uname: Linux 5.3.0-23-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
CurrentDesktop: MATE
Date: Wed Nov 27 20:44:24 2019
InstallationDate: Installed on 2019-10-11 (47 days ago)
InstallationMedia: Ubuntu-MATE 19.10 "Eoan Ermine" - Beta amd64 (20190926.2)
PackageArchitecture: all
SourcePackage: wireguard
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: neil 3007 F.... pulseaudio
 /dev/snd/pcmC0D0p: neil 3007 F...m pulseaudio
CurrentDesktop: MATE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2019-10-11 (118 days ago)
InstallationMedia: Ubuntu-MATE 19.10 "Eoan Ermine" - Beta amd64 (20190926.2)
MachineType: LENOVO 2325A39
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-12-generic root=/dev/mapper/vgubuntu--mate-root ro quiet splash mitigations=off vt.handoff=7
ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-12-generic N/A
 linux-backports-modules-5.4.0-12-generic N/A
 linux-firmware 1.186
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: focal
Uname: Linux 5.4.0-12-generic x86_64
UpgradeStatus: Upgraded to focal on 2020-02-07 (0 days ago)
UserGroups: adm audio cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 06/19/2018
dmi.bios.vendor: LENOVO
dmi.bios.version: G2ETB3WW (2.73 )
dmi.board.asset.tag: Not Available
dmi.board.name: 2325A39
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrG2ETB3WW(2.73):bd06/19/2018:svnLENOVO:pn2325A39:pvrThinkPadX230:rvnLENOVO:rn2325A39:rvrNotDefined:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad X230
dmi.product.name: 2325A39
dmi.product.sku: LENOVO_MT_2325
dmi.product.version: ThinkPad X230
dmi.sys.vendor: LENOVO

Revision history for this message
Neil McPhail (njmcphail) wrote :
Revision history for this message
Neil McPhail (njmcphail) wrote :
Revision history for this message
Eduardo Barretto (ebarretto) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better.

Could you please confirm if that issue only happens with IPv6 or also with IPv4?

Some reports were done regarding IPv6 issue on kernel 5.3, that affected wireguard and the following patch seems to fix it:
https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26

@Tyler, could you please verify that the reported issue and above fix are related?

Thanks!

Revision history for this message
Neil McPhail (njmcphail) wrote :

Thanks, @ebarretto, for the reply.

On both occasions I experienced the lockup i was connected to an ISP which only gives me an IPv4 address. Bringing up the VPN then lets me see the IPv6 world via my VPS. Is that the information you're looking for?

Cheers

NMP

Revision history for this message
Tyler Hicks (tyhicks) wrote :

Hi Neil - your .crash files are missing the Call Trace sections so we can't see how the native_queued_spin_lock_slowpath() function is being reached. Can you paste the relevant call traces from the /var/log/kern.log log file?

Revision history for this message
Tyler Hicks (tyhicks) wrote : Re: [Bug 1854225] Re: Kernel oops and system lock up when invoking wg-quick up

On 2019-11-29 18:59:07, Eduardo dos Santos Barretto wrote:
> Some reports were done regarding IPv6 issue on kernel 5.3, that
> affected wireguard and the following patch seems to fix it:
> https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26
>
> @Tyler, could you please verify that the reported issue and above fix
> are related?

I don't think that fix is related to this bug report because we've
already got that fix applied. It was released in our 5.3.0-19.20 kernel
and was tracked in bug 1847478.

Revision history for this message
Neil McPhail (njmcphail) wrote :

Hi Tyler.

I've attached my kern.log for that day. Looks as if the crash is at approx 2320h?

Sorry for the delay in replying - have been travelling. Give me a shout with anything else you need.

Many thanks

NMP

Revision history for this message
Tyler Hicks (tyhicks) wrote :

Thanks, Neil. The kern.log file is missing all of the newlines for some reason. Can you try attaching it again while preserving the newlines? Thanks!

Revision history for this message
Neil McPhail (njmcphail) wrote :

I've changed the "attachment type" to "text", which seems to fix the problem. Cheers.

Revision history for this message
Neil McPhail (njmcphail) wrote :

Update - I haven't had any further occurrences of this bug since filing the above.

Should we remove the private/security flag for better visibility?

Revision history for this message
Tyler Hicks (tyhicks) wrote :

Hi Neil - I think that's a good idea since we haven't seen any progress on this private bug report. I'm not sure of the cause here but I think that we would have received a lot more reports if this was a widespread issue when using wg-quick (as we have in the past).

information type: Private Security → Public
Revision history for this message
Jason A. Donenfeld (zx2c4) wrote :
Download full text (52.7 KiB)

Thanks for the bug report. That kern.log is useful. The relevant part is reproduced below in this comment. Looks like wg-quick(8) invokes sysctl(8), which then uses /proc/sys/, and somehow invokes a null pointer dereference while holding a spinlock, leading to that lock being hit by other cores, eventually locking up your system.

Nov 26 23:20:01 padbeast kernel: [16283.030060] BUG: kernel NULL pointer dereference, address: 0000000000000011
Nov 26 23:20:01 padbeast kernel: [16283.030064] #PF: supervisor read access in kernel mode
Nov 26 23:20:01 padbeast kernel: [16283.030065] #PF: error_code(0x0000) - not-present page
Nov 26 23:20:01 padbeast kernel: [16283.030067] PGD 0 P4D 0
Nov 26 23:20:01 padbeast kernel: [16283.030070] Oops: 0000 [#1] SMP NOPTI
Nov 26 23:20:01 padbeast kernel: [16283.030073] CPU: 1 PID: 6983 Comm: sysctl Tainted: G OE 5.3.0-23-generic #25-Ubuntu
Nov 26 23:20:01 padbeast kernel: [16283.030074] Hardware name: LENOVO 2325A39/2325A39, BIOS G2ETB3WW (2.73 ) 06/19/2018
Nov 26 23:20:01 padbeast kernel: [16283.030080] RIP: 0010:rb_first+0xb/0x20
Nov 26 23:20:01 padbeast kernel: [16283.030082] Code: fe ff ff 4c 89 e9 4c 89 f2 4d 89 ee 49 89 c5 e9 81 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 74 10 49 89 c0 <48> 8b 40 10 48 85 c0 75 f4 4c 89 c0 c3 45 31 c0 eb f7 0f 1f 00 48
Nov 26 23:20:01 padbeast kernel: [16283.030083] RSP: 0018:ffffb662c21efe18 EFLAGS: 00010202
Nov 26 23:20:01 padbeast kernel: [16283.030085] RAX: 0000000000000001 RBX: ffffb662c21efec0 RCX: 0000000000000000
Nov 26 23:20:01 padbeast kernel: [16283.030087] RDX: 0000000000000001 RSI: ffffffffb71e1b73 RDI: ffff9e25445eea50
Nov 26 23:20:01 padbeast kernel: [16283.030088] RBP: ffffb662c21efe70 R08: 0000000000000001 R09: 0000000000000004
Nov 26 23:20:01 padbeast kernel: [16283.030090] R10: ffffffffb71e1b71 R11: 0000000000000000 R12: ffff9e24f782ead8
Nov 26 23:20:01 padbeast kernel: [16283.030091] R13: ffff9e24f782ea80 R14: ffff9e24f75cb400 R15: ffffffffb60e2ba0
Nov 26 23:20:01 padbeast kernel: [16283.030093] FS: 00007f669f9d6580(0000) GS:ffff9e2556040000(0000) knlGS:0000000000000000
Nov 26 23:20:01 padbeast kernel: [16283.030095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 23:20:01 padbeast kernel: [16283.030096] CR2: 0000000000000011 CR3: 0000000147bb8006 CR4: 00000000001606e0
Nov 26 23:20:01 padbeast kernel: [16283.030098] Call Trace:
Nov 26 23:20:01 padbeast kernel: [16283.030104] ? proc_sys_readdir+0x11a/0x2c0
Nov 26 23:20:01 padbeast kernel: [16283.030109] iterate_dir+0x9a/0x1b0
Nov 26 23:20:01 padbeast kernel: [16283.030112] ksys_getdents64+0x9c/0x130
Nov 26 23:20:01 padbeast kernel: [16283.030114] ? iterate_dir+0x1b0/0x1b0
Nov 26 23:20:01 padbeast kernel: [16283.030117] __x64_sys_getdents64+0x1a/0x20
Nov 26 23:20:01 padbeast kernel: [16283.030120] do_syscall_64+0x5a/0x130
Nov 26 23:20:01 padbeast kernel: [16283.030124] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 26 23:20:01 padbeast kernel: [16283.030126] RIP: 0033:0x7f669f8c507b
Nov 26 23:20:01 padbeast kernel: [16283.030129] Code: 0f 1e fa 48 8b 47 20 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 81 fa ff ff ff 7f b8 ff ff ff 7f 48 0f 47 d0 b8 d9 00 00 00 0f 05 ...

Revision history for this message
Jason A. Donenfeld (zx2c4) wrote :

Doesn't look like a WireGuard bug.

affects: wireguard (Ubuntu) → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1854225

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Neil McPhail (njmcphail) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
Neil McPhail (njmcphail) wrote : CRDA.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : IwConfig.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : Lspci.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : Lsusb.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : Lsusb-t.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : Lsusb-v.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : ProcEnviron.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : ProcModules.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : PulseList.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : UdevDb.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote : WifiSyslog.txt

apport information

Revision history for this message
Neil McPhail (njmcphail) wrote :

Note that this bug is quite old now, and I upgraded this machine to 20.04 last night. I have no idea if the information collected by apport has any relevance any more.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.