Summary: Kernel bug (unhandled paging request) on "udisksctl power-off"

Bug #1803929 reported by Matt C
96
This bug affects 18 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04

uname -a
Linux metabox 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Steps (1) to reproduce:
1) Plug in a USB3 hard drive.
2) Right click on launcher icon.
3) Select "Safely Remove".

After finding this: https://askubuntu.com/a/532691
...the following steps also reproduce the problem.

Steps (2) to reproduce:
1) Plugin in a USB3 hard drive.
2) udisksctl unmount -b /dev/sdXY
3) udisksctl power-off -b /dev/sdX

System completely locks and becomes unresponsive, such that even the Magic SysRq key combination does nothing.
Most of the time the fans on my laptop go crazy (so the CPU must get stuck in some loop), but once they did not and the system was just locked (without the fans going nuts).

I can't get a textual copy of the kernel bug log (because I have to do a hard reset to regain control of the system and the output doesn't get flushed to disk), but I did manage to get a photo of some of it (please refer to attachment).
---
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC1D0p: matt 2754 F...m pulseaudio
 /dev/snd/controlC1: matt 2754 F.... pulseaudio
 /dev/snd/controlC0: matt 2754 F.... pulseaudio
CurrentDesktop: Unity
DistroRelease: Ubuntu 16.04
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=f00235c0-ba8a-4403-91b1-64e9d87b76e0
InstallationDate: Installed on 2016-04-23 (940 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
MachineType: Notebook N150SD/N155SD
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-139-generic root=UUID=beb9b9ce-39a7-464a-978c-ac4fb82a5a81 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-139.165-generic 4.4.160
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-139-generic N/A
 linux-backports-modules-4.4.0-139-generic N/A
 linux-firmware 1.157.20
Tags: xenial
Uname: Linux 4.4.0-139-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo vboxusers wireshark
_MarkForUpload: True
dmi.bios.date: 03/31/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.03.05
dmi.board.asset.tag: Tag 12345
dmi.board.name: N150SD/N155SD
dmi.board.vendor: Notebook
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: Notebook
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.03.05:bd03/31/2015:svnNotebook:pnN150SD/N155SD:pvrNotApplicable:rvnNotebook:rnN150SD/N155SD:rvrNotApplicable:cvnNotebook:ct9:cvrN/A:
dmi.product.name: N150SD/N155SD
dmi.product.version: Not Applicable
dmi.sys.vendor: Notebook

Revision history for this message
Matt C (playsilicon) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1803929

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Revision history for this message
Matt C (playsilicon) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Matt C (playsilicon) wrote : CRDA.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : IwConfig.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : Lspci.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : Lsusb.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : ProcEnviron.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : ProcModules.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : PulseList.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : RfKill.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : UdevDb.txt

apport information

Revision history for this message
Matt C (playsilicon) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.20 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20-rc3

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Matt C (playsilicon) wrote :

Yes, it only started occurring within the last few weeks/month.

I can confirm that booting the previous (but still installed) kernel, fixes the problem.

$ uname -a
Linux metabox 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

That kernel works for me, so the issue must be in the "139" version.

I did not try the latest upstream kernel because I have NVIDIA drivers and Virtualbox installed and I'd rather not go through the hassle of uninstalling all custom modules, if I can avoid it.

Do you still need me to try the latest upstream?

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Steven "Kreuvf" Koenig (kreuvf) wrote :

As I ran into this today as well, I might add that I managed to see the very first line in the kernel.log:
> BUG: unable to handle kernel NULL pointer dereference at [very low address]

Going back to 4.4.0-138 solved the problem here as well.

Revision history for this message
Jan Kees van Amerongen (jankees) wrote :

Unfortunately 4.4.0-140 does not solve this problem.

Going back to 4.4.0-138 solved the problem here as well too

Revision history for this message
David Jao (djao) wrote :

I experienced the problem 100% of the time on 4.4.0-139. However, unlike Jan, I just tried 4.4.0-140 and the problem is gone. I will continue monitoring my system to see if it reappears.

Revision history for this message
Matt C (playsilicon) wrote :

4.4.0-140 does not solve the problem for me either (had to revert back to 4.4.0-138).

The first time I tried 4.4.0-140 the system locked up, but there was no kernel crash stacktrace and I was able to use the Magic SysRq key combination to avoid a hardware reboot.

The second time I tried 4.4.0-140 the system locked up, but I was not able to use Magic SysRq to avoid a hardware reboot.
Yet the kernel crash logging changed, producing about three different stacktraces over the span of about 30 seconds (before the logs froze).

Unfortunately the crash logs did not persist to disk.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible to do a kernel bisection between 4.4.0-138 and 4.4.0-139?

Revision history for this message
David Jao (djao) wrote :

OK, false alarm -- 4.4.0-140 is now crashing for me as well. As far as I know nothing has changed between now and yesterday. Same computer, same drives, everything. Back to 4.4.0-138 for me.

Revision history for this message
David Jao (djao) wrote :

Kai-Heng Feng: I did the bisection and the culprit is

[9021f2295af1af93416aea20c7b72390919f122d] xhci: Fix use-after-free in xhci_free_virt_device

This commit exists upstream, and the problem has already been fixed upstream:
https://www.spinics.net/lists/linux-usb/msg168861.html

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Thanks.

Upstream stable v4.4.165 has this commit, the fix will land in 16.04 kernel's later release.

Revision history for this message
Norbert (nrbrtx) wrote :

Went to this problem on 139 and 140 kernels.
Safely remove of USB3 drive freezes the system completely.
I was very lucky that I have saved all documents before the freeze.

The temporary solution is to use 138 ( 4.4.0-138-generic ) kernel.

Please fix this bug as soon as possible.

Revision history for this message
Norbert (nrbrtx) wrote :

Temporary fixed on my system with hacky way:

sudo rm /boot/*139*
sudo rm /boot/*140*
sudo update-grub
sudo reboot

But still waiting for official update.

Revision history for this message
David Jao (djao) wrote :

I built 4.4.0-141.167 with the one-line patch from https://www.spinics.net/lists/linux-usb/msg168861.html and it does indeed fix this problem.

Revision history for this message
Guilherme Moura Paredes (guilherme-paredes) wrote :

I have the same problem with 4.4.0-140-generic #166-Ubuntu SMP Wed Nov 14 20:09:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. This problem has already destroyed to of my USB external HDDs. Please fix this!

Revision history for this message
Norbert (nrbrtx) wrote :

Created Q&A on AskUbuntu ( https://askubuntu.com/q/1099414/66509 ) to help other users keep their data safe.

Revision history for this message
mintbug (mintbug) wrote :

Hi there, the problem persists in the same way in kernel 4.4.0-141 :-(

Revision history for this message
Matt C (playsilicon) wrote :

The same for me, 4.4.0-141 does not help.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

4.4.0-141 syncs to upstream stable 4.4.162. So not to 4.4.165 yet.

Revision history for this message
Andreas Bouché (a-bouche) wrote :

I was also able to reproduce this in kernels 4.4.0-139, -140 and -141. Kernel 4.4.0-138 worked fine.
On kernel 4.4.0-140, the freeze didn't occur instantly but after about 20 seconds, so I was able to capture a stacktrace.

Revision history for this message
Andreas Bouché (a-bouche) wrote :
Revision history for this message
Kenneth (notgiven) wrote :
Revision history for this message
Simon Frettlöh (ubuntu-sf) wrote :

As I'm using 16.04, and had 4.4.0-138 unfortunately already removed. I installed https://packages.ubuntu.com/xenial-updates/linux-image-4.4.0-138-generic with it's depedencies, as I didn't want to use a PPA cause that sounds to fishy and/or unofficial. This fixed the issue. However, this fix cannot be expected to be performed by non-technical users.

Revision history for this message
David Jao (djao) wrote :

If you're looking for a workaround, the current default Ubuntu 16.04.5 kernel doesn't have this bug, and doesn't require PPAs.

apt-get install --install-recommends linux-generic-hwe-16.04

Revision history for this message
j shupert (jshupert) wrote :

i tried
apt-get install --install-recommends linux-generic-hwe-16.04
actually i did : sudo apt-get install --install-recommends linux-generic-hwe-16.04

and it did not work for me

terminal responded with

unable to locate pkg ..couldn't find any package by regex

gosh .... really wishing for a good solution.

Revision history for this message
Andreas Bouché (a-bouche) wrote :

Seems to be fixed in 4.4.0-142-generic.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810958
"xhci: Fix USB3 NULL pointer dereference at logical disconnect."

Revision history for this message
Matt C (playsilicon) wrote :

Tested on my machine and yes it appears to be fixed.

Thank you to those who made it happen and thank you Andreas for letting us know.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.