Swap pagefault is hanging paged processes with kernel BUG assert in include/linux/swapops.h

Bug #1690796 reported by Bas Zoetekouw on 2017-05-15
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned

Bug Description

My machine just crashed. Looking at the journalctl output (after a reboot, as the machine was unresponsive), I got this:

mei 15 12:53:23 regan kernel: kernel BUG at /build/linux-2NWldV/linux-4.10.0/include/linux/swapops.h:129!

I'll attach the full log.

I'm running Zesty with this kernel:
Linux regan 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
---
ApportVersion: 2.20.4-0ubuntu4.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bas 2203 F.... pulseaudio
CurrentDesktop: GNOME
DistroRelease: Ubuntu 17.04
HibernationDevice: RESUME=UUID=ca0f5952-e241-4ec9-80c0-f7c596dbad03
InstallationDate: Installed on 2016-12-22 (143 days ago)
InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
MachineType: Dell Inc. Latitude E7470
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-20-generic root=/dev/mapper/hostname-root ro nosplash acpi_backlight=vendor
ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-20-generic N/A
 linux-backports-modules-4.10.0-20-generic N/A
 linux-firmware 1.164.1
Tags: zesty
Uname: Linux 4.10.0-20-generic x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lp lpadmin lxd plugdev sambashare src sudo vboxusers wireshark
_MarkForUpload: False
dmi.bios.date: 11/09/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.11.3
dmi.board.name: 0T6HHJ
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.11.3:bd11/09/2016:svnDellInc.:pnLatitudeE7470:pvr:rvnDellInc.:rn0T6HHJ:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E7470
dmi.sys.vendor: Dell Inc.

Bas Zoetekouw (baszoetekouw) wrote :
tags: added: apport-collected zesty
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

Do you have a way to reproduce this bug?

If so, would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc1/

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete

Unfortunately, I can't really reproduce it. Journalctl shows the bug
having occurred twice since the beginnen of March, om kernel 4.10.0.

I'll install the 4.12 kernel and see the coming weeks if it triggers again.

Thanks!
Bas.

On 15-05-17 22:03, Joseph Salisbury wrote:
> Do you have a way to reproduce this bug?
>
> If so, would it be possible for you to test the latest upstream kernel?
> Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the
> latest v4.12 kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc1/
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => High
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>

I had the same happen to me, I suppose, except the process involved was Firefox. I filed the report with Bug #1692482. Marking it as duplicate.

I had a default swapfile as set up by the 17.04 installer. Suspecting that might be the cause I changed and made a swap partition instead. So far so good, haven't had this problem since.

@baszoetekouw, am I reading your UdevDB.txt correctly, you're using a swap partition in a LUKS container? Is that a LVM in-between? I don't use LVM.

Because if so, my suspicion is that extra layering would cause this. I had swapfile over ext4 over LUKS. You (if I'm reading correctly) have swap partition over LVM over LUKS.

Having moved to swap partition over LUKS directly solved my problem, it appears.

Bas Zoetekouw (baszoetekouw) wrote :

Yes, I have swap on LVM, of which the PVs are on a crypted fs.

So:
- /dev/nvme0n1p3 --> cryptsetup(luks) --> /dev/mapper/nvme0n1p3_crypt
- /dev/mapper/nvme0n1p3_crypt is the physical volume in my LVM setup
- swap is a logical volume in the LVM

So that would be quite a curious bug, caused by encrypted swap.

I don't think it's encrypted swap per se, as I also have it, I think it's additional layering. In my case:

* sw -> LUKS -> sda5 = OK
* swapfile -> ext4 (root fs) -> LUKS -> sda2 = problem

And from what I see in your case:

* sw -> LUKS -> LVM -> nvme0n1p3 = problem

Oh also another common thing we have is, both devices are SSDs. My LUKS containers are open with 'discard' option.

Pardon the comment spam, I made a mistake, your use case is

* sw -> LVM -> LUKS -> nvme

so both our failure cases have an extra layer between swap and LUKS

Returned to computer to find Firefox frozen, tried to kill Firefox but it turned zombie, after that system froze. Still answered ping but nothing else. Rebooted by "sysrq + b".
I have swap on an LVM lv on a sata SSD.
I don't run luks.

summary: - Kernel bug causes hang
+ Swap pagefault is hanging paged processes with kernel BUG assert in
+ include/linux/swapops.h
Colan Schwartz (colan) wrote :

Is this a duplicate of bug #1674838 ?

kralisec (kralisec) wrote :

Same for me, 4 crash since 17.04 upgrade but 1st I see with syslog below:

kernel BUG at /build/linux-lz1RHE/linux-4.10.0/include/linux/swapops.h:129!

and 8 lines with:
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [JS Helper:3994]

uname -a
Linux labtop 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Firefox become dark, sudo killall -9 firefox (and Web Content) has no effect
[firefox] <defunct> and [Web Content] <defunct>

CPU load high quickly (but really faster in the first 3 crash,

Like others: swap partition, ssd + luks but no LVM

I upgrade from 16.04/16.10, hardware Dell E5410, should I fill a separate bug report ?

Except that since I moved away from swapfile to a swap partition (atop LUKS), I haven't had the issue. I haven't checked so I can't say what the CPU load was, or whether there was activity with khugepaged as per bug #1674838.

Also had the NMI watchdog soft lockup after the swapops.h BUG assertion.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers