kernel BUG at /build/linux-7LGLH_/linux-4.10.0/include/linux/swapops.h:129

Bug #1674838 reported by Mathieu Marquer on 2017-03-21
912
This bug affects 199 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Undecided
Unassigned
linux (Ubuntu)
High
Joseph Salisbury
Zesty
High
Joseph Salisbury
linux-hwe-edge (Ubuntu)
Undecided
Unassigned
Zesty
Undecided
Unassigned

Bug Description

Randomly, khugepaged process will take 100% CPU, and I can only restart the computer to recover it.

Relevant dmesg attached (dmesg_crash.txt).

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-14-generic 4.10.0-14.16
ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
Uname: Linux 4.10.0-14-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mathieu 2221 F.... pulseaudio
 /dev/snd/pcmC1D0p: mathieu 2221 F...m pulseaudio
 /dev/snd/controlC1: mathieu 2221 F.... pulseaudio
CurrentDesktop: Unity:Unity7
Date: Tue Mar 21 23:03:23 2017
HibernationDevice: RESUME=UUID=67e78e4c-94ee-447c-ae60-4387dae296dd
InstallationDate: Installed on 2016-01-31 (415 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160131)
MachineType: LENOVO 20344
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic root=UUID=b982929e-11d0-4984-885c-6c9daba24836 ro noprompt quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-14-generic N/A
 linux-backports-modules-4.10.0-14-generic N/A
 linux-firmware 1.164
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-03-02 (19 days ago)
dmi.bios.date: 10/16/2014
dmi.bios.vendor: LENOVO
dmi.bios.version: 96CN29WW(V1.15)
dmi.board.asset.tag: 31900058WIN
dmi.board.name: INVALID
dmi.board.vendor: LENOVO
dmi.board.version: 31900058WIN
dmi.chassis.asset.tag: 31900058WIN
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo Yoga 2 13
dmi.modalias: dmi:bvnLENOVO:bvr96CN29WW(V1.15):bd10/16/2014:svnLENOVO:pn20344:pvrLenovoYoga213:rvnLENOVO:rnINVALID:rvr31900058WIN:cvnLENOVO:ct10:cvrLenovoYoga213:
dmi.product.name: 20344
dmi.product.version: Lenovo Yoga 2 13
dmi.sys.vendor: LENOVO

CVE References

Mathieu Marquer (slasher-fun) wrote :
description: updated
tags: added: kernel-bug
tags: added: regression-release

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc3

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Mathieu Marquer (slasher-fun) wrote :

Hi,

I don't remember encountering this bug with Linux 4.8, happens about every 1-3 hours with Linux 4.10 (although I couldn't figure out a way to reproduce it).

I'll try with Linux 4.11 RC3 and tell you how it goes.

Mathieu Marquer (slasher-fun) wrote :

So I *thnk* it's fixed in 4.11 RC3, although I'm not fully sure because I was encountering bug https://bugs.freedesktop.org/show_bug.cgi?id=100181 which made display crash about every 30 minutes, but after a few hours testing I didn't encounter this kernel bug, while it appeared after ~45 minutes when back on 4.10.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
JockeTF (jocketf) wrote :

This started happening for me after upgrading to Zesty.

Mathieu Marquer (slasher-fun) wrote :

Update from me, the patch for my display crash has been included in 4.11 RC5, and I'm not encountering the kernel bug anymore there, so it's 99% definitely fixed in 4.11 branch.

tags: added: kernel-da-key needs-reverse-bisect
kiney (jannik-winkel) wrote :

This problem made my system mostly unuseable since upgreading to zesty (4.10.0-19). One crash every 1-4 Hours.
Some workloads tend to trigger it faster (firefox + youtube) but the crash is unavoidable.
Desktop usage becomes completely impossible.
Normally i can sill ssh in, but only certain things work:
- top works fine, I always see one kernel thread hogging cpu. Sometimes additionally a userspace process (mostly firefox). htop hangs on exit.
- kill -9 on cpu hogging userspace process does not work - this seems weird
- trying to reboot gracefully hangs. SysRq works.

I just switched to 4.11-rc7 mainline, but its too soon to make any conclusions. I will report tomorrow.

Looking through the clones of this bug this seems to happen with quite different hardware.
My affected system is AMD x370 chipset with RyZen 7 1700X cpu.
The system was perfectly stable with yakkety (kernel 4.8.0-??)

Bryan Quigley (bryanquigley) wrote :

I thought it was related to my system being a brand new Ryzen with ZRam and 32GB of memory (no real swap) but apparently not.

A BIOS update bricked that motherboard so I'm back on my older Phenom(tm) II X4 945 with 12 GB of RAM (now no ZRAM). Just got the issue again. Now, I have *no* swap enabled at all and still got it.

kiney (jannik-winkel) wrote :

I also have no swap.

JockeTF (jocketf) wrote :

I'm on a laptop with Intel Ivy Bridge.

I had no swap enabled. I haven't experienced this issue since I created a small 1GB swap file. It may be too soon to tell for sure if it's related though.

Christian Sarrasin (sxc731) wrote :

My laptop (Kaby Lake) has 16 GB swap configured and I have encountered the issue 5 times so far; see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682184/comments/27 for exact details, including which kernels were concerned.

Was on multiple i7 2600 with 2 gig swap. Took longer to occur, but still
occurred.

On Apr 18, 2017 7:31 AM, "JockeTF" <email address hidden> wrote:

> I'm on a laptop with Intel Ivy Bridge.
>
> I had no swap enabled. I haven't experienced this issue since I created
> a small 1GB swap file. It may be too soon to tell for sure if it's
> related though.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1677611).
> https://bugs.launchpad.net/bugs/1674838
>
> Title:
> kernel BUG at /build/linux-
> 7LGLH_/linux-4.10.0/include/linux/swapops.h:129
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Randomly, khugepaged process will take 100% CPU, and I can only
> restart the computer to recover it.
>
> Relevant dmesg attached (dmesg_crash.txt).
>
> ProblemType: Bug
> DistroRelease: Ubuntu 17.04
> Package: linux-image-4.10.0-14-generic 4.10.0-14.16
> ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
> Uname: Linux 4.10.0-14-generic x86_64
> ApportVersion: 2.20.4-0ubuntu2
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: mathieu 2221 F.... pulseaudio
> /dev/snd/pcmC1D0p: mathieu 2221 F...m pulseaudio
> /dev/snd/controlC1: mathieu 2221 F.... pulseaudio
> CurrentDesktop: Unity:Unity7
> Date: Tue Mar 21 23:03:23 2017
> HibernationDevice: RESUME=UUID=67e78e4c-94ee-447c-ae60-4387dae296dd
> InstallationDate: Installed on 2016-01-31 (415 days ago)
> InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160131)
> MachineType: LENOVO 20344
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic
> root=UUID=b982929e-11d0-4984-885c-6c9daba24836 ro noprompt quiet splash
> vt.handoff=7
> RelatedPackageVersions:
> linux-restricted-modules-4.10.0-14-generic N/A
> linux-backports-modules-4.10.0-14-generic N/A
> linux-firmware 1.164
> SourcePackage: linux
> UpgradeStatus: Upgraded to zesty on 2017-03-02 (19 days ago)
> dmi.bios.date: 10/16/2014
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 96CN29WW(V1.15)
> dmi.board.asset.tag: 31900058WIN
> dmi.board.name: INVALID
> dmi.board.vendor: LENOVO
> dmi.board.version: 31900058WIN
> dmi.chassis.asset.tag: 31900058WIN
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Lenovo Yoga 2 13
> dmi.modalias: dmi:bvnLENOVO:bvr96CN29WW(V1.15):bd10/16/2014:svnLENOVO:
> pn20344:pvrLenovoYoga213:rvnLENOVO:rnINVALID:
> rvr31900058WIN:cvnLENOVO:ct10:cvrLenovoYoga213:
> dmi.product.name: 20344
> dmi.product.version: Lenovo Yoga 2 13
> dmi.sys.vendor: LENOVO
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1674838/+subscriptions
>

Andrea Bernabei (faenil) wrote :

same bug here, after upgrading to latest Zesty packages as of yesterday. (kernel 4.10.0-19-generic)

I use firefox-trunk, the nightly build.
At one point firefox-trunk goes 100% cpu and it can't even be killed.

After a couple of minutes, the whole system freezes.

Drascus (enchantedvisionsband) wrote :

I am also having this issue. Ever time it occurs my whole system locks up and I have to hard reset to get things working again.

kiney (jannik-winkel) wrote :

ok. With 4.11-rc7 mainline _this_ problem seems to be fixed.
But I had another (probably) unrelated crash/reboot with no useful traces in the logs.

Oliver Egginger (lau6chpad) wrote :

Hi,

come from:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682427

cause Launchpad told me that 1682427 is a duplicate of 1674838.

I've see the same behavior with Thunderbird. See Bug Description of 1682427. But I think that is coincidence. The problem seems to be more general. I have no I idea at the moment but can give you the dmesg output of my system. See the attached file.

I have been using Ubuntu for a year on this system. First with 16.4, then 16.10 and since some days 17.4.

Before 17.4. I never have seen such a problem. The system was stable. It's a skylake system with a 6700K CPU. I had updated my board to BIOS version 2003 half a year ago. But as I said, I could not observe the error before Zesty.

I'm curious now what's going on here.

Regards
Oliver

Dennis Sheil (dennis-sheil) wrote :

I upgraded from 16.10 to 17.04 three days ago. I have been hit with this three times in three days. I am using my desktop, and then everything freezes.

This last time I had a little more freedom. I was using firefox when it became unresponsive. I opened up a terminal and ran "ps axu" and it hung halfway through. I did a dmesg and saw "kernel: [52312.170678] kernel BUG at /build/linux-Fk60NP/linux-4.10.0/include/linux/swapops.h:129!"

Then I tried to close Firefox by hitting the close button. It popped up a force quit button which I hit. This froze my desktop GUI, even the cursor.

Joseph Salisbury (jsalisbury) wrote :

Can you see if this bug also exists in the latest upstream stable 4.10 kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.11/

kiney (jannik-winkel) wrote :

That was already tested here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682184/comments/21

 -> seems to be fixed/not present in upstream 4.10

Jeffery Painter (jeff-painter) wrote :

I can report that mainline seems fine. I have not tried 4.10.11 but stopped at 4.10.8 as it is working well for me the past couple days.

painter@merlin:~$ uname -a
Linux merlin 4.10.8-041008-generic #201703310531 SMP Fri Mar 31 09:33:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

painter@merlin:~$ uptime
 17:41:20 up 11:09, 1 user, load average: 0.20, 0.06, 0.02

No more crashes!

Chris Hermansen (c-hermansen) wrote :

Joseph, kiney;

I was having the problem yesterday with 4.10.0-19-generic whose vmlinuz is dated 8 April.

I installed 4.11.0-rc7 and ran it for awhile today with no problems, but I don't have a good feeling for how long is necessary to say "I think the problem is solved".

I have now installed 4.10.11 and am running it. We'll see...

Oliver Egginger (lau6chpad) wrote :

I also have the problem with 4.10.0-19-generic.

This is the actual kernel in Zesty.

Joseph Salisbury (jsalisbury) wrote :

Hmm, it sounds like this bug is only happening in Ubuntu kernels and not any of the upstream kernels. That indicates this is due to a SAUCE patch. We next need to identify the last Ubuntu kernel version that did not have the bug and the first that did.

Can those affected test the following early Zesty kernel:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12001523

Note with this test kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Changed in linux (Ubuntu):
importance: Medium → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
tags: added: performing-bisect
removed: needs-reverse-bisect
Mitchell Tasman (tasman) wrote :

I am also experiencing the problem with 4.10.0-19-generic.

Chris Hermansen (c-hermansen) wrote :

Joseph, for us rookies out here, can you confirm that these are the only files we need to install with the 4.10.0-8 kernel you would like tested?

linux-headers-4.10.0-8_4.10.0-8.10_all.deb
linux-headers-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Thanks in advance!

Joseph Salisbury (jsalisbury) wrote :

@Chris Hermansen, you only need to install the following two:
linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Jeffery Painter (jeff-painter) wrote :

Trying to test as recommended and installed linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Note to others, if you are using any proprietary drivers, you should also download and install both:

linux-headers-4.10.0-8_4.10.0-8.10_all.deb
linux-headers-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Required to continue using VirtualBox but will need to run /sbin/vboxconfig after installing.

I will try this one out for today and post back my results this afternoon.

Chris Hermansen (c-hermansen) wrote :

Joseph, I'm running 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 now. I'll keep you posted.

BTW no problems yesterday with 4.10.11. My wife has been using the computer for a photo project on a web-based photo book service and that is what brought about this problem in the first place.

Christian Sarrasin (sxc731) wrote :

Hi Joseph,

As previously reported:

4.10.0-15: affected
4.10.0-14: issue not experienced in over a week

Obviously "not experienced" doesn't mean the bug isn't present. All I can say is that it was first experienced with 4.10.0-15.

Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury my computer, running 4.10.0-8, hung just now (display frozen etc). I was running "stress". This is similar to what happened previously with respect to this bug, but there seems to be nothing but a bunch of nulls in syslog this time. Not sure how to determine whether the same bug was actividated or not... any thoughts?

Thanks in advance.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Chris. Lets see if other folks hit this bug running 4.10.0-8. It might be that there are multiple bugs being hit here.

Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury,

Running the same test on 4.10.11-041011-generic did not wedge the system.

This was the test:

for t in 1 2 3 4 5 6 7 8 9 10; do echo iter $t; stress -c 2 -m 2 -t 60; dmesg; done

So I'm leaving this running for awhile and will report back.

Jeffery Painter (jeff-painter) wrote :

I've been working pretty heavily throughout the day (Eclipse, Chrome, Thunderbird, MySQL, etc) with the 4.10.0-8 and haven't hit the bug. I will run a couple more days on this version and see if it pops up.

painter@merlin:~$ uname -a
Linux merlin 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

painter@merlin:~$ uptime
 16:48:33 up 6:59, 1 user, load average: 0.36, 0.52, 0.64

Chris Hermansen (c-hermansen) wrote :

Still no problems with 4.10.11-041011.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing Chris. So this bug really only seems to be happening with Ubuntu kernels and not Upstream ones. We should test some earlier Zesty kernels, so we can get a last good version and first bad version. That will allow us to bisect. Can you next test the following last 4.9 based Zesty kernel:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/11948001

Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury, installed the 4.9.0-16 recommended and running now (continuing to use stress). More later...

Colin Ian King (colin-king) wrote :

I hit this same issue today and it seems like a hugepage scanning lockup to me.

Thomas M Steenholdt (tmus) wrote :

There must be a better way to do this...

All of these issues seem to arise from a BUG event in swapops.h:129. That particular spot is a section that's only active, when the kernel was built with CONFIG_MIGRATION=y. So first step is probably to verify that CONFIG_MIGRATION is even enabled for the mainline kernel (the configs are not the same, I'm told). So for all we know, the bug could still be upstream.
If somebody running the mainline kernel could post the output of the following command, that would be useful:

cat /boot/config-$(uname -r) |grep CONFIG_MIGRATION

If CONFIG_MIGRATION is enabled on mainline (CONFIG_MIGRATION=y in the output above), next step should be to check if some of the Ubuntu modifications touch in the source in any relevant places. The BUG event in swapops.h:129 seems to be hit if migration_entry_to_page() is called with an unlocked page. Grepping through the source, this function is only called from a handful of places, so it should be possible cross-reference with the Ubuntu modifications.

Perhaps this will bring us closer to the problem a bit faster?

Jeffery Painter (jeff-painter) wrote :

I have been running this kernel a 3-4 days now without any problems.

root@merlin:~# uname -a
Linux merlin 4.10.8-041008-generic #201703310531 SMP Fri Mar 31 09:33:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@merlin:~# cat /boot/config-$(uname -r) |grep CONFIG_MIGRATION
CONFIG_MIGRATION=y

Thanks!

Changed in linux-hwe-edge (Ubuntu):
status: New → In Progress
Changed in linux-hwe-edge (Ubuntu Zesty):
status: New → In Progress
Seth Forshee (sforshee) on 2017-05-13
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-hwe-edge (Ubuntu Zesty):
status: In Progress → Fix Committed
161 comments hidden view all 241 comments
Dan Streetman (ddstreet) wrote :

> Question: Given that this is quite a critical bug, is there any reason this
> hasn't been put into the main archives sooner?

I'm not the right person to ask that, but you can see Seth's comment 139 above.

"This is our normal SRU cycle. Out-of-cycle updates are for the most part
reserved for critical security issues."

Nazar Mokrynskyi (nazar-pc) wrote :

I'm wondering how did it reach zesty-proposed earlier that artful-proposed. On Artful I still do not see an update.

Seth Forshee (sforshee) wrote :

> I'm wondering how did it reach zesty-proposed earlier that artful-
> proposed. On Artful I still do not see an update.

In artful the kernels are just copied forward from zesty once they reach
-updates. Soon artful will switch to 4.11, at which point its kernels
will start going into proposed. You can manually download the deb
package files from zesty-proposed and install them in artful.

This may not be a security issue, but it certainly is critical to normal users who don't have access to information here, let alone temporary patches or proposed changes.

Maybe ubuntu only want techies using their systems, and would prefer normal users went elsewhere? I'll certainly have to start looking for an alternative, more stable system, for my partner's desktop.

I now have 23 hours of uptime thanks to 4.10.0-22 #24 from zesty proposed. That's a zesty record for me :)

thanks.

geez (geez) wrote :

@ddstreet, sforshee: Thanks for the replies. Given that this also affects LTS installations that use the HWE stack, shouldn't this have as high a priority as critical security issues? I'd consider this a critical usability issue; there are tons of people running LTS, particularly on servers, for exactly the reason that it is "always" stable. Needless to say this is a bit of an outlier considering Ubuntu's overall good track record.

I understand that testing 4.10.0-22 takes time; one could conceivably apply a hotfix to 4.10.0-21?

As far as my own testing is concerned, I've been using my personal laptop for work today (instead of my 16.04 work laptop) with the kernel from zesty-proposed, and no issues so far.

Daniel Holbert (dholbert) wrote :

I installed the kernel with the fix from zesty-proposed (4.10.0-22-generic #24-Ubuntu), but after ~4 hours of uptime on that kernel, I hit what felt like the same system lockup again. (Or perhaps a new version of this lockup that the patch introduces / leaves unfixed?)

Here's the kern.log from that lockup. Hope this is helpful; otherwise, sorry for adding noise.

Seth Forshee (sforshee) wrote :

I apprciate that this bug has a significant impact for many. However we
have a QA process to test kernels before they get pushed out to
everyone, and it is always risky to skip this testing which is why we
rarely do it. In the case of the fix for this bug the changes required
are fairly substantial and should go through testing.

The kernel in zesty-proposed is exactly the same kernel that will be
released to -updates in a couple of weeks (assuming it passes QA, etc.)
so please do not hesitate to run this kernel. There is also a signed
kernel available in -proposed.

Seth Forshee (sforshee) wrote :

> I installed the kernel with the fix from zesty-proposed
> (4.10.0-22-generic #24-Ubuntu), but after ~4 hours of uptime on that
> kernel, I hit what felt like the same system lockup again. (Or perhaps
> a new version of this lockup that the patch introduces / leaves
> unfixed?)
>
> Here's the kern.log from that lockup. Hope this is helpful; otherwise,
> sorry for adding noise.
>
> ** Attachment added: "kern.log snippet for lockup on 4.10.0-22-generic #24-Ubuntu"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838/+attachment/4882848/+files/kern-log-snippet.txt

That is a differnt issue, please file a new bug. Thanks!

Rocko (rockorequin) wrote :

@dholbert: your lockup looks like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680904 (and the upstream bug is https://bugs.freedesktop.org/show_bug.cgi?id=100516). It's a bug in the Intel graphics drivers that unfortunately is present in both kernels 4.10 and 4.11, but should be fixed in 4.12.

Daniel Holbert (dholbert) wrote :

(Thanks @Rocko - I'd filed https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1693357 but I've now marked that as a duplicate of the bug you mentioned.)

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
flux242 (flux242) wrote :

> If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

never go full retard, dude

Christian Nassau (nassau) wrote :

I've been running "-proposed" for quite a while now without the kernel oops, hence changed the tag to "verification-done-zesty".

tags: added: verification-done-zesty
removed: verification-needed-zesty
flux242 (flux242) wrote :

for those of you like me who were reluctant to add testing into the sources list (because of course it could break something else) and couldn't find the deb packages because they didn't provide any direct link, here is a little script I had to write that downloads and installs the kernel 4.10.0-22.24. Adjust the architecture and the download directory

https://gist.github.com/635e1dad33c335fe9592bb1b7c28cd3c

Kwang Moo Yi (kwang-m-yi) wrote :

@Tim - To be fair, Ubuntu LTS without the hwe-edge is perfectly stable. However, it is true that all this process was a bit disappointing, and I ended up moving to debian sid, which seems to be a quite stable experience so far.

Frode Nordahl (fnordahl) wrote :

As much as I want the -22 to fix all problems, it does not.

However, my crashes does currently not leave a trace in the logs. Sticking to -13 keeps my workhorse running riddled of any crash or freeze problems.

The simplest way to describe what is going on is that all I/O gets stuck and that my displays get a interesting change/distortion to the background image. (I can provide screen-shots upon request)

Any advice on how to best collect useful data from crashed/frozen machine is welcome.

The issue is highly reproducible on my system under heavy load.

Example: Deploy some big software with your favorite deployment tool on LXD containers and libvirt virtual machines through MAAS, at the same time do some backups with duplicity and playback of (YouTube) video in your favorite browser (Firefox/Chrome).

I tried to install 4.10.0-22 on the one system we have that was regularly crashing, but it gets installation errors (being unable to access files in /usr/src/linux-headers-4.10.0-22), so I cannot verify if it would fix the run-time crashes. I was unable to work round this in a short enough time to make it worthwhile.

Given that this system is stable on 4.8, I'll have to wait for the normal release process rather than check the proposed version.

Axy (joshi-a) wrote :

Been affecting me as well -- will try the alternate kernels.

Kernel that's crashing:
axyjo@frost:~$ uname -a
Linux frost 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I'm getting a freeze with the same symptoms described in this bug report, but with this message

May 31 02:37:27 walter kernel: [410240.819651] NMI watchdog:
BUG: soft lockup - CPU#14 stuck for 22s! [JS Helper:23856]

I'm running 4.10.0-21-generic.

I will duplicate my bug report (bug #1677491) to this one, feel free to reverse if it is not the same bug.

It's strange that I hit this at least once every two days, but server has been working for weeks now.
Is there already a description of what common scenarios are when this bug is hit, i.e., is it already reproducible?

I've seen this bug in the scenario described in comment #218. In my case, there was always a frozen Firefox around, but all the other processes running in the system where also reacting slowly or frozen.

I just experienced this bug on 4.10.0-21-generic #23-Ubuntu.

My Firefox stopped responding, and when trying to kill Firefox the Firefox process became un-killable used 25% CPU time and N/A memory.

CPU: 3 PID: 2558 Comm: firefox Tainted: G OE 4.10.0-21-generic #23-Ubuntu

This is the first time I have seen the issue. I have been running 17.04 for two or three weeks.

I was also experiencing this issue with the official 4.10.0-21-generic kernel; I ran the ~lp1674838 kernel for several days, and have been running the -22-generic test kernel for a couple without problems.

Got bitten by this again today. Ubuntu 17.04, 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux. Firefox again.

I think this is issue is critical enough to expedite the release of fixed kernel, posthaste.

Colan Schwartz (colan) wrote :

Folks, just enable the Proposed channel for the next four days (until this is released into stable on the 5th). You can disable it again afterwards. This is what I've been doing, and haven't run into this issue again (or any other problems with Proposed).

Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Vincas Dargis (talkless) wrote :

When it should be available for 16.04?

Is there a delay in getting 4.10.0-22 to stable release? I had previously understood it was expected yesterday (June 5th).

user722 (user722) on 2017-06-06
information type: Public → Private
information type: Private → Public
Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released

There's a problem with this new version - it won't install on the one system I have that is affected by the bug. The installation error is:

dpkg: error processing archive /var/cache/apt/archives/linux-headers-4.10.0-22_4.10.0-22.24_all.deb (--unpack):
 unable to open '/usr/src/linux-headers-4.10.0-22/arch/ia64/sn/Makefile.dpkg-new': Operation not permitted

The directory contains:
.../usr/src/linux-headers-4.10.0-22/arch/ia64$ cd sn
total 20
drwxr-xr-x 5 root root 4096 Jun 6 15:01 .
drwxr-xr-x 13 root root 4096 Jun 6 15:01 ..
drwxr-xr-x 3 root root 4096 Jun 6 15:01 include
drwxr-xr-x 3 root root 4096 Jun 6 15:01 kernel
drwxr-xr-x 3 root root 4096 Jun 6 15:01 pci
..../usr/src/linux-headers-4.10.0-22/arch/ia64/sn$

I've done a clean, update and install -f, but the headers still failed to install, giving a similar but different error code.

I have now reported the installation error using the automated report system - #1696132

Launchpad Janitor (janitor) wrote :
Download full text (16.1 KiB)

This bug was fixed in the package linux-hwe-edge - 4.10.0-22.24~16.04.1

---------------
linux-hwe-edge (4.10.0-22.24~16.04.1) xenial; urgency=low

  * linux-hwe-edge: 4.10.0-22.24~16.04.1 -proposed tracker (LP: #1691149)

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTD...

Changed in linux-hwe-edge (Ubuntu):
status: In Progress → Fix Released
status: In Progress → Fix Released
Changed in ubuntu-power-systems:
status: New → Fix Released
1 comments hidden view all 241 comments
Christian Sarrasin (sxc731) wrote :

@tim-8aw3u04umo no installation issue here. I used: 'sudo apt update && sudo apt upgrade'.

uname -r reports "4.10.0-22-generic"

I have 4 17.04 systems. The one that had this bug is the only one which has an installation problem with 4.10.0-22. Maybe a coincidence, maybe not?

Seth has fixed my installation problem, so all my 4 17.04 4.10.0.22 systems are now being used. I'll report if I get any further crashes (but I don't expect any).

See #1696132 if you are interested in what the problem was.

Changed in linux-hwe-edge (Ubuntu Zesty):
status: Fix Committed → Fix Released

I believe I may still be encountering this bug, this seems to happen any time the system is under significant load. Attached is my kern.log from right after the NMI watchdog soft lockups start. Please let me know if I should submit additional logs from the next time this happens, or submit a new bug.

Kai-Heng Feng (kaihengfeng) wrote :

@Robbie

It's not the same, please file a new bug.

uname -a
Linux merlin 4.10.0-28-generic #32-Ubuntu SMP Fri Jun 30 05:32:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Same symptoms as reported about. Firefox freezes and causes the system to freeze.

Cannot get Firefox to work. I have had to install Chrome in order to access web.

Have sent automated crash reports to Mozilla and Ubuntu.

If it is not the same problem as reported here, it looks the same from all comments her.

Displaying first 40 and last 40 comments. View all 241 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.