X1Carbon comes to a crawl during high CPU usage tasks

Bug #1627108 reported by Omer Akram
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
High
Joseph Salisbury
Yakkety
In Progress
High
Joseph Salisbury

Bug Description

My X1Carbon becomes quite laggy, the cursor hangs for a few seconds and then resumes while my system is compiling some code, or lets says PyCharm is indexing things or Android Studio is compiling some code. I was using 4.4 on Xenial a few days ago and everything was working just fine. Installed Yakkety and this issue happens.

TEST CASE:
`stress -c 4` <-- that results in system slowness.

I will downgrade the kernel and see if that mitigates the issue.

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: linux-image-4.8.0-15-generic 4.8.0-15.16
ProcVersionSignature: Ubuntu 4.8.0-15.16-generic 4.8.0-rc7
Uname: Linux 4.8.0-15-generic x86_64
ApportVersion: 2.20.3-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC1D0p: om26er 3390 F...m pulseaudio
 /dev/snd/controlC1: om26er 3390 F.... pulseaudio
 /dev/snd/controlC0: om26er 3390 F.... pulseaudio
CurrentDesktop: Unity
Date: Fri Sep 23 21:40:34 2016
HibernationDevice: RESUME=UUID=a92c85ab-cca3-4afc-abf1-3516f193129e
InstallationDate: Installed on 2016-09-21 (1 days ago)
InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Alpha amd64 (20160921)
MachineType: LENOVO 20BSCTO1WW
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.8.0-15-generic.efi.signed root=UUID=01b0a4a0-d791-46e8-a212-1f769cff3a4b ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.8.0-15-generic N/A
 linux-backports-modules-4.8.0-15-generic N/A
 linux-firmware 1.161
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/13/2015
dmi.bios.vendor: LENOVO
dmi.bios.version: N14ET32W (1.10 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20BSCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0E50510 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN14ET32W(1.10):bd08/13/2015:svnLENOVO:pn20BSCTO1WW:pvrThinkPadX1Carbon3rd:rvnLENOVO:rn20BSCTO1WW:rvrSDK0E50510WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.name: 20BSCTO1WW
dmi.product.version: ThinkPad X1 Carbon 3rd
dmi.sys.vendor: LENOVO

Revision history for this message
Omer Akram (om26er) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Brad Figg (brad-figg)
tags: added: kernel-4.8
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Omer Akram (om26er) wrote :

Confirming, downgrading to 4.4 fixes the issue.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.8 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key needs-bisect
Revision history for this message
Omer Akram (om26er) wrote :

So I tested linux 4.8 final and the regression still exists. On another note I also tested linux 4.7 and the issue does not exist there.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu Yakkety):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you give the v4.8-rc1 kernel a try? It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8

We can perform a kernel bisect, once we know the last good kernel and first bad one.

Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
tags: added: perfomring-bisect
removed: needs-bisect
Revision history for this message
Omer Akram (om26er) wrote :

I tried linux 4.8-rc1 and it does have the said regression.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.7 final and v4.8-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
1c88e19b0f6a8471ee50d5062721ba30b8fd4ba9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Omer reports the first test kernel is bad on IRC.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
f7816ad0f878dacd5f0120476f9b836ccf8699ea

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

Tested, the latest kernel build that you provided is bad as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
55392c4c06204c8149dc333309cf474691f1cc3c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

Same results. This build is infected as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
52770c37db2c0ee5585dae2de3d19c8453f1e8dc

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

Ok, that is a good kernel. My test is `stress -c 4` and it works fine i.e. system performance does not degrade.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
5048c2af078d5976895d521262a8802ea791f3b0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Martin Pitt (pitti) wrote :

I filed bug 1626436 which is similar; I wanted to try http://kernel.ubuntu.com/~jsalisbury/lp1627108 but that is empty?

Revision history for this message
Omer Akram (om26er) wrote :

To add the missing comment, I tried that last kernel last night and that is a good one as well. I notified Joseph on IRC.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
0f657262d5f99ad86b9a63fb5dcd29036c2ed916

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

Hi! That is a bad kernel, i.e. it contains the regression.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c86ad14d305d2429c3da19462440bac50c183def

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

That's a good kernel build. Seems there are multiple builds in there, so I tested 201610061038 which is newer.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
3ebfd81f7fb3e81a754e37283b7f38c62244641a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/3ebfd81f7fb3e81a754e37283b7f38c62244641a

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
766fd5f6cdaf1d558afba19850493b2603c9625d

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/766fd5f6cdaf1d558afba19850493b2603c9625d

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

That is a bad kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
55e16d30bd99510900caec913c90f53bc2b35cba

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/766fd5f6cdaf1d558afba19850493b2603c9625d

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

Thanks, that's a bad kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I pointed you to the wrong kernel in comment #26. The URL should have been:

http://kernel.ubuntu.com/~jsalisbury/lp1627108/55e16d30bd99510900caec913c90f53bc2b35cba/

Can you test that kernel?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

If that kernel is also bad, then the next one is already ready and can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1627108/3d89e5478bf550a50c99e93adf659369798263b0/

However, if it is good, I'll have to build the next kernel with a good result.

Revision history for this message
Omer Akram (om26er) wrote :

Ok, the one is comment 28 is bad, while the one in #29 is Good.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
ea86cb4b7621e1298a37197005bf0abcc86348d4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/ea86cb4b7621e1298a37197005bf0abcc86348d4

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Martin Pitt (pitti) wrote :

FTR, I tested #29 which works for Omer, but not for me, so bug 1626436 is not a duplicate.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
3d30544f02120b884bba2a9466c87dba980e3be5

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/3d30544f02120b884bba2a9466c87dba980e3be5

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Omer Akram (om26er) wrote :

As reported on IRC, that's a bad kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
7dc603c9028ea5d4354e0e317e8481df99b06d7e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/7dc603c9028ea5d4354e0e317e8481df99b06d7e

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The bisect reported this as the first bad commit:

commit 3d30544f02120b884bba2a9466c87dba980e3be5
Author: Peter Zijlstra <email address hidden>
Date: Tue Jun 21 14:27:50 2016 +0200

    sched/fair: Apply more PELT fixes

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Yakkety test kernel with a revert of commit 3d30544f02120b884bba2a9466c87dba980e3be5. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1627108/

Can you test that kernel and report back if it has the bug or not?

Note with this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Omer Akram (om26er) wrote :

As notified on on IRC, that is a good kernel.

The issue is easy to reproduce on an X1 Carbon just run `stress -c 4` and within a few seconds mouse starts to lag and everything becomes slow.

description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The patch author requested that we provide a /proc/sched_debug dump while the issue is happening. Can you collect that data?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Upstream also requested testing of the following repo:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

I built a test kernel with this tree. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/upstream/

Can you test this kernel and see if it exhibits the bug or not?

Revision history for this message
Omer Akram (om26er) wrote :

I tested the kernel tip from your debs and the issue is very much alive. Here is the requested dump: http://paste.ubuntu.com/23312351/

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Update and request from upstream:

"I have looked at the dump and there is something very odd for
system.slice task group where the display manager is running.
system.slice->tg_load_avg is around 381697 but tg_load_avg is
normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
case. We can have some differences because the dump of
/proc/shed_debug is not atomic and some changes can happen but nothing
like this difference.

The main effect of this quite high value is that the weight/prio of
the sched_entity that represents system.slice in root cfs_rq is very
low (lower than task with the smallest nice prio) so the system.slice
task group will not get the CPU quite often compared to the user.slice
task group: less than 1% for the system.slice where lightDM and xorg
are running compared 99% for the user.slice where the stress tasks are
running. This is confirmed by the se->avg.util_avg value of the task
groups which reflect how much time each task group is effectively
running on a CPU:
system.slice[CPU3].se->avg.util_avg = 8 whereas
user.slice[CPU3].se->avg.util_avg = 991

This difference of weight/priority explains why the system becomes
unresponsive. For now, I can't explain is why
system.slice->tg_load_avg = 381697 whereas is should be around 1013
and how the patch can generate this situation.

Is it possible to have a dump of /proc/sched_debug before starting
stress command ? to check if the problem is there from the beginning
but not seen because not overloaded. Or if it the problem comes when
user starts to load the system"

Revision history for this message
Omer Akram (om26er) wrote :
Revision history for this message
Omer Akram (om26er) wrote :

I have attached two dumps. One with the buggy kernel and the other with the kernel that does not show the lag.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

From Upstream:

"> Here is the dump before stress is started:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy

This one is ok.
The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic
#12~lp1627108Commit3d30544Reverted
so this is without the culprit commit

>
> Here it is after:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy
>

This one has the exact same odds values for system.slice->tg_load_avg
than the 1st dump that you sent yesterday
The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu
So this dump has been done with a different kernel than for the dump above.
As I can't find any stress task in the dump, i tend to believe that
the dump has been done before starting the stress tasks and not after
starting them. Can you confirm ?

If i'm right, it mean that the problem was already there before
starting stress tasks."

Revision history for this message
Omer Akram (om26er) wrote :

That's correct, that dump_buggy is without stress running.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the proposed patch from upstream. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1627108/upstream/

Can you test this kernel and see if it resolves this bug?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers