Not using all cores after upgrade to 14.10

Bug #1386473 reported by Stefan Freyr
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Utopic
Fix Released
Medium
Joseph Salisbury

Bug Description

I have a computer with two cpus with 4 cores each (physical) but htop is only showing activity on two of these 8 cores (number 1 and 3). Also, if I run "stress -c 8" top shows only 25% cpu usage. I've rebooted the machine and the problem persists. I also took a look at the BIOS settings and found nothing there that looked suspicious.

If I run 'stress -c 8' only two cores go to 100% in htop as stated above (cores 1 and 3). If I run 'stress -c 1', one core goes to 100% (either core 1 or 3). If I run taskset -cp <coreid> <pid> I can control which core the process is run on. I can actually move it to any core that I want (1, 3 or any other one)! So the cores are there... the kernel just doesn't seem to use it unless I explicitly tell it to with taskset.

 I've verified that this is a regression when moving from [lxk]ubuntu 14.04 to 14.10. I used pristine lubuntu 14.10 and 14.04 usb sticks to test this out and verified that a cpu sysbench test using 8 threads took ~15 sec on 14.04 but ~55 sec on 14.10. The computer this is happening on is a Dell Precision T5400 tower workstation.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: linux-image-3.16.0-23-generic 3.16.0-23.31
ProcVersionSignature: Ubuntu 3.16.0-23.31-generic 3.16.4
Uname: Linux 3.16.0-23-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDesktop: KDE
Date: Tue Oct 28 00:46:25 2014
HibernationDevice: RESUME=UUID=20536d9f-ef05-46dd-a104-47367e464fcd
InstallationDate: Installed on 2014-02-05 (264 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Release amd64 (20131016)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Dell Inc. Precision WorkStation T5400
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-23-generic root=UUID=28fc4c4c-cecd-4abe-a788-28f69e745845 ro splash quiet
RelatedPackageVersions:
 linux-restricted-modules-3.16.0-23-generic N/A
 linux-backports-modules-3.16.0-23-generic N/A
 linux-firmware 1.138
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to utopic on 2014-10-24 (3 days ago)
dmi.bios.date: 02/01/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A01
dmi.board.name: 0RW203
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA01:bd02/01/2008:svnDellInc.:pnPrecisionWorkStationT5400:pvr:rvnDellInc.:rn0RW203:rvrA00:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision WorkStation T5400
dmi.sys.vendor: Dell Inc.

Revision history for this message
Stefan Freyr (stefan-freyr) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

Just verified that downgrading the kernel to 3.13.0 fixed the problem. However, this is not a viable solution since many of the dkms modules (specifically nvidia and vboxhost) will not compile properly and therefore had to be disabled in order to get the system to boot.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc2-utopic/

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-unable-to-test-upstream
Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

I installed the 3.18-rc2 packages but when I tried to boot into it I got an error message saying that it couldn't find the root device. I compared the ID of the root device in the boot options for the 3.18 kernel to the ones in the 3.16 kernel and couldn't find any differences. It seems to be referencing the exact same UUID for the root device at least.

Is there something I can try to fix this?

Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

I managed to install the rc1 kernel and this problem seems to be fixed there! I will add the appropriate tag. Is there any ETA for the 3.18 kernel release in utopic?

tags: added: kernel-fixed-upstream
removed: kernel-unable-to-test-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you also give the latest 3.16 upstream stable kernel a test? This will tell us if the fix in mainline already made it's way into stable updates:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.6-utopic/

Changed in linux (Ubuntu Utopic):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

I installed 13.16.6 and it uses all cores. The bug is not present in that kernel version.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

That is good news. That means Utopic will get the fix through the normal stable update process.

Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

Great news! Thanks.

Any idea when a new stable kernel will be pushed to the repos? Also... do you have any idea what's causing this? I'm curious :-)

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The 3.16.5, 3.16.6 and 3.16.7 upstream updates are now in the Ubuntu 3.16.0-25.33 kernel which is available in the -proposed repository.

Would it be possible for you to test this latest kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Revision history for this message
PE Norris (o-baul-7) wrote :

I can't speak for the OP, but I was hit by this bug with an almost identical CPU setup (Xeon x5355) and the kernel in proposed doesn't resolve the issue for me. The upstream utopic kernel link posted at #7 however does fix it. My symptoms are identical.

Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

Sorry... I was out of commission for a while.

Now it seems that this kernel has made it through to the official repository and I just installed it and I have the same results as PE Norris above. The latest official kernel does not fix this problem.

Revision history for this message
willmo (willmo) wrote :

I'm having the same problem on a 2008 Mac Pro with dual Xeon E5462s, and kernel 3.16.0-25.33. Was working fine on 14.04.

Very suspicious that we all have dual quad-core Clovertown or Harpertown Xeons.

Let me know if you need additional info.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you confirm upstream 3.16.6 still does in fact fix this bug:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.6-utopic/

If it does, then I'll compare what is in the Ubuntu kernel versus the Upstream kernel that may be causing this.

Revision history for this message
willmo (willmo) wrote :

I'm reasonably sure the problematic change is this one: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-utopic.git;a=commit;h=8ee41919feea10de065e3102de4cbf0c57fc60ea

It was fixed upstream: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/smpboot.c?id=728e5653e6fdb2a0892e94a600aef8c9a036c7eb . This needs to be cherry-picked into Utopic.

I'm building a Utopic kernel with that patch applied now, and will test it shortly.

Revision history for this message
willmo (willmo) wrote :

Yep, that fixed it. stress -c 8 is now using all 8 CPUs. Please cherry-pick upstream commit 728e5653e6fdb into Utopic. :-)

Just for posterity, I'm attaching some instructive dmesg output. This is obtained by booting with the sched_debug parameter, switching the printk level to DEBUG, and offlining/onlining a CPU. (I think this info can be obtained other ways, but this was easiest.)

Notice that in the broken output, there is no sched-domain that contains more than 2 of 8 CPUs.

Revision history for this message
willmo (willmo) wrote :
Revision history for this message
willmo (willmo) wrote :

I'm attaching the patch for the cherry-pick of the upstream fix. I tried to do it in proper Ubuntu kernel format. :-)

Is there anything else you need to get this into Utopic?

tags: added: patch
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the pointer to the patch, willmo. Now I understand why the bug existed in Utopic and not in upstream 3.16.6.

Utopic has the following SAUCE commit:

commit 8ee41919feea10de065e3102de4cbf0c57fc60ea
Author: Dave Hansen <email address hidden>
Date: Thu Sep 18 12:33:34 2014 -0700

    UBUNTU: SAUCE: x86, sched: Add new topology for multi-NUMA-node CPUs

Which is upstream commit: cebf15eb.

The SAUCE commit adds this line, which is what causes this bug:
 if (match_die(c, o) == !topology_same_node(c, o))

Commit 728e565 fixes this bug by changing the line to:
 if (match_die(c, o) && !topology_same_node(c, o))

I built a Utopic test kernel with a cherry-pick of 728e565. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1386473/

Can you test this kernel and confirm if it fixes this bug? Note, you will need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Stefan Freyr (stefan-freyr) wrote :

I've downloaded the Utopic test kernel and verified that all cores work when I boot up on it. So it seems to fix the problem.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Brad Figg (brad-figg)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
willmo (willmo)
tags: added: verification-done-utopic
removed: verification-needed-utopic
Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.