Yakkety failures on Xeon Phi (Dell Poweredge C6320p)

Bug #1643087 reported by Mark W Wenning
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Yakkety
Won't Fix
High
Unassigned

Bug Description

Dell Poweredge C6320p machine with Intel Xeon Phi processor, running certification tests.
I see 2 segfaults in the logs, there are more errors in the cert tests which I will post later.

Linux brief-badger 4.8.0-27-generic #29-Ubuntu SMP Thu Oct 20 21:03:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Mark W Wenning (mwenning) wrote :

Tar file with all the logs:
C6320p_info:
total 892
drwxrwxr-x 2 mwenning mwenning 4096 Nov 18 17:09 ./
drwxr-xr-x 102 mwenning mwenning 12288 Nov 18 17:24 ../
-rw-r--r-- 1 mwenning mwenning 5321 Nov 18 17:09 alternatives.log
-rw-r----- 1 mwenning mwenning 490 Nov 18 17:09 apport.log
-rw-r----- 1 mwenning mwenning 3187 Nov 18 17:09 auth.log
-rw-r--r-- 1 mwenning mwenning 157698 Nov 18 17:09 cloud-init.log
-rw-r--r-- 1 mwenning mwenning 4622 Nov 18 17:09 cloud-init-output.log
-rw-rw-r-- 1 mwenning mwenning 266308 Nov 18 17:09 cpuinfo.txt
-rw-rw-r-- 1 mwenning mwenning 82262 Nov 18 17:09 dmesg.txt
-rw-r--r-- 1 mwenning mwenning 217921 Nov 18 17:09 dpkg.log
-rw-r--r-- 1 mwenning mwenning 510 Nov 18 17:09 fontconfig.log
-rw-r----- 1 mwenning mwenning 124475 Nov 18 17:09 kern.log
-rw-rw-r-- 1 mwenning mwenning 111 Nov 18 17:09 uname.txt

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1643087

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Mark W Wenning (mwenning) wrote :
Revision history for this message
Mark W Wenning (mwenning) wrote :

Got the following when I tried to run apport-collect:
ubuntu@brief-badger:~$ apport-collect 1643087
The authorization page:
 (https://launchpad.net/+authorize-token?oauth_token=0h42c3flxcPFlcTk6RqB&allow_permission=DESKTOP_INTEGRATION)
should be opening in your browser. Use your browser to authorize
this program to access Launchpad on your behalf.
Waiting to hear from Launchpad about your decision...
ERROR: connecting to Launchpad failed: local variable 'browser_obj' referenced before assignment
You can reset the credentials by removing the file "/home/ubuntu/.cache/apport/launchpad.credentials"
ubuntu@brief-badger:~$

This is a server (maas node) so no browser was available. Opened the url in my laptop and authorized, nothing exciting happened.
Marking confirmed.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc6

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key yakkety
Changed in linux (Ubuntu Yakkety):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, do you happen to know if this is a regression and if a particular prior kernel version works?

Revision history for this message
Mark W Wenning (mwenning) wrote :

Joseph,
This is a pretty new machine, as far as I know it's been tested with Xenial and Yakkety.
I will try to run the test kernel next week when I get back.

Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

The microcode on DELL KNL-D machine is 0xffff01a0, while our SDV is 0xffff01a3. So it is possible to upgrade bios first.

Another strange thing in cpuinfo is cpu frequency is 998.962MHz, why is it not 1.3GHz ?

tags: added: kernel-key
removed: kernel-da-key
Revision history for this message
Mark W Wenning (mwenning) wrote :

Tried with Joe's kernel above:
ubuntu@brief-badger:~$ uname -a
Linux brief-badger 4.9.0-040900rc6-generic #201611201731 SMP Sun Nov 20 22:33:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Mark W Wenning (mwenning) wrote :

Xiong, the BIOS level on the machine is 1.0.0 . Where would I get the latest BIOS?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Mark - how about trying the v4.9 kernel in ppa:canonical-kernel-team/unstable when it is done building ? I've set NR_CPUS=8192 so that all of the cores are activated.

Revision history for this message
Mark W Wenning (mwenning) wrote :

Sure Tim, let me know when it's done building and a link to get to it.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Mark W Wenning (mwenning) wrote : Re: [Bug 1643087] Re: Yakkety failures on Xeon Phi (Dell Poweredge C6320p)

Ok Tim, tryng it now.
How is this different than the one Joe set up in the bug?

Mark Wenning
Technical Partner Manager, Cloud Alliances
Canonical, Ltd
<email address hidden>
-----
"We will encourage you to develop the three great virtues of a programmer:
laziness, impatience, and hubris." -- Larry Wall, Programming Perl (1st
edition), Oreilly And Associates

On Thu, Dec 1, 2016 at 4:04 PM, Tim Gardner <email address hidden>
wrote:

> Mark,
>
> mkdir 4.9.0.5.6
> cd 4.9.0.5.6
> wget http://kernel.ubuntu.com/~rtg/linux-4.9.0.5.6/linux-image-4.
> 9.0-5-generic_4.9.0-5.6_amd64.deb
> wget http://kernel.ubuntu.com/~rtg/linux-4.9.0.5.6/linux-image-
> extra-4.9.0-5-generic_4.9.0-5.6_amd64.deb
> sudo dpkg -i *.deb
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1643087
>
> Title:
> Yakkety failures on Xeon Phi (Dell Poweredge C6320p)
>
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Yakkety:
> Confirmed
>
> Bug description:
> Dell Poweredge C6320p machine with Intel Xeon Phi processor, running
> certification tests.
> I see 2 segfaults in the logs, there are more errors in the cert tests
> which I will post later.
>
> Linux brief-badger 4.8.0-27-generic #29-Ubuntu SMP Thu Oct 20 21:03:13
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1643087/+subscriptions
>

Revision history for this message
Tim Gardner (timg-tpi) wrote :

NR_CPUS=8192 to take full advantage of all of the cores. The current default is 256 which is insufficient.

Revision history for this message
Mark W Wenning (mwenning) wrote :

Tim, it came up OK with the new kernel, no segfaults that I see. cat /proc/cpuinfo | grep 'cpu id' | wc -l returns 256 .

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Mark - I guess you should re-run some of the tests that were causing segfaults.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Mark - ugh, I just read your comment about 256 cpu id's. Are you sure you've booted the right kernel ? 'uname -a'

tags: added: kernel-da-key
removed: kernel-key
Revision history for this message
Mark W Wenning (mwenning) wrote :

Hi Tim, I'm sure I loaded the right kernel.
Unfortunately I've lost access to the machine, when I can find another I will update this bug.

Revision history for this message
Mark W Wenning (mwenning) wrote :

Hi Tim,
I have the C6320P back and running. Do you hae a new 4.9 kernel to try, or should I apply the same one above?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Mark - there is a 4.9 kernel in the Zesty release pocket with more updates on the way. We should have a 4.10 kernel in the archive within a couple of weeks.

Revision history for this message
Mark W Wenning (mwenning) wrote :

Tim, good news, the upgraded machine returns the right number:

ubuntu@good-hawk:~$ cat /proc/cpuinfo | grep 'processor' | wc -l
272
ubuntu@good-hawk:~$ Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz

Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Seth Forshee (sforshee) wrote :

To update - 4.10 is now in the archive for zesty. Please let us know if you continue to see these issues on a fully up to date zesty installation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.