[2.0b6] core count not updated during commissioning if MAAS previously stored a higher core count

Bug #1590144 reported by Jason Hobbs on 2016-06-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Unassigned

Bug Description

I commissioned some nodes with hyperthreading enabled, and the core count showed correctly as 20 (10 cores, with 2 HTs per core).

I then disabled hyperthreading and recommissioned. The core count did not change - it remained as 20.

Here's the cpuinfo output from the node in MAAS:
http://pastebin.ubuntu.com/17100667/

This is with 2.0 beta 6.

Related branches

Andres Rodriguez (andreserl) wrote :

Jason,

Can you also please attache /proc/cpuinfo ? That said, we grab what:

1. lshw
2. /proc/cpuinfo

(2) is if lshw doesn't provide the correct information.

For example, there's a bug in lshw that would always return 1 CPU when it had 10 cores. So, if that's the case, /proc/cpuinfo reports 20.

Lee Trager (ltrager) on 2016-06-07
Changed in maas:
status: New → Incomplete
Andres Rodriguez (andreserl) wrote :

nevermind on cpuinfo, I mean, please attach lshw.

We prefer lshw over cpuinfo, so if that's what lshw is reporting, that's wwhat we will prefer.

Changed in maas:
milestone: none → 2.0.0
Jason Hobbs (jason-hobbs) wrote :

Here is the lshw output: http://pastebin.ubuntu.com/17101751/

Changed in maas:
status: Incomplete → New
Andres Rodriguez (andreserl) wrote :

Ok, I quickly looked over the lshw and it seems that lshw reports 20 cores. As such, I'll make this as Opinion provided that we get this information from lshw. If lshw is incorrectly detecting this information, then lshw would need to be fixed.

That said, there should have been an updated lshw that landed today in the archives (xenial-updates). Can you verify this was using it?

Thanks!

Changed in maas:
status: New → Opinion
Jason Hobbs (jason-hobbs) wrote :

Where does it say it's using 20 cores in lshw?

Changed in maas:
status: Opinion → New
Jason Hobbs (jason-hobbs) wrote :

Please let me know where lshw says it's using 20 cores. That one is a bit hard for me to believe, since we have other identical systems that originally had HT turned off and are correctly reporting 10 cores.

Hi Jason,

Can you retry with latest lshw? This was only promoted to -updates an hour
ago, and while the bug below refers to RAM, it is also affecting CPU
detection. https://bugs.launchpad.net/bugs/1039701

On Wednesday, June 8, 2016, Jason Hobbs <email address hidden> wrote:

> Please let me know where lshw says it's using 20 cores. That one is a
> bit hard for me to believe, since we have other identical systems that
> originally had HT turned off and are correctly reporting 10 cores.
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1590144
>
> Title:
> core count not updated during commissioning
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1590144/+subscriptions
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Andres Rodriguez (andreserl) wrote :

Also, the fact that it is reporting 10 may be due to a work around we did
to bypass the issues with lshw, but lshw will always take priority vs
cpuinfo.

On Wednesday, June 8, 2016, Andres Rodriguez <email address hidden>
wrote:

> Hi Jason,
>
> Can you retry with latest lshw? This was only promoted to -updates an hour
> ago, and while the bug below refers to RAM, it is also affecting CPU
> detection. https://bugs.launchpad.net/bugs/1039701
>
> On Wednesday, June 8, 2016, Jason Hobbs <<email address hidden>
> <javascript:_e(%7B%7D,'cvml','<email address hidden>');>> wrote:
>
>> Please let me know where lshw says it's using 20 cores. That one is a
>> bit hard for me to believe, since we have other identical systems that
>> originally had HT turned off and are correctly reporting 10 cores.
>>
>> --
>> You received this bug notification because you are subscribed to MAAS.
>> https://bugs.launchpad.net/bugs/1590144
>>
>> Title:
>> core count not updated during commissioning
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/maas/+bug/1590144/+subscriptions
>>
>
>
> --
> Andres Rodriguez (RoAkSoAx)
> Ubuntu Server Developer
> MSc. Telecom & Networking
> Systems Engineer
>
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

I tested with the latest lshw and I still have this issue.

I looked at the code and added some debugging. Here's what's happening.

lshw is is showing 1.0 as the core count - it doesn't accurately represent the core count, obviously, and the code is setup so that the higher core count takes precedence, so when the node was first commissioned, the cpuinfo value is used and the core count is set to 20.

The next time I commissioned the node, with HT turned off, the same code paths are hit - lshw says 1.0 and cpuinfo says 10 now. The same code paths are hit, but the code is setup to take the highest value, including the preexisting value of 20, and so the core count isn't being updated.

Here's the code with the issue (from parse_cpuinfo) (with my log message in it):
    logger.error("current cpu count: %s\tnew cpu count: %s" % (node.cpu_count, cpu_count))
    if node.cpu_count is None or cpu_count > node.cpu_count:
        node.cpu_count = cpu_count
        node.save()

Example log message:
2016-06-09 15:15:32 [metadataserver.models.commissioningscript] ERROR: current cpu count: 20 new cpu count: 10

It's interesting that if I enabled HT on the nodes that never had them, cpuinfo would reflect that and the cpu core count would properly go up in MAAS. They would just be stuck there - if I turned HT off again the core count wouldn't go back down.

So, if the approach of using the highest value out of lshw or cpuinfo is used, the code has to be changed to not also consider the preexisting cpu_count value for Node, which may be based on an old hardware configuration.

But I'm also curious why lshw and cpuinfo are used, instead of lscpu, which seems to do a superior job here.

Jason Hobbs (jason-hobbs) wrote :

*To be clear, MAAS is deriving 1.0 as the core count from lshw. I don't think lshw is actually making an attempt to accurately represent the core count.

Changed in maas:
status: New → Confirmed
importance: Undecided → High
summary: - core count not updated during commissioning
+ [2.0b6] core count not updated during commissioning
summary: - [2.0b6] core count not updated during commissioning
+ [2.0b6] core count not updated during commissioning if MAAS stores a
+ higher core count
summary: - [2.0b6] core count not updated during commissioning if MAAS stores a
- higher core count
+ [2.0b6] core count not updated during commissioning if MAAS previously
+ stored a higher core count
Jason Hobbs (jason-hobbs) wrote :

There is also nproc, part of coreutils (in main), which is very simple and seems to work well in my limited testing.

Changed in maas:
status: Confirmed → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers