Ubuntu 16.04.1: numactl failed when trying to specify node with netdev:dev

Bug #1638515 reported by bugproxy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
numactl (Ubuntu)
Invalid
Undecided
Taco Screen team

Bug Description

== Comment: #0 - Ping Tian Han <email address hidden> - 2016-07-08 02:21:08 ==
---Problem Description---
When trying to run this command one roselp1:

% numactl -a --cpunodebind=netdev:enP2p1s0f3 numactl --show

I got a warning message as this:

libnuma: Warning: Cannot read node mask for device `/devices/pci0002:01/0002:01:00.3/'
<netdev:enP2p1s0f3> is invalid
... ...

---uname output---
Linux roselp1 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = IBM,8286-42A lpar
 ---Debugger---
A debugger was configured, however the system did not enter into the debugger

---Steps to Reproduce---
 1. install ubuntu 16.04.1 on roselp1
2. run numactl -a --cpunodebind=netdev:enP2p1s0f3 numactl --show

Contact Information = Ping Tian <email address hidden>

*Additional Instructions for Ping Tian <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.

== Comment: #2 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 07:51:04 ==
root@roselp1:~# tail /proc/cpuinfo

processor : 159
cpu : POWER8 (architected), altivec supported
clock : 4157.000000MHz
revision : 2.1 (pvr 004b 0201)

timebase : 512000000
platform : pSeries
model : IBM,8286-42A
machine : CHRP IBM,8286-42A
root@roselp1:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
root@roselp1:~# uname -a
Linux roselp1 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@roselp1:~#

== Comment: #3 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 08:49:30 ==
root@roselp1:/# numactl -H
available: 2 nodes (0,2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus: 16 17 18 19 20 21 22 23 56 57 58 59 60 61 62 63 112 113 114 115 116 117 118 119
node 2 size: 9651 MB
node 2 free: 7065 MB
node distances:
node 0 2
  0: 10 40
  2: 40 10
root@roselp1:/#

== Comment: #4 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 08:55:07 ==

== Comment: #5 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 09:05:17 ==
root@roselp1:~# numactl -s
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
cpubind: 2
nodebind: 2
membind: 2
root@roselp1:~#

== Comment: #6 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 09:20:07 ==
Directly giving node number also works partially.
It has two nodes 0,2. It fails for node 0 while works for node 2.

root@roselp1:~# numactl --cpunodebind=0 echo hi
libnuma: Warning: node argument 0 is out of range

<0> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
               [--physcpubind= | -C <cpus>] [--cpunodebind= | -N <nodes>]
               [--membind= | -m <nodes>] [--localalloc | -l] command args ...
       numactl [--show | -s]

root@roselp1:~# numactl --cpunodebind=2 echo hi
hi
root@roselp1:~# numactl --cpunodebind=netdev:enP2p1s0f3 echo hi
libnuma: Warning: Cannot read node mask for device `/devices/pci0002:01/0002:01:00.3/'
<netdev:enP2p1s0f3> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
..
..
root@roselp1:~#

== Comment: #7 - VIPIN K. PARASHAR <email address hidden> - 2016-07-08 09:28:22 ==
Workstation:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.04
Release: 15.04
Codename: vivid
Workstation:~$ head /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz
stepping : 7
microcode : 0x1a
cpu MHz : 834.234
cache size : 3072 KB
physical id : 0

Workstation:~$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 7738 MB
node 0 free: 3672 MB
node distances:
node 0
  0: 10

Workstation:~$ numactl --cpunodebind=0 echo hi
hi

Workstation:~$ ifconfig | grep eth
eth0 Link encap:Ethernet HWaddr 00:21:cc:6d:1e:77

Workstation:~$ numactl --cpunodebind=netdev:eth0 echo hi
libnuma: Warning: Kernel does not know node mask for device `/devices/pci0000:00/0000:00:19.0/'
<netdev:eth0> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
....
....
Workstation:~$

I see it to fail on a x86 box also running Ubuntu 16.04.

== Comment: #10 - Ping Tian Han <email address hidden> - 2016-10-30 21:38:07 ==

I can reproduce this problem on 16.10:

% numactl -a --cpunodebind=netdev:enP32770p1s0f0 echo hi
libnuma: Warning: Kernel does not know node mask for device `/devices/pci8002:01/8002:01:00.0/'
<netdev:enP32770p1s0f0> is invalid
... ...

Revision history for this message
bugproxy (bugproxy) wrote : sosreport - roselp1

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-143570 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → numactl (Ubuntu)
bugproxy (bugproxy)
tags: added: targetmilestone-inin16042
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-02 07:11 EDT-------
Hello Canonical,

Can you please have a look and advise at numactl command
failures being seen with '--cpubind' option. We could recreate
it on ppc and x86 platforms running 16.10 and 16.04.1. Refer
previous comments about detailed description and findings on
the issue.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Unfortunately numactl/libnuma is historically quite bad at making it easy to debug these kind of issues.

Algorithmically, numactl does the following:

If a /sys/class/net/<device> symlink is found and the target matches '/devices/pci[0-9a-fA-F:/]+\\.[0-9]+)/', it then tries to parse /sys/<part that matches the above regex>/numa_node
If the above symlink is not found, it tries to parse /sys/class/net/<device>/device/numa_node

My initial suspicion would be the non-continuous NUMA numbering, but if it also happens on x86 then that seems like not an issue.

From the affected systems, can you provide:

ls -ahl /sys/class/net/<device> (which should be a symlink)
cat /sys/devices/<pci path to device>/numa_node
[ I believe in your case, that would be:
 cat /sys//devices/pci0002:01/0002:01:00.3/numa_node
]
cat /sys/class/net/<device>/device/numa_node

Thanks,
Nish

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-06 20:30 EDT-------
(In reply to comment #15)
> From the affected systems, can you provide:
>
> ls -ahl /sys/class/net/<device> (which should be a symlink)
> cat /sys/devices/<pci path to device>/numa_node
> [ I believe in your case, that would be:
> cat /sys//devices/pci0002:01/0002:01:00.3/numa_node
> ]
> cat /sys/class/net/<device>/device/numa_node
>

The original system which I found this bug with 16.04.1 cannot reproduce it with 16.10 anymore. I found another system roselp3 and adding a NIC adapter(U78C9.001.WZS02T4-P1-C8) by DLPAR. Then,

% numactl -a --cpunodebind=netdev:enP28p96s0f3 numactl --show
libnuma: Warning: Cannot read node mask for device `/devices/pci001c:60/001c:60:00.3/'
<netdev:enP28p96s0f3> is invalid
... ...

% ls -ahl /sys/class/net/enP28p96s0f3
lrwxrwxrwx 1 root root 0 Nov 6 19:18 /sys/class/net/enP28p96s0f3 -> ../../devices/pci001c:60/001c:60:00.3/net/enP28p96s0f3
% cat /sys/devices/pci001c:60/001c:60:00.3/numa_node
1
% cat /sys/class/net/enP28p96s0f3/device/numa_node
1

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: [Bug 1638515] Comment bridged from LTC Bugzilla

On Nov 6, 2016 17:45, "bugproxy" <email address hidden> wrote:
>
> ------- Comment From <email address hidden> 2016-11-06 20:30 EDT-------
> (In reply to comment #15)
> > From the affected systems, can you provide:
> >
> > ls -ahl /sys/class/net/<device> (which should be a symlink)
> > cat /sys/devices/<pci path to device>/numa_node
> > [ I believe in your case, that would be:
> > cat /sys//devices/pci0002:01/0002:01:00.3/numa_node
> > ]
> > cat /sys/class/net/<device>/device/numa_node
> >
>
> The original system which I found this bug with 16.04.1 cannot reproduce
> it with 16.10 anymore. I found another system roselp3 and adding a NIC
> adapter(U78C9.001.WZS02T4-P1-C8) by DLPAR. Then,
>
> % numactl -a --cpunodebind=netdev:enP28p96s0f3 numactl --show
> libnuma: Warning: Cannot read node mask for device
`/devices/pci001c:60/001c:60:00.3/'
> <netdev:enP28p96s0f3> is invalid
> ... ...
>
> % ls -ahl /sys/class/net/enP28p96s0f3
> lrwxrwxrwx 1 root root 0 Nov 6 19:18 /sys/class/net/enP28p96s0f3 ->
../../devices/pci001c:60/001c:60:00.3/net/enP28p96s0f3
> % cat /sys/devices/pci001c:60/001c:60:00.3/numa_node
> 1
> % cat /sys/class/net/enP28p96s0f3/device/numa_node
> 1

Can you please provide numactl -H output from the affected system?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-07 00:49 EDT-------
(In reply to comment #18)
> On Nov 6, 2016 17:45, "bugproxy" <email address hidden> wrote:
> >
> > (In reply to comment #15)
> > > From the affected systems, can you provide:
> > >
> > > ls -ahl /sys/class/net/<device> (which should be a symlink)
> > > cat /sys/devices/<pci path to device>/numa_node
> > > [ I believe in your case, that would be:
> > > cat /sys//devices/pci0002:01/0002:01:00.3/numa_node
> > > ]
> > > cat /sys/class/net/<device>/device/numa_node
> > >
> >
> > The original system which I found this bug with 16.04.1 cannot reproduce
> > it with 16.10 anymore. I found another system roselp3 and adding a NIC
> > adapter(U78C9.001.WZS02T4-P1-C8) by DLPAR. Then,
> >
> > % numactl -a --cpunodebind=netdev:enP28p96s0f3 numactl --show
> > libnuma: Warning: Cannot read node mask for device
> `/devices/pci001c:60/001c:60:00.3/'
> > <netdev:enP28p96s0f3> is invalid
> > ... ...
> >
> > % ls -ahl /sys/class/net/enP28p96s0f3
> > lrwxrwxrwx 1 root root 0 Nov 6 19:18 /sys/class/net/enP28p96s0f3 ->
> ../../devices/pci001c:60/001c:60:00.3/net/enP28p96s0f3
> > % cat /sys/devices/pci001c:60/001c:60:00.3/numa_node
> > 1
> > % cat /sys/class/net/enP28p96s0f3/device/numa_node
> > 1
>
> Can you please provide numactl -H output from the affected system?

% sudo numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus:
node 1 size: 17826 MB
node 1 free: 13472 MB
node distances:
node 0 1
0: 10 20
1: 20 10

Revision history for this message
Nish Aravamudan (nacc) wrote :

Ok, so I think this is not exactly a bug, maybe a usability issue upstream?

Note also that the error message now reported ("Cannot read node mask for device", sysfs_node_read returning -1) is different than before ("Kernel does not know node mask for device", sysfs_node_read returning -2)

In any case, you've told numactl to use --cpunodebind=<node of a network device>, but that <node of a network device> has no CPUs, so numactl can't bind to it?

In the code, I believe numprocnode to be 0 (as node 0 is the highest node with CPUs) and

  if (num >= numa_num_task_nodes())
   return -1;

is probably being tripped in sysfs_node_read. As with all things numactl, it needs some love to be more user-friendly.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Note that LPAR is so broken as far as affinity goes that I don't think any invocation of numactl on it makes sense. You have one node with all CPUs and one node with all memory. By definition, you will need to fallback to node 0 to run and to node 1 to allocate memory, so don't bother using numactl on it.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-08 07:08 EDT-------
# for i in `ifconfig -a | awk -F : '/^e/ {print $1}'`; do echo -n "$i - " ; ethtool -i $i | grep bus; done
enP28p96s0f0 - bus-info: 001c:60:00.0
enP28p96s0f1 - bus-info: 001c:60:00.1
enP28p96s0f2 - bus-info: 001c:60:00.2
enP28p96s0f3 - bus-info: 001c:60:00.3
enP32770p1s0f0 - bus-info: 8002:01:00.0

# lspci
0018:01:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 30)
0018:01:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 30)
001c:60:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
001c:60:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
001c:60:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
001c:60:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10)
001c:60:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
001c:60:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
8002:01:00.0 Ethernet controller: Emulex Corporation Device e228 (rev 10)

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 - 159
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus:
node 1 size: 17826 MB
node 1 free: 13171 MB
node distances:
....

# ls -l /sys/class/net | cut -d ' ' -f 10-
enP28p96s0f0 -> ../../devices/pci001c:60/001c:60:00.0/net/enP28p96s0f0
enP28p96s0f1 -> ../../devices/pci001c:60/001c:60:00.1/net/enP28p96s0f1
enP28p96s0f2 -> ../../devices/pci001c:60/001c:60:00.2/net/enP28p96s0f2
enP28p96s0f3 -> ../../devices/pci001c:60/001c:60:00.3/net/enP28p96s0f3
enP32770p1s0f0 -> ../../devices/pci8002:01/8002:01:00.0/net/enP32770p1s0f0
lo -> ../../devices/virtual/net/lo

# cat /sys/class/net/enP28p96s0f0/device/numa_node
1
# cat /sys/class/net/enP28p96s0f1/device/numa_node
1
#

# numactl -a --cpunodebind=netdev:enP28p96s0f0
libnuma: Warning: Cannot read node mask for device `/devices/pci001c:60/001c:60:00.0/'
<netdev:enP28p96s0f0> is invalid
..
..
# numactl -a --cpunodebind=netdev:enP28p96s0f1
libnuma: Warning: Cannot read node mask for device `/devices/pci001c:60/001c:60:00.1/'
<netdev:enP28p96s0f1> is invalid

There are 5 Ethernet controllers onto this box.
enP28p96s0fX devices fall under numa_node 1, and
since node 1 doesn't have any cpus assigned thus
numactl with --cpunodebind fails with "Cannot read node mask" error.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-08 08:03 EDT-------
# ls -l /sys/class/net/enP32770p1s0f0
.. /sys/class/net/enP32770p1s0f0 -> ../../devices/pci8002:01/8002:01:00.0/net/enP32770p1s0f0

# cat /sys/class/net/enP32770p1s0f0/device/numa_node
-1
# cat /sys/devices/pci8002\:01/8002:01:00.0/numa_node
-1

# numactl -a --cpubind=netdev:enP32770p1s0f0 echo hi
libnuma: Warning: Kernel does not know node mask for device `/devices/pci8002:01/8002:01:00.0/'
<netdev:enP32770p1s0f0> is invalid

For enP32770p1s0f0 device numa_code contains -1 and thus
"numactl --cpubind" fails with "Kernel does not know node mask"

Hello Canonical,

Thanks!! for your findings with numactl --cpubind option.
Here enP28p96s0fX devices fall under numa_node 1,
which doesn't have CPUs and thus cpubind fails
with "Cannot read node mask". This is expected behaviour
as per numactl code working.

enP32770p1s0f0 fails numactl --cpubind option with
"Kernel does not know node mask". This fail is likely due
to numa_node for this device having -1 as value.

Any suggestions why enP32770p1s0f0 device is having
numa_node value of -1 ? Is this some kernel issue here ?

Revision history for this message
Nish Aravamudan (nacc) wrote :

I think the -1 means that we were unable to parse the OF representation of the NUMA node. I'd have to refresh my memory on the powerpc NUMA code, it might be faster to ask the Ozlabs folks.

-Nish

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-15 03:55 EDT-------
Thanks!! for explanation. Since things are working as
per numactl code design, thus i am closing this bug.

Revision history for this message
Nish Aravamudan (nacc) wrote :

You're welcome! Note that there is an opportunity here (distro agnostic) for someone to write up an article explaining when numactl goes wrong (and why) in cases like this. I think they are seen more often on Power (due to LPARs).

Nish Aravamudan (nacc)
Changed in numactl (Ubuntu):
status: New → Invalid
bugproxy (bugproxy)
tags: added: targetmilestone-inin---
removed: targetmilestone-inin16042
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.