config failing with unable to find interface (Mellanox family interface not visible in linux)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Jim Somerville |
Bug Description
Brief Description
-----------------
Configuration of labs with mellanox interfaces fail. The interface is not visible in linux output (ifconfig -a).
Severity
--------
Major
Steps to Reproduce
------------------
Interfaces (mellanox) do not appear to be visible in the linux output (while they do appear to be there).
This is preventing install/
The following error is reported during configure controller step on these labs.
Errors are being reported during configure controller indicates that it is unable to find the interface ens785f0
eg.
{"log":"E0117 20:18:30.697446 1 common.go:226] controller/host \"msg\"=\"user data error\" \"error\"=\"unable to find interface UUID for port: ens785f0\" \"request\"=
{\"Namespace\
\n","stream"
{"log":"E0117 20:19:31.405278 1 common.go:226] controller/host \"msg\"=\"user data error\" \"error\"=\"unable to find interface UUID for port: ens785f0\" \"request\"=
{\"Namespace\
\n","stream"
The mellanox interfaces do not appear to be visible. Only these 2 interfaces are reported in the linux output.
$ system host-port-list controller-0
+------
| uuid | name | type | pci address | device | processor | accelerated | device type |
+------
| 2b181f2c-
| | | | | | | | [1521] |
| | | | | | | | |
| f02d98df-
| | | | | | | | [1521]
controller-0:~$ ifconfig -a
cali675b11b2b81: flags=4163<
inet6 fe80::ecee:
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cali8c7155b8c25: flags=4163<
inet6 fe80::ecee:
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 40971661 bytes 14589139995 (13.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 40971661 bytes 14589139995 (13.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
calid2619c1ab5f: flags=4163<
inet6 fe80::ecee:
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 40971661 bytes 14589139995 (13.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 40971661 bytes 14589139995 (13.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:70:69:02:fc txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=4163<
inet 128.224.151.80 netmask 255.255.254.0 broadcast 128.224.151.255
inet6 2620:10a:
inet6 fe80::a6bf:
ether a4:bf:01:00:06:90 txqueuelen 1000 (Ethernet)
RX packets 11602842 bytes 3055925470 (2.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 423747 bytes 35184409 (33.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x91920000-9193ffff
enp3s0f3: flags=4098<
ether a4:bf:01:00:06:91 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x91900000-9191ffff
lo: flags=73<
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 40971661 bytes 14589139995 (13.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 40971661 bytes 14589139995 (13.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo:1: flags=73<
inet 192.168.204.2 netmask 255.255.255.0
loop txqueuelen 1000 (Local Loopback)
lo:5: flags=73<
inet 192.168.206.2 netmask 255.255.255.0
loop txqueuelen 1000 (Local Loopback)
tunl0: flags=193<
inet 172.16.192.64 netmask 255.255.255.255
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Expected Behavior
------------------
Expect the lab install/
Actual Behavior
----------------
Configuration fails as interface expected is not visible in linux.
Reproducibility
---------------
yes
System Configuration
-------
Any system with Mellanox NICs
Lab-name:
wcp_61-62
wcp_63-66
Branch/Pull Time/Commit
-------
master as of 20200117T023005Z
(also seen with a master load from Jan 10)
Last Pass
---------
The same configuration for lab wcp 63-66 worked on the following build
2019-12-13_19-03-42
Note: Output for december load (for lab wcp63-66) was as following.
The two mellanox ports are missing on the latest master load.
[sysadmin@
+------
| uuid | name | type | pci address | device | processor | accelerated | device type |
+------
| 6f857eaa-
| bf93c6d6-
| 12bc274e-
| 0c72b5e3-
+------
Timestamp/Logs
--------------
Test Activity
-------------
Install/
Changed in starlingx: | |
assignee: | nobody → Jim Somerville (jsomervi) |
description: | updated |
description: | updated |
summary: |
- Config controller failing with unable to find interface (Mellanox family - interface not visible in linux) + config failing with unable to find interface (Mellanox family interface + not visible in linux) |
tags: | added: in-r-stx20 in-r-stx30 |
Based on the load info, this maybe related to the kernel upversion which merged on Jan 2: /review. opendev. org/#/c/ 695355/
https:/
Perhaps there is a compatibility issue with the mellanox drivers.
Assigning to Jim Somerville to investigate since he has access to the WR labs that include the mlx NICs.