Comment 0 for bug 1860347

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote : Config controller failing with unable to find interface (Mellanox family interface not visible in linux)

Brief Description
-----------------
Configuration of labs with mellanox interfaces fail. The interface is not visible in linux output (ifconfig -a).

Severity
--------
Major

Steps to Reproduce
------------------
Interfaces (mellanox) do not appear to be visible in the linux output (while they do appear to be there).

This is preventing install/configuration from successfully completing
The following error is reported during configure controller step on these labs.

Errors are being repored during configure controller indicates that it is unable to find the interface ens785f0
eg.

{"log":"E0117 20:18:30.697446 1 common.go:226] controller/host \"msg\"=\"user data error\" \"error\"=\"unable to find interface UUID for port: ens785f0\" \"request\"=
{\"Namespace\":\"deployment\",\"Name\":\"controller-0\"}
\n","stream":"stderr","time":"2020-01-17T20:18:30.704588677Z"}
{"log":"E0117 20:19:31.405278 1 common.go:226] controller/host \"msg\"=\"user data error\" \"error\"=\"unable to find interface UUID for port: ens785f0\" \"request\"=
{\"Namespace\":\"deployment\",\"Name\":\"controller-0\"}
\n","stream":"stderr","time":"2020-01-17T20:19:31.405496902Z"}

The mellanox interfaces do not appear to be visible. Only these 2 interfaces are reported in the linux output.

$ system host-port-list controller-0
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+----------------------------------+
| uuid | name | type | pci address | device | processor | accelerated | device type |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+----------------------------------+
| 2b181f2c-f742-41f4-b718-c9d62da8067e | enp3s0f0 | ethernet | 0000:03:00.0 | 0 | 0 | True | I350 Gigabit Network Connection |
| | | | | | | | [1521] |
| | | | | | | | |
| f02d98df-86dc-4386-9707-b12aa7dff50a | enp3s0f3 | ethernet | 0000:03:00.3 | 0 | 0 | True | I350 Gigabit Network Connection |
| | | | | | | | [1521]

controller-0:~$ ifconfig -a
cali675b11b2b81: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

cali8c7155b8c25: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
        RX packets 40971661 bytes 14589139995 (13.5 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 40971661 bytes 14589139995 (13.5 GiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

calid2619c1ab5f: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
        RX packets 40971661 bytes 14589139995 (13.5 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 40971661 bytes 14589139995 (13.5 GiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
        inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
        ether 02:42:70:69:02:fc txqueuelen 0 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp3s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 128.224.151.80 netmask 255.255.254.0 broadcast 128.224.151.255
        inet6 2620:10a:a001:a103:a6bf:1ff:fe00:690 prefixlen 64 scopeid 0x0<global>
        inet6 fe80::a6bf:1ff:fe00:690 prefixlen 64 scopeid 0x20<link>
        ether a4:bf:01:00:06:90 txqueuelen 1000 (Ethernet)
        RX packets 11602842 bytes 3055925470 (2.8 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 423747 bytes 35184409 (33.5 MiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
        device memory 0x91920000-9193ffff

enp3s0f3: flags=4098<BROADCAST,MULTICAST> mtu 1500
        ether a4:bf:01:00:06:91 txqueuelen 1000 (Ethernet)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
        device memory 0x91900000-9191ffff

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 40971661 bytes 14589139995 (13.5 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 40971661 bytes 14589139995 (13.5 GiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo:1: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 192.168.204.2 netmask 255.255.255.0
        loop txqueuelen 1000 (Local Loopback)

lo:5: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 192.168.206.2 netmask 255.255.255.0
        loop txqueuelen 1000 (Local Loopback)

tunl0: flags=193<UP,RUNNING,NOARP> mtu 1440
        inet 172.16.192.64 netmask 255.255.255.255
        tunnel txqueuelen 1000 (IPIP Tunnel)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Expected Behavior
------------------
Expect the lab install/configuration to succeed (as previously load did 2019-12-13_19-03-42)

Actual Behavior
----------------
Configuration fails as interface expected is not visible in linux.

Reproducibility
---------------
yes

System Configuration
--------------------
wcp_61-62
wcp_63-66

Lab-name:

Branch/Pull Time/Commit
-----------------------
20200117T023005Z

Last Pass
---------
The same configuration for lab wcp 63-66 worked on the following build
2019-12-13_19-03-42

Note: Output for december load (for lab wcp63-66) was as following.
Two highlighted ports are missing on the latest master load.
[sysadmin@controller-0 ~(keystone_admin)]$ system host-port-list controller-0
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+----------------------------------------+
| uuid | name | type | pci address | device | processor | accelerated | device type |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+----------------------------------------+
| 6f857eaa-f2ca-4d38-9849-c0ca37d8184f | eno1 | ethernet | 0000:03:00.0 | 0 | 0 | False | I350 Gigabit Network Connection [1521] |
| bf93c6d6-4127-44eb-bce8-460327d43082 | eno2 | ethernet | 0000:03:00.3 | 0 | 0 | False | I350 Gigabit Network Connection [1521] |
| 12bc274e-3248-458b-ac6a-f051e7f8201a | ens785f0 | ethernet | 0000:05:00.0 | 0 | 0 | False | MT27710 Family [ConnectX-4 Lx] [1015] |
| 0c72b5e3-e8b2-4abb-bb2b-fe15e34e4d0b | ens785f1 | ethernet | 0000:05:00.1 | 0 | 0 | False | MT27710 Family [ConnectX-4 Lx] [1015] |
+--------------------------------------+----------+----------+--------------+--------+-----------+-------------+----------------------------------------+

Timestamp/Logs
--------------

Test Activity
-------------
Install/configuration