Mellanox MT27500 Family [ConnectX-3] 40G NIC not being detected by udev

Bug #1675091 reported by Mike Rushton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
checkbox-support (Ubuntu)
Fix Released
Undecided
Rod Smith
Xenial
Invalid
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Medium
Rod Smith
Xenial
Confirmed
Medium
Unassigned

Bug Description

During certification testing on Power8NVL, the network tests for the Mellanox 40/56G network cards are failing. According to the failure pasted below, udev/dbus isn't detecting the network card properly, but NetworkManager is. My feeling is this might be an issue with UdevadmParser.

ERROR: devices missing - udev showed 7 NETWORK devices, but NetworkManager saw 8 devices in ('Ethernet', 'Modem')
---------------------------[ Devices found by udev ]----------------------------
Category: NETWORK
Interface: enP1p1s0f0
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0001:00/0001:00:00.0/0001:01:00.0
ID: [14e4:1657]
Subsystem ID: [1014:0420]

Category: NETWORK
Interface: enP1p1s0f1
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0001:00/0001:00:00.0/0001:01:00.1
ID: [14e4:1657]
Subsystem ID: [1014:0420]

Category: NETWORK
Interface: enP1p1s0f2
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0001:00/0001:00:00.0/0001:01:00.2
ID: [14e4:1657]
Subsystem ID: [1014:0420]

Category: NETWORK
Interface: enP1p1s0f3
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0001:00/0001:00:00.0/0001:01:00.3
ID: [14e4:1657]
Subsystem ID: [1014:0420]

Category: NETWORK
Interface: enP8p1s0d1
Product: MT27500 Family [ConnectX-3]
Vendor: Mellanox Technologies
Driver: mlx4_core (ver: 2.2-1)
Path: /devices/pci0008:00/0008:00:00.0/0008:01:00.0

Category: NETWORK
Interface: enP9p7s0f0
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0009:00/0009:00:00.0/0009:01:00.0/0009:02:04.0/0009:07:00.0
ID: [14e4:1657]
Subsystem ID: [14e4:1981]

Category: NETWORK
Interface: enP9p7s0f1
Product: NetXtreme BCM5719 Gigabit Ethernet PCIe
Vendor: Broadcom Corporation
Driver: tg3 (ver: Unknown)
Path: /devices/pci0009:00/0009:00:00.0/0009:01:00.0/0009:02:04.0/0009:07:00.1
ID: [14e4:1657]
Subsystem ID: [14e4:1657]

----------------------[ Devices found by Network Manager ]----------------------
Category: Ethernet
Interface: enP1p1s0f3
IP: 10.20.30.54
Driver: tg3 (ver: Unknown)
State: Unmanaged

Category: Ethernet
Interface: enP9p7s0f0
IP: 10.20.30.52
Driver: tg3 (ver: Unknown)
State: Unmanaged

Category: Ethernet
Interface: enP9p7s0f1
IP: 10.20.30.53
Driver: tg3 (ver: Unknown)
State: Unmanaged

Category: Ethernet
Interface: enP8p1s0
IP: 10.20.30.5
Driver: mlx4_core (ver: 2.2-1)
State: Unmanaged

Category: Ethernet
Interface: enP1p1s0f0
IP: 10.20.30.56
Driver: tg3 (ver: Unknown)
State: Unmanaged

Category: Ethernet
Interface: enP8p1s0d1
IP: 10.20.30.6
Driver: mlx4_core (ver: 2.2-1)
State: Unmanaged

Category: Ethernet
Interface: enP1p1s0f1
IP: 10.20.30.57
Driver: tg3 (ver: Unknown)
State: Unmanaged

Category: Ethernet
Interface: enP1p1s0f2
IP: 10.20.30.55
Driver: tg3 (ver: Unknown)
State: Unmanaged

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-66-generic 4.4.0-66.87
ProcVersionSignature: Ubuntu 4.4.0-66.87-generic 4.4.44
Uname: Linux 4.4.0-66-generic ppc64le
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 21 17:00 seq
 crw-rw---- 1 root audio 116, 33 Mar 21 17:00 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Mar 22 15:59:40 2017
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=a287c2bd-7e5a-488c-8921-5cf86aed177a ro
ProcLoadAvg: 5.79 2.74 1.02 1/1261 881
ProcLocks:
 1: POSIX ADVISORY WRITE 4149 00:14:817 0 EOF
 2: POSIX ADVISORY WRITE 1701 00:14:477 0 EOF
 3: POSIX ADVISORY WRITE 4014 00:14:789 0 EOF
 4: POSIX ADVISORY WRITE 4033 00:14:802 0 EOF
 5: FLOCK ADVISORY WRITE 4012 00:14:798 0 EOF
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -1
ProcVersion: Linux version 4.4.0-66-generic (buildd@bos01-ppc64el-025) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #87-Ubuntu SMP Fri Mar 3 15:30:20 UTC 2017
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-66-generic N/A
 linux-backports-modules-4.4.0-66-generic N/A
 linux-firmware 1.157.8
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_dscr: DSCR is 0
cpu_freq:
 min: 3.953 GHz (cpu 79)
 max: 3.965 GHz (cpu 81)
 avg: 3.959 GHz
cpu_runmode:
 Could not retrieve current diagnostics mode,
 No kernel interface to firmware
cpu_smt: SMT=8

Related branches

Revision history for this message
Mike Rushton (leftyfb) wrote :
tags: added: blocks-hwcert-server
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc3

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Mike Rushton (leftyfb) wrote :

kernel 4.11.0-041100rc3-generic failed with the same symptoms.

Jeff Lane  (bladernr)
Changed in checkbox (Ubuntu):
status: New → Invalid
Changed in checkbox (Ubuntu Xenial):
status: New → Invalid
Changed in checkbox (Ubuntu):
status: Invalid → New
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote :

The important bits of udev seem to be these:

P: /devices/pci0008:00/0008:00:00.0/0008:01:00.0
E: DEVPATH=/devices/pci0008:00/0008:00:00.0/0008:01:00.0
E: DRIVER=mlx4_core
E: ID_MODEL_FROM_DATABASE=MT27500 Family [ConnectX-3]
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: MODALIAS=pci:v000015B3d00001003sv00001014sd000004B5bc02sc00i00
E: OF_COMPATIBLE_N=0
E: OF_FULLNAME=/pciex@3fffe40400000/pci@0/ethernet@0
E: OF_NAME=ethernet
E: PCI_CLASS=20000
E: PCI_ID=15B3:1003
E: PCI_SLOT_NAME=0008:01:00.0
E: PCI_SUBSYS_ID=1014:04B5
E: SUBSYSTEM=pci
E: USEC_INITIALIZED=11882462

P: /devices/pci0008:00/0008:00:00.0/0008:01:00.0/net/enP8p1s0
E: DEVPATH=/devices/pci0008:00/0008:00:00.0/0008:01:00.0/net/enP8p1s0
E: ID_BUS=pci
E: ID_MM_CANDIDATE=1
E: ID_MODEL_FROM_DATABASE=MT27500 Family [ConnectX-3]
E: ID_MODEL_ID=0x1003
E: ID_NET_DRIVER=mlx4_en
E: ID_NET_LINK_FILE=/lib/systemd/network/99-default.link
E: ID_NET_NAME_MAC=enxe41d2d2590c0
E: ID_NET_NAME_PATH=enP8p1s0
E: ID_OUI_FROM_DATABASE=Mellanox Technologies, Inc.
E: ID_PATH=pci-0008:01:00.0
E: ID_PATH_TAG=pci-0008_01_00_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=8
E: INTERFACE=enP8p1s0
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enP8p1s0
E: TAGS=:systemd:
E: USEC_INITIALIZED=8004370

P: /devices/pci0008:00/0008:00:00.0/0008:01:00.0/net/enP8p1s0d1
E: DEVPATH=/devices/pci0008:00/0008:00:00.0/0008:01:00.0/net/enP8p1s0d1
E: ID_BUS=pci
E: ID_MM_CANDIDATE=1
E: ID_MODEL_FROM_DATABASE=MT27500 Family [ConnectX-3]
E: ID_MODEL_ID=0x1003
E: ID_NET_DRIVER=mlx4_en
E: ID_NET_LINK_FILE=/lib/systemd/network/99-default.link
E: ID_NET_NAME_MAC=enxe41d2d2590c1
E: ID_NET_NAME_PATH=enP8p1s0d1
E: ID_OUI_FROM_DATABASE=Mellanox Technologies, Inc.
E: ID_PATH=pci-0008:01:00.0
E: ID_PATH_TAG=pci-0008_01_00_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=9
E: INTERFACE=enP8p1s0d1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enP8p1s0d1
E: TAGS=:systemd:
E: USEC_INITIALIZED=7956109

Revision history for this message
Rod Smith (rodsmith) wrote :

I've been looking at this (on and off), and it seems to reach close to the end of UdevadmParser.run() (in udevadm.py) with the data intact, but it gets lost in the last line or in transferring control out, in the main() function of the udev_resource script.

Rod Smith (rodsmith)
Changed in checkbox (Ubuntu):
assignee: nobody → Rod Smith (rodsmith)
Changed in linux (Ubuntu):
assignee: nobody → Rod Smith (rodsmith)
Jeff Lane  (bladernr)
Changed in checkbox (Ubuntu):
status: Incomplete → In Progress
Jeff Lane  (bladernr)
affects: checkbox (Ubuntu) → checkbox-support (Ubuntu)
Jeff Lane  (bladernr)
tags: removed: blocks-hwcert-server
Rod Smith (rodsmith)
Changed in checkbox-support (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.