Comment 41 for bug 1794477

Revision history for this message
Chris Valean (cvalean) wrote :

This does look resolved on WS2016 and Azure.
However, SR-IOV is broken in another way now on WS2019.

Affected proposed kernels:
- cosmic proposed
- bionic proposed edge - 4.18 based.

Tested this with cosmic linux-azure 4.18.0.1006.7 from proposed, same vhd:
- SR-IOV with Mellanox CX3 works fine on WS2016, all testing has passed.
- SR-IOV with Mellanox CX3/CX4 is broken on WS2019.

These are the relevant log portions showing the issue when the kernel attempts to load the driver:

dmesg:
[ 21.059766] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 21.059775] mlx4_core: Initializing 9488:00:02.0
[ 21.191481] mlx4_core 9488:00:02.0: Detected virtual function - running in slave mode
[ 21.191508] mlx4_core 9488:00:02.0: Sending reset
[ 21.191602] mlx4_core 9488:00:02.0: Sending vhcr0
[ 21.193338] mlx4_core 9488:00:02.0: HCA minimum page size:512
[ 21.193804] mlx4_core 9488:00:02.0: Timestamping is not supported in slave mode
[ 93.148028] mlx4_core 9488:00:02.0: communication channel command 0x5 (op=0x31) timed out
[ 93.148031] mlx4_core 9488:00:02.0: device is going to be reset
[ 93.171917] mlx4_core 9488:00:02.0: VF is sending reset request to Firmware
[ 93.172584] mlx4_core 9488:00:02.0: VF Reset succeed
[ 93.172585] mlx4_core 9488:00:02.0: device was reset successfully
[ 93.195311] mlx4_core 9488:00:02.0: NOP command failed to generate MSI-X interrupt IRQ 24)
[ 93.195312] mlx4_core 9488:00:02.0: Trying again without MSI-X
[ 93.196258] mlx4_core 9488:00:02.0: Failed to close slave function
[ 93.196866] mlx4_core: probe of 9488:00:02.0 failed with error -5

----

syslog:

Dec 4 14:35:18 ubuntu kernel: [ 21.059766] mlx4_core: Mellanox ConnectX core driver v4.0-0
Dec 4 14:35:18 ubuntu kernel: [ 21.059775] mlx4_core: Initializing 9488:00:02.0
Dec 4 14:35:18 ubuntu kernel: [ 21.191481] mlx4_core 9488:00:02.0: Detected virtual function - running in slave mode
Dec 4 14:35:18 ubuntu kernel: [ 21.191508] mlx4_core 9488:00:02.0: Sending reset
Dec 4 14:35:18 ubuntu kernel: [ 21.191602] mlx4_core 9488:00:02.0: Sending vhcr0
Dec 4 14:35:18 ubuntu kernel: [ 21.193338] mlx4_core 9488:00:02.0: HCA minimum page size:512
Dec 4 14:35:18 ubuntu kernel: [ 21.193804] mlx4_core 9488:00:02.0: Timestamping is not supported in slave mode
Dec 4 14:35:18 ubuntu kernel: [ 93.148028] mlx4_core 9488:00:02.0: communication channel command 0x5 (op=0x31) timed out
Dec 4 14:35:18 ubuntu kernel: [ 93.148031] mlx4_core 9488:00:02.0: device is going to be reset
Dec 4 14:35:18 ubuntu kernel: [ 93.171917] mlx4_core 9488:00:02.0: VF is sending reset request to Firmware
Dec 4 14:35:18 ubuntu kernel: [ 93.172584] mlx4_core 9488:00:02.0: VF Reset succeed
Dec 4 14:35:18 ubuntu kernel: [ 93.172585] mlx4_core 9488:00:02.0: device was reset successfully
Dec 4 14:35:18 ubuntu kernel: [ 93.195311] mlx4_core 9488:00:02.0: NOP command failed to generate MSI-X interrupt IRQ 24)
Dec 4 14:35:18 ubuntu kernel: [ 93.195312] mlx4_core 9488:00:02.0: Trying again without MSI-X
Dec 4 14:35:18 ubuntu kernel: [ 93.196258] mlx4_core 9488:00:02.0: Failed to close slave function
Dec 4 14:35:18 ubuntu kernel: [ 93.196866] mlx4_core: probe of 9488:00:02.0 failed with error -5