Comment 0 for bug 1834322

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

We are losing port channel aggregation on reboot.

After the reboot, /var/log/syslog contains the entries:
[ 250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
               Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports
[ 282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
               Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports

Aggregator IDs of the slave interfaces are different:
ubuntu@node-6:~$ cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp24s0f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:51
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp24s0f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:50
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1

The mismatch in "Aggregator ID" on the port is a symptom of the issue. If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up', the port with the mismatched ID appears to renegotiate with the port-channel and becomes aggregated.

The other way to workaround this issue is to put bond ports down and bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

When I change the order of bringing the ports up (first enp24s0f1np1, and second enp24s0f0np0), the issue is still there.

When the issue occurs, a port on the switch, corresponding to interface enp24s0f0np0 is in Suspended state. After applying the workaround the port is no longer in Suspended state and Aggregator IDs in /proc/net/bonding/bond2 are equal.

I installed 5.0.0 kernel, the issue is still there.

Operating System:
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

ubuntu@node-6:~$ uname -a
Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@node-6:~$ sudo lspci -vnvn
https://pastebin.ubuntu.com/p/Dy2CKDbySC/

Hardware: Dell PowerEdge R740xd
BIOS version: 2.1.7

sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-AQBEU7Gw8a_AJTuq0AOZO

ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
https://pastebin.ubuntu.com/p/sqCx79vZWM/

ubuntu@node-6:~$ lspci -n | grep 18:00
18:00.0 0200: 14e4:16d8 (rev 01)
18:00.1 0200: 14e4:16d8 (rev 01)

ubuntu@node-6:~$ modinfo bnx2x
https://pastebin.ubuntu.com/p/pkmzsFjK8M/

ubuntu@node-6:~$ ip -o l
https://pastebin.ubuntu.com/p/QpW7TjnT2v/

ubuntu@node-6:~$ ip -o a
https://pastebin.ubuntu.com/p/MczKtrnmDR/

ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
https://pastebin.ubuntu.com/p/9cZpPc7C6P/

ubuntu@node-6:~$ sudo lshw -c network
https://pastebin.ubuntu.com/p/gmfgZptzDT/