After the reboot, /var/log/syslog contains the entries:
[ 250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1) Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports
[ 282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1) Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports
Aggregator IDs of the slave interfaces are different:
ubuntu@node-6:~$ cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: enp24s0f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:51
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
Slave Interface: enp24s0f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:50
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
The mismatch in "Aggregator ID" on the port is a symptom of the issue. If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up', the port with the mismatched ID appears to renegotiate with the port-channel and becomes aggregated.
The other way to workaround this issue is to put bond ports down and bring up port enp24s0f0np0 first and port enp24s0f1np1 second.
When I change the order of bringing the ports up (first enp24s0f1np1, and second enp24s0f0np0), the issue is still there.
When the issue occurs, a port on the switch, corresponding to interface enp24s0f0np0 is in Suspended state. After applying the workaround the port is no longer in Suspended state and Aggregator IDs in /proc/net/bonding/bond2 are equal.
I installed 5.0.0 kernel, the issue is still there.
We are losing port channel aggregation on reboot.
After the reboot, /var/log/syslog contains the entries:
Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports
Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports
[ 250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
[ 282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
Aggregator IDs of the slave interfaces are different: bonding/ bond2
ubuntu@node-6:~$ cat /proc/net/
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: enp24s0f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:51
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
Slave Interface: enp24s0f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:50
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
The mismatch in "Aggregator ID" on the port is a symptom of the issue. If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up', the port with the mismatched ID appears to renegotiate with the port-channel and becomes aggregated.
The other way to workaround this issue is to put bond ports down and bring up port enp24s0f0np0 first and port enp24s0f1np1 second.
When I change the order of bringing the ports up (first enp24s0f1np1, and second enp24s0f0np0), the issue is still there.
When the issue occurs, a port on the switch, corresponding to interface enp24s0f0np0 is in Suspended state. After applying the workaround the port is no longer in Suspended state and Aggregator IDs in /proc/net/ bonding/ bond2 are equal.
I installed 5.0.0 kernel, the issue is still there.
Operating System:
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)
ubuntu@node-6:~$ uname -a
Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@node-6:~$ sudo lspci -vnvn /pastebin. ubuntu. com/p/Dy2CKDbyS C/
https:/
Hardware: Dell PowerEdge R740xd
BIOS version: 2.1.7
sosreport: https:/ /drive. google. com/open? id=1-eN7cZJIeu- AQBEU7Gw8a_ AJTuq0AOZO
ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G /pastebin. ubuntu. com/p/sqCx79vZW M/
https:/
ubuntu@node-6:~$ lspci -n | grep 18:00
18:00.0 0200: 14e4:16d8 (rev 01)
18:00.1 0200: 14e4:16d8 (rev 01)
ubuntu@node-6:~$ modinfo bnx2x /pastebin. ubuntu. com/p/pkmzsFjK8 M/
https:/
ubuntu@node-6:~$ ip -o l /pastebin. ubuntu. com/p/QpW7TjnT2 v/
https:/
ubuntu@node-6:~$ ip -o a /pastebin. ubuntu. com/p/MczKtrnmD R/
https:/
ubuntu@node-6:~$ cat /etc/netplan/ 98-juju. yaml /pastebin. ubuntu. com/p/9cZpPc7C6 P/
https:/
ubuntu@node-6:~$ sudo lshw -c network /pastebin. ubuntu. com/p/gmfgZptzD T/
https:/