Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Bionic |
Fix Released
|
High
|
Unassigned | ||
Disco |
Fix Released
|
High
|
Unassigned | ||
Eoan |
Fix Released
|
High
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned |
Bug Description
We are losing port channel aggregation on reboot.
After the reboot, /var/log/syslog contains the entries:
[ 250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
[ 282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
Aggregator IDs of the slave interfaces are different:
ubuntu@node-6:~$ cat /proc/net/
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: enp24s0f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:51
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
Slave Interface: enp24s0f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b0:26:28:48:9f:50
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
The mismatch in "Aggregator ID" on the port is a symptom of the issue. If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up', the port with the mismatched ID appears to renegotiate with the port-channel and becomes aggregated.
The other way to workaround this issue is to put bond ports down and bring up port enp24s0f0np0 first and port enp24s0f1np1 second.
When I change the order of bringing the ports up (first enp24s0f1np1, and second enp24s0f0np0), the issue is still there.
When the issue occurs, a port on the switch, corresponding to interface enp24s0f0np0 is in Suspended state. After applying the workaround the port is no longer in Suspended state and Aggregator IDs in /proc/net/
I installed 5.0.0 kernel, the issue is still there.
Operating System:
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)
ubuntu@node-6:~$ uname -a
Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@node-6:~$ sudo lspci -vnvn
https:/
Hardware: Dell PowerEdge R740xd
BIOS version: 2.1.7
sosreport: https:/
ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
https:/
ubuntu@node-6:~$ lspci -n | grep 18:00
18:00.0 0200: 14e4:16d8 (rev 01)
18:00.1 0200: 14e4:16d8 (rev 01)
ubuntu@node-6:~$ modinfo bnx2x
https:/
ubuntu@node-6:~$ ip -o l
https:/
ubuntu@node-6:~$ ip -o a
https:/
ubuntu@node-6:~$ cat /etc/netplan/
https:/
ubuntu@node-6:~$ sudo lshw -c network
https:/
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jun 26 10:21 seq
crw-rw---- 1 root audio 116, 33 Jun 26 10:21 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 1604:10c0 Tascam
Bus 001 Device 003: ID 1604:10c0 Tascam
Bus 001 Device 002: ID 1604:10c0 Tascam
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R740xd
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.173.6
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic uec-images
Uname: Linux 4.15.0-52-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy libvirt netdev plugdev sudo video
_MarkForUpload: False
dmi.bios.date: 04/03/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.1.7
dmi.board.name: 0JMK61
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.family: PowerEdge
dmi.product.name: PowerEdge R740xd
dmi.sys.vendor: Dell Inc.
tags: | added: ubuntu-certified |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Released |
Changed in linux (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Disco): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Eoan): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
subscribed ~field-critical
the issue is critically impairing the networking to the instance during the ongoing customer deployment