Kernel oops + system freeze on network-bridge shutdown

Bug #1616107 reported by AW
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
bridge-utils (Ubuntu)
Invalid
Critical
Unassigned
Xenial
Invalid
Critical
Unassigned
Yakkety
Invalid
Critical
Unassigned
linux (Ubuntu)
Fix Released
Critical
Unassigned
Xenial
Fix Released
Critical
Tim Gardner

Bug Description

A Kernel oops leaves Ubuntu 16.04 unusable when a network bridge is brought down on a HPE 530SFP+ 10GBit NIC that uses bnx2x as a driver. This error does not appear in Ubuntu 14.04 however.

The error is reproducible whenever issuing the commands "shutdown", "service networking stop" or "brctl delbr br0". Manually creating the bridge and subsequently bringing it down results in the same error.

/var/log/kern.log:
[...]
Aug 23 15:09:46 base1 kernel: [ 617.996677] device ens1f0 left promiscuous mode
Aug 23 15:09:46 base1 kernel: [ 617.996699] br0: port 1(ens1f0) entered disabled state
Aug 23 15:09:46 base1 kernel: [ 617.996730] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d2
Aug 23 15:09:46 base1 kernel: [ 618.008306] IP: [<ffffffffc0486d78>] __vlan_flush+0x18/0x60 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.020549] PGD 10374c0067 PUD 1033927067 PMD 0
Aug 23 15:09:46 base1 kernel: [ 618.032773] Oops: 0002 [#1] SMP
Aug 23 15:09:46 base1 kernel: [ 618.044434] Modules linked in: nls_iso8859_1 ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass sb_edac edac_core joydev bridge stp llc input_leds hpilo lpc_ich ioatdma ipmi_si ipmi_msghandler shpchp mac_hid acpi_power_meter ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid1 hid_generic crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd igb usbhid hid bnx2x dca ahci i2c_algo_bit vxlan libahci ip6_udp_tunnel udp_tunnel ptp pps_core mdio libcrc32c wmi fjes
Aug 23 15:09:46 base1 kernel: [ 618.058563] CPU: 3 PID: 4049 Comm: brctl Not tainted 4.4.0-34-generic #53-Ubuntu
Aug 23 15:09:46 base1 kernel: [ 618.058564] Hardware name: HP ProLiant DL120 Gen9/ProLiant DL120 Gen9, BIOS P86 05/05/2016
Aug 23 15:09:46 base1 kernel: [ 618.058574] task: ffff881030676040 ti: ffff8810341e4000 task.ti: ffff8810341e4000
Aug 23 15:09:46 base1 kernel: [ 618.058576] RIP: 0010:[<ffffffffc0486d78>] [<ffffffffc0486d78>] __vlan_flush+0x18/0x60 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058754] RSP: 0018:ffff8810341e7d68 EFLAGS: 00010206
Aug 23 15:09:46 base1 kernel: [ 618.058769] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Aug 23 15:09:46 base1 kernel: [ 618.058774] RDX: ffff881038470848 RSI: 0000000000000000 RDI: 0000000000000000
Aug 23 15:09:46 base1 kernel: [ 618.058775] RBP: ffff8810341e7d78 R08: 0000000000000000 R09: ffffffff8170d949
Aug 23 15:09:46 base1 kernel: [ 618.058776] R10: ffffea0000d61340 R11: ffff8810329d2c00 R12: 00000000000000c0
Aug 23 15:09:46 base1 kernel: [ 618.058777] R13: ffff881030044000 R14: ffff881038470840 R15: 0000000000000000
Aug 23 15:09:46 base1 kernel: [ 618.058782] FS: 00007f9aebc94700(0000) GS:ffff88107fcc0000(0000) knlGS:0000000000000000
Aug 23 15:09:46 base1 kernel: [ 618.058789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 23 15:09:46 base1 kernel: [ 618.058790] CR2: 00000000000000d2 CR3: 000000102fe83000 CR4: 00000000001406e0
Aug 23 15:09:46 base1 kernel: [ 618.058802] Stack:
Aug 23 15:09:46 base1 kernel: [ 618.058806] 0000000000000000 ffff8810356a4c00 ffff8810341e7d98 ffffffffc0489258
Aug 23 15:09:46 base1 kernel: [ 618.058822] ffff8810356a4c00 ffff881038470840 ffff8810341e7dc0 ffffffffc0479bd8
Aug 23 15:09:46 base1 kernel: [ 618.058825] ffff881038470838 ffff881038470848 ffff881038470000 ffff8810341e7df8
Aug 23 15:09:46 base1 kernel: [ 618.058827] Call Trace:
Aug 23 15:09:46 base1 kernel: [ 618.058863] [<ffffffffc0489258>] nbp_vlan_flush+0x28/0x65 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058870] [<ffffffffc0479bd8>] del_nbp+0x98/0x130 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058889] [<ffffffffc0479cb2>] br_dev_delete+0x42/0xb0 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058895] [<ffffffffc0479dea>] br_del_bridge+0x4a/0x70 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058911] [<ffffffffc047b4e3>] br_ioctl_deviceless_stub+0x153/0x230 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.058984] [<ffffffff81345533>] ? security_file_alloc+0x33/0x50
Aug 23 15:09:46 base1 kernel: [ 618.059095] [<ffffffff81704625>] sock_ioctl+0x215/0x290
Aug 23 15:09:46 base1 kernel: [ 618.059121] [<ffffffff81220c3f>] do_vfs_ioctl+0x29f/0x490
Aug 23 15:09:46 base1 kernel: [ 618.059223] [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
Aug 23 15:09:46 base1 kernel: [ 618.059264] [<ffffffff8122b525>] ? fd_install+0x25/0x30
Aug 23 15:09:46 base1 kernel: [ 618.059266] [<ffffffff81220ea9>] SyS_ioctl+0x79/0x90
Aug 23 15:09:46 base1 kernel: [ 618.059359] [<ffffffff8182def2>] entry_SYSCALL_64_fastpath+0x16/0x71
Aug 23 15:09:46 base1 kernel: [ 618.059369] Code: 1b c1 c0 e9 30 ff ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 49 89 fc 31 c0 49 81 c4 c0 00 00 00 <66> 89 87 d2 00 00 00 48 8b 87 c0 00 00 00 49 39 c4 48 8b 08 74
Aug 23 15:09:46 base1 kernel: [ 618.059514] RIP [<ffffffffc0486d78>] __vlan_flush+0x18/0x60 [bridge]
Aug 23 15:09:46 base1 kernel: [ 618.059522] RSP <ffff8810341e7d68>
Aug 23 15:09:46 base1 kernel: [ 618.059523] CR2: 00000000000000d2
Aug 23 15:09:46 base1 kernel: [ 618.060149] ---[ end trace f915551f71712a3d ]---
[...]

dmesg | grep bnx2x:
[ 3.523820] bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
[ 3.544982] bnx2x 0000:03:00.0: msix capability found
[ 3.549671] bnx2x 0000:03:00.0: part number 0-0-0-0
[ 3.682788] bnx2x 0000:03:00.1: msix capability found
[ 3.685014] bnx2x 0000:03:00.1: part number 0-0-0-0
[ 4.588510] bnx2x 0000:03:00.0 ens1f0: renamed from eth1
[ 4.688509] bnx2x 0000:03:00.1 ens1f1: renamed from eth2
[ 10.104242] bnx2x 0000:03:00.0 ens1f0: failed to initialize vlan filtering on this port
[ 10.713654] bnx2x 0000:03:00.0 ens1f0: using MSI-X IRQs: sp 47 fp[0] 50 ... fp[5] 55
[ 11.946057] bnx2x 0000:03:00.0 ens1f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit

/etc/network/interfaces:
# The loopback network interface
auto lo
iface lo inet loopback

# The primary 10GbE network interface
auto ens1f0
iface ens1f0 inet manual

# The secondary 10GbE network interface
# auto ens1f1
# iface ens1f1 inet manual

# Bridged network interface
auto br0
iface br0 inet static
  address 192.168.xxx.xxx
  netmask 255.255.255.0
  network 192.168.xxx.0
  broadcast 192.168.xxx.255
  gateway 192.168.xxx.xxx
  dns-nameservers 192.168.xxx.xxx 192.168.xxx.xxx
  dns-search xxx
  bridge_ports ens1f0
  bridge_hello 2
  bridge_fd 9
  bridge_maxage 12
  bridge_stp off

lsb_release -rd:
Description: Ubuntu 16.04.1 LTS
Release: 16.04

cat /proc/version_signature:
Ubuntu 4.4.0-34.53-generic 4.4.15

apt-cache policy bridge-utils:
bridge-utils:
  Installed: 1.5-9ubuntu1
  Candidate: 1.5-9ubuntu1
  Version table:
 *** 1.5-9ubuntu1 500
        500 http://de.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status
---
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
Dependencies:
 gcc-6-base 6.0.1-0ubuntu1
 libc6 2.23-0ubuntu3
 libgcc1 1:6.0.1-0ubuntu1
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-08-22 (1 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
Package: linux
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 4.4.0-34.53-generic 4.4.15
Tags: xenial
Uname: Linux 4.4.0-34-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
Dependencies:
 gcc-6-base 6.0.1-0ubuntu1
 libc6 2.23-0ubuntu3
 libgcc1 1:6.0.1-0ubuntu1
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-08-22 (1 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
Package: linux
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 4.4.0-34.53-generic 4.4.15
Tags: xenial
Uname: Linux 4.4.0-34-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True

AW (jnx)
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1616107/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
AW (jnx)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.8 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8-rc3

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key needs-bisect xenial
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1616107

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
AW (jnx) wrote : JournalErrors.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
AW (jnx) wrote : ProcEnviron.txt

apport information

description: updated
Revision history for this message
AW (jnx) wrote : JournalErrors.txt

apport information

Revision history for this message
AW (jnx) wrote : ProcEnviron.txt

apport information

AW (jnx)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-fixed-upstream
description: updated
Revision history for this message
AW (jnx) wrote :

Some more information:

On Ubuntu 16.04 LTS:
- The above error occurs in all Ubuntu 16.04 LTS versions since the beta and up to Ubuntu 16.04.1 LTS with kernel 4.4.0-36-generic #55-Ubuntu
- It does NOT occur in Ubuntu 16.04.1 LTS with manually installed kernel "v4.8-rc3 mainline build"
- bridge-utils: 1.5-9ubuntu1

On Ubuntu 14.04 LTS:
- The error does NOT occur on any Ubuntu 14 versions up to Ubuntu 14.04.5 LTS with kernel 4.4.0-36-generic #55~14.04.1-Ubuntu
- bridge-utils: 1.5-6ubuntu2

I tested with a clean install, only addition is "sudo apt install bridge-utils".

I have 8 HPE DL120 Gen9 in slightly different configurations with one HPE 530SFP+ NIC each. Firmware up to date. Ubuntu 14 always works, Ubuntu 16 always gives the error, except with the mainline kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you also see if this bug happens with the latest upstream stable 4.4 kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.19/

If it does still happen with 4.4.19, we can perform a reverse bisect to find the fix that is in 4.8-rc3.

Revision history for this message
AW (jnx) wrote :

So I tested a couple of them:

v4.4.19 mainline build -> Same oops

v4.6.4 mainline build -> No oops
Remark: Bridge is up after boot but without network connection. "sudo service networking restart" fixes that.

v4.6.7 mainline build -> No oops
Remark: Bridge is up after boot but without network connection. "sudo service networking restart" fixes that.

v4.7.2 mainline build -> No oops
Remark: Console freezes during boot, access only by ssh.

v4.8-rc3 mainline build -> No oops
Remark: Console freezes during boot, access only by ssh.

Revision history for this message
Lolo (8-l) wrote :

4.4.0-38.57 doesn't fix the issue
4.4.20-040420-generic doesn't fix it either
4.7.* are fine but there's a bug with xenbus that renders 4.7.* unusable with xen (https://patchwork.kernel.org/patch/9281193/)
4.8rcs are ok

Revision history for this message
Lolo (8-l) wrote :

and 4.4.20-040420-generic now includes the xenbus bug...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Per comment #10, the bug was fixed somewhere between upstream 4.4 and 4.6. To narrow it down further, so we can perform a reverse bisect can you test the 4.5 final kernel:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

If 4.5 is good, maybe test 4.5-rc1:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-rc1-wily/

To perform the reverse bisect, we need to identify the last bad kernel version and the first good one.

Revision history for this message
AW (jnx) wrote :

More test results:

v4.5 mainline build -> No oops, no problems

v4.5-rc1 mainline build -> No oops, no problems

v4.4.21 mainline build -> Same oops

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like we can perform a reverse bisect between 4.4 final and 4.5-rc1. Can you give 4.4 final a test, just to confirm it exhibits the bug and it wasn't introduced in on of the stable updates in 4.4? 4.4 final can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily/

Revision history for this message
AW (jnx) wrote :

4.4 final shows the same oops.

Changed in linux (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
no longer affects: linux (Ubuntu Yakkety)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a Reverse kernel bisect between v4.4 final and v4.5-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
1289ace5b4f70f1e68ce785735b82c7e483de863

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1616107

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
AW (jnx) wrote :

The #17 kernel has no oops. Network and bridge are brought up after boot, however there is no network connectivity on the bridge. "sudo service networking restart" fixes that and after that everything works.

Revision history for this message
AW (jnx) wrote :

I'll have the servers for another ~2 weeks for testing. After that they're going off to the customer with Ubuntu 14. Please tell me now if you need more testing done.

Cheers

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in bridge-utils (Ubuntu Xenial):
status: New → Confirmed
Changed in bridge-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Niedermeier (tniedermeier) wrote :

This bug affects me too. I'm working on a Tyan Power8 machine (PPC64el) with Ubuntu 16.04.1 and 10GB network cards using the bnx2x driver. A configured network bridge on my machine results in a kernel oops if I trigger a shutdown or reboot, see attachment.
I installed the 4.8.0-040800-generic kernel and now the shutdown or reboot works fine, so I would recommend using the LTS Enablement Stack (linux-generic-lts-yakkety) when it's available.

Changed in bridge-utils (Ubuntu Yakkety):
importance: Undecided → High
Changed in bridge-utils (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: High → Critical
Changed in bridge-utils (Ubuntu Yakkety):
importance: High → Critical
Changed in linux (Ubuntu):
importance: High → Critical
Changed in bridge-utils (Ubuntu Xenial):
importance: High → Critical
Revision history for this message
jwiegley (jeffw) wrote :

6 weeks later what's the fix for this? I think we can safely assume that this bug affects everyone using a bridged bnx2x device and 16.04.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We started a Reverse bisect in comment #17. However, it appears the hardware is no longer avaiable.

As mentioned in comment #22, this is fixed in Yakkety. You could use that kernel, but if you can reproduce the bug, we can continue the reverse bisect? That would allow us to identify the commit that fixes the bug, and have it SRU'd to Xenial.

Revision history for this message
AW (jnx) wrote :

Unfortunately I have to confirm, that I no longer have the hardware to test it on.

Revision history for this message
Breno Leitão (breno-leitao) wrote :

Kernel 4.8 just make 16.04 proposed repository. Does it happen on Ubuntu 16.04 with kernel 4.8?

Revision history for this message
bugproxy (bugproxy) wrote : sosreport from the iaos1 host.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-148746 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-16 13:27 EDT-------
Reversed mirrored as we can still replicate this with 4.4.0-47. I'll give the proposed 4.8 kernel a try later today.

Regardless of whether the bnx2x vlan filter init error is resolved, it seems that the nbp_vlan_flush() routine needs to be hardened a bit to hand a null vlan group (seems like simply posting a warning and returning would be a better response than panicking since there isn't a group to flush).

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-16 13:36 EDT-------
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06682471/JournalErrors.txt on 2016-11-16 12:33:12
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06682471/shutdown-problem-power8 on 2016-11-16 12:33:12
cde00 (<email address hidden>) added native attachment /tmp/AIXOS06682471/ProcEnviron.txt on 2016-11-16 12:33:12

Revision history for this message
bugproxy (bugproxy) wrote : sosreport from the iaos1 host.

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : shutdown-problem-power8

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-16 16:02 EDT-------
Tested with 4.8.0-27 from proposed. With this kernel, the vlan filtering init appears to have been successful, so br->vlgrp was not NULL on shutdown.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-17 19:00 EDT-------
I can reproduce the problem using the following steps:

brctl addbr boom
brctl stp boom off
brctl addif boom enP2p1s0f3
brctl delif boom enP2p1s0f3

enP2p1s0f3 is a port on a bnx2.

The Oops will occur when "brctl delif" is run.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-18 20:02 EDT-------
I traced the probolem to __vlan_vid_add(), this code snipit:

/* Try switchdev op first. In case it is not supported, fallback to
* 8021q add.
*/
err = switchdev_port_obj_add(dev, &v.obj);
if (err == -EOPNOTSUPP)
return vlan_vid_add(dev, br->vlan_proto, vid);
return err;

switchdev_port_obj_add() returns EOPNOTSUP so it it calls vlan_vid_add() that function is in 8021q module, on my system that was not loaded. If I set the interface to be added to the bridge UP then load 8021q before creating the bridge the problem will not happen. Using the following steps to create the
bridge will prevent the problem..

rmmod bridge
rmmod 8021q
ifconfig enP2p1s0f3 up
modprobe 8021q
brctl addbr boom
brctl stp boom off
brctl addif boom enP2p1s0f3
brctl delif boom enP2p1s0f3

I made a few attempts to alter the interfaces script to create a workaround. Not much luck. however we can prevent a panic by not shutting down the interface at system shutdown time. Using the no-auto-down directive in /etc/network/interfaces worked for me. My Bridge definition is:

no-auto-down boom
iface boom inet static
address 192.168.10.1
netmask 255.255.0.0
bridge_ports enP2p1s0f3
bridge_stp off

Revision history for this message
Adam Boyhan (driver86) wrote :

I can confirm that I have this issue in a ubuntu based OS (proxmox). I have the hardware, is there anything I can do to help resolve this issue?

Setting up /etc/network/interfaces with the "no-auto-down" option doesn't help, it prevents the interface from coming up all together.

Revision history for this message
Robie Basak (racb) wrote :

Based on the investigation so far, it seems to me that this is entirely a kernel bug, and there is nothing required in the bridge-utils package in order to fix this issue. So I'm marking this bug as Invalid for bridge-utils. The kernel tasks remain open. If this is incorrect, please explain why you expect to need a fix in bridge-utils and reopen that task.

Changed in bridge-utils (Ubuntu):
status: Confirmed → Invalid
Changed in bridge-utils (Ubuntu Xenial):
status: Confirmed → Invalid
Changed in bridge-utils (Ubuntu Yakkety):
status: Confirmed → Invalid
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-02 12:48 EDT-------
I found the root of the problem in the bnx2x driver. If the interface is down when it is slaved to the bridge it returns -EFAULT when attempting to add a vid.

tatic int bnx2x_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
{
struct bnx2x *bp = netdev_priv(dev);
struct bnx2x_vlan_entry *vlan;
bool hw = false;
int rc = 0;

if (!netif_running(bp->dev)) {
DP(NETIF_MSG_IFUP,
"Ignoring VLAN configuration the interface is down\n");
return -EFAULT;
}
......

This has been corrected in the commit: a02cc9d3cc9f98905df214d4a57e5918473260ea

From a02cc9d3cc9f98905df214d4a57e5918473260ea Mon Sep 17 00:00:00 2001
From: Michal Schmidt <email address hidden>
Date: Fri, 3 Jun 2016 15:32:18 +0200
Subject: [PATCH] bnx2x: allow adding VLANs while interface is down

Since implementing VLAN filtering in commit 05cc5a39ddb74
("bnx2x: add vlan filtering offload") bnx2x refuses to add a VLAN while
the interface is down:

# ip link add link enp3s0f0 enp3s0f0_10 type vlan id 10
RTNETLINK answers: Bad address

and in dmesg (with bnx2x.debug=0x20):
bnx2x: [bnx2x_vlan_rx_add_vid:12941(enp3s0f0)]Ignoring VLAN
configuration the interface is down

Other drivers have no problem with this.
Fix this peculiar behavior in the following way:
- Accept requests to add/kill VID regardless of the device state.
Maintain the requested list of VIDs in the bp->vlan_reg list.
- If the device is up, try to configure the VID list into the hardware.
If we run out of VLAN credits or encounter a failure configuring an
entry, fall back to accepting all VLANs.
If we successfully configure all entries from the list, turn the
fallback off.
- Use the same code for reconfiguring VLANs during NIC load.

Signed-off-by: Michal Schmidt <email address hidden>
Acked-by: Yuval Mintz <email address hidden>
Signed-off-by: David S. Miller <email address hidden>

-------------------------------------------------------------------------------------------------------------------------
I applied the patch to a 4.4.24 kernel; and it corrected the issue.

As a workaround simply bringing the interface up (ifconfig <interface> up> ) before adding it to the bridge.

Revision history for this message
bugproxy (bugproxy) wrote : bnx2x-allow-adding-VLANs-while-interface-is-down

------- Comment on attachment From <email address hidden> 2016-12-02 12:50 EDT-------

Upstream patch

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-12-02 18:01 EDT-------
Hi Canonical

This issue with the bnx2x driver exists in the all 4.4. kernels including the current stable (4.4.36). If 16.04 will continue to support a 4.4. kernel we need the following patch included in a kernel build. I am working on ppc64le however others have reported the issue on x86-64.

(The full patch is attached to this bugzilla)
From a02cc9d3cc9f98905df214d4a57e5918473260ea Mon Sep 17 00:00:00 2001
From: Michal Schmidt <email address hidden>
Date: Fri, 3 Jun 2016 15:32:18 +0200
Subject: [PATCH] bnx2x: allow adding VLANs while interface is down

Here are the steps to verify the fix:

# brctl addbr boom
# brctl stp boom off
# brctl addif boom <interface>
# brctl delif boom <interface>

<Interface> is any network interface using the bnx2x network driver.

The Oops happens after brctl delif is run unless patch is included.

Note: the problem can be avoided if <interface> is "up" before brctl addif is run. This worked when I ran the commands by hand as above. I attempted to modify the /etc/network/interfaces to change the state of the interface before creating the bridge but was unsuccessful. Maybe someone else will have better luck.

Revision history for this message
Chris Hofstaedtler (zeha) wrote :

Given 4.4 is maintained by gregk-h, I'd suggest you just submit that patch to stable@k.o...

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Luis Henriques (henrix)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Luis Henriques (henrix)
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-06 17:50 EDT-------
Sorry about the long delay, but our bridge missed the last two LP status updates. Validated with 4.4.0-72 and closing on our end. Thanks.

tags: added: targetmilestone-inin16042
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.