bnx2x driver crash

Bug #1643558 reported by zibort
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

On server with two NICs on Broadcom, bnx2x driver is crashes on boot. Full crash dump is in attach.

# lspci | grep BCM57711
15:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe
15:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-47-generic 4.4.0-47.68
ProcVersionSignature: Ubuntu 4.4.0-47.68-generic 4.4.24
Uname: Linux 4.4.0-47-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 ноя 21 13:50 seq
 crw-rw---- 1 root audio 116, 33 ноя 21 13:50 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Nov 21 16:21:07 2016
InstallationDate: Installed on 2001-01-01 (5803 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: IBM BladeCenter HS22 -[7870H4G]-
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=ru_RU.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-47-generic.efi.signed root=UUID=8fae9581-bd38-4063-8871-cda77ac6c4ec ro
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-47-generic N/A
 linux-backports-modules-4.4.0-47-generic N/A
 linux-firmware 1.157.5
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/03/2012
dmi.bios.vendor: IBM
dmi.bios.version: -[P9E156CUS-1.17]-
dmi.board.asset.tag: (none)
dmi.board.name: 68Y8071
dmi.board.vendor: IBM
dmi.board.version: (none)
dmi.chassis.asset.tag: none
dmi.chassis.type: 17
dmi.chassis.vendor: IBM
dmi.chassis.version: none
dmi.modalias: dmi:bvnIBM:bvr-[P9E156CUS-1.17]-:bd02/03/2012:svnIBM:pnBladeCenterHS22-[7870H4G]-:pvr07:rvnIBM:rn68Y8071:rvr(none):cvnIBM:ct17:cvrnone:
dmi.product.name: BladeCenter HS22 -[7870H4G]-
dmi.product.version: 07
dmi.sys.vendor: IBM

Revision history for this message
zibort (zibort) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc6

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
zibort (zibort) wrote :

I don't remember about exact prior kernel versions.
Now there is 4.4.0-21-generic, 4.4.0-45-generic, 4.4.0-47-generic kernels, and problem is happing on all of them.

4.9.0-040900rc6-generic kernel dosn't fix a problem (see attach).

Revision history for this message
zibort (zibort) wrote :

I don't remember about exact prior kernel versions.
Now there is 4.4.0-21-generic, 4.4.0-45-generic, 4.4.0-47-generic kernels, and problem is happing on all of them.

4.9.0-040900rc6-generic kernel dosn't fix a problem (see attach).

zibort (zibort)
tags: added: kernel-bug-exists-upstream
Revision history for this message
zibort (zibort) wrote :

Hi,

there are some intersting details about this bug to help make fix. On fresh Ubuntu 16.04 there is no that bug at all.

bnx2x driver crash is occur on vxlan tasks, for example
# ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev enp21s0f1 dstport 4789
# ip li set vxlan0 up

or using openvswitch with vxlan feature
# ovs-vsctl add-port br-test vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip="172.29.3.102"

Hope it's helpful.

Thx

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Kernel team, is there anything missing?

Thanks

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The v4.9-rc8 upstream kernel is now available. It might be worthwhile to test it:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc8/

If the bug still exists in -rc9, would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
anna (anna.sgu) wrote :

In kernel 4.9.0-040900rc8-generic #201612051443 SMP Mon Dec 5 19:45:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux problem is not fixed.

Problem is in driver itself, if bnx2x is recompiled without udp_tunnel (kernel 4.9) everything works fine.

Revision history for this message
anna (anna.sgu) wrote :

Bug is fixed in mainline kernel.
With kernel 4.9.0-040900-generic #201612111631 SMP Sun Dec 11 21:33:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux bnx2x doesn't crash when vxlans are enabled.

Would it be possible to backport patch for kernels 4.4.x?

tags: added: kernel-fixed-upstream
removed: kernel-bug-exists-upstream
Revision history for this message
zibort (zibort) wrote :

I confirm that with kernel 4.9.0-040900-generic #201612111631 this bug is fixed.

Revision history for this message
Ondrej Vasko (ondrej.vasko) wrote :

I also had this issue with following combination while deployin Openstack with VXLAN:

Ubuntu 16.04 4.4.0-79-generic
HP BL460c G7
HP VirtualConnect
HP NC 532m mezzanine (BCM57711)

After update to 4.10.17 (I didn't try 4.9) it was working right.

Will the fix be backported to Ubuntu 16.04 LTS kernel?

Revision history for this message
admgsic (j-3dmin-q) wrote :
Download full text (24.0 KiB)

Good afternoon, I do not know if this could be related to the same problem.

I have a network card with 2 ports 10 gb.

05: 00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
05: 00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

They use bnx2x

Then I have an integrated network card with four ports but I have not had a problem with it.
02: 00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02: 00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02: 00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02: 00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)

They use tg3

The machine is an HP Proliant DL 360 Gen9

And I use it as a cloud driver in OpenStack and at the same time it is used to provide access to cloud vxlan networks.

The fact is that I had never had problems until 21 days ago in which I lost the connectivity of a port of 10 GB and restarted the server and returned to normal.

Other data:

Ubuntu Server 16.04
Kernel: 4.4.0-104-generic
OpenStack: Pike release

The port that failed: ens2f0
driver: bnx2x
version: 1.712.30-0
firmware-version: bc 7.13.23 phy 1.34
expansion-rom-version:
bus-info: 0000: 05: 00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

I have also occupied the other port ens2f1 of the 10gb card but it did not give me problems and it has less traffic than ens2f0.

LOGS KERNEL:

May 2 02:50:22 hermes kernel: [7904456.095764] bnx2x: [bnx2x_stats_update:1232(ens2f0)]storm stats were not updated for 3 times
May 2 02:50:22 hermes kernel: [7904456.095772] bnx2x: [bnx2x_stats_update:1233(ens2f0)]driver assert
May 2 02:50:22 hermes kernel: [7904456.095776] bnx2x: [bnx2x_panic_dump:919(ens2f0)]begin crash dump -----------------
May 2 02:50:22 hermes kernel: [7904456.095780] bnx2x: [bnx2x_panic_dump:929(ens2f0)]def_idx(0x928c) def_att_idx(0x66d8) attn_state(0x0) spq_prod_idx(0xa5) next_stats_cnt(0x927a)
May 2 02:50:22 hermes kernel: [7904456.095783] bnx2x: [bnx2x_panic_dump:934(ens2f0)]DSB: attn bits(0x0) ack(0x10) id(0x0) idx(0x66d8)
May 2 02:50:22 hermes kernel: [7904456.095785] bnx2x: [bnx2x_panic_dump:935(ens2f0)] def (0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xb97 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0) igu_sb_id(0x0) igu_seg_id(0x1) pf_id(0x0) vnic_id(0x0) vf_id(0xff) vf_valid (0x0) state(0x1)
May 2 02:50:22 hermes kernel: [7904456.095807] bnx2x: [bnx2x_panic_dump:986(ens2f0)]fp0: rx_bd_prod(0x4278) rx_bd_cons(0xb1) rx_comp_prod(0xb72b) rx_comp_cons(0xb55f) *rx_cons_sb(0xb55f)
May 2 02:50:22 hermes kernel: [7904456.095809] bnx2x: [bnx2x_panic_dump:989(ens2f0)] rx_sge_prod(0x2b00) last_max_sge(0x273b) fp_hc_idx(0x7dbb)
May 2 02:50:22 hermes kernel: [7904456.095813] bnx2x: [bnx2x_panic_dump:1006(ens2f0)]fp0: tx_pkt_prod(0x79cb) tx_pkt_cons(0x79cb) tx_bd_prod(0xe2cc) tx_bd_cons(0xe2cb) *tx_cons_sb(0x79cb)
May 2 02:50:22 hermes kernel: [7904456.095816] bnx2x: ...

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

has the same issue.

my env is:

centos7.5 + BCM57800,

i am using openshift + vxlan, when a vxlan packet is sent to the node, node is restarted automatically.

I also test the latest kernel (4.18.12), which still have the same issue.

i also tried to disable tso/gso/gro/tx-udp_tnl-segmentation/rx-udp_tunnel-port-offload/tx-udp_csum-segmentation, still do not work.

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.