BNX2X firmware an hang on Trusty (3.13) and Utopic (3.16).

Bug #1454286 reported by Rafael David Tinoco on 2015-05-12
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Dan Streetman
Trusty
Undecided
Dan Streetman

Bug Description

It was brought to my attention that BNX2X FIRMWARE is causing HW hangs on kernels 3.13 and 3.16 (until next power-cycle):

Messages like this one:

"""
Mar 30 13:42:05 host kernel: [3986360.755781] device em3 left promiscuous mode
Mar 30 13:42:05 host kernel: [3986360.755849] br1: port 1(em3) entered disabled state
Mar 30 13:42:05 host kernel: [3986360.756305] IPv6: ADDRCONF(NETDEV_UP): em3: link is not ready
Mar 30 13:42:08 host kernel: [3986363.354376] device em3 entered promiscuous mode
Mar 30 13:42:08 host kernel: [3986363.376093] bnx2x: [bnx2x_open:11791(em3)]Recovery flow hasn't been properly completed yet. Try again later.
Mar 30 13:42:08 host kernel: [3986363.376093] If you still see this message after a few retries then power cycle is required.
Mar 30 13:42:08 host kernel: [3986363.420113] bnx2x: [bnx2x_open:11791(em3)]Recovery flow hasn't been properly completed yet. Try again later.
Mar 30 13:42:08 host kernel: [3986363.420113] If you still see this message after a few retries then power cycle is required.
Mar 30 13:42:08 host kernel: [3986363.443840] IPv6: ADDRCONF(NETDEV_UP): br1: link is not ready
"""

Are likely to be given in syslog whenever this happens.

WORKAROUND: After providing the user a "linux-lts-vivid" and an updated linux-firmware package (https://launchpad.net/~inaddy/+archive/ubuntu/sf00080928/) the issue was fixed after package installation and power-cycle -> Power cycle is needed since previous firmware hangs HW until it is shutdown.

My comments:

"""
Actually after reading the Changelog for FW 7.10.51 (specifically the part "Chip may stall in very rare cases under heavy traffic with FW GRO enabled.") I think we might need to upgrade not only the FIRMWARE but also the DRIVER.

commit 626041248d3fb5b2fca5c9af172f00fa3bb6dcfe
Author: Yuval Mintz <email address hidden>
Date: Sun Aug 17 16:47:46 2014 +0300

bnx2x: Update driver version to 1.710.5

commit e42780b66aab88d3a82b6087bcd6095b90eecde7
Author: Dmitry Kravkov <email address hidden>
Date: Sun Aug 17 16:47:43 2014 +0300

bnx2x: Utilize FW 7.10.51

Observations:

inaddy@alien:~/Bugs/customer/sf00080928/sources/trusty/linux-firmware-1.127.11$ find . | grep bnx2x
./bnx2x
./bnx2x/bnx2x-e1h-7.8.19.0.fw
./bnx2x/bnx2x-e1-7.8.17.0.fw
./bnx2x/bnx2x-e1-7.8.19.0.fw
./bnx2x/bnx2x-e2-7.8.17.0.fw
./bnx2x/bnx2x-e1h-7.8.17.0.fw
./bnx2x/bnx2x-e2-7.8.19.0.fw

inaddy@alien:~/Bugs/customer/sf00080928/sources/utopic/linux-firmware-1.138.1$ find . | grep bnx2x
./bnx2x
./bnx2x/bnx2x-e1h-7.8.19.0.fw
./bnx2x/bnx2x-e1-7.8.17.0.fw
./bnx2x/bnx2x-e1-7.8.19.0.fw
./bnx2x/bnx2x-e2-7.10.51.0.fw
./bnx2x/bnx2x-e1h-7.10.51.0.fw
./bnx2x/bnx2x-e2-7.8.17.0.fw
./bnx2x/bnx2x-e1h-7.8.17.0.fw
./bnx2x/bnx2x-e1-7.10.51.0.fw
./bnx2x/bnx2x-e2-7.8.19.0.fw

Firmware 7.10.51 is already present in Utopic but the commit to support it is only in 3.18. It might be feasible to backport it to 3.16 (and use HWE) but not sure about 3.13 (to be resolved if this new firmware/driver fixes the issue).
"""

I'll provide more comments right after my backport attempt (of this new version) is made.

Tags: sts Edit Tag help
Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → In Progress

This is a HOTFIX for LP1454286:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1454286

Where I am backporting bnx2x driver from Vivid to Trusty and Utopic.

-> Please provide feedback regarding bnx2x new driver and firmware for Utopic (and lts-utopic HWE for Trusty).

Thank you

Rafael Tinoco

Observations from PPA:

#### Trusty

Not available yet

#### Utopic (and Trusty lts-utopic HWE kernel)

Kernel UBUNTU: Ubuntu-3.16.0-38.52 + the following upstream patches:

1)

Author: Dmitry Kravkov <email address hidden>
Date: Sun Aug 17 16:47:43 2014 +0300

    bnx2x: Utilize FW 7.10.51

     - (L2) In some multi-function configurations, inter-PF and inter-VF
       Tx switching is incorrectly enabled.

     - (L2) Wrong assert code in FLR final cleanup in case it is sent not
       after FLR.

     - (L2) Chip may stall in very rare cases under heavy traffic with FW GRO
       enabled.

     - (L2) VF malicious notification error fixes.

     - (L2) Default gre tunnel to IPGRE which allows proper RSS for IPGRE packets,
       L2GRE traffic will reach single queue.

     - (FCoE) Fix data being placed in wrong buffer when corrupt FCoE frame is
       received.

     - (FCoE) Burst of FIP packets with destination MAC of ALL-FCF_MACs
       causes FCoE traffic to stop.

    Signed-off-by: Dmitry Kravkov <email address hidden>
    Signed-off-by: Yuval Mintz <email address hidden>
    Signed-off-by: Ariel Elior <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

2)

Author: Yuval Mintz <email address hidden>
Date: Sun Aug 17 16:47:45 2014 +0300

    bnx2x: Code cleanup

    This patch does several semantic things:
      - Fixing typos.
      - Removing unnecessary prints.
      - Removing unused functions and definitions.
      - Change 'strange' usage of boolean variables.

    Signed-off-by: Yuval Mintz <email address hidden>
    Signed-off-by: Ariel Elior <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

    Conflicts:
        drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
        drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c

3)

Author: Yuval Mintz <email address hidden>
Date: Sun Aug 17 16:47:46 2014 +0300

    bnx2x: Update driver version to 1.710.51

    Signed-off-by: Yuval Mintz <email address hidden>
    Signed-off-by: Ariel Elior <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

tags: added: sts

Minor observation:

linux-firmware package for Trusty (1.127.12) already includes the FW 7.10.51 (needed for this backport to work) so linux-firmware from Trusty is good after the following bug:

https://launchpad.net/bugs/1378491

was solved for Vivid (and also for Utopic and Trusty).

Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
Dan Streetman (ddstreet) on 2015-12-22
Changed in linux (Ubuntu):
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet) wrote :

I've backported the bnx2x driver from lts-vivid (3.19) to trusty (3.13) and have it included in this ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1454286

Anyone having problems with the bnx2x driver on trusty (3.13), please test with this kernel PPA.

Dan Streetman (ddstreet) wrote :

Also, if anyone is using bnx2x with trusty lts-utopic (3.16 kernel), please let me know, as I'd rather not spend the time backporting to a kernel that's scheduled for EOL later this year, unless there's a need for it.

Changed in linux (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
status: New → In Progress
Dan Streetman (ddstreet) wrote :

Update: after some discussion, I'm going to pull specific commits into the trusty 3.13 bnx2x driver, instead of pulling the entire 3.19 bnx2x driver back. I'll update this bug when I have something new to test.

Dan Streetman (ddstreet) wrote :

For reference, to repeat the commit hashes that Rafael mentioned above (in standard reverse chronological order):

ddstreet@toughbook:~/linux$ git show 6260412 | head -5
commit 626041248d3fb5b2fca5c9af172f00fa3bb6dcfe
Author: Yuval Mintz <email address hidden>
Date: Sun Aug 17 16:47:46 2014 +0300

    bnx2x: Update driver version to 1.710.51
ddstreet@toughbook:~/linux$ git show 0c23ad3 | head -5
commit 0c23ad37a220b6a58b90e36203fe915c80dbd403
Author: Yuval Mintz <email address hidden>
Date: Sun Aug 17 16:47:45 2014 +0300

    bnx2x: Code cleanup
ddstreet@toughbook:~/linux$ git show e42780b | head -5
commit e42780b66aab88d3a82b6087bcd6095b90eecde7
Author: Dmitry Kravkov <email address hidden>
Date: Sun Aug 17 16:47:43 2014 +0300

    bnx2x: Utilize FW 7.10.51

Dan Streetman (ddstreet) wrote :

Note: commit 6260412 is *definitely* not appropriate, as it simply updates the driver version, which is wrong to do without also updating the driver with all the preceeding commits.

Dan Streetman (ddstreet) wrote :

Note: commit e42780b actually updates the driver to use the new firmware:

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/b
index 5ba8af5..3b6cbd2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -2876,8 +2876,8 @@ struct afex_stats {
 };

 #define BCM_5710_FW_MAJOR_VERSION 7
-#define BCM_5710_FW_MINOR_VERSION 8
-#define BCM_5710_FW_REVISION_VERSION 19
+#define BCM_5710_FW_MINOR_VERSION 10
+#define BCM_5710_FW_REVISION_VERSION 51
 #define BCM_5710_FW_ENGINEERING_VERSION 0
 #define BCM_5710_FW_COMPILE_FLAGS 1

however, this commit sits on top of 87 other commits:

ddstreet@toughbook:~/linux/drivers/net/ethernet/broadcom/bnx2x$ git log --oneline Ubuntu-3.13.0-78.122..e42780b . | wc -l
87

so, it's not clear yet if any of those 87 commits are also needed to work properly with the 7.10.51 firmware.

Dan Streetman (ddstreet) wrote :

I individually cherry-picked each relevant commit to the bnx2x driver, leading up to the commit referenced above, to the trusty kernel. A test ppa is here:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1454286

please test with this ppa to verify it fixes the firmware hangs.

Dan Streetman (ddstreet) wrote :

I've tested this backported driver for basic functionality - boots, can connect to the interface, ran iperf tests and several flent tests - but I can't reproduce this problem, so I can't test to verify this backport and updated firmware actually fixes it.

Anyone who can reproduce this problem, please test the kernel ppa and report success/failure. This backport will never make it into the trusty release without verification that it fixes the problem.

Dan Streetman (ddstreet) wrote :

moving this to wontfix as i cant get a response from the original reporter.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status: In Progress → Won't Fix
Dan Streetman (ddstreet) wrote :

> moving this to wontfix as i cant get a response from the original reporter.

meaning original reporter who can reproduce this, not meaning Rafael.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers