BCM5719/tg3 loses connectivity due to missing heartbeats between fw and driver

Bug #1751337 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Canonical Kernel Team

Bug Description

User reported several nodes lost connectivity in several situations, for instance during the netboot, in which a flood of arp traffic happens due to multiple simultaneous boot across the cluster.

No stack trace or message is seen, the device just stop receiving packets.

In our attempts to reproduce the issue BCM5719 lost connectivity, always only under a heavy arp storm, in the follow situations:

- changing MTU
- interface configuration (with ifconfig or ip tool)
- netboot

In order to fix the issue we need to include the upstream patches:

1 - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=748a240c5
which reads: "tg3: Fix rx hang on MTU change with 5717/5719 "

2 - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=506b0a395
which reads: "tg3: APE heartbeat changes"

Considering that 18.04 is planned to use linux 4.15 we will need to backport only the second patch.

I'll submit it to the ml and post here a reference.

CVE References

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-165090 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
tags: added: triage-g
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-02-26 07:39 EDT-------
FYI I've submitted the patch to the mail list: https://lists.ubuntu.com/archives/kernel-team/2018-February/090372.html

Manoj Iyer (manjo)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: New → Fix Committed
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.0 KiB)

This bug was fixed in the package linux - 4.15.0-12.13

---------------
linux (4.15.0-12.13) bionic; urgency=medium

  * linux: 4.15.0-12.13 -proposed tracker (LP: #1754059)

  * CONFIG_EFI=y on armhf (LP: #1726362)
    - [Config] CONFIG_EFI=y on armhf, reconcile secureboot EFI settings

  * ppc64el: Support firmware disable of RFI flush (LP: #1751994)
    - powerpc/pseries: Support firmware disable of RFI flush
    - powerpc/powernv: Support firmware disable of RFI flush

  * [Feature] CFL/CNL (PCH:CNP-H): New GPIO Commit added (GPIO Driver needed)
    (LP: #1751714)
    - gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation
    - pinctrl: intel: Allow custom GPIO base for pad groups
    - pinctrl: cannonlake: Align GPIO number space with Windows

  * [Feature] Add xHCI debug device support in the driver (LP: #1730832)
    - usb: xhci: Make some static functions global
    - usb: xhci: Add DbC support in xHCI driver
    - [Config] USB_XHCI_DBGCAP=y for commit mainline dfba2174dc42.

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * [Feature] [Graphics]Whiskey Lake (Coffelake-U 4+2) new PCI Device ID adds
    (LP: #1742561)
    - drm/i915/cfl: Adding more Coffee Lake PCI IDs.

  * [Bug] [USB Function][CFL-CNL PCH]Stall Error and USB Transaction Error in
    trace, Disable of device-initiated U1/U2 failed and rebind failed: -517
    during suspend/resume with usb storage. (LP: #1730599)
    - usb: Don't print a warning if interface driver rebind is deferred at resume

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
    - [Packaging] retpoline -- elide %cs:0xNNNN constants on i386
    - [Config] retpoline -- clean up i386 retpoline files

  * hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn
    callback") (LP: #1738334)
    - drm/ttm: add ttm_bo_io_mem_pfn to check io_mem_pfn

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * [18.04 FEAT] Automatically detect layer2 setting in the qeth device driver
    (LP: #1747639)
    - s390/diag: add diag26c support for VNIC info
    - s390/qeth: support early setup for z/VM NICs

  * Bionic update to v4.15.7 stable release (LP: #1752317)
    - netfilter: drop outermost socket lock in getsockopt()
    - arm64: mm: don't write garbage into TTBR1_EL1 register
    - kconfig.h: Include compiler types to avoid missed struct attributes
    - MIPS: boot: Define __ASSEMBLY__ for its.S build
    - xtensa: fix high memory/reserved memory collision
    - scsi: ibmvfc: fix misde...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.