thunder nic: avoid link delays due to RX_PACKET_DIS

Bug #1630038 reported by dann frazier on 2016-10-03
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
dann frazier
Xenial
Medium
dann frazier
Yakkety
Medium
dann frazier

Bug Description

[Impact]
Link establishment is delayed during initialization, possibly resulting in remote fault conditions that may cause the interface to fail to come up.

[Test Case]
Put the system in a reboot loop and watch for a remote fault condition, or a failure to bring up the link that can only be resolved by reloading the module.

[Regression Risk]
Patch is to a specific driver that is only used on Cavium ThunderX systems. The patch is upstream, so will have upstream support for regressions.

CVE References

dann frazier (dannf) on 2016-10-07
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → dann frazier (dannf)
Seth Forshee (sforshee) on 2016-10-11
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

dann frazier (dannf) on 2016-10-18
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
dann frazier (dannf) on 2016-10-18
tags: added: verification-done-xenial
removed: verification-needed-xenial
dann frazier (dannf) on 2016-10-18
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Seth Forshee (sforshee) on 2016-10-20
Changed in linux (Ubuntu Yakkety):
assignee: nobody → dann frazier (dannf)
importance: Undecided → Medium
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-47.68

---------------
linux (4.4.0-47.68) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1636941

  * Add a driver for Amazon Elastic Network Adapters (ENA) (LP: #1635721)
    - lib/bitmap.c: conversion routines to/from u32 array
    - net: ethtool: add new ETHTOOL_xLINKSETTINGS API
    - net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
    - [config] enable CONFIG_ENA_ETHERNET=m (Amazon ENA driver)

  * unexpectedly large memory usage of mounted snaps (LP: #1636847)
    - [Config] switch squashfs to single threaded decode

 -- Kamal Mostafa <email address hidden> Wed, 26 Oct 2016 10:47:55 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package linux - 4.8.0-27.29

---------------
linux (4.8.0-27.29) yakkety; urgency=low

  [ Seth Forshee ]

  * Release Tracking Bug
    - LP: #1635377

  * proc_keys_show crash when reading /proc/keys (LP: #1634496)
    - SAUCE: KEYS: ensure xbuf is large enough to fix buffer overflow in
      proc_keys_show (LP: #1634496)

  * Revert "If zone is so small that watermarks are the same, stop zone balance"
    in yakkety (LP: #1632894)
    - Revert "UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the
      same, stop zone balance."

  * lts-yakkety 4.8 cannot mount lvm raid1 (LP: #1631298)
    - SAUCE: (no-up) dm raid: fix compat_features validation

  * kswapd0 100% CPU usage (LP: #1518457)
    - SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone
      balance.

  * [Trusty->Yakkety] powerpc/64: Fix incorrect return value from
    __copy_tofrom_user (LP: #1632462)
    - SAUCE: (no-up) powerpc/64: Fix incorrect return value from
      __copy_tofrom_user

  * Ubuntu 16.10: Oops panic in move_page_tables/page_remove_rmap after running
    memory_stress_ng. (LP: #1628976)
    - SAUCE: (no-up) powerpc/pseries: Fix stack corruption in htpe code

  * Paths not failed properly when unmapping virtual FC ports in VIOS (using
    ibmvfc) (LP: #1632116)
    - scsi: ibmvfc: Fix I/O hang when port is not mapped

  * [Ubuntu16.10]KV4.8: kernel livepatch config options are not set
    (LP: #1626983)
    - [Config] Enable live patching on powerpc/ppc64el

  * CONFIG_AUFS_XATTR is not set (LP: #1557776)
    - [Config] CONFIG_AUFS_XATTR=y

  * Yakkety update to 4.8.1 stable release (LP: #1632445)
    - arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
    - Using BUG_ON() as an assert() is _never_ acceptable
    - usb: misc: legousbtower: Fix NULL pointer deference
    - Staging: fbtft: Fix bug in fbtft-core
    - usb: usbip: vudc: fix left shift overflow
    - USB: serial: cp210x: Add ID for a Juniper console
    - Revert "usbtmc: convert to devm_kzalloc"
    - ALSA: hda - Adding one more ALC255 pin definition for headset problem
    - ALSA: hda - Fix headset mic detection problem for several Dell laptops
    - ALSA: hda - Add the top speaker pin config for HP Spectre x360
    - Linux 4.8.1

  * PSL data cache should be flushed before resetting CAPI adapter
    (LP: #1632049)
    - cxl: Flush PSL cache before resetting the adapter

  * thunder nic: avoid link delays due to RX_PACKET_DIS (LP: #1630038)
    - net: thunderx: Don't set RX_PACKET_DIS while initializing

  * crypto/vmx/p8_ghash memory corruption (LP: #1630970)
    - crypto: ghash-generic - move common definitions to a new header file
    - crypto: vmx - Fix memory corruption caused by p8_ghash
    - crypto: vmx - Ensure ghash-generic is enabled

  * arm64: SPCR console not autodetected (LP: #1630311)
    - of/serial: move earlycon early_param handling to serial
    - [Config] CONFIG_ACPI_SPCR_TABLE=y
    - ACPI: parse SPCR and enable matching console
    - ARM64: ACPI: enable ACPI_SPCR_TABLE
    - serial: pl011: add console matching function

  * include/linux/security.h header syntax error with !CONFIG_SECURITYFS
...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-30.32

---------------
linux (4.8.0-30.32) yakkety; urgency=low

  * CVE-2016-8655 (LP: #1646318)
    - packet: fix race condition in packet_set_ring

 -- Brad Figg <email address hidden> Thu, 01 Dec 2016 08:02:53 -0800

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released

Hi,
This fix introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45.
I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]).

BR,
Alex

[1] https://jira.opnfv.org/browse/ARMBAND-168
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17

Raghuram Kota (rkota) wrote :

Hi,

Regarding comment#6, can you please help provide :

1) The model of the specific server in use ?
2) A console log that help determine the UEFI firmware version running on that model ?

Thanks,
Raghu

Hi,

1) We tested different models (CRB-1S, CRB-2S) - all behave the same.
2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31).

All in all, this issue seems pretty tied to the switch we use, and all firmware/board model combinations behaved the same ...

Thanks,
Alex

On Tue, Mar 21, 2017 at 3:41 PM, Alexandru Avadanii
<email address hidden> wrote:
> Hi,
>
> 1) We tested different models (CRB-1S, CRB-2S) - all behave the same.
> 2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31).
>
> All in all, this issue seems pretty tied to the switch we use, and all
> firmware/board model combinations behaved the same ...

Hi Alex,

  Would you mind opening a new bug to track this regression and linking it here?

Hi, Dann,
I created a new bug and pasted the same info as above at [1].
Afaict, there is no useful information in the logs when link training fails.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers