mlxbf-gige: autonegotiation fails to complete on BF2

Bug #2062384 reported by Asmaa Mnebhi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
New
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]

During the reboot test, QA found an intermittent issue where the OOB link is down.
The link is down because the KSZ9031 PHY fails to complete autonegotiation.
Even under "normal" circumstances where autonegotiation completes,
it takes an abnormal time to do so (on average, at least 8 seconds).

Hence, the hardware team and Microchip are involved in this debug but the root cause is still unknown.
In the meantime, we need to provide a software workaround since customers are starting to see this issue as well.

[Fix]

* Restart autonegotiation when it fails the first time.

[Test Case]

* On BF2, Do the reboot test: 2000 loops.
* Check that the OOB link is up and ip is assigned.

[Regression Potential]

* no known regression.

[Other]
* Note that this issue is BF2 hardware specific. The same ethernet code is used for BF3 and we don't see any issues. In fact, the link up time on BF3 <= 1s. On BF2, the link up time is > 8s.
* we have been aware of this issue for 2 years and have shared this with the PHY vendor and the hardware team but there were not root causes identified.

Changed in linux-bluefield (Ubuntu Jammy):
status: New → Fix Committed
Changed in linux-bluefield (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.4.0-1086.93 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-bluefield' to 'verification-done-focal-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-focal-linux-bluefield' to 'verification-failed-focal-linux-bluefield'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-bluefield-v2 verification-needed-focal-linux-bluefield
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1044.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield
Revision history for this message
Feysel Mohammed (feyselm) wrote :

Using 5.15.0-1044-bluefield, After 2000 reboots, OOB link is up and ip is assigned.

tags: added: verification-done-jammy-linux-bluefield
removed: verification-needed-jammy-linux-bluefield
Revision history for this message
Feysel Mohammed (feyselm) wrote :

Using 5.4.0-1086-bluefield, After 2000 reboots, OOB link is up and ip is assigned.

tags: added: verification-done-focal-linux-bluefield
removed: verification-needed-focal-linux-bluefield
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.0 KiB)

This bug was fixed in the package linux-bluefield - 5.4.0-1086.93

---------------
linux-bluefield (5.4.0-1086.93) focal; urgency=medium

  * focal/linux-bluefield: 5.4.0-1086.93 -proposed tracker (LP: #2063770)

  * mlxbf-gige: Vitesse PHY stuck in a bad state during reboot test
    (LP: #2064163)
    - SAUCE: mlxbf-gige: OOB PHY stuck in a bad state during reboot test

  * mlxbf-gige: autonegotiation fails to complete on BF2 (LP: #2062384)
    - SAUCE: mlxbf-gige: autonegotiation fails to complete on BF2

  [ Ubuntu: 5.4.0-186.206 ]

  * focal/linux: 5.4.0-186.206 -proposed tracker (LP: #2063812)
  * Mount CIFS fails with Permission denied (LP: #2061986)
    - cifs: fix ntlmssp auth when there is no key exchange
  * USB stick can't be detected (LP: #2040948)
    - usb: Disable USB3 LPM at shutdown
  * CVE-2024-26733
    - net: dev: Convert sa_data to flexible array in struct sockaddr
    - arp: Prevent overflow in arp_req_get().
    - stddef: Introduce DECLARE_FLEX_ARRAY() helper
  * CVE-2024-26712
    - powerpc/kasan: Fix addr error caused by page alignment
  * CVE-2023-52530
    - wifi: mac80211: fix potential key use-after-free
  * CVE-2021-47063
    - drm: bridge/panel: Cleanup connector on bridge detach
  * [Ubuntu 22.04.4/linux-image-6.5.0-26-generic] Kernel output "UBSAN: array-
    index-out-of-bounds in /build/linux-hwe-6.5-34pCLi/linux-
    hwe-6.5-6.5.0/drivers/net/hyperv/netvsc.c:1445:41" multiple times,
    especially during boot. (LP: #2058477)
    - hv: hyperv.h: Replace one-element array with flexible-array member
  * CVE-2024-26614
    - tcp: make sure init the accept_queue's spinlocks once
    - ipv6: init the accept_queue's spinlocks in inet6_create
  * Focal update: v5.4.271 upstream stable release (LP: #2060216)
    - netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
    - net: ip_tunnel: prevent perpetual headroom growth
    - tun: Fix xdp_rxq_info's queue_index when detaching
    - ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
    - lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is
      detected
    - net: usb: dm9601: fix wrong return value in dm9601_mdio_read
    - Bluetooth: Avoid potential use-after-free in hci_error_reset
    - Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
    - Bluetooth: Enforce validation on max value of connection interval
    - netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
    - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
    - efi/capsule-loader: fix incorrect allocation size
    - power: supply: bq27xxx-i2c: Do not free non existing IRQ
    - ALSA: Drop leftover snd-rtctimer stuff from Makefile
    - afs: Fix endless loop in directory parsing
    - gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
    - wifi: nl80211: reject iftype change with mesh ID change
    - btrfs: dev-replace: properly validate device names
    - dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
    - dmaengine: fsl-qdma: init irq after reg initialization
    - mmc: core: Fix eMMC initialization with 1-bit bus connection
    - x86/cpu/intel: Detect TME keyid bits before setting MTRR mas...

Read more...

Changed in linux-bluefield (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.