[linux-azure][hibernation] Mellanox CX4 NIC's TX/RX packets stop increasing after hibernation/resume

Bug #1894896 reported by Dexuan Cui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Marcelo Cerri

Bug Description

[Impact]

Description of problem:
In a VM with CX4 VF NIC on Azure, after hibernation/resume, the TX/RX packet counters stop increaseing.
This issue doesn't exist in VM with a CX3 VF NIC.

This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel.

[Test Case]

How reproducible:
100%

Steps to Reproduce:
1. Start a VM in Azure that supports Accelerated Networking, and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ). Please make sure the VF NIC is CX-4 since the issue doesn't happen to CX-3.

2. Do hibernation from serial console
# systemctl hibernate

3. After the VM resumes back, check the MSI interrupt counters in /proc/interrupts for the CX-4 NIC, and also check “ifconfig” (e.g. “ifconfig enP2642s2”) for the RX/TX counters. These counters stop increasing while they should.

[Regression Potential]

The change touches netvsc and has potential to affect any instances using accelerated networking. However the fix is straightforward and it's a clean cherry-pick from 5.9.

[Other Info]

BUG FIX:
The fix is in the net.git tree now:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=19162fd4063a3211843b997a454b505edb81d5ce

Revision history for this message
Dexuan Cui (decui) wrote :
Revision history for this message
Marcelo Cerri (mhcerri) wrote :
description: updated
Changed in linux-azure (Ubuntu Focal):
status: New → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

groovy:linux-azure already has the fix that was included by the upstream stable update LP:#1897550.

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Focal):
assignee: nobody → Marcelo Cerri (mhcerri)
Stefan Bader (smb)
Changed in linux-azure (Ubuntu):
status: New → Invalid
Changed in linux-azure (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux-azure (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (80.2 KiB)

This bug was fixed in the package linux-azure - 5.4.0-1032.33

---------------
linux-azure (5.4.0-1032.33) focal; urgency=medium

  * focal/linux-azure: 5.4.0-1032.33 -proposed tracker (LP: #1903162)

  * Focal update: v5.4.66 upstream stable release (LP: #1896824)
    - [Config] azure: updateconfigs for VGACON_SOFT_SCROLLBACK

  * [linux-azure][hibernation] Mellanox CX4 NIC's TX/RX packets stop increasing
    after hibernation/resume (LP: #1894896)
    - hv_netvsc: Fix hibernation for mlx5 VF driver

  * [linux-azure][hibernation] GPU device no longer working after resume from
    hibernation in NV6 VM size (LP: #1894893)
    - PCI: hv: Fix hibernation in case interrupts are not re-created

  * linux-azure: build and include the tcm_loop module to the main kernel
    package (LP: #1791794)
    - [Config] linux-azure: CONFIG_LOOPBACK_TARGET=m (tcm_loop)

  * [linux-azure] Two Fixes For kdump Over Network (LP: #1883261)
    - PCI: hv: Fix the PCI HyperV probe failure path to release resource properly
    - PCI: hv: Retry PCI bus D0 entry on invalid device state

  [ Ubuntu: 5.4.0-55.61 ]

  * focal/linux: 5.4.0-55.61 -proposed tracker (LP: #1903175)
  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX
  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline
  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.
  * CVE-2020-14351
    - perf/core: Fix race in the perf_mmap_close() function
  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout
    - dm raid: fix discard limits for raid1 and raid10
    - dm raid: remove unnecessary discard limits for raid10
  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: drop unnecessary offset_in_page in extent buffer helpers
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks
  * Ethernet no link lights after reboot (Intel i225-v 2.5G) (LP: #1902578)
    - igc: Add PHY power management control
  * Undetected Data corruption in MPI workloads that use VSX for reductions on
    POWER9 DD2.1 systems (LP: #1902694)
    - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
    - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
      workaround
  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
    - s390: nvme ipl
    - s390: nvme reipl
    - s390/ipl: support NVMe IPL kernel para...

Changed in linux-azure (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.