aws: xen-netfront: prevent potential error on hibernate

Bug #1906850 reported by Andrea Righi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Unassigned
Focal
Fix Released
Medium
Unassigned
Groovy
Fix Released
Medium
Unassigned

Bug Description

[Impact]

On hibernation xen-netfront is setting the device state on the xenbus to "Closing" and then it's waiting for the backend state to acknowledge that state (with a timeout). However, if the device is already in the state "Closed" this opteration will always hit the timeout, preventing the system to hibernate correctly.

[Test case]

It is a quite rare condition that can be reproduced hibernating/resuming a Xen instance multiple times. When the problem happens we should in the log a Xen error message like the following:

  Freezing timed out; the device may become inconsistent state

[Fix]

If the device is already in the state "Closed", simply tear it down without notifying the Xen backend.

[Regression potential]

The fix would just prevent aborting hibernation if the netfront device is already in a "Closed" state. The resume callback is forcing the device into the "Initializing" state anyway, basically forcing a reset of the device, so nothing else in the state machine can be potentially broken.

Andrea Righi (arighi)
summary: - aws: xen-netfront: prevent potential deadlock on hibernate
+ aws: xen-netfront: prevent potential error on hibernate
summary: - aws: xen-netfront: prevent potential error on hibernate
+ aws: xen-netfront: potential error on hibernate
summary: - aws: xen-netfront: potential error on hibernate
+ aws: xen-netfront: prevent potential error on hibernate
Ian May (ian-may)
Changed in linux-aws (Ubuntu Focal):
status: New → Fix Committed
Changed in linux-aws (Ubuntu Groovy):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.0 KiB)

This bug was fixed in the package linux-aws - 4.15.0-1093.99

---------------
linux-aws (4.15.0-1093.99) bionic; urgency=medium

  * bionic/linux-aws: 4.15.0-1093.99 -proposed tracker (LP: #1911275)

  * aws: network performance regression due to initial TCP receive buffer size
    change (LP: #1910200)
    - tcp: select sane initial rcvq_space.space for big MSS

  * arm64: prevent losing page dirty state (LP: #1908503)
    - arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()

  * Disable Atari partition support for cloud kernels (LP: #1908264)
    - [Config] Disable Atari partition support

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 4.15.0-133.137 ]

  * bionic/linux: 4.15.0-133.137 -proposed tracker (LP: #1911295)
  * [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors
    config: (LP: #1908219)
    - qxl: remove qxl_io_log()
    - qxl: move qxl_send_monitors_config()
    - qxl: hook monitors_config updates into crtc, not encoder.
  * Touchpad not detected on ByteSpeed C15B laptop (LP: #1906128)
    - Input: i8042 - add ByteSpeed touchpad to noloop table
  * vmx_nm_test in ubuntu_kvm_unit_tests interrupted on X-oracle-4.15 /
    B-oracle-4.15 / X-KVM / B-KVM (LP: #1872401)
    - KVM: nVMX: Always reflect #NM VM-exits to L1
  * stack trace in kernel (LP: #1903596)
    - net: napi: remove useless stack trace
  * CVE-2020-27777
    - [Config]: Set CONFIG_PPC_RTAS_FILTER
  * Bionic update: upstream stable patchset 2020-12-04 (LP: #1906875)
    - regulator: defer probe when trying to get voltage from unresolved supply
    - ring-buffer: Fix recursion protection transitions between interrupt context
    - time: Prevent undefined behaviour in timespec64_to_ns()
    - nbd: don't update block size after device is started
    - btrfs: sysfs: init devices outside of the chunk_mutex
    - btrfs: reschedule when cloning lots of extents
    - genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
    - hv_balloon: disable warning when floor reached
    - net: xfrm: fix a race condition during allocing spi
    - perf tools: Add missing swap for ino_generation
    - ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
    - can: rx-offload: don't call kfree_skb() from IRQ context
    - can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ
      context
    - can: dev: __can_get_echo_skb(): fix real payload length return value for RTR
      frames
    - can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
    - can: peak_usb: add range checking in decode operations
    - can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
    - can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is
      on
    - xfs: flush new eof page on truncate to avoid post-eof corruption
    - Btrfs: fix missing error return if writeback for extent buffer never started
    - ath9k_htc: Use appropriate rs_datalen type
    - usb: gadget: goku_udc: fix potential crashes in probe
    - gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
  ...

Read more...

Changed in linux-aws (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-aws - 5.4.0-1037.39

---------------
linux-aws (5.4.0-1037.39) focal; urgency=medium

  * focal/linux-aws: 5.4.0-1037.39 -proposed tracker (LP: #1911314)

  * aws: network performance regression due to initial TCP receive buffer size
    change (LP: #1910200)
    - tcp: select sane initial rcvq_space.space for big MSS

  * Disable Atari partition support for linux-aws (LP: #1908264)
    - [Config] Disable Atari partition support

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 5.4.0-63.71 ]

  * focal/linux: 5.4.0-63.71 -proposed tracker (LP: #1911333)
  * overlay: permission regression in 5.4.0-51.56 due to patches related to
    CVE-2020-16120 (LP: #1900141)
    - ovl: do not fail because of O_NOATIME
  * Focal update: v5.4.79 upstream stable release (LP: #1907151)
    - net/mlx5: Use async EQ setup cleanup helpers for multiple EQs
    - net/mlx5: poll cmd EQ in case of command timeout
    - net/mlx5: Fix a race when moving command interface to events mode
    - net/mlx5: Add retry mechanism to the command entry index allocation
  * Kernel 5.4.0-56 Wi-Fi does not connect (LP: #1906770)
    - mt76: fix fix ampdu locking
  * [Ubuntu 21.04 FEAT] mpt3sas: Request to include the patch set which supports
    topology where zoning is enabled in expander (LP: #1899802)
    - scsi: mpt3sas: Define hba_port structure
    - scsi: mpt3sas: Allocate memory for hba_port objects
    - scsi: mpt3sas: Rearrange _scsih_mark_responding_sas_device()
    - scsi: mpt3sas: Update hba_port's sas_address & phy_mask
    - scsi: mpt3sas: Get device objects using sas_address & portID
    - scsi: mpt3sas: Rename transport_del_phy_from_an_existing_port()
    - scsi: mpt3sas: Get sas_device objects using device's rphy
    - scsi: mpt3sas: Update hba_port objects after host reset
    - scsi: mpt3sas: Set valid PhysicalPort in SMPPassThrough
    - scsi: mpt3sas: Handling HBA vSES device
    - scsi: mpt3sas: Add bypass_dirty_port_flag parameter
    - scsi: mpt3sas: Handle vSES vphy object during HBA reset
    - scsi: mpt3sas: Add module parameter multipath_on_hba
    - scsi: mpt3sas: Bump driver version to 35.101.00.00

  [ Ubuntu: 5.4.0-62.70 ]

  * focal/linux: 5.4.0-62.70 -proposed tracker (LP: #1911144)
  * CVE-2020-28374
    - SAUCE: target: fix XCOPY NAA identifier lookup
  * Packaging resync (LP: #1786013)
    - update dkms package versions

 -- Kelsey Skunberg <email address hidden> Wed, 13 Jan 2021 19:01:10 -0700

Changed in linux-aws (Ubuntu Focal):
status: Fix Committed → Fix Released
Stefan Bader (smb)
Changed in linux-aws (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → Medium
Changed in linux-aws (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux-aws (Ubuntu Groovy):
importance: Undecided → Medium
Stefan Bader (smb)
Changed in linux-aws (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andrea Righi (arighi)
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.7 KiB)

This bug was fixed in the package linux-aws - 5.8.0-1028.30

---------------
linux-aws (5.8.0-1028.30) groovy; urgency=medium

  * groovy/linux-aws: 5.8.0-1028.30 -proposed tracker (LP: #1921043)

  * kernel: Enable CONFIG_BPF_LSM on Ubuntu (LP: #1905975)
    - [Config][aws] Enable CONFIG_BPF_LSM

  * Groovy update: upstream stable patchset 2021-03-05 (LP: #1917964)
    - [Config][aws] updateconfigs for USB_BDC_PCI

  * Enforce CONFIG_DRM_BOCHS=m (LP: #1916290)
    - [Config][aws] Enforce CONFIG_DRM_BOCHS=m

  * Groovy update: upstream stable patchset 2021-02-25 (LP: #1916960)
    - [Config][aws] updateconfigs for KPROBE_EVENTS_ON_NOTRACE

  * aws: update Xen hibernation patch set (LP: #1913410)
    - Revert "UBUNTU: SAUCE: xen: Update sched clock offset to avoid system
      instability in hibernation"
    - Revert "UBUNTU: SAUCE: xen: Introduce wrapper for save/restore sched clock
      offset"
    - Revert "UBUNTU: SAUCE: x86/xen: save and restore steal clock"
    - Revert "UBUNTU: SAUCE: xen/time: introduce xen_{save,restore}_steal_clock"
    - Revert "UBUNTU: SAUCE: xen-netfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: x86/xen: add system core suspend and resume
      callbacks"
    - Revert "UBUNTU: SAUCE: x86/xen: Introduce new function to map
      HYPERVISOR_shared_info on Resume"
    - Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
    - Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
    - SAUCE: xen/manage: keep track of the on-going suspend mode
    - SAUCE: xenbus: add freeze/thaw/restore callbacks support
    - SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on
      Resume
    - SAUCE: x86/xen: add system core suspend and resume callbacks
    - SAUCE: xen-netfront: add callbacks for PM suspend and hibernation support
    - SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen/time: introduce xen_{save,restore}_steal_clock
    - SAUCE: x86/xen: save and restore steal clock
    - SAUCE: xen: Introduce wrapper for save/restore sched clock offset
    - SAUCE: xen: Update sched clock offset to avoid system instability in
      hibernation
    - SAUCE: x86: tsc: avoid system instability in hibernation

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 5.8.0-49.55 ]

  * groovy/linux: 5.8.0-49.55 -proposed tracker (LP: #1921053)
  * selftests: bpf verifier fails after sanitize_ptr_alu fixes (LP: #1920995)
    - bpf: Simplify alu_limit masking for pointer arithmetic
    - bpf: Add sanity check for upper ptr_limit
    - bpf, selftests: Fix up some test_verifier cases for unprivileged
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * improper memcg accounting causes NULL pointer derefs (LP: #1918668)
    - SAUCE: Revert "mm: memcg/slab: optimize objcg stock draining"
  * kernel: Enable CONFIG_BPF_LSM on Ubuntu (LP: #1905975)
    - [Config] Enable CONFIG_BPF_LSM
  * Groovy u...

Changed in linux-aws (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.5 KiB)

This bug was fixed in the package linux-aws - 4.15.0-1098.105

---------------
linux-aws (4.15.0-1098.105) bionic; urgency=medium

  * bionic/linux-aws: 4.15.0-1098.105 -proposed tracker (LP: #1919513)

  * Enforce CONFIG_DRM_BOCHS=m (LP: #1916290)
    - [Config] aws: Add CONFIG_DRM_BOCHS=m (enforced)

  * Bionic update: upstream stable patchset 2021-02-26 (LP: #1917093)
    - [Config] aws: Updateconfigs for USB_BDC_PCI

  * Please trust Canonical Livepatch Service kmod signing key (LP: #1898716)
    - [Config] aws: enable CONFIG_MODVERSIONS=y
    - [Packaging] aws: build canonical-certs.pem from branch/arch certs

  * aws: update Xen hibernation patch set (LP: #1913410)
    - Revert "UBUNTU: SAUCE: xen-netfront: prevent unnecessary close on hibernate"
    - Revert "UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in
      xen_restore_pirqs."
    - Revert "UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation."
    - Revert "UBUNTU SAUCE [aws]: block: xen-blkfront: consider new dom0 features
      on restore"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in
      system core suspend callback"
    - Revert "UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper
      function"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock"
    - Revert "UBUNTU: SAUCE [aws] xen-time-introduce-xen_-save-restore-
      _steal_clock"
    - Revert "UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and
      hibernation support"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume
      callbacks"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map
      HYPERVISOR_shared_info on Resume"
    - Revert "UBUNTU: SAUCE: xen-blkfront: Fixed blkfront_restore to remove a call
      to negotiate_mq"
    - Revert "UBUNTU: SAUCE: xen-blkfront: resurrect request-based mode"
    - Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
    - Revert "UBUNTU: SAUCE: xen/manage: introduce helper function to know the on-
      going suspend mode"
    - Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
    - SAUCE: xen/manage: keep track of the on-going suspend mode
    - SAUCE: xen/manage: introduce helper function to know the on-going suspend
      mode
    - SAUCE: xenbus: add freeze/thaw/restore callbacks support
    - SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on
      Resume
    - SAUCE: x86/xen: add system core suspend and resume callbacks
    - SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen-netfront: add callbacks for PM suspend and hibernation support
    - SAUCE: xen/time: introduce xen_{save,restore}_steal_clock
    - SAUCE: x86/xen: save and restore steal clock
    - SAUCE: xen/events: add xen_shutdown_pirqs helper function
    - SAUCE: x86/xen: close event channels for PIRQs in system core suspend
      callback
    - SAUCE: xen-blkfront: resurrect request-based mode
    - SAUCE: xen-blkfront: add 'persistent_grants' parameter
    ...

Changed in linux-aws (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.