aws: xen-netfront: prevent potential error on hibernate

Bug #1906850 reported by Andrea Righi on 2020-12-04
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Undecided
Unassigned
Bionic
Medium
Unassigned
Focal
Medium
Unassigned
Groovy
Medium
Unassigned

Bug Description

[Impact]

On hibernation xen-netfront is setting the device state on the xenbus to "Closing" and then it's waiting for the backend state to acknowledge that state (with a timeout). However, if the device is already in the state "Closed" this opteration will always hit the timeout, preventing the system to hibernate correctly.

[Test case]

It is a quite rare condition that can be reproduced hibernating/resuming a Xen instance multiple times. When the problem happens we should in the log a Xen error message like the following:

  Freezing timed out; the device may become inconsistent state

[Fix]

If the device is already in the state "Closed", simply tear it down without notifying the Xen backend.

[Regression potential]

The fix would just prevent aborting hibernation if the netfront device is already in a "Closed" state. The resume callback is forcing the device into the "Initializing" state anyway, basically forcing a reset of the device, so nothing else in the state machine can be potentially broken.

Andrea Righi (arighi) on 2020-12-04
summary: - aws: xen-netfront: prevent potential deadlock on hibernate
+ aws: xen-netfront: prevent potential error on hibernate
summary: - aws: xen-netfront: prevent potential error on hibernate
+ aws: xen-netfront: potential error on hibernate
summary: - aws: xen-netfront: potential error on hibernate
+ aws: xen-netfront: prevent potential error on hibernate
Ian (ian-may) on 2020-12-17
Changed in linux-aws (Ubuntu Focal):
status: New → Fix Committed
Changed in linux-aws (Ubuntu Groovy):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Launchpad Janitor (janitor) wrote :
Download full text (7.0 KiB)

This bug was fixed in the package linux-aws - 4.15.0-1093.99

---------------
linux-aws (4.15.0-1093.99) bionic; urgency=medium

  * bionic/linux-aws: 4.15.0-1093.99 -proposed tracker (LP: #1911275)

  * aws: network performance regression due to initial TCP receive buffer size
    change (LP: #1910200)
    - tcp: select sane initial rcvq_space.space for big MSS

  * arm64: prevent losing page dirty state (LP: #1908503)
    - arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()

  * Disable Atari partition support for cloud kernels (LP: #1908264)
    - [Config] Disable Atari partition support

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 4.15.0-133.137 ]

  * bionic/linux: 4.15.0-133.137 -proposed tracker (LP: #1911295)
  * [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors
    config: (LP: #1908219)
    - qxl: remove qxl_io_log()
    - qxl: move qxl_send_monitors_config()
    - qxl: hook monitors_config updates into crtc, not encoder.
  * Touchpad not detected on ByteSpeed C15B laptop (LP: #1906128)
    - Input: i8042 - add ByteSpeed touchpad to noloop table
  * vmx_nm_test in ubuntu_kvm_unit_tests interrupted on X-oracle-4.15 /
    B-oracle-4.15 / X-KVM / B-KVM (LP: #1872401)
    - KVM: nVMX: Always reflect #NM VM-exits to L1
  * stack trace in kernel (LP: #1903596)
    - net: napi: remove useless stack trace
  * CVE-2020-27777
    - [Config]: Set CONFIG_PPC_RTAS_FILTER
  * Bionic update: upstream stable patchset 2020-12-04 (LP: #1906875)
    - regulator: defer probe when trying to get voltage from unresolved supply
    - ring-buffer: Fix recursion protection transitions between interrupt context
    - time: Prevent undefined behaviour in timespec64_to_ns()
    - nbd: don't update block size after device is started
    - btrfs: sysfs: init devices outside of the chunk_mutex
    - btrfs: reschedule when cloning lots of extents
    - genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
    - hv_balloon: disable warning when floor reached
    - net: xfrm: fix a race condition during allocing spi
    - perf tools: Add missing swap for ino_generation
    - ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
    - can: rx-offload: don't call kfree_skb() from IRQ context
    - can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ
      context
    - can: dev: __can_get_echo_skb(): fix real payload length return value for RTR
      frames
    - can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
    - can: peak_usb: add range checking in decode operations
    - can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
    - can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is
      on
    - xfs: flush new eof page on truncate to avoid post-eof corruption
    - Btrfs: fix missing error return if writeback for extent buffer never started
    - ath9k_htc: Use appropriate rs_datalen type
    - usb: gadget: goku_udc: fix potential crashes in probe
    - gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
  ...

Read more...

Changed in linux-aws (Ubuntu):
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-aws - 5.4.0-1037.39

---------------
linux-aws (5.4.0-1037.39) focal; urgency=medium

  * focal/linux-aws: 5.4.0-1037.39 -proposed tracker (LP: #1911314)

  * aws: network performance regression due to initial TCP receive buffer size
    change (LP: #1910200)
    - tcp: select sane initial rcvq_space.space for big MSS

  * Disable Atari partition support for linux-aws (LP: #1908264)
    - [Config] Disable Atari partition support

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 5.4.0-63.71 ]

  * focal/linux: 5.4.0-63.71 -proposed tracker (LP: #1911333)
  * overlay: permission regression in 5.4.0-51.56 due to patches related to
    CVE-2020-16120 (LP: #1900141)
    - ovl: do not fail because of O_NOATIME
  * Focal update: v5.4.79 upstream stable release (LP: #1907151)
    - net/mlx5: Use async EQ setup cleanup helpers for multiple EQs
    - net/mlx5: poll cmd EQ in case of command timeout
    - net/mlx5: Fix a race when moving command interface to events mode
    - net/mlx5: Add retry mechanism to the command entry index allocation
  * Kernel 5.4.0-56 Wi-Fi does not connect (LP: #1906770)
    - mt76: fix fix ampdu locking
  * [Ubuntu 21.04 FEAT] mpt3sas: Request to include the patch set which supports
    topology where zoning is enabled in expander (LP: #1899802)
    - scsi: mpt3sas: Define hba_port structure
    - scsi: mpt3sas: Allocate memory for hba_port objects
    - scsi: mpt3sas: Rearrange _scsih_mark_responding_sas_device()
    - scsi: mpt3sas: Update hba_port's sas_address & phy_mask
    - scsi: mpt3sas: Get device objects using sas_address & portID
    - scsi: mpt3sas: Rename transport_del_phy_from_an_existing_port()
    - scsi: mpt3sas: Get sas_device objects using device's rphy
    - scsi: mpt3sas: Update hba_port objects after host reset
    - scsi: mpt3sas: Set valid PhysicalPort in SMPPassThrough
    - scsi: mpt3sas: Handling HBA vSES device
    - scsi: mpt3sas: Add bypass_dirty_port_flag parameter
    - scsi: mpt3sas: Handle vSES vphy object during HBA reset
    - scsi: mpt3sas: Add module parameter multipath_on_hba
    - scsi: mpt3sas: Bump driver version to 35.101.00.00

  [ Ubuntu: 5.4.0-62.70 ]

  * focal/linux: 5.4.0-62.70 -proposed tracker (LP: #1911144)
  * CVE-2020-28374
    - SAUCE: target: fix XCOPY NAA identifier lookup
  * Packaging resync (LP: #1786013)
    - update dkms package versions

 -- Kelsey Skunberg <email address hidden> Wed, 13 Jan 2021 19:01:10 -0700

Changed in linux-aws (Ubuntu Focal):
status: Fix Committed → Fix Released
Stefan Bader (smb) on 2021-02-25
Changed in linux-aws (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → Medium
Changed in linux-aws (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux-aws (Ubuntu Groovy):
importance: Undecided → Medium
Stefan Bader (smb) on 2021-03-10
Changed in linux-aws (Ubuntu Bionic):
status: Triaged → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andrea Righi (arighi) on 2021-04-07
tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (30.7 KiB)

This bug was fixed in the package linux-aws - 5.8.0-1028.30

---------------
linux-aws (5.8.0-1028.30) groovy; urgency=medium

  * groovy/linux-aws: 5.8.0-1028.30 -proposed tracker (LP: #1921043)

  * kernel: Enable CONFIG_BPF_LSM on Ubuntu (LP: #1905975)
    - [Config][aws] Enable CONFIG_BPF_LSM

  * Groovy update: upstream stable patchset 2021-03-05 (LP: #1917964)
    - [Config][aws] updateconfigs for USB_BDC_PCI

  * Enforce CONFIG_DRM_BOCHS=m (LP: #1916290)
    - [Config][aws] Enforce CONFIG_DRM_BOCHS=m

  * Groovy update: upstream stable patchset 2021-02-25 (LP: #1916960)
    - [Config][aws] updateconfigs for KPROBE_EVENTS_ON_NOTRACE

  * aws: update Xen hibernation patch set (LP: #1913410)
    - Revert "UBUNTU: SAUCE: xen: Update sched clock offset to avoid system
      instability in hibernation"
    - Revert "UBUNTU: SAUCE: xen: Introduce wrapper for save/restore sched clock
      offset"
    - Revert "UBUNTU: SAUCE: x86/xen: save and restore steal clock"
    - Revert "UBUNTU: SAUCE: xen/time: introduce xen_{save,restore}_steal_clock"
    - Revert "UBUNTU: SAUCE: xen-netfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: x86/xen: add system core suspend and resume
      callbacks"
    - Revert "UBUNTU: SAUCE: x86/xen: Introduce new function to map
      HYPERVISOR_shared_info on Resume"
    - Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
    - Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
    - SAUCE: xen/manage: keep track of the on-going suspend mode
    - SAUCE: xenbus: add freeze/thaw/restore callbacks support
    - SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on
      Resume
    - SAUCE: x86/xen: add system core suspend and resume callbacks
    - SAUCE: xen-netfront: add callbacks for PM suspend and hibernation support
    - SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen/time: introduce xen_{save,restore}_steal_clock
    - SAUCE: x86/xen: save and restore steal clock
    - SAUCE: xen: Introduce wrapper for save/restore sched clock offset
    - SAUCE: xen: Update sched clock offset to avoid system instability in
      hibernation
    - SAUCE: x86: tsc: avoid system instability in hibernation

  * aws: xen-netfront: prevent potential error on hibernate (LP: #1906850)
    - SAUCE: xen-netfront: prevent unnecessary close on hibernate

  [ Ubuntu: 5.8.0-49.55 ]

  * groovy/linux: 5.8.0-49.55 -proposed tracker (LP: #1921053)
  * selftests: bpf verifier fails after sanitize_ptr_alu fixes (LP: #1920995)
    - bpf: Simplify alu_limit masking for pointer arithmetic
    - bpf: Add sanity check for upper ptr_limit
    - bpf, selftests: Fix up some test_verifier cases for unprivileged
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * improper memcg accounting causes NULL pointer derefs (LP: #1918668)
    - SAUCE: Revert "mm: memcg/slab: optimize objcg stock draining"
  * kernel: Enable CONFIG_BPF_LSM on Ubuntu (LP: #1905975)
    - [Config] Enable CONFIG_BPF_LSM
  * Groovy u...

Changed in linux-aws (Ubuntu Groovy):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (15.5 KiB)

This bug was fixed in the package linux-aws - 4.15.0-1098.105

---------------
linux-aws (4.15.0-1098.105) bionic; urgency=medium

  * bionic/linux-aws: 4.15.0-1098.105 -proposed tracker (LP: #1919513)

  * Enforce CONFIG_DRM_BOCHS=m (LP: #1916290)
    - [Config] aws: Add CONFIG_DRM_BOCHS=m (enforced)

  * Bionic update: upstream stable patchset 2021-02-26 (LP: #1917093)
    - [Config] aws: Updateconfigs for USB_BDC_PCI

  * Please trust Canonical Livepatch Service kmod signing key (LP: #1898716)
    - [Config] aws: enable CONFIG_MODVERSIONS=y
    - [Packaging] aws: build canonical-certs.pem from branch/arch certs

  * aws: update Xen hibernation patch set (LP: #1913410)
    - Revert "UBUNTU: SAUCE: xen-netfront: prevent unnecessary close on hibernate"
    - Revert "UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in
      xen_restore_pirqs."
    - Revert "UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation."
    - Revert "UBUNTU SAUCE [aws]: block: xen-blkfront: consider new dom0 features
      on restore"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in
      system core suspend callback"
    - Revert "UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper
      function"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock"
    - Revert "UBUNTU: SAUCE [aws] xen-time-introduce-xen_-save-restore-
      _steal_clock"
    - Revert "UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and
      hibernation support"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume
      callbacks"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map
      HYPERVISOR_shared_info on Resume"
    - Revert "UBUNTU: SAUCE: xen-blkfront: Fixed blkfront_restore to remove a call
      to negotiate_mq"
    - Revert "UBUNTU: SAUCE: xen-blkfront: resurrect request-based mode"
    - Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
    - Revert "UBUNTU: SAUCE: xen/manage: introduce helper function to know the on-
      going suspend mode"
    - Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
    - SAUCE: xen/manage: keep track of the on-going suspend mode
    - SAUCE: xen/manage: introduce helper function to know the on-going suspend
      mode
    - SAUCE: xenbus: add freeze/thaw/restore callbacks support
    - SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on
      Resume
    - SAUCE: x86/xen: add system core suspend and resume callbacks
    - SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen-netfront: add callbacks for PM suspend and hibernation support
    - SAUCE: xen/time: introduce xen_{save,restore}_steal_clock
    - SAUCE: x86/xen: save and restore steal clock
    - SAUCE: xen/events: add xen_shutdown_pirqs helper function
    - SAUCE: x86/xen: close event channels for PIRQs in system core suspend
      callback
    - SAUCE: xen-blkfront: resurrect request-based mode
    - SAUCE: xen-blkfront: add 'persistent_grants' parameter
    ...

Changed in linux-aws (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers