[linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM was unavailable for 2 hours

Bug #1837661 reported by Joseph Salisbury
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Undecided
Marcelo Cerri
Xenial
Invalid
Undecided
Marcelo Cerri
Disco
Fix Released
Undecided
Marcelo Cerri

Bug Description

Can you please pick up the following 4 patches? They resolve this live migration that was reported by a mutual customer.

PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
PCI: hv: Add hv_pci_remove_slots() when we unload the driver
PCI: hv: Fix a memory leak in hv_eject_device_work()
PCI: hv: Fix a use-after-free bug in hv_eject_device_work()pci/hv

This is a known issue in linux pci-hyperv driver and is fixed by these patches.

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Disco):
status: New → In Progress
Changed in linux-azure (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Disco):
assignee: nobody → Marcelo Cerri (mhcerri)
Revision history for this message
Joshua R. Poulson (jrp) wrote :

Joseph, I think we also need these on Xenial, is that correct?

Revision history for this message
Joshua R. Poulson (jrp) wrote :

I meant Bionic on the 5.0 kernel, not Xenial.

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

I believe the question is indeed regarding xenial. Bionic will get the fixes via disco. But I would like to know if that should also be backported to the 4.15 azure kernel in xenial as well?

Revision history for this message
Dexuan Cui (decui) wrote :

I guess you might have already included this patch:
    15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")

Unluckily it turns out it is buggy and just now I had to post a further patch for it:
 [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
 ( https://lkml.org/lkml/2019/8/1/1173)

Please consider including this further patch as well.

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Do they need to be applied to 4.15 as well?

Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Changed in linux-azure (Ubuntu Disco):
status: In Progress → Fix Committed
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Hi, Joe. Can you confirm if they need to be applied to 4.15 as well?

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Xenial):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

These commits are not needed in the 4.15 kernel. The only kernels that need the fixes are the ones that include the following commit:

a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Xenial):
status: Incomplete → Opinion
status: Opinion → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (35.1 KiB)

This bug was fixed in the package linux-azure - 5.0.0-1018.19

---------------
linux-azure (5.0.0-1018.19) disco; urgency=medium

  * disco/linux-azure: 5.0.0-1018.19 -proposed tracker (LP: #1840803)

  [ Ubuntu: 5.0.0-27.28 ]

  * disco/linux: 5.0.0-27.28 -proposed tracker (LP: #1840816)
  * [Potential Regression] System crashes when running ftrace test in
    ubuntu_kernel_selftests (LP: #1840750)
    - x86/kprobes: Set instruction page as executable

linux-azure (5.0.0-1017.18) disco; urgency=medium

  * disco/linux-azure: 5.0.0-1017.18 -proposed tracker (LP: #1840326)

  * [linux-azure] Important InfiniBand patches for Ubuntu 18.04 (LP: #1839673)
    - SAUCE: Don't wait in hvnd_query_gid after interface is already bound to ND
    - SAUCE: Expose extended attributes for user IB verbs QUERY_DEVICE, CREATE_CQ
      and CREATE_QP

  * [linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM
    was unavailable for 2 hours (LP: #1837661)
    - PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
    - SAUCE: PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

  [ Ubuntu: 5.0.0-26.27 ]

  * disco/linux: 5.0.0-26.27 -proposed tracker (LP: #1839972)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * alsa/hdmi: add icelake hdmi audio support for a Dell machine (LP: #1836916)
    - ALSA: hda: hdmi - add Icelake support
    - ALSA: hda/hdmi - Remove duplicated define
    - ALSA: hda/hdmi - Fix i915 reverse port/pin mapping
  * input/mouse: alps trackpoint-only device doesn't work (LP: #1836752)
    - Input: alps - don't handle ALPS cs19 trackpoint-only device
    - Input: alps - fix a mismatch between a condition check and its comment
  * [18.04 FEAT] Enhanced hardware support (LP: #1836857)
    - s390: report new CPU capabilities
    - s390: add alignment hints to vector load and store
  * System does not auto detect disconnection of external monitor (LP: #1835001)
    - drm/i915: Add support for retrying hotplug
    - drm/i915: Enable hotplug retry
  * [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
    - s390/cpum_cf: Add support for CPU-MF SVN 6
    - s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  * EeePC 1005px laptop backlight is off after system boot up (LP: #1837117)
    - platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from
      asus_nb_wmi
  * br_netfilter: namespace sysctl operations (LP: #1836910)
    - netfilter: bridge: port sysctls to use brnf_net
    - netfilter: bridge: namespace bridge netfilter sysctls
    - netfilter: bridge: prevent UAF in brnf_exit_net()
  * ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
    - platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
  * shiftfs: allow overlayfs (LP: #1838677)
    - SAUCE: shiftfs: enable overlayfs on shiftfs
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665)
    - bcache: never writeback a discard operation
    - bcache: improve bcache_reboot()
    - SAUCE: bcache: fix deadlock in bcache_allocator
  * Regressions in CMA allocation rework (LP: #1839395)
    - dma-contiguous: do not overwrite align in ...

Changed in linux-azure (Ubuntu Disco):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (35.3 KiB)

This bug was fixed in the package linux-azure - 5.0.0-1018.19~18.04.1

---------------
linux-azure (5.0.0-1018.19~18.04.1) bionic; urgency=medium

  * bionic/linux-azure: 5.0.0-1018.19~18.04.1 -proposed tracker (LP: #1840802)

  [ Ubuntu: 5.0.0-1018.19 ]

  * disco/linux-azure: 5.0.0-1018.19 -proposed tracker (LP: #1840803)
  * disco/linux: 5.0.0-27.28 -proposed tracker (LP: #1840816)
  * [Potential Regression] System crashes when running ftrace test in
    ubuntu_kernel_selftests (LP: #1840750)
    - x86/kprobes: Set instruction page as executable

linux-azure (5.0.0-1017.18~18.04.1) bionic; urgency=medium

  * bionic/linux-azure: 5.0.0-1017.18~18.04.1 -proposed tracker (LP: #1840324)

  [ Ubuntu: 5.0.0-1017.18 ]

  * disco/linux-azure: 5.0.0-1017.18 -proposed tracker (LP: #1840326)
  * [linux-azure] Important InfiniBand patches for Ubuntu 18.04 (LP: #1839673)
    - SAUCE: Don't wait in hvnd_query_gid after interface is already bound to ND
    - SAUCE: Expose extended attributes for user IB verbs QUERY_DEVICE, CREATE_CQ
      and CREATE_QP
  * [linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM
    was unavailable for 2 hours (LP: #1837661)
    - PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
    - SAUCE: PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
  * disco/linux: 5.0.0-26.27 -proposed tracker (LP: #1839972)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * alsa/hdmi: add icelake hdmi audio support for a Dell machine (LP: #1836916)
    - ALSA: hda: hdmi - add Icelake support
    - ALSA: hda/hdmi - Remove duplicated define
    - ALSA: hda/hdmi - Fix i915 reverse port/pin mapping
  * input/mouse: alps trackpoint-only device doesn't work (LP: #1836752)
    - Input: alps - don't handle ALPS cs19 trackpoint-only device
    - Input: alps - fix a mismatch between a condition check and its comment
  * [18.04 FEAT] Enhanced hardware support (LP: #1836857)
    - s390: report new CPU capabilities
    - s390: add alignment hints to vector load and store
  * System does not auto detect disconnection of external monitor (LP: #1835001)
    - drm/i915: Add support for retrying hotplug
    - drm/i915: Enable hotplug retry
  * [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
    - s390/cpum_cf: Add support for CPU-MF SVN 6
    - s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  * EeePC 1005px laptop backlight is off after system boot up (LP: #1837117)
    - platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from
      asus_nb_wmi
  * br_netfilter: namespace sysctl operations (LP: #1836910)
    - netfilter: bridge: port sysctls to use brnf_net
    - netfilter: bridge: namespace bridge netfilter sysctls
    - netfilter: bridge: prevent UAF in brnf_exit_net()
  * ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
    - platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
  * shiftfs: allow overlayfs (LP: #1838677)
    - SAUCE: shiftfs: enable overlayfs on shiftfs
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665)
    - bcache: never writeback a discard operation
    - bcac...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.