[linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM was unavailable for 2 hours

Bug #1837661 reported by Joseph Salisbury on 2019-07-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Undecided
Marcelo Cerri
Xenial
Undecided
Marcelo Cerri
Disco
Undecided
Marcelo Cerri

Bug Description

Can you please pick up the following 4 patches? They resolve this live migration that was reported by a mutual customer.

PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
PCI: hv: Add hv_pci_remove_slots() when we unload the driver
PCI: hv: Fix a memory leak in hv_eject_device_work()
PCI: hv: Fix a use-after-free bug in hv_eject_device_work()pci/hv

This is a known issue in linux pci-hyperv driver and is fixed by these patches.

Marcelo Cerri (mhcerri) on 2019-08-01
Changed in linux-azure (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Disco):
status: New → In Progress
Changed in linux-azure (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Disco):
assignee: nobody → Marcelo Cerri (mhcerri)
Joshua R. Poulson (jrp) wrote :

Joseph, I think we also need these on Xenial, is that correct?

Joshua R. Poulson (jrp) wrote :

I meant Bionic on the 5.0 kernel, not Xenial.

Marcelo Cerri (mhcerri) wrote :

I believe the question is indeed regarding xenial. Bionic will get the fixes via disco. But I would like to know if that should also be backported to the 4.15 azure kernel in xenial as well?

Dexuan Cui (decui) wrote :

I guess you might have already included this patch:
    15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")

Unluckily it turns out it is buggy and just now I had to post a further patch for it:
 [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
 ( https://lkml.org/lkml/2019/8/1/1173)

Please consider including this further patch as well.

Marcelo Cerri (mhcerri) wrote :

Do they need to be applied to 4.15 as well?

Changed in linux-azure (Ubuntu Disco):
status: In Progress → Fix Committed
Marcelo Cerri (mhcerri) wrote :

Hi, Joe. Can you confirm if they need to be applied to 4.15 as well?

Marcelo Cerri (mhcerri) on 2019-08-14
Changed in linux-azure (Ubuntu Xenial):
status: New → Incomplete
Joseph Salisbury (jsalisbury) wrote :

These commits are not needed in the 4.15 kernel. The only kernels that need the fixes are the ones that include the following commit:

a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")

Marcelo Cerri (mhcerri) on 2019-08-15
Changed in linux-azure (Ubuntu Xenial):
status: Incomplete → Opinion
status: Opinion → Invalid

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
Launchpad Janitor (janitor) wrote :
Download full text (35.1 KiB)

This bug was fixed in the package linux-azure - 5.0.0-1018.19

---------------
linux-azure (5.0.0-1018.19) disco; urgency=medium

  * disco/linux-azure: 5.0.0-1018.19 -proposed tracker (LP: #1840803)

  [ Ubuntu: 5.0.0-27.28 ]

  * disco/linux: 5.0.0-27.28 -proposed tracker (LP: #1840816)
  * [Potential Regression] System crashes when running ftrace test in
    ubuntu_kernel_selftests (LP: #1840750)
    - x86/kprobes: Set instruction page as executable

linux-azure (5.0.0-1017.18) disco; urgency=medium

  * disco/linux-azure: 5.0.0-1017.18 -proposed tracker (LP: #1840326)

  * [linux-azure] Important InfiniBand patches for Ubuntu 18.04 (LP: #1839673)
    - SAUCE: Don't wait in hvnd_query_gid after interface is already bound to ND
    - SAUCE: Expose extended attributes for user IB verbs QUERY_DEVICE, CREATE_CQ
      and CREATE_QP

  * [linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM
    was unavailable for 2 hours (LP: #1837661)
    - PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
    - SAUCE: PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

  [ Ubuntu: 5.0.0-26.27 ]

  * disco/linux: 5.0.0-26.27 -proposed tracker (LP: #1839972)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * alsa/hdmi: add icelake hdmi audio support for a Dell machine (LP: #1836916)
    - ALSA: hda: hdmi - add Icelake support
    - ALSA: hda/hdmi - Remove duplicated define
    - ALSA: hda/hdmi - Fix i915 reverse port/pin mapping
  * input/mouse: alps trackpoint-only device doesn't work (LP: #1836752)
    - Input: alps - don't handle ALPS cs19 trackpoint-only device
    - Input: alps - fix a mismatch between a condition check and its comment
  * [18.04 FEAT] Enhanced hardware support (LP: #1836857)
    - s390: report new CPU capabilities
    - s390: add alignment hints to vector load and store
  * System does not auto detect disconnection of external monitor (LP: #1835001)
    - drm/i915: Add support for retrying hotplug
    - drm/i915: Enable hotplug retry
  * [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
    - s390/cpum_cf: Add support for CPU-MF SVN 6
    - s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  * EeePC 1005px laptop backlight is off after system boot up (LP: #1837117)
    - platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from
      asus_nb_wmi
  * br_netfilter: namespace sysctl operations (LP: #1836910)
    - netfilter: bridge: port sysctls to use brnf_net
    - netfilter: bridge: namespace bridge netfilter sysctls
    - netfilter: bridge: prevent UAF in brnf_exit_net()
  * ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
    - platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
  * shiftfs: allow overlayfs (LP: #1838677)
    - SAUCE: shiftfs: enable overlayfs on shiftfs
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665)
    - bcache: never writeback a discard operation
    - bcache: improve bcache_reboot()
    - SAUCE: bcache: fix deadlock in bcache_allocator
  * Regressions in CMA allocation rework (LP: #1839395)
    - dma-contiguous: do not overwrite align in ...

Changed in linux-azure (Ubuntu Disco):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (35.3 KiB)

This bug was fixed in the package linux-azure - 5.0.0-1018.19~18.04.1

---------------
linux-azure (5.0.0-1018.19~18.04.1) bionic; urgency=medium

  * bionic/linux-azure: 5.0.0-1018.19~18.04.1 -proposed tracker (LP: #1840802)

  [ Ubuntu: 5.0.0-1018.19 ]

  * disco/linux-azure: 5.0.0-1018.19 -proposed tracker (LP: #1840803)
  * disco/linux: 5.0.0-27.28 -proposed tracker (LP: #1840816)
  * [Potential Regression] System crashes when running ftrace test in
    ubuntu_kernel_selftests (LP: #1840750)
    - x86/kprobes: Set instruction page as executable

linux-azure (5.0.0-1017.18~18.04.1) bionic; urgency=medium

  * bionic/linux-azure: 5.0.0-1017.18~18.04.1 -proposed tracker (LP: #1840324)

  [ Ubuntu: 5.0.0-1017.18 ]

  * disco/linux-azure: 5.0.0-1017.18 -proposed tracker (LP: #1840326)
  * [linux-azure] Important InfiniBand patches for Ubuntu 18.04 (LP: #1839673)
    - SAUCE: Don't wait in hvnd_query_gid after interface is already bound to ND
    - SAUCE: Expose extended attributes for user IB verbs QUERY_DEVICE, CREATE_CQ
      and CREATE_QP
  * [linux-azure] CRI-RDOS | Live migration only takes 10 seconds, but the VM
    was unavailable for 2 hours (LP: #1837661)
    - PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
    - SAUCE: PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
  * disco/linux: 5.0.0-26.27 -proposed tracker (LP: #1839972)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * alsa/hdmi: add icelake hdmi audio support for a Dell machine (LP: #1836916)
    - ALSA: hda: hdmi - add Icelake support
    - ALSA: hda/hdmi - Remove duplicated define
    - ALSA: hda/hdmi - Fix i915 reverse port/pin mapping
  * input/mouse: alps trackpoint-only device doesn't work (LP: #1836752)
    - Input: alps - don't handle ALPS cs19 trackpoint-only device
    - Input: alps - fix a mismatch between a condition check and its comment
  * [18.04 FEAT] Enhanced hardware support (LP: #1836857)
    - s390: report new CPU capabilities
    - s390: add alignment hints to vector load and store
  * System does not auto detect disconnection of external monitor (LP: #1835001)
    - drm/i915: Add support for retrying hotplug
    - drm/i915: Enable hotplug retry
  * [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
    - s390/cpum_cf: Add support for CPU-MF SVN 6
    - s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  * EeePC 1005px laptop backlight is off after system boot up (LP: #1837117)
    - platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from
      asus_nb_wmi
  * br_netfilter: namespace sysctl operations (LP: #1836910)
    - netfilter: bridge: port sysctls to use brnf_net
    - netfilter: bridge: namespace bridge netfilter sysctls
    - netfilter: bridge: prevent UAF in brnf_exit_net()
  * ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
    - platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
  * shiftfs: allow overlayfs (LP: #1838677)
    - SAUCE: shiftfs: enable overlayfs on shiftfs
  * bcache: bch_allocator_thread(): hung task timeout (LP: #1784665)
    - bcache: never writeback a discard operation
    - bcac...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers