Stopping and starting KVM partitions results in guest kernel softlockup

Bug #1427075 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Chris J Arges

Bug Description

[Impact]
Stop/continue via qemu monitor can easily trigger soft-lockups due to incorrect VM timekeeping.

[Test Case]
On a single CPU KVM guest:
yes > /dev/null &

Followed by
(qemu) stop

Wait a while, then:
(qemu) cont

[Fix]
The following commits upstream:
commit 545a2bf742fb41f17d03486dd8a8c74ad511dec2
commit 4be1b29795d692d512bb67b770665d6f8ea5cb0b

--

== Comment: #0 - Cyril Bur <email address hidden> - 2015-02-23 18:03:41 ==
+++ This bug was initially created as a clone of Bug #108455 +++

I was investigating the cause of some ppc64le KVM guest softlockup warnings. On a single CPU KVM guest, I ran something to keep the guest busy:

yes > /dev/null &

Followed by

(qemu) stop

Wait a while, then:

(qemu) cont

We get a softlockup error:

BUG: soft lockup - CPU#0 stuck for 9220s! [yes:2389]

.__getnstimeofday
.getnstimeofday
.ktime_get_real
.netif_receive_skb
.ibmveth_poll
.net_rx_action
.__do_softirq
.irq_exit
.__do_irq
.call_do_irq
.do_IRQ

I was going to file it away in the "don't do that" bin, but I notice x86 have something to detect a paused VM and avoid spewing the soft lockup error. Do we need something like this on ppc64?

commit 5d1c0f4a80a6df73395fb3fc2c302510f8f09d36
Author: Eric B Munson <email address hidden>
Date: Sat Mar 10 14:37:28 2012 -0500

    watchdog: add check for suspended vm in softlockup detector

    A suspended VM can cause spurious soft lockup warnings. To avoid these, the
    watchdog now checks if the kernel knows it was stopped by the host and skips
    the warning if so. When the watchdog is reset successfully, clear the guest
    paused flag.

== Comment: #1 - Cyril Bur <email address hidden> - 2015-02-23 18:03:55 ==
Hi,

I have been working on a fix for guest kernels. This requires two patches:

1/2

commit 545a2bf742fb41f17d03486dd8a8c74ad511dec2
Author: Cyril Bur <email address hidden>
Date: Thu Feb 12 15:01:24 2015 -0800

    kernel/sched/clock.c: add another clock for use with the soft lockup watchdog

and 2/2

commit 4be1b29795d692d512bb67b770665d6f8ea5cb0b
Author: Cyril Bur <email address hidden>
Date: Thu Feb 12 15:01:28 2015 -0800

    powerpc: add running_clock for powerpc to prevent spurious softlockup warnings

Both are in upstream.

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-122013 severity-medium targetmilestone-inin1504
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1427075/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Luciano Chavez (lnx1138)
affects: ubuntu → linux (Ubuntu)
Chris J Arges (arges)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
assignee: nobody → Chris J Arges (arges)
Revision history for this message
Chris J Arges (arges) wrote :

Does this fix need to be applied to 3.16 as well? Or is backporting to Vivid/3.19 sufficient?

description: updated
Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.8 KiB)

This bug was fixed in the package linux - 3.19.0-12.12

---------------
linux (3.19.0-12.12) vivid; urgency=low

  [ Andy Whitcroft ]

  * [Packaging] do_common_tools should always be on
  * [Packaging] Provides: virtualbox-guest-modules when appropriate
    - LP: #1434579

  [ Chris J Arges ]

  * Revert "SAUCE: ext4: disable ext4_punch_hole for indirect filesystems"
    - LP: #1292234

  [ Leann Ogasawara ]

  * Release Tracking Bug
    - LP: #1439803

  [ Timo Aaltonen ]

  * SAUCE: i915_bpo: Provide a backport driver for Skylake & Cherryview
    graphics
    - LP: #1420774
  * SAUCE: i915_bpo: Update intel_ips.h file location
    - LP: #1420774
  * SAUCE: i915_bpo: Only support Skylake and Cherryview with the backport
    driver
    - LP: #1420774
  * SAUCE: i915_bpo: Rename the backport driver to i915_bpo
    - LP: #1420774
  * i915_bpo: [Config] Enable CONFIG_DRM_I915_BPO=m
    - LP: #1420774
  * SAUCE: i915_bpo: Add i915_bpo_*() calls for ubuntu/i915
    - LP: #1420774
  * SAUCE: i915_bpo: Revert "drm/i915: remove unused
    power_well/get_cdclk_freq api"
    - LP: #1420774
  * SAUCE: i915_bpo: Add i915_bpo specific power well calls
    - LP: #1420774
  * SAUCE: Backport I915_PARAM_MMAP_VERSION and I915_MMAP_WC
    - LP: #1420774
  * SAUCE: Partial backport of drm/i915: Add ioctl to set per-context
    parameters
    - LP: #1420774
  * SAUCE: drm/i915: Specify bsd rings through exec flag
    - LP: #1420774
  * SAUCE: drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam
    - LP: #1420774
  * SAUCE: drm/i915: add component support
    - LP: #1420774
  * SAUCE: drm/i915: Add tiled framebuffer modifiers
    - LP: #1420774
  * SAUCE: Backport new displayable tiling formats
    - LP: #1420774
  * SAUCE: Backport drm_crtc_vblank_reset() helper
    - LP: #1420774
  * SAUCE: drm/i915: Add I915_PARAM_REVISION
    - LP: #1420774
  * SAUCE: drm/i915: Export total subslice and EU counts
    - LP: #1420774
  * SAUCE: i915_bpo: Revert drm/mm: Support 4 GiB and larger ranges
    - LP: #1420774

  [ Upstream Kernel Changes ]

  * drm/i915/skl: Split the SKL PCI ids by GT
    - LP: #1420774
  * drm: Reorganize probed mode validation
    - LP: #1420774
  * drm: Perform basic sanity checks on probed modes
    - LP: #1420774
  * drm: Do basic sanity checks for user modes
    - LP: #1420774
  * drm/atomic-helper: Export both plane and modeset check helpers
    - LP: #1420774
  * drm/atomic-helper: Again check modeset *before* plane states
    - LP: #1420774
  * drm/atomic: Introduce state->obj backpointers
    - LP: #1420774
  * drm: allow property validation for refcnted props
    - LP: #1420774
  * drm: store property instead of id in obj attachment
    - LP: #1420774
  * drm: get rid of direct property value access
    - LP: #1420774
  * drm: add atomic_set_property wrappers
    - LP: #1420774
  * drm: tweak getconnector locking
    - LP: #1420774
  * drm: add atomic_get_property
    - LP: #1420774
  * drm: Remove unneeded braces for single statement blocks
    - LP: #1420774
  * drm: refactor getproperties/getconnector
    - LP: #1420774
  * drm: add atomic properties
    - LP: #1420774
  * drm/atomic: atomic_check functions
    - LP: #1420774
  * drm: s...

Read more...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.