cpuset for libvirt set to 0 after suspend/resume

Bug #993354 reported by Seth Jennings
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libvirt (Fedora)
Fix Released
Undecided
libvirt (Ubuntu)
Fix Released
Low
Unassigned
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Starting 12.04 (afaict), guest KVM processes are put into their own cgroup with the cpuset of that cgroup at /sys/fs/cgroup/cpuset/libvirt/qemu/<vm_name>/cpuset.cpus

On a clean boot, the cpuset.cpus for libvirt contain all the cpus. However, after a suspend/resume, cpuset.cpus for libvirt and all children is 0, which effectively pins all vcpus in the guest to a cpu 0 in the host causing massive performance problems.

There is already a bug that RedHat is tracking for this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=714271

Expected:
The cpuset for libvirt and children remains consistent across a suspend/resume

What happens:
The cpuset for libvirt and children gets set to 0

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: libvirt-bin 0.9.8-2ubuntu17
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic 3.2.14
Uname: Linux 3.2.0-23-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.0.1-0ubuntu5
Architecture: amd64
Date: Wed May 2 10:22:31 2012
InstallationMedia: Xubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted]

Revision history for this message
Seth Jennings (spartacus06) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. I've linked the redhat bug, and marked it confirmed based on that. I've marked it as affecting the kernel, since there has been talk of a fix in the kernel (per the redhat bug).

I'm going to mark this low priority because there is a workaround.

Changed in libvirt (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 993354

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Seth Jennings (spartacus06) wrote :

apport-collect 993354 not working:
Package libvirt not installed and no hook available, ignoring
No packages found matching linux.

However, the problem doesn't need diagnosing, as the cause is known. It just requires fixing.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.4kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc4-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Seth Jennings (spartacus06) wrote :

The problem exists in upstream as of 3.4-rc4.

tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Seth Jennings (spartacus06) wrote :

Some relevant kernel log:

8f2f748b0656257153bcf0941df8d6060acc5ca6 CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resume
4293f20c19f44ca66e5ac836b411d25e14b9f185 Revert "CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resume"

$ git describe 8f2f748b0656257153bcf0941df8d6060acc5ca6
v3.3-rc4-36-g8f2f748

git describe 4293f20c19f44ca66e5ac836b411d25e14b9f185
v3.3-rc6-147-g4293f20

So it comes as no surprise that the bug still exists in upstream. The commit that was supposed to fix this was reverted due to side effects (see comment #3).

The cpuset.sh script in this thread works as a workaround (for me at least):
https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html

However, this is just a hack to get around a deficiency in the kernel. The kernel shouldn't mangle the cpusets like this.

Thomas Gleixner (tglx) is working on refactoring cpu hotplug code, which is used for suspend/resume support. It is the hot removal of all cpus except cpu 0 during suspend that is responsible for the cpusets being reduced to only cpu 0.

https://lkml.org/lkml/2012/4/20/160

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=shortlog;h=refs/heads/smp/hotplug

I think he's just in the early stages though; nothing to fix this issue yet.

So question:
Should the above mentioned cpuset.sh be packaged in libvirt-bin and installed at /usr/lib/pm-utils/sleep.d/XXcpuset.sh (whatever XX should be) in the meantime? Even when this eventually gets fixed in the kernel, this script won't mess things up. It'll just be redundant.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Seth,

thanks for the link to the script. I'm not opposed to adding that as a workaround. Do you already have a debdiff to add that?

Revision history for this message
Seth Jennings (spartacus06) wrote :

Serge,

Sorry for the delay. I don't have a debdiff, nor do I have any experience in making one. Can you do it? Or do I need to learn? The latter will take longer.

Revision history for this message
Seth Jennings (spartacus06) wrote :

Joseph,

There is already an upstream bug for this issue.

https://bugzilla.kernel.org/show_bug.cgi?id=42789

Revision history for this message
Seth Jennings (spartacus06) wrote :

There is a new proposed patchset on lkml regarding this issue:

https://lkml.org/lkml/2012/5/4/265

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I can't reproduce this on raring any more, and the kernel patch appears to be applied. Marking this fix released.

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
Changed in libvirt (Fedora):
importance: Unknown → Undecided
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.