Ubuntu

general protection fault running apt-get inside double nested kvm VM

Reported by James Hunt on 2013-08-05
30
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Saucy
High
Unassigned
qemu-kvm (Ubuntu)
High
Unassigned
Saucy
High
Unassigned

Bug Description

Attempting to run apt-get in a nested vm hosted in OpenStack results in a general protection fault in the kernel in the nested VM.

James Hunt (jamesodhunt) wrote :

Attempting to repeat the dep8 test run in the same environment has so far resulted in the qemu-kvm binary consuming 200% CPU (recreated on multiple nodes).

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1208455

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

I'm suspecting an issue with nested=1... the nested VM does eventually boot but it takes many minutes to do so. At this point, the cpu usage of qemu-kvm drops to reasonable values. But if you ssh into the vm, the usage jumps up to ~200% again.

To recreate:

$ sudo apt-get install -y genisoimage linux-image-extra-virtual bzr qemu-kvm qemu-utils
$ sudo usermod -G kvm -a ubuntu

<logout+back in>

$ sudo modprobe kvm_intel nested=1
$ bzr branch lp:auto-package-testing
$ cd auto-package-testing
$ bin/prepare-testbed -r saucy -d i386

... and observe ps output.

James Hunt (jamesodhunt) wrote :
James Hunt (jamesodhunt) wrote :

Kernel logs can be found on bug 1208509.

Note that #4 was taken from a local nested kvm environment (not OpenStack).

summary: - general protection fault running apt-get inside nested kvm VM on
- OpenStack
+ general protection fault running apt-get inside nested kvm VM
James Hunt (jamesodhunt) on 2013-08-06
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
James Hunt (jamesodhunt) wrote :

High CPU may be cloud-init. Using todays server image, running 'prepare-testbed' spins the CPU at 200%. The end of the log shows:

Aug 6 10:13:15 autopkgtest [CLOUDINIT] helpers.py[DEBUG]: Running config-package-update-upgrade-install using lock (<FileLock using file '/var/lib/cloud/instances/nocloud/sem/config_package_update_upgrade_install'>)
Aug 6 10:13:15 autopkgtest [CLOUDINIT] util.py[DEBUG]: Writing to /var/lib/cloud/instances/nocloud/sem/update_sources - wb: [420] 19 bytes
Aug 6 10:13:15 autopkgtest [CLOUDINIT] helpers.py[DEBUG]: Running update-sources using lock (<FileLock using file '/var/lib/cloud/instances/nocloud/sem/update_sources'>)
Aug 6 10:13:15 autopkgtest [CLOUDINIT] util.py[DEBUG]: Running command ['apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'update'] with allowed return codes [0] (shell=False, capture=False)
Aug 6 10:13:34 autopkgtest [CLOUDINIT] util.py[DEBUG]: Running command ['apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'dist-upgrade'] with allowed return codes [0] (shell=False, capture=False)
Aug 6 10:13:36 autopkgtest [CLOUDINIT] helpers.py[DEBUG]: update-sources already ran (freq=once-per-instance)
Aug 6 10:13:36 autopkgtest [CLOUDINIT] util.py[DEBUG]: Running command ['apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'install', 'eatmydata', 'linux-image-generic', 'autopkgtest', 'dpkg-dev', 'pbuilder', 'bzr'] with allowed return codes [0] (shell=False, capture=False)
1680s before timeout
1620s before timeout
1560s before timeout
1500s before timeout
1440s before timeout
1380s before timeout
1320s before timeout

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-rc4-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: apport-bug i386 saucy third-party-packages uec-images
Changed in linux (Ubuntu):
importance: Medium → High
status: Confirmed → Incomplete
tags: added: kernel-key
Scott Moser (smoser) wrote :

i doubt cloud-init has any thing to do with it, other than being the thing that ran apt (which then some possibly substantial network IO and disk IO and cpu).

James Hunt (jamesodhunt) wrote :

What we know so far:

- the issue is not related to cloud-init.
- the very high CPU load was atleast partly caused by byobu bug 1208853. To disable byobu...

=== modified file 'bin/prepare-testbed'
--- bin/prepare-testbed 2013-08-06 08:05:16 +0000
+++ bin/prepare-testbed 2013-08-06 14:01:42 +0000
@@ -227,7 +227,7 @@

 apt_update: true
 apt_upgrade: true
-byobu_by_default: system
+byobu_by_default: disable

 packages:
  - eatmydata

- smb/apw noticed that nested=1 for the kvm_intel driver was not being set in the 1st and 2nd level nested VMs but was being set on the host. However, forcing nested=1 on the host system, the autopkgtest system *and* the pristine nested VM (2nd-level nesting) does not stop the kernel panic.

Now trying with linux-image-3.11.0-0-generic to see if that helps...

no longer affects: cloud-init (Ubuntu)
summary: - general protection fault running apt-get inside nested kvm VM
+ general protection fault running apt-get inside double nested kvm VM
James Hunt (jamesodhunt) wrote :

Fresh kernel panic from 2nd-level nested VM running with host, 1st level and 2nd level all running with nested=1.

Changed in qemu-kvm (Ubuntu):
importance: Undecided → High
Serge Hallyn (serge-hallyn) wrote :

All the kernel logs I"ve seen have shown oops at rtc_sysfs_del_device, starting with something like

[ 509.317630] general protection fault: 0000 [#1] SMP
[ 509.320084] Modules linked in: dm_crypt(F) kvm_intel kvm ppdev(F) nls_utf8 isofs(F) microcode(F) parport_pc(F) pvpanic parport(F) psmouse(F) serio_raw(F) i2c_piix4 mac_hid cirrus syscopyarea(F) sysfillrect(F) sysimgblt(F) ttm drm_kms_helper drm floppy(F) e1000(F)
[ 509.320084] CPU: 0 PID: 1335 Comm: dpkg Tainted: GF 3.10.0-6-generic #17-Ubuntu
[ 509.320084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 509.320084] task: c6143380 ti: c1a42000 task.ti: c338a000
[ 509.320084] EIP: 0060:[<c12e46de>] EFLAGS: 00014086 CPU: 0
[ 509.320084] EIP is at number.isra.2+0xe/0x350
[ 509.320084] EAX: c7eec000 EBX: c17e2633 ECX: 00000021 EDX: c1a46380
[ 509.320084] ESI: c17e2633 EDI: c1a45fbe EBP: c1a43618 ESP: c1a435ac
[ 509.320084] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 509.320084] CR0: 8005003b CR2: 00000000 CR3: 01a36000 CR4: 000006f0
[ 509.320084] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 509.320084] DR6: ffff0ff0 DR7: 00000400
[ 509.320084] Stack:
[ 509.320084] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 509.320084] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 509.320084] 00000000 c17e262e c17e262f c1a43618 c12e40e3 c1a43648 c17e262e c17e2632
[ 509.320084] Call Trace:
[ 509.320084] [<c12e40e3>] ? format_decode+0x323/0x390
[ 509.320084] [<c12e5f8c>] vsnprintf+0x2bc/0x390
[ 509.320084] [<c12e6076>] vscnprintf+0x16/0x30
[ 509.320084] [<c104990b>] vprintk_emit+0x12b/0x530
[ 509.320084] [<c161a120>] printk+0x50/0x52
[ 509.320084] [<c10347c9>] doublefault_fn+0x39/0x120
[ 509.320084] [<c14b8c20>] ? rtc_sysfs_del_device+0x30/0x30

James Hunt (jamesodhunt) wrote :

Testing with linux-image-3.11.0-0-generic from -proposed with nested=Y in all 3 environments does appear to solve this issue.

Stefan Bader (smb) wrote :

Serge mentioned today on IRC that this may be also a glibc issue, but I am not sure what exactly he was looking at and why he thinks so. Apparently ftracing (I guess the qemu process) and some partially inlining clone function (those with .isra extention) were involved.

Quoting Stefan Bader (<email address hidden>):
> Serge mentioned today on IRC that this may be also a glibc issue, but I
> am not sure what exactly he was looking at and why he thinks so.
> Apparently ftracing (I guess the qemu process) and some partially
> inlining clone function (those with .isra extention) were involved.

Right, ftrace seems the most likely culprit. In any case, since
a newer kernel is confirmed to fix it, disregard anything I said
about libc :)

Joseph Salisbury (jsalisbury) wrote :

Per comment #13, it sounds like this issue is resolved in the 3.11 kernel, which is what Saucy will be rebased to. Does this issue only exist in Saucy, or should the bug be targeted to other releases as well?

Changed in linux (Ubuntu Saucy):
status: Incomplete → Fix Committed
James Hunt (jamesodhunt) wrote :

I got the panic again last night. This was running a 3.11 kernel for the host, but of course the 1st and 2nd level guests were still running 3.10 as 3.11 is still in -proposed. So, although it's possible to get a test run with (3.11, 3.10, 3.10) I really need to try running with (3.11, 3.11, 3.11) to see if this provides a reliable solution.

Changed in linux (Ubuntu Saucy):
status: Fix Committed → In Progress
tags: added: kernel-da-key
removed: kernel-key
James Hunt (jamesodhunt) wrote :

Now that 3.11 is out of proposed, we're seeing a different kernel failure, starting with

run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.11.0-2-generic /boot/vmlinuz-3.11.0-2-generic
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-3.11.0-2-generic
Found initrd image: /boot/initrd.img-3.11.0-2-generic
Found memtest86+ image: /boot/memtest86+.bin
[ 776.392144] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:18758]
[ 776.392144] CPU: 0 PID: 18758 Comm: modprobe Tainted: GF 3.11.0-2-generic #5-Ubuntu
[ 776.432133] BUG: soft lockup - CPU#1 stuck for 23s! [systemd-udevd:333]
[ 776.432133] CPU: 1 PID: 333 Comm: systemd-udevd Tainted: GF 3.11.0-2-generic #5-Ubuntu
[ 776.432133] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 776.432133] task: f5f619c0 ti: f5f3c000 task.ti: f5f3c000
[ 776.432133] Stack:
[ 776.432133] Call Trace:
[ 776.432133] Code: 08 89 0c 24 89 f1 e8 16 fc ff ff e9 36 ff ff ff 90 55 ba 00 01 00 00 89 e5 f0 66 0f c1 10 0f b6 ce 38 d1 74 0d 8d 74 26 00 f3 90 <0f> b6 10 38 ca 75 f7 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00
[ 776.392144] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 776.392144] task: f56919c0 ti: f5746000 task.ti: f5746000

James Hunt (jamesodhunt) wrote :

The kernel BUG: above happens when running "bin/prepare-testbed -r saucy -d i386" from lp:auto-package-testing in canonistack:
the kvm instance spins at ~200% CPU again.

I modified bin/prepare-testbed slightly to log the output of the vm and disable byoubu. See attached.

James Hunt (jamesodhunt) wrote :

To recreate:

1) Follow instructions on internal wiki (page "InformationInfrastructure/IS/CanonicalOpenstack") to create a keypair.

2) Create an guest instance:

$ keyname="${USER}_lcy01"
$ image=$(euca-describe-images|grep saucy|head -1|awk '{print $2}')
$ euca-run-instances -k $keyname -t m1.medium $image

3) Identify IP address and instance name ("i-0000XXXX"):

$ euca-describe-instances

4) ssh into system

5) Install required packages:

$ sudo apt-get -y install genisoimage linux-image-extra-virtual bzr qemu-kvm qemu-utils

6) Setup groups

$ usermod -G kvm -a ubuntu

7) Logout and back in again to pick up kvm group membership.

8) Checkout bzr branch:

$ bzr branch lp:auto-package-testing
$ cd auto-package-testing

9) Apply patch on this bug to disable byobu and enable kvm logging (to /tmp/kvm.log)

10) Run the following:

bin/prepare-testbed -r saucy -d i386

11) tail -f /tmp/kvm.log and wait for massive cpu usage and kernel spew in /tmp/kvm.log.

12) Cleanup:

$ euca-terminate-instances i-0000XXXX

tags: added: patch
Stefan Bader (smb) wrote :

Before running prepare-testbed:

- create a directory in /mnt that user ubuntu can write to (say /mnt/adt)
- echo BASEDIR=/mnt/adt > ~/.adtrc

autopkgtest should then use /mnt which has lots of space on medium guests.

James Hunt (jamesodhunt) wrote :

Latest kernel panic (from 2nd-level guest) using following kernel versions:

host: 3.11.0-1-generic (-2 causes problems for my graphics hardware atm)
1st-level guest: 3.11.0-2-generic
2nd-level guest: 3.11.0-2-generic

Stefan Bader (smb) wrote :

I saw a similar crash when running the prepare script with a Saucy64 host, a Saucy32 1rst and Saucy32 2nd level guest. The 1rst level guest cpu set to core2duo and I also had to replace the qemu-system-x86_64 by qemu (which maps to the same arch). Otherwise the 2nd level guest was not started at all.

That would on one run give a double-fault as you had, but the second attempt did succeed. Though both runs had a lot of NMI (reason 21 and 31) messages in them.

Stefan Bader (smb) wrote :

Using a 64bit 1rst level guest seemed better on manual runs but running in loops seemed to have locked up on the third run. Though completely without any messages. I have to look into that next week. I need to run again with qmp enabled. At least this allows to check for the instruction pointer of the cpus. But unfortunately I don't know yet how to get any better data from the 2nd level guest.

Stefan Bader (smb) wrote :

Trying to summarize my testing so far:

Host: saucy 64bit, 1rst level saucy 32bit, 2nd level saucy 32bit:
- 2nd level guest did not start when using qemu-system-x86_64:
  - This may be some regression or feature, not sure, yet.
- Using qemu only (defaulting to same arch) the 2nd level guest starts:
  - I saw many NMI (21 / 31) messages which looks not good.
  - One run completed, the other caused a double fault in 2nd level.

Host: Saucy 64bit, 1rst level saucy 64bit, 2nd level saucy 32bit:
- Not ultimately required, but also using qemu (not qemu-system-x86_64).
- Works much better but doing many loops at some point there is a double
   fault too.

Host: Quantal 64bit, 1rst level saucy 32bit, 2nd level saucy 32bit:
- Had the lockups when using qemu-system-x86_64.
- Currently completed 3 runs (needs more time) with qemu only call.

Host: Quantal 64bit, 1rst level saucy 64bit, 2nd level saucy 32bit:
- With qemu only call now completed run 7.

This is not conclusive yet as I need more runs, but Quantal as host seems more stable than Saucy. The question is whether this is also true for Precise as host (unfortunately my test HW is too new, so its wired network is not supported back in Precise). James, maybe you want to try that. One critical change seems to be the use of qemu instead of qemu-system-x86_64. I am not sure whether starting a 64bit system with hardware acceleration from a 32bit system is something that is supposed to work. Well the emulated cpu likely claims 64bit capability but then I am not sure whether that combo is so well tested with respect to other interfaces.

Stefan Bader (smb) wrote :

So host Q and 1rst 32 completed 35 runs and 1rst 64 completed 40.

James Hunt (jamesodhunt) wrote :

The latest advice for running on Canonistack (kvm guest instance where nested=Y is to):

- Only run on a 64-bit Canonistack guest instance.
- Modify bin/prepare-testbed and bin/run-adt-test to start the autopkgtest VM using 'qemu-system-x86_64 -enable-kvm'.
- Have the DEP8 test running inside the autopkgtest VM start its VM using 'qemu -enable-kvm'.

Using the above but appending '-nographic >/tmp/kvm.log 2>&1' to each kvm command-line to allow console log capture results in the first created VM (the one created by prepare-testbed/bin/run-adt-test) locking up. Unfortunately, there is no output in kvm.log.

I'll retry without '-nographic' as smb has noticed this seems to affect the stability of the overall kvm stack. Even if this does work, it is clearly not ideal to discard console output (unless there is another magic qemu option we can try?)

James Hunt (jamesodhunt) wrote :

'qemu -serial file:/tmp/kvm.log' Looks like an alternative to 'qemu -nographic >/tmp/kvm.log 2>&1'.

Stefan Bader (smb) wrote :

They can be even together. The -nographic removes any emulated graphics card. I am not sure in detail how the cloud images work, but I believe serial line as a console is always on. Depending on which release (in that case I think it is the qemu user-space) the used setup would not work even. Because -monitor stdio redicrects the monitor to stdio and the default of serial was stdio, too. So they did clash (might depend on -nographic as well).

So when playing without -nogrphics, and crashes persist, you might play with -vga std which would give a std vga card instead of cirrus.

James Hunt (jamesodhunt) wrote :

Re #27, removing -nographic still causes the same lockup for me.

Stefan Bader (smb) wrote :

As it seemed that starting a 64bit hypervisor (qemu-system-x86_64) from a 32bit user-space running on another 64bit hypervisor has different issues which seem to go back even further, I concentrated on bisecting the case of host running 64bit user-space on a 64bit CPU and the first level guest running 64bit user-space under a 64bit hypervisor. The second level guest is started with a 64bit hypervisor but a 32bit userspace. All host and guests are Saucy and started with the same kernel.

Then for bisecting I only replaced the host kernel with kernels between v3.9 and v3.10. Then started a guest and ran the prepare-testbed repeatedly. The first change that broke things was:

commit 5f3d5799974b89100268ba813cec8db7bd0693fb
KVM: nVMX: Rework event injection and recovery

However at that point starting a 2nd level guest is completely broken until:

commit 5a2892ce72e010e3cb96b438d7cdddce0c88e0e6
KVM: nVMX: Skip PF interception check when queuing during nested run

So potentially the rework patch broke things and the current issue is remaining after fixing the major issues. Though it could be introduced by one of the patches in between but it cannot be bisected.

Stefan Bader (smb) wrote :

Btw, James, this means there is a simple but not very helpful work-around for Saucy: use AMD CPU based hosts. Did not check for 32bit user-space on 1rst level but with 64bit and had no issues. Not very surprising as the regression seems to be in the nested VMX code. But also not very useful when working in the cloud...

Robie Basak (racb) wrote :

Stefan: I assigned you in the server team meeting. Please unassign if you object. Thanks!

Changed in linux (Ubuntu Saucy):
assignee: nobody → Stefan Bader (smb)
Changed in qemu-kvm (Ubuntu Saucy):
assignee: nobody → Stefan Bader (smb)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
Seth Forshee (sforshee) wrote :

I tried reproducing this while using trusty on the host and was unable to do so. I also could not reproduce with the trusty kernel (currently 3.12) running with saucy userspace on the host. So as far as I can tell the issue is fixed in 3.12.

I took a stab at trying to bisect, but something between 3.11 and 3.12-rc1 is causing the machine to not boot. With 3.12-rc1 my VMs failed to boot, and with 3.12-rc2 I couldn't reproduce the issue at all. So it looks like bisecting to find the fix will be nearly impossible. Going through the kvm commits in 3.12 revealed a lot of changes related to running nested VMs but nothing that stood out from the pack.

So if anyone _really_ needs this fixed in saucy, let me know, otherwise I'm going to assume that having it fixed in trusty is good enough.

Stefan Bader (smb) on 2013-12-19
Changed in linux (Ubuntu):
assignee: Stefan Bader (smb) → nobody
Changed in linux (Ubuntu Saucy):
assignee: Stefan Bader (smb) → nobody
Changed in qemu-kvm (Ubuntu):
assignee: Stefan Bader (smb) → nobody
Changed in qemu-kvm (Ubuntu Saucy):
assignee: Stefan Bader (smb) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers