Ubuntu
linux package

Server installations on VMs fail to reboot after the installations

Bug #1100386 reported by Para Siva on 2013-01-16

This bug affects 1 person

	Status	Importance	Assigned to
linux (Ubuntu)	Won't Fix	High	Andy Whitcroft
Raring	Won't Fix	High	Andy Whitcroft
systemd (Ubuntu)	Fix Released	High	Andy Whitcroft
Raring	Invalid	Undecided	Unassigned
udev (Ubuntu)	Invalid	Undecided	Unassigned
Raring	Won't Fix	High	Andy Whitcroft

Bug Description

Raring and saucy server installations with both amd64 and i386 fail to reboot normally after the installations on VMs. This occurs with both amd64 and i386 images when installing VMs, (using libvirt and virt-manager and also using VirtBox).

This appears to be a regression started with Ubuntu 3.7.0-6-generic. Earlier versions do not have this issue.

On i386 installations booting via the recovery mode causes "Kernel panic - not syncing: Attempted to kill init! exit code 0x00000600" as shown in the attached image.

Latest amd64 (20130121) installations with virtual-host package selection also reported the kernel panic when booting via recovery mode with the same message above, "Kernel panic - not syncing: Attempted to kill init! exit code 0x00000600".

Standard booting causes the similar type of hang as that in i386 cases. (please see the video attached)

This issue can not be seen in hardware installations.

Steps to reproduce:

A) Manual steps:

1. Install raring server on a VM with no package selected, leaving the default answers for the questions. The host of the VM is irrelevant, this has been observed on raring, quantal and precise 64 bit hosts where the VM is installed.
2. Reboot after the grub installation is complete

B) Automated steps:
1. Do a preseed installation of raring desktop with the attached preseed file (virtual-host.preseed) and the virtual-host.run using utah. The automation instructions are below
2. Reboot the machine
(I used utah for automated installation, the how to is given in http://utah.readthedocs.org/en/latest/introduction.html#how-to-start-running-tests)
The steps are
1. use the attached .preseed file (attachment 15) and .run file (attachment 16) to execute the following command (please provide the absolute path to the files and the iso)
sudo -i -u utah run_utah_tests.py -i /path/to/iso -p /path/to/preseed /path/to/.run -n -x /etc/utah/bridged-network-vm.xml
2. Reboot the VM after the installation

See original description

Tags:

Related branches

lp:ubuntu/saucy-proposed/systemd

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-16:

i386_recovery_mode.jpg Edit (584.2 KiB, image/jpeg)

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-01-16: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1100386

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: raring

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-16: Re: Raring server installations on KVM fail to reboot after the installations

amd86_reboot.mp4 Edit (5.7 MiB, video/mp4)

The hang on amd64 installations is shown here. The same type of hang could be seen during i386 installation reboots when recovery mode is not selected. (when recovery mode is used in i386 installations the kernel panic as given above occurs)

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2013-01-16:

Do you also get a panic when booting i386 normally, not in recovery mode?

Changed in linux (Ubuntu):
importance:	Undecided → High
tags:	added: kernel-key

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2013-01-16:

Also, did this start happening on a recent daily image? Was there a prior image that did not exhibit this bug?

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-16:

log.tar.gz Edit (4.6 MiB, application/x-tar)

Contents from /var/log from an amd64 installation is attached herewith. Could not collect those logs for i386 for never being able to loginto an i386 installation

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-16:

I do not see the panic when booting the i386 normally. When booting the i386 is normally booted the behaviour is the same as the video attached. It just hangs the same way amd64 does when booted normally.

I could not say if this is a new issue. The automatic smoke tests passed until yesterday and since we could not run the auto tests reliably today i had to do manual installations.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2013-01-16:

Looking at the video, it seems like the VM is trying to perform a filesystem check. Did you try cancelling the filesystem check or waiting to see if it finishes?

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-17:

I tried but there is no response. None of the key presses did anything once it went to the screen in the video.

Para Siva (psivaa) on 2013-01-17

summary:

- Raring server installations on KVM fail to reboot after the
+ Raring server installations on VMs fail to reboot after the
installations

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-17: Re: Raring server installations on VMs fail to reboot after the installations

#10

VBox_kernel_panic_i386_recovery.png Edit (29.5 KiB, image/png)

I tried with 20130117 image of i386 on VBox and the kernel panic occurred when booting on recovery mode. Please see the attached screenshot.
On VBox the normal booting succeeds most of the time though as opposed to almost never on virt-manager.
The contents of /var/log on the i386 server installation on VBox is also attached below.

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-17:

#11

i386_vbox_log.tar.gz Edit (4.4 MiB, application/x-tar)

Contents of /var/log/ in an i386 server installation on VBox

Para Siva (psivaa) on 2013-01-17

description:

updated

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-21:

#12

This occurred during a virtual-host preseeded installation of amd64 raring server image. The VM hanged when booting normally and threw a kernel panic when booting via recovery mode.

https://jenkins.qa.ubuntu.com/view/Raring/view/Smoke%20Testing/job/raring-server-amd64-smoke-virtual-host/60/
is the impacted job

description:

updated

Para Siva (psivaa) on 2013-01-22

description:

updated

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-22:

#15

virtual-host.preseed Edit (12.5 KiB, text/plain)

The preseed file

description:

updated

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-22:

#16

virtual-host.run Edit (158 bytes, text/plain)

Utah runlist

description:

updated

Revision history for this message

James Hunt (jamesodhunt) wrote on 2013-01-22:

#17

This is bug 1096531 - looks like a standard job is behaving slightly differently under raring and exposing the issue.

Joseph Salisbury (jsalisbury) on 2013-01-22

tags:

removed: kernel-key

Revision history for this message

Stefan Bader (smb) wrote on 2013-01-22:

#18

We think this may actually be modeset related. Is it possible to preseed the installation (or change it for the failing installed VMs) to have nomodeset on the grub commandline?

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-22:

#19

So I installed an i386 server with nomodeset option selected and the normal reboots are working fine ( although the recovery mode path I tried out of curiosity still leads to the kernel panic) - The host is a 64 bit quantal running KVM and virt-manager

When i edited the grub command line on this installation to remove nomodeset, the standard reboot hangs.

Converse also conforms to the above pattern. i.e. An installation without nomodeset hangs on reboot but when I edited the grub command line to include nomodeset, the VM boots fine.

Revision history for this message

Para Siva (psivaa) wrote on 2013-01-23:

#20

The hang started on images with kernel version of Ubuntu 3.7.0-6-generic. This could be reproduced on the raring server i386 image of 20121213 and later

The hang can not be seen with images that contain kernel version 3.7.0-5-generic and earlier. Tested with raring server i386 images of 20121212 and earlier ones but could not reproduce the issue.

Para Siva (psivaa) on 2013-01-30

description:

updated

Stefan Bader (smb) on 2013-01-30

Changed in linux (Ubuntu):
assignee:	nobody → Andy Whitcroft (apw)

Revision history for this message

Andy Whitcroft (apw) wrote on 2013-01-30:

#21

I have managed to reproduce the apparent hangs, the recovery mode issues I have not; if they still exist they should be filed under a different bug.

For the apparent hangs, I have managed to confirm they are not hangs at all. What has happened is that we have lost the console completely. The kernel attempted to switch framebuffer devices and failed to do so, it successfully removed the efifb but failed to initialise cirrusfb. Now there is nothing to display console output. If you know the IP address of the image however it is pingable, and with openssh installed it is possible to login. Errors in the dmesg as below:

  [ 2.701082] fb: conflicting fb hw usage cirrusdrmfb vs EFI VGA - removing generic driver
  [ 2.704007] Console: switching to colour dummy device 80x25
  [ 2.717086] [drm:cirrus_vram_init] *ERROR* can't reserve VRAM
  [ 2.717093] cirrus 0000:00:02.0: Fatal error during GPU init: -6

Now this is something we have seen before. We are using efifb (a generic driver) but want to use a device specific driver to get 3d support. If plymouth opens the framebuffer before we switch over then we get into a hole where we cannot completly remove the old driver and as they share the same VRAM we cannot initialise the new one.

The correct solution would be to make the kernel able to force the open driver to close on plymouth and to allow the new one to start. We would also then need to fix plymouth to cope with the case where the framebuffer closes harshly on it and reconnect.

What we have done in the past (for vesafb) was to delay loading vesafb until after the better driver has had a chance to take and use the device, falling back to vesafb only when it did not appear. We cannot do this quite the same for efifb as it has to be builtin, but we can prevent efifb being identified as a primary framebuffer. This will mean we normally not start plymouth splash until after we have had a chance to detect the cirrus driver. If there is no alternative however, we will use efifb from the normal fallback path as used for vesafb. We have confirmed that vesafb will not load in this case as efifb has already claimed the device. Patch to follow.

I have managed to reproduce the apparent hangs, the recovery mode issues I have not; if they still exist they should be filed under a different bug.

For the apparent hangs, I have managed to confirm they are not hangs at all.  What has happened is that we have lost the console completely.  The kernel attempted to switch framebuffer devices and failed to do so, it successfully removed the efifb but failed to initialise cirrusfb.  Now there is nothing to display console output.  If you know the IP address of the image however it is pingable, and with openssh installed it is possible to login.  Errors in the dmesg as below:

[    2.701082] fb: conflicting fb hw usage cirrusdrmfb vs EFI VGA - removing generic driver
  [    2.704007] Console: switching to colour dummy device 80x25
  [    2.717086] [drm:cirrus_vram_init] *ERROR* can't reserve VRAM
  [    2.717093] cirrus 0000:00:02.0: Fatal error during GPU init: -6

Now this is something we have seen before.  We are using efifb (a generic driver) but want to use a device specific driver to get 3d support.  If plymouth opens the framebuffer before we switch over then we get into a hole where we cannot completly remove the old driver and as they share the same VRAM we cannot initialise the new one.

The correct solution would be to make the kernel able to force the open driver to close on plymouth and to allow the new one to start.  We would also then need to fix plymouth to cope with the case where the framebuffer closes harshly on it and reconnect.

What we have done in the past (for vesafb) was to delay loading vesafb until after the better driver has had a chance to take and use the device, falling back to vesafb only when it did not appear.  We cannot do this quite the same for efifb as it has to be builtin, but we can prevent efifb being identified as a primary framebuffer.  This will mean we normally not start plymouth splash until after we have had a chance to detect the cirrus driver.  If there is no alternative however, we will use efifb from the normal fallback path as used for vesafb.  We have confirmed that vesafb will not load in this case as efifb has already claimed the device.  Patch to follow.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2013-05-13:

#22

FYI, this is still a problem in 13.04 release. I installed 13.04 amd64 server and could not get a login prompt. Adding 'nomodeset' works fine. I imagine I could also use the 'vmvga' driver instead of 'cirrus'.

Brad Figg (brad-figg) on 2013-05-14

tags:

added: kernel-stable-key

Para Siva (psivaa) on 2013-05-20

summary:	- Raring server installations on VMs fail to reboot after the - installations + Server installations on VMs fail to reboot after the installations
description:	updated

Revision history for this message

Andy Whitcroft (apw) wrote on 2013-05-21:

#23

Although this is fundamentally a kernel issue, the current kernel infrastructure would really only be able to abort anyone use the 'being replaced' framebuffer when we switch from efifb to a DRM framebuffer. This change will likely be extensive and slow to get through upstream. Also plymouth will have to be modified to handle being aborted and reconnect to the replacement transparently. In the short term we can avoid this issue the same way we avoided it for vesafb by demoting the driver to a secondary display. This means we only use efifb at all if no DRM driver appears. Neatly avoiding the issue.

As we have now (early saucy) moved udev over to systemd sources I have proposed this against both udev and systemd. udev for raring and systemd for saucy. I will attach patches once they are tested.

Changed in linux (Ubuntu Raring):
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → Andy Whitcroft (apw)
Changed in systemd (Ubuntu Raring):
status:	New → Invalid
Changed in udev (Ubuntu):
status:	New → Invalid
Changed in systemd (Ubuntu):
status:	New → In Progress
Changed in udev (Ubuntu Raring):
status:	New → In Progress
Changed in systemd (Ubuntu):
importance:	Undecided → High
Changed in udev (Ubuntu Raring):
importance:	Undecided → High
assignee:	nobody → Andy Whitcroft (apw)
Changed in systemd (Ubuntu):
assignee:	nobody → Andy Whitcroft (apw)

Martin Pitt (pitti) on 2013-05-22

summary:	- Server installations on VMs fail to reboot after the installations + [udev] Server installations on VMs fail to reboot after the + installations
summary:	- [udev] Server installations on VMs fail to reboot after the - installations + Server installations on VMs fail to reboot after the installations

Revision history for this message

Andy Whitcroft (apw) wrote on 2013-05-22:

#24

lp1100386-systemd.diff Edit (1.1 KiB, text/plain)

systemd patch for saucy.

Martin Pitt (pitti) on 2013-05-22

Changed in systemd (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Andy Whitcroft (apw) wrote on 2013-05-22:

#25

Marking Won't Fix for the kernel as this is a very big effort kernel side and we are going to avoid it in udev rules.

Changed in linux (Ubuntu Raring):
status:	Confirmed → Won't Fix
Changed in linux (Ubuntu):
status:	Confirmed → Won't Fix

Revision history for this message

Launchpad Janitor (janitor) wrote on 2013-05-22:

#26

This bug was fixed in the package systemd - 202-0ubuntu7

---------------
systemd (202-0ubuntu7) saucy; urgency=low

  [ Martin Pitt ]
  * debian/*: Replace remaining "udevadm info --run" invocations with
    /run/udev/. (LP: #1182788)
  * Add 0020-persistent-storage-rule-mmc-partname.patch: Create disk/by-name
    links for mmcblk partitions if they have a PARTNAME property. Patch by
    Ricardo Salveti de Araujo, taken from udev 175-0ubuntu29.

  [ Andy Whitcroft ]
  * debian/extra/rules/78-graphics-card.rules -- demote efifb to a secondary
    display adaptor as in the majority of cases this will be replaced by
    a DRM driver. (LP: #1100386)
-- Martin Pitt <email address hidden> Wed, 22 May 2013 12:09:59 +0200

Changed in systemd (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2014-12-05:

#27

raring has seen the end of its life and is no longer receiving any updates. Marking the raring task for this ticket as "Won't Fix".

Changed in udev (Ubuntu Raring):
status:	In Progress → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

lp1100386-systemd.diff Edit

Add patch

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Server installations on VMs fail to reboot after the installations

Bug Description

Related branches

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux package