xen guest hangs after mounting filesystem

Bug #144631 reported by Todd Deshane
52
Affects Status Importance Assigned to Milestone
xen-3.1 (Ubuntu)
Expired
Undecided
Unassigned
xen-3.2 (Ubuntu)
Expired
Undecided
Unassigned
xen-tools (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

xm create -c feisty
.
.
.
blkfront: xvda1: barriers enabled
XENBUS: Device with no driver: device/console/0
Freeing unused kernel memory: 180k freed
AppArmor: AppArmor initialized<5>audit(1190679352.582:2): type=1505 info="AppArmor initialized" pid=638
fuse init (API version 7.8)
Failure registering capabilities with primary security module.
thermal: Unknown symbol acpi_processor_set_thermal_limit
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
<HANGS HERE>

I will attach the full outpu, the config, strace, etc.

I tried two guests, one from scratch and one using the xen-tools script (xen-create-image)

I will attach the relevant logs for each.

Revision history for this message
Todd Deshane (deshantm) wrote :

config for feisty image made manually with debootstrap

Revision history for this message
Todd Deshane (deshantm) wrote :

output of xm log

Revision history for this message
Todd Deshane (deshantm) wrote :

full output of feisty boot process

Revision history for this message
Todd Deshane (deshantm) wrote :

strace of full boot of feisty

Revision history for this message
Todd Deshane (deshantm) wrote :

the config for the guest made with xen-create-image (from xen-tools) is attached

The behavior is the same, I don't notice any big differences in behavior compared to feisty

it freezes a few lines longer because of swap and th loading of a couple things:

EXT3-fs: mounted filesystem with ordered data mode.
Adding 131064k swap on /dev/sda2. Priority:-1 extents:1 across:131064k
EXT3 FS on sda1, internal journal
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: <email address hidden>
NET: Registered protocol family 17
lo: Disabled Privacy Extensions
Mobile IPv6
<HANG>

Also the strace goes the same amount "farther":

     select(7, [0 4 6], NULL, NULL, NULL) = 1 (in [6])
                                                          read(6, "EXT3 FS on sda1, internal journa"..., 512) = 35
                                  write(1, "EXT3 FS on sda1, internal journa"..., 35EXT3 FS on sda1, internal journal
) = 35
      select(7, [0 4 6], NULL, NULL, NULL) = 1 (in [6])
                                                          read(6, "device-mapper: ioctl: 4.11.0-ioc"..., 512) = 82
                                  write(1, "device-mapper: ioctl: 4.11.0-ioc"..., 82device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: <email address hidden>
) = 82
      select(7, [0 4 6], NULL, NULL, NULL) = 1 (in [6])
                                                          read(6, "NET: Registered protocol family "..., 512) = 36
                                  write(1, "NET: Registered protocol family "..., 36NET: Registered protocol family 17
) = 36
      select(7, [0 4 6], NULL, NULL, NULL) = 1 (in [6])
                                                          read(6, "NET: Registered protocol family "..., 512) = 36
                                  write(1, "NET: Registered protocol family "..., 36NET: Registered protocol family 10
) = 36
      select(7, [0 4 6], NULL, NULL, NULL) = 1 (in [6])
                                                          read(6, "lo: Disabled Privacy Extensions\n"..., 512) = 46
                                   write(1, "lo: Disabled Privacy Extensions\n"..., 46lo: Disabled Privacy Extensions
Mobile IPv6
) = 46
      select(7, [0 4 6], NULL, NULL, NULL

I can try post other logs or information as needed, let me know.

Revision history for this message
Todd Deshane (deshantm) wrote :

Also this is on i386 with the latest updates available... i.e.

linux-meta (2.6.22.12.15) gutsy; urgency=low

  * Add cell flavour on powerpc.

 -- Colin Watson <email address hidden> Sun, 23 Sep 2007 21:47:42 +0100

linux-meta (2.6.22.12.14) gutsy; urgency=low

  * ABI bump for -12.
  * Add virtual flavour on i386.
  * Add xen flavour on amd64.

 -- Colin Watson <email address hidden> Fri, 21 Sep 2007 13:12:09 +0100

I am more than willing to test on amd64, but i have had about the same bad luck on it.

Also there was a regression as some point, as the xen stuff from before the linux-xen meta package was available for amd64 worked great.

I should still have that available to test with so I will see if i can "dust it off" and verify that it works as well as I say.

Revision history for this message
Todd Deshane (deshantm) wrote :

OK. I am back running xen kernel 2.6.19-4-generic-amd64. It works great, booted the feisty guest and a guest created with xen-create-image with no problems.

I also booted my Windows XP guest too.

So this is the last xen in the gusty series that works for me:
 2.6.19-4-generic-amd64

ii libxen3.1 3.1.0-0ubuntu15 library interface for Xen, a Virtual Machine
ii python-xen-3.1 3.1.0-0ubuntu15 python bindings for Xen, a Virtual Machine M
ii xen-docs-3.1 3.1.0-0ubuntu15 documentation for XEN, a Virtual Machine Mon
ii xen-hypervisor-3.1 3.1.0-0ubuntu15 The Xen Hypervisor for i386, amd64 amd lpia
ii xen-image-2.6.19-4-generic-amd64 2.6.19-2ubuntu7 Linux kernel image for version 2.6.19 on x86
ii xen-ioemu-3.1 3.1.0-0ubuntu15 XEN administrative tools
ii xen-tools 3.5-1ubuntu1 Tools to manage debian XEN virtual servers
ii xen-utils-3.1 3.1.0-0ubuntu15 XEN administrative tools
ii xenman 0.6-1ubuntu1 A graphical Xen management tool

This was all back just before tribe5 released.

I am willing to help figure out what the problem is. I have access to i386 and amd64 hardware and software, with xen working and broken.

Let me know what needs to be tested where and I can try to provide the information and/or help as I can.

Revision history for this message
mikmak (mikmak) wrote :

in gutsy amd64 host here,
after fixing the network/bridge script
the guest boots, but the console is not working
I can ssh to the guest though and it's up and working fine otherwise

might just be the init console which is broken somewhere

Mik

Revision history for this message
Takeshi Sone (ts1) wrote :

Try adding extra='xenconsole=tty'.
I think this should be default in xen-tools.

Revision history for this message
William Grant (wgrant) wrote :

Confirmed here on i386 (Pentium M @ 1.6GHz, if it might matter). I've tried variations of ext2, ext3, reiserfs, and they all just hang. At one stage I did get the warning about init being slow due to bad tls emulation or so, but only once, and it didn't get any further.

Changed in xen-3.1:
status: New → Confirmed
Revision history for this message
Todd Deshane (deshantm) wrote :

Adding the extra= line above doesn't work.

I also looked into the other things reported in https://bugs.launchpad.net/bugs/139046 but nothing seemed to work or was applicable. It hanged on the same spot regardless of the change.

I never had any tls warnings on this one.

Also, recall that the same images works perfectly on an older Xen kernel

Revision history for this message
William Grant (wgrant) wrote : Re: [Bug 144631] Re: xen guest hangs after mounting filesystem

I've also tried replacing /sbin/init with dash, but that doesn't execute
either.

Revision history for this message
Paul Wagland (paul-kungfoocoder) wrote :

I was having exactly the same problem, and the above extra line did not help me. However the line:

extra='xencons=tty'

did work for me, and I now have a booted domain that was not previously booting. Please note that this may disable the framebuffer. If you need that you will need to look at xencons=xvc, or at least that is what I think you nee to look at. I do not need the framebuffer, I just came across xvc when trying to figure out what xencons means :-)

Revision history for this message
Todd Deshane (deshantm) wrote :

Thanks Paul, that did work.

Do anyone know what the real fix should be?

Why does it not work out of the box like it used to?

Revision history for this message
William Grant (wgrant) wrote :

I'd say this is more of a bug in xen-tools. I've worked out how to get them booting using the default value for xencons (ie. using /dev/xvc0).

One addition is required in each domU config:
 extra = 'console=xvc0'
Apparently the kernel doesn't detect it automatically.

Once that is done, a couple of changes are needed in the filesystem:
 - xvc0 must be added to /etc/securettys
 - an extra upstart job for xvc0 must be added to /etc/event.d (basically just copying tty1, and modifying the getty parameter)

Unfortunately, the Gutsy hooks seem broken, and don't remove either of the hwclock.sh or hwclockfirst.sh rc scripts, so it may hang on the clock-setting stage of boot. Removing both of those from /etc/rcS.d should get it up and running, finally.

Revision history for this message
Miguel Araujo (infolinuxblog) wrote :

I'm testing XEN 3.1 in Ubuntu Gutsy Gibbon. When I start a Feisty debootstrap domU with a .sxp configuration file manually created from scratch, It hangs after mounting the ext3 filysystem, happening exactly the same as Todd Deshane described above. The problem is that the virtual machine does not respond to a ping, so It's not really running. I have read you have fixed the problem adding the xencons line, which doesn't work for me. But what I don't understand is how this line is related to hanging after mounting the filesystem?

I have run XEN 3.1 in a Feisty Server machine before and I think I have some experience with XEN. But this time I haven't been able to fix the problem, because I'm not sure what it is.

Thanks a lot for your time, any feedback will be welcomed.

Miguel

Revision history for this message
Todd Deshane (deshantm) wrote :

William: I really appreciate your efforts in tracking down fixes for things. I think we need to figure out how to make this work out of the box. Adding anything extra to the guest is unexceptable. Just before tribe5 I didn't have this problem.

Let's see if we can come up with a set up patches to packages within the xen-3.1 set that fix the problem and let guests work out of the box.

Miguel: Can you make sure that you got the xencons line right?

extra='xencons=tty'

I am pretty sure that the only reason it hangs after mounting the file system is because after that point it wants to write output to the console, the console detected by the kernel and not on the ramdisk anymore. So it does seems right that it is a console missing problem. I do know that others have reported being able to ssh (so presumably ping) the guest at that point, but I have never tried.

Revision history for this message
Stephen Touset (stephen-touset) wrote :

It looks to me like the xencons solution is fixing another problem: Bug #139046.

This problem doesn't have anything to do with the console, since it actually renders the machine unbootable. Nothing happens after the filesystem driver is loaded, with the box spiking at 100% CPU usage and no network coming up.

The workaround I've found in the meantime is to install the 2.6.19-4 kernel mentioned above on the xen dom0 and domU machines, then specify that as the kernel to boot from in the configuration for the domU machine.

Installing the kernel on the domU box is necessary so that all the loadable modules are available, since the kernel isn't compiled with the needed modules statically.

Revision history for this message
Todd Deshane (deshantm) wrote :

Bug #139046 is a different problem, though it is very related. It is specific to the guest type, in the case of bug #139046 it seems specific to edgy guest for example. The fixes/workarounds don't apply directly to a feisty guest for example.

The 2.6.19-4 is the kernel that I was using too until the xencons thing worked for me.

Putting the modules into the domU, is better, but not absolutely necessary for it to boot, problems may come up if you need a module though. Ideally, we should try to move to the pygrub bootloader method of booting and require the domU to provide its own kernel and modules as needed. I will look into this at some point.

My question still is: what broke in the kernels after 2.6.19-4, which is when the xen kernels started matching the Ubuntu kernel numbers, that caused the hanging of guest and/or missing console?

Revision history for this message
Stephen Touset (stephen-touset) wrote :

Right. Bug #139046 is a different problem, but the symptoms are roughly similar. I believe that is causing some confusion about the true nature of this bug.

I firmly believe that this bug is entirely unrelated to the console. SSHing to the box after boot does _not_ work, nor does pinging it. The boot process literally halts at this stage, and nothing is done afterwards. Proof includes:

  1. Mounting the domU filesystem inside dom0 still works, since boot never continues
  2. SSH and other network activity fails outright
  3. Logs on the domU machine remain completely untouched and empty
  4. CPU usage on the guest soars to 100% and never comes down

A problem only related to displaying console output wouldn't have these side effects. The solution mentioned _only_ works for those suffering from bug #139046, and not this bug (and, as an aside, I don't believe it's the best solution to that problem either -- much better is to simply change the getty to listen on xvc0).

Revision history for this message
Todd Deshane (deshantm) wrote :

Stephen: I see what you are getting at now....

It seems to me that you are right, there are more like 3 bugs here.

1) The console one, which is not completely a dup of <a href="https://bugs.launchpad.net/bugs/139046">bug #139046</a>

2) Not being able to SSH into domU even after (did you report this one Stephen?)

3) For me networking is not working.. I am not sure what the cause of this is... I will either figure it out (if it is my configuration's problem) or report a bug on it.

4) The guest hanging in general is still a bug and has not the best workarounds (This bug is still open for that)

Let me know if I can test anything in particular or try any possible fixes etc. I will try to dig in to figure out the cause of all this.

Please post details and/or start new bugs that may help clear up the problem and make it clear what you are seeing.

Revision history for this message
William Grant (wgrant) wrote :

If everybody specifies either 'console=xvc0' or 'xencons=tty0' in their guest, it should get further, so we can see what's actually happening in each case. Even after modifying /etc/securettys and adding a getty on xvc0, I had to poke a couple of things inside the domU filesystem to get a Gutsy domU to finish booting (basically because the hooks don't do everything they should). Mine also hangs for a while in udevsettle (`Loading hardware drivers').

Revision history for this message
Todd Deshane (deshantm) wrote :

quick update for those following a potential network problem as well. I filed a bug for the network problem I am seeing here: https://bugs.launchpad.net/ubuntu/+source/xen-3.1/+bug/150805

Revision history for this message
Miguel Araujo (infolinuxblog) wrote :

#17 Todd: I have just checked the xencons line and it is correct, but when booting the domU with 2.6.22 kernel it hangs after mounting the filesystem as always. As you (Stephen and you) have been discussing, I think this is related to bug #139046 as you do, and because of the similar symptoms it could be confused with this one.

#18 Stephen: "The workaround I've found in the meantime is to install the 2.6.19-4 kernel mentioned above on the xen dom0 and domU machines, then specify that as the kernel to boot from in the configuration for the domU machine."

This is working for me either, and I think Todd also said that he had managed to get domU running with this kernel. So we get to the point that something in 2.6.22 generic xen kernel could be broken or bad configured.

#19 Todd: "My question still is: what broke in the kernels after 2.6.19-4, which is when the xen kernels started matching the Ubuntu kernel numbers, that caused the hanging of guest and/or missing console?"

From my point of view, this is the million dollar question.

So to sum up. I'm using Gutsy server beta in a Poweredge 860 Dell, which has probably different network cards than yours do. Anyway I don't rule out a network related problem, but It is rare that we all suffer the same symptoms and probably the unique thing we have in common is that the 2.6.22 xen kernel doesn't work and the 2.6.19 amd64 does.

When I boot a domU with the 2.6.22, even if I create it without console it hangs and I have never seen it answering a ping (which makes impossible to ssh by default).

I hope we will fix this before the final Gutsy release.

Regards

Revision history for this message
Alvin Cura (alvinc) wrote :

I think we may be chasing the wrong problem by pursuing xen-tools for the fix.

This problem is also 100% reproducible using manual domain creation.

The only working fix I have found is to add extra='xencons=tty1' to the xendomu config file.

This would be incorrect behaviour. It should work out-of-the-box.

My method for creating the xendomu was using a self-written script as follows:

#!/bin/sh
######################################################################
# $Id: //depot/hosts/xen1/root/mkxenvm#3 $
# $DateTime: 2007/10/16 10:45:21 $
######################################################################

usage()
{
        printf "USAGE: ${0}\t<xendom> <xen volume group>\n"
        printf " \t\t[root part size] [swap part size]\n"
        exit
}

if [ -z ${1} ]; then usage; else xendomu=${1}; fi
if [ -z ${2} ]; then usage; else xenvg=${2}; fi
if [ -z ${3} ]; then rootsize="8G"; else rootsize=${3}; fi
if [ -z ${4} ]; then swapsize="2G"; else swapsize=${4}; fi

printf "Creating root volume /dev/${xenvg}/${xendomu}_root with size ${rootsize}: "
lvcreate -L${rootsize} -n${xendomu}_root ${xenvg}
printf "done.\n"

printf "Creating swap volume /dev/${xenvg}/${xendomu}_swap with size ${swapsize}: "
lvcreate -L${swapsize} -n${xendomu}_swap ${xenvg}
printf "done.\n"

printf "Creating ext3 filesystem on /dev/${xenvg}/${xendomu}_root: "
mkfs.ext3 -L${xendomu}_root /dev/${xenvg}/${xendomu}_root
printf "done.\n"

printf "Creating swap partition on /dev/${xenvg}/${xendomu}_swap: "
mkswap /dev/${xenvg}/${xendomu}_swap
printf "done.\n"

printf "Mounting /dev/${xenvg}/${xendomu}_root on /tmp/${xendomu}: "
mkdir /tmp/${xendomu}

printf "Debootstrapping ${xendomu}: "
debootstrap gutsy /tmp/${xendomu}
printf "done.\n"

printf "Copying modules to ${xendomu}: "
cp -r /lib/modules/`uname -r` /tmp/${xendomu}/lib/modules/
printf "done.\n"

printf "Setting up fstab: "
printf "/dev/sda1\t/\t\text3\trw,errors=remount-ro\t0\t1\n" >> /tmp/${xendomu}/etc/fstab
printf "/dev/sda2\tnone\t\tswap\tdefaults\t0\t0\n" >> /tmp/${xendomu}/etc/fstab
printf "none\t\t/proc\t\tproc\trw,nosuid,noexec\t0\t0\n" >> /tmp/${xendomu}/etc/fstab
printf "done.\n"

printf "Setting up hostname: "
echo "${xendomu}" > /tmp/${xendomu}/etc/hostname

printf "done.\n"

printf "Setting up hosts: "
printf "127.0.0.1\tlocalhost\n" > /tmp/${xendomu}/etc/hosts
printf "done.\n"

printf "Setting up network interfaces: "
printf "auto lo\n" >> /tmp/${xendomu}/etc/network/interfaces
printf "iface lo inet loopback\n" >> /tmp/${xendomu}/etc/network/interfaces
printf "done.\n"

printf "Setting up apt sources: "
echo "deb http://archive.ubuntu.com/ubuntu gutsy main universe" > /tmp/${xendomu}/etc/apt/sources.list
printf "done.\n"

printf "Updating apt: "
chroot /tmp/${xendomu} apt-get update
printf "done.\n"

printf "Disabling threads: "
mv /tmp/${xendomu}/lib/tls /tmp/${xendomu}/lib/tls.disabled
printf "done.\n"

Revision history for this message
Stephen Touset (stephen-touset) wrote :

Can anyone confirm progress made on this bug? Since the "workaround" involves using a kernel that can only run one Xen domU at a time, this effectively kills Xen on Gutsy.

Revision history for this message
Duane (duane-e164) wrote :

"I think this should be default in xen-tools."

You just need to add one line to a tmpl file:

echo "extra = ' TERM=xterm xencons=tty console=tty1'" >> /etc/xen-tools/xm.tmpl

Revision history for this message
Duane (duane-e164) wrote :

"printf "Disabling threads: "
mv /tmp/${xendomu}/lib/tls /tmp/${xendomu}/lib/tls.disabled
printf "done.\n""

I don't think this needs to be done if you are installing libc6-xen

"Can anyone confirm progress made on this bug?"

The hwclock.sh scripts hold the whole thing up afaik, before launching a new DomU, I get into the filesystem and

update-rc.d -f hwclock.sh remove
update-rc.d -f hwclockfirst.sh remove

Also for some reason the DomU's seem to hang if the IP in the config differs from the IP it was setup with.

Revision history for this message
Duane (duane-e164) wrote :

Forgot to mention, in /etc/xen-tools/xen-tools.conf

Down the bottom of the file it has:

# If you're using a newer version of the Xen guest kernel you will
# need to make sure that you use 'xvc0' for the guest serial device,
# and 'xvdX' instead of 'sdX' for serial devices.
#
# serial_device = tty1 #default
# serial_device = xvc0
#
# disk_device = sda #default
# disk_device = xvda

Revision history for this message
Duane (duane-e164) wrote :

Actually I've been digging more into this, as part of xen-create-image hwclock should be disabled by:

/usr/lib/xen-tools/gutsy.d/15-disable-hwclock

However the gutsy.d directory is really a symlink to edgy.d and this doesn't seem to work for gutsy for some reason.

/usr/lib/xen-tools/gutsy.d/30-disable-gettys is supposed to disable getty on tty's 2 to 6, which is probably does, but it should probably have a line something like...

cat ${prefix}/etc/event.d/tty1 | sed "s/tty1/xvc0/" > ${prefix}/etc/event.d/xvc0

Added beneath:

rm ${prefix}/etc/event.d/tty[!1]

Since Xen is shifting from tty's to xvc's etc...

Revision history for this message
Duane (duane-e164) wrote :

Seems a udev entry is blocking bootup, edit /usr/lib/xen-tool/gutsy.d/25-disable-hwclock and add the following line:

rm -f ${prefix}/etc/init.d/hwclock.sh ${prefix}/etc/init.d/hwclockfirst.sh ${prefix}/etc/udev/rules.d/85-hwclock.rules

Below these lines:

chroot ${prefix} /usr/sbin/update-rc.d -f hwclock.sh remove
chroot ${prefix} /usr/sbin/update-rc.d -f hwclockfirst.sh remove

Revision history for this message
Stephen Touset (stephen-touset) wrote :

The hwclock scripts have nothing to do with this bug.

Removing them did not cause bootup to continue, nor should it have. The guest instances hang immediately _before_ even mounting the root filesystem, so none of the init scripts have even had a chance to run.

Revision history for this message
Duane (duane-e164) wrote :

Ummm maybe my eyesight is going on me, but the bug report says "xen guest hangs *after* mounting filesystem", no idea why it's hanging for you before the file system is mounted but the hwclock stuff was deff hanging my DomU gutsy guests.

Revision history for this message
Stephen Touset (stephen-touset) wrote :

The console shows "EXT3-fs: mounted filesystem with ordered data mode." which would normally cause you to believe it's already mounted the filesystem. But in reality, it hasn't.

With this bug, you're still able to mount the filesystem in the dom0 even while the domU machine is supposedly "running".

Revision history for this message
Duane (duane-e164) wrote :

I'm not sure how safe it is to mount the file system like that twice. I've had systems in a mess because of doing similar things and they hard lockup on me as a result.

Dom0 has to be able to mount the file systems of any DomU at any time, otherwise the DomU's wouldn't see their data. I don't think there is any code in Xen or the linux kernel, or mount utils to prevent mounting the same file system multiple, and linux has a history of letting the user hang themselves in such ways because of corner cases where doing what looks to be a silly thing is actually a desirable thing. Flexability is a wonderful thing in the hands of someone capable of dealing with any problems that arise as a result.

As for your issue did you shutdown the DomU, and then remove all the hwclock stuff I pointed out including the file in the /etc/udev/rules.d directory, I think this is the main culperate, every time I forget to remove it the DomU hangs.

If you don't believe me about hwclock causing hangs then you could look through some of the xen-tool scripts, including one called 15-disable-hwclock.

# dpkg -L xen-tools|grep hwclock
/usr/lib/xen-tools/debian.d/15-disable-hwclock
/usr/lib/xen-tools/edgy.d/15-disable-hwclock
/usr/lib/xen-tools/dapper.d/15-disable-hwclock

However this script only deals with hwclock scripts in init.d, not in udev, this seems to be a new for Gutsy thing.

As for still not believing me, go into google, type in "xen hwclock hang" and there is almost 1000 results, some dating back to at least 2003, so yes hwclock can hang a DomU, however the introduction of a udev rule when /dev/rtc appears causes hwclock to run and the whole thing to hang.

Revision history for this message
dpates (dpates) wrote :

It seems that the problem is the /lib/udev/set_hwclock script; it calls hwclock, even when HWCLOCKACCESS is set to 'no' in /etc/default/rcS (which is obeyed by the /etc/init.d/hwclockfirst.sh script). I tried turning off all the 'x' bits on /lib/udev/set_hwclock, and then a Xen domU running gutsy boots fine. Perhaps someone should just check $HWCLOCKACCESS in the script, as is done in hwclockfirst.sh?

Revision history for this message
Paul Waldo (pwaldo) wrote :

I can confirm that /lib/udev/set_hwclock is problematic. My gutsy domU took forever to boot, and when it did, top showed hwclock hogging the CPU. I removed execute permissions on /lib/udev/set_hwclock and now it boots quite snappily.

Revision history for this message
Anton Wurscht (wurscht) wrote :

Exactly the same problems, adding a new line to the DomU configuration file:
extra='xencons=tty'
fixed it. Problem is not related to hwclock, this may be another issue. However I see lockups apparently in the loopback driver, "dd" segfaulted durcing setup, etc... had to hard reset the server several times today

Revision history for this message
nettraptor (nettraptor) wrote :

Same problems here with ubuntu gutsy, adding a new line to the DomU configuration file:
extra='xencons=tty'

This does make the VMs Virtually work but now they hung in the state of "setting the system clock.." forever!

In other words, this must be a complicated problem an yes it might be the case that we have a problem there as well.

I have seen many problems with gutsy and Xen 3.1. I simply and strongly believe that it is not a prod setup by any means.

First I got the /lib/tls problem, then the extra='xencons=tty', now the hwclock and in the past loads of problems with the cupsys incompatibilities in the Host (don't ask why i wanted cupsys.

The case we are facing now is the worst. Guest domains will simply not boot properly without great intervention. Furthermore the minute you solve one problem another one shows up

Revision history for this message
Fred H (fred-hensley-bereanservices) wrote :

I too am experiencing precisely the same issues as reported above on my Dell 430SC server with dual core pentium D920 processors running 64 bit Ubuntu 7.10 gutsy server and Xen 3.1 ( 2.6.22-14-xen #1 SMP Sun Oct 14 23:20:20 GMT 2007 x86_64 GNU/Linux).

However, immediately after initially creating my two gutsy "guests" on this server via xen-tools they worked flawlessly. Only after shutting them down (xm shutdown) and restarting (xm create) did the boot lockup hit them hard. Granted, I could always try to leave the guest instances running (kidding).... :)

I have followed the examples above, adding extra='xencons=tty' to my guest configuration files and removing execute privileges from the hardware clock script referenced above. Now both gutsy guests boot, but neither has any network connectivity. Bummer.. Obviously, I will also repost this to the new bug report 150805 submitted by Todd D.

So, bottom line, is there any new progress on this bug? Do these issues seem related? Are there any other work-arounds which can help? Personally, I'm deciding whether to: (a) downgrade back to xen 3.04, (b) downgrade ubuntu to 7.04 (feisty) or 6.06 (dapper), or (c) simply downgrade the kernel to 2.6.19-4.. Someone posted earlier that the 2.6.19-4 kernel only supported one guest instance, but that did not make sense to me why that would be the case...

Anyone? Anyone? Buehler? :) :)

Thanks,

-Fred-
.

Revision history for this message
Anton Wurscht (wurscht) wrote :

I have also seen lockups of the Xen Dom0 with no DomU started under heavy I/O by a process owned by root. Strangely bonnie++ runs fine under an unprivileged account. I think this is not at all ready for production and will scratch the Ubuntu 7.10 gutsy server installation now and move to something else.

Trying to install the 64bit version to see if that works even got install errors:

Unpacking xen-utils-3.1 (from .../xen-utils-3.1_3.1.0-0ubuntu18_amd64.deb) ...
dpkg: error processing /var/cache/apt/archives/xen-utils-3.1_3.1.0-0ubuntu18_amd64.deb (--unpack):
 trying to overwrite `/etc/udev/xen-backend.rules', which is also in package xen-utils-common
dpkg-deb: subprocess paste killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/xen-utils-3.1_3.1.0-0ubuntu18_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Not really impressing

Revision history for this message
Frank Abel (frankabel) wrote :

After install gusty ubuntu-xen-server packages, the system boot don't pass "Setting the system clock", what can I do for solve this? Any workaround at least yet?

I'm using a virtual machine to test (VMWare). Any body have installed this packages with successful? Or it is broken at all?

Revision history for this message
sapphirepaw.org (static-sapphirepaw) wrote :

Just to add my experience here: I had some stuff working on Xen-3.0, Kubuntu feisty host, gutsy-server guest. Paravirtualized, using kernel 2.6.19-4-generic, on a socket-A based host. I upgraded dom0 to gutsy (and fixed the damage after the upgrade tool crashed), updated my config to point to the new kernel and initrd (2.6.22-14-xen), and none of the Xen domains would boot.

Zeroth fix: according to Xen's website somewhere, xend requires python-xml for the shiny new XMLRPC interface, but the Ubuntu package doesn't depend on it. I had some warnings about API calls not being found in the xend log until I installed python-xml. I don't know if that was actually a problem or not. (Given the state of xen-3.1 documentation right now, I'm lucky I ever found out that much.)

First problem: no output except from the kernel, ending with the "EXT3-fs: mounted filesystem with ordered data mode." message. First solution: passing "xencons=tty" in extra. This puts out a bunch of "Couldnt get a file descriptor referring to the console" messages, even with console=tty0 included.

Second problem: hwclock hangs, using 100% CPU (as judged by the 'time' column from 'xm list' in dom0). Second solution: get rid of hwclock. I was desperate to make things work at this point, because I foolishly upgraded before finishing a major project for somebody, which was being tested in one of the domUs. So I disabled hwclock* in /etc/rc?.d and stripped execute permissions from /lib/udev/set_hwclock as mentioned in this bug report. After that, domU booted happily.

IMHO, XenSource should ship this stuff in a non-broken mode where the framebuffer console (which also doesn't work, as far as I can tell) is disabled by default. Finding how to debug should not be an additional shooting-in-the-dark debugging process.

Revision history for this message
guitousson (jean-gui) wrote :

Hi,

I have got the same problem on my gutsy server install.
I have tried the patches the previous post described:

- disabling of hwclock in the /etc/rc0.d and in the rc6.d directories by removing the links.
- stripping the execution permission on the /lib/udev/set_hwclock file by chmod...
- reboot the server

And I have still the same problem when launching the command:

sudo xm create -c /etc/xen/xen1.cfg

Is there another solution to fix this problem? Thanks in advance for your help.

Cheers,

Revision history for this message
delerious010 (delerious010) wrote :

The above comments seem to have resolved my issues booting up Gutsy DomUs.

Previous problems :
- At no time can I login to console.
- After xm-create-image the DomU responds to ssh.
- After xm-shutdown/create the DomU does not respond to ssh.

Fresh install of Gutsy with vmlinuz-2.6.22-14-xen :
* Test on the DomU :
-- rm /etc/nologin
-- cat xvc0 >> /etc/securetty
-- sed -ie 's@tty1@xvc0@g' /etc/event.d/tty1 && mv /etc/event.d/tty1 /etc/event.d/xvc0
-- chmod -x /lib/udev/set_hwclock

Revision history for this message
Henrik Riomar (henrik-riomar) wrote :

Same problem here
# xm top
Shows 100% cpu load on the DomU

DomU Console in Gutsy with 2.6.22-14-xen:
  * Setting preliminary keymap... [ OK ]
  * Setting the system clock

The problem occurs with 2.6.22-13-xen & 2.6.22-14-xen, kernel 2.6.22-12-xen however boots.

DomU Console in Gutsy with 2.6.22-12-xen:
 * Setting preliminary keymap... [ OK ]
 * Setting the system clock
 * Unable to set System Clock to: Mon Nov 26 19:38:44 UTC 2007
 * Starting basic networking... [ OK ]
 * Starting kernel event manager... [ OK ]
 * Loading hardware drivers... [ OK ]
... continues
Ubuntu 7.10 myhost xvc0

myhost login:

Revision history for this message
Herman Bos (hbos) wrote :

I don't know if this will help since it sounds a bit different:

We had some Ubuntu booting problems before (run manual build xen on centos). We fixed this by building a new initrd in the domU and copy that initrd to dom0 and use it in the config file instead of the supplied one.

you can build a new initrd with the following command:

example: `mkinitramfs -o /boot/initrd-2.6.18-xen-feisty.img 2.6.18-xen`

you don't have to start the domu, you can also build it with a chroot to the filesystem.

maybe its worth a try.

Revision history for this message
psychothirteen (psychothirteen) wrote :

And again, same problem here!

After installing Ubuntu Desktop 7.10 amd64 I directly installed Xen 3.1 from the repositories. Creating a guest via xen-create-image works fine but the domU hangs when trying to mount / mounting the file system. After adding the "extra = 'xencons=tty1' " line though it boots but the guest never manages to find any network connection. It always fails (Debian and Ubuntu) when booting. "ifconfig" doesn't even list loop-back...
I don't know if everybody has this bug since there are some tutorials concerning Ubuntu 7.10 and Xen (e.g. howtoforge.com) which apparently seem to work. Is this bug hardware dependent? Is there a definite fix for this? Do the developers know about this bug?

Would it be possible to set up a Debian installation (4.0) and use a xen-kernel 2.6.22-14 for example to see if it happens when using a different OS (Debian for example), too? Debian only provides an older kernel (2.6.18) which makes it impossible to me to use it - doesn't recognize my network-card so I can't test it...

Revision history for this message
Todd Deshane (deshantm) wrote :

according to: http://www.howtoforge.com/ubuntu-7.10-server-install-xen-from-ubuntu-repositories-p2
The --ide option is required for the xen-create-image in order for it to boot... Can somebody test/confirm that?

I will follow through with the howtoforge suggestions as soon as I get a chance. Thanks for pointing those out. Usually they figure out the little issues and then post step by step instructions that work.

Revision history for this message
Todd Deshane (deshantm) wrote :

I tried the howtoforge instructions.

I noticed the --ide and also that they are using a hard-coded IP address.

Even with those changes. I don't have networking...

I am also running into this bug again:
https://bugs.launchpad.net/ubuntu/+source/xen-3.1/+bug/150805

The work around seemed to work once but doesn't seem to work consistently (i.e. not after a reboot)

So the network not available after reboot that many people have mentioned is related.

The bottom line is that I still need to figure out quite a few problems, before getting to a booting system. Then more tricks need to be pulled to try to get networking.

I will try to look into it more later.

Revision history for this message
Todd Deshane (deshantm) wrote :

I have a working system currently... More testing is needed, but you can see Bug # 150805 for how i got to a working state. Hopefully it helps somebody.

https://bugs.launchpad.net/ubuntu/+source/xen-3.1/+bug/150805

Revision history for this message
delerious010 (delerious010) wrote :

I don't think the --ide is required.
On my working system, I've got it on the default sda interface.
As far as networking is concerned, I've have no issues with a static MAC assignment and DHCP.

Difference with our systems though, is that I don't have NetworkManager. I think it only comes with the Gnome Desktop ubuntu / not the server build ?

Revision history for this message
Paul Waldo (pwaldo) wrote :

I have had great luck following these items from a Xen mailing list poster. In the (gutsy) domU,

-- cat xvc0 >> /etc/securetty
-- sed -ie 's@tty1@xvc0@g' /etc/event.d/tty1 && mv /etc/event.d/tty1 /etc/event.d/xvc0
-- chmod -x /lib/udev/set_hwclock

FYI, I have never used --ide.

HTH!

Revision history for this message
Paul Waldo (pwaldo) wrote :

Sorry, I had a senior moment. You won't get much use out of
cat xvc0 >> /etc/securetty
Try this, instead:
echo xvc0 >> /etc/securetty

Revision history for this message
karlbowden (karlbowden) wrote :

Is anybody still having trouble with this?
The only two files I needed to change were:

append to > /etc/xen-tools/xm.tmpl
extra = ' xencons=tty console=tty1'

append to > /usr/lib/xen-tools/gutsy.d/15-disable-hwclock
rm -f ${prefix}/etc/init.d/hwclock.sh ${prefix}/etc/init.d/hwclockfirst.sh ${prefix}/etc/udev/rules.d/85-hwclock.rules
chmod -x ${prefix}/lib/udev/set_hwclock

For xm.tmpl you can either add that line so that xen uses tty's again or you can edit all the tty1 references to be xvc0 in the domU filesystem.

(Btw, damn the hours of heartache for finding the three lines to add)

-Karl

Revision history for this message
Henrik Riomar (henrik-riomar) wrote :

Thanks Karl!

That fixed it for me.

Revision history for this message
Will Saxon (saxonww) wrote :

I just wanted to comment since I have been researching this for a couple of hours myself. Once I added

extra='xencons=tty'

to my domain.cfg files, I was able to use the console to monitor the bootup. I saw an immediate error about not being able to set the hardware clock, but the boot continued. I used "update-rc.d hwclock remove; update-rc.d hwclockfirst remove" to remove the clock set from the bootup and now I do not have an issue with that. I use ntp to set the clock anyway.

The reason I was trying to work with the console in the first place was due to https://bugs.launchpad.net/ubuntu/+source/xen-3.1/+bug/150805, which also has a workaround.

Note: while the xenconsole workaround was necessary, no hwclock or udev/network changes were necessary for my gutsy dom0/dapper domU. This entire process was only an issue with a gutsy dom0/domU setup.

Revision history for this message
Dustin Essington (dustin-essington) wrote :

i used both removing execution of the hwclock script as well as extra='xencons=tty'. My domU's are all booting fine now.

Revision history for this message
Todd Deshane (deshantm) wrote :

This problem is showing up in Xen 3.2, but the workarounds don't seem to work.

I even tried adding the modules as suggested in bug #199533
https://bugs.launchpad.net/ubuntu/+source/xen-3.2/+bug/199533

Revision history for this message
Christophe Painchaud (dash-ionblast) wrote :

Also please all note that the VM is not crashed/freezed, it's just that VM console is not working (so network services and programs are still running fine, try to log with SSH and you will see)

To fix the problem, in the dimU config file, add/replace a line with :

extra = "4 xencons=tty"

It's been working for everyone around me.

Revision history for this message
Daniel T Chen (crimsun) wrote :

Is this symptom also reproducible using Xen 3.3 (i.e., is the workaround still required)?

Revision history for this message
Lionel Porcheron (lionel.porcheron) wrote :

Yes it is still required but: considering that xen-create-image (from xen tools) now create the correct configuration that does not have to be tweaked by hand (i.e. the workaround does not have to be applied by hand) and that we do not ship dom0 anymore in intrepid, this bug is probabily a Won't Fix

Revision history for this message
Christian Kujau (christiank) wrote :

As if this bugreport is not long enough already, I want share my findings here: my DomU (Debian/sid) would stop booting and not be reachable via ping/ssh:

[ 0.515574] EXT3-fs: mounted filesystem with ordered data mode.
[ 0.515597] VFS: Mounted root (ext3 filesystem) readonly on device 202:1.
[ 0.515621] Freeing unused kernel memory: 308k freed
[ 0.564694] Warning: unable to open an initial console.

I think Stephen was right in https://bugs.launchpad.net/ubuntu/+source/xen-3.1/+bug/144631/comments/20 when he proposed that this bug has nothing to do with the console at all - the "Warning:" just happens to be the last visible (error)message. Booting really seems to stop and I was able to "solve" it by modifying a completely different parameter: first I had:

  root = '/dev/xvda1 ro'

which is similar to Todd's config. I changed it to:

  root = '/dev/xvda1'

And even though the "Warning:" is still there, the system boots just fine. I have no local Xen console (but that's another bug or PEBKAC) and I have *no* extra/xencons/console settings in my .cfg now. And even when I had, the machine would not boot until I changed the root parameter.

HTH,
Christian.

Revision history for this message
Alvin Cura (alvinc) wrote :

I am not seeing this problem on Hardy 8.04.3. Can anyone confirm/deny?

Revision history for this message
Axel Beckert (xtaran) wrote :

Since according to several comments this seems to happen independent of xen-create-image being used or not, I'm marking this invalid for xen-tools. Seems to be a problem with Xen in general. (Please tell me if you think I'm wrong, but also tell me why. :-)

Changed in xen-tools (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
rusivi2 (rusivi2-deactivatedaccount) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. The issue that you reported should be reproducible with the live environment of the Desktop CD development release - Maverick Meerkat. It would help us greatly if you could test with it so we can work on getting it fixed in the next release of Ubuntu. You can find out more about the development release at http://www.ubuntu.com/testing/. Thanks again and we appreciate your help.

Revision history for this message
Thomas Hotz (thotz-deactivatedaccount) wrote :

Your Ubuntu version is EOL so please try to reproduce the error with a supported Ubuntu version! Thank you!

Changed in xen-3.2 (Ubuntu):
status: New → Incomplete
Changed in xen-3.1 (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xen-3.1 (Ubuntu) because there has been no activity for 60 days.]

Changed in xen-3.1 (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xen-3.2 (Ubuntu) because there has been no activity for 60 days.]

Changed in xen-3.2 (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.