Xen

Xen kernel hangs randomly

Bug #147862 reported by John Leach
8
Affects Status Importance Assigned to Milestone
Xen
Confirmed
Critical
xen-3.1 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: linux-source-2.6.22

I have 3 Xen machines that are hanging randomly. They completely freeze up - nothing on the console either (no panic messages or anything) but I sometimes see a clocksource message in logs (caught only with remote syslogging). This is occurring within 5 seconds of the hang (clustering software reports lost machines within 5 seconds):

xen01 kernel: clocksource/0: Time went backwards: delta=-6164329055846119597 shadow=9515037682421 offset=888776

The hang doesn't appear to be linked to CPU, RAM or disk activity, as I graph that and haven't noticed anything abnormal around the time of the hangs. In fact, one machine has no xen guests running and no services in use in dom0 and still hung.

I *can* reproduce the hang consistently by generating network traffic though using iperf - usually within 20 or 30 minutes of sustained traffic. Again, often with the clocksource message before hang. I've not yet reproduced it using burnP6 or userspace memtest.

This is with linux-image-2.6.22-12-xen 2.6.22-12.37. I cannot reproduce this with linux-image-2.6.22-12-server 2.6.22-12.39 on the same hardware, just when within a hypervisor.

I still have the problem after replacing the hypervisor with the official 32bit PAE and 64bit Xensource versions.

This kernel is running on a Feisty system, with backported Xen 3.1 packages from Gutsy.

The hardware is all PowerEdge 1950, Intel. dom0 is given 2048M (of 16G) and 1 cpu (of 8 possible cores) and therefore has switched to UP mode.

Network cards vary between machines. Some machines have entirely "Intel Corporation 82571EB Gigabit Ethernet Controller", some entirely "Broadcom Corporation NetXtreme II BCM5708".

Bug #146924 is possibly related.

Revision history for this message
John Leach (johnleach) wrote :

I have not been able to reproduce this on the same hardware with the official Xensource 64bit hypervisor and the Gutsy AMD64 architecture Xen kernel.

Though likely unimportant - the 32bit xen userspace tools don't work for me with the 64bit kernel, so it was running in SMP mode with 8 cpus.

Revision history for this message
John Leach (johnleach) wrote :

I have not been able to reproduce this on the same hardware with the Gutsy hypervisor and a vanilla i386 2.6.23-rc8 kernel from kernel.org.

Again, xen userspace isn't working, so SMP mode with 8 cpus.

Revision history for this message
John Leach (johnleach) wrote :

I rebuilt the xen kernel using the i386 server package config (plus the XEN bits from the xen custom binary configs) and can still reproduce the hang.

The server config has a number of notable differences, such as no preemption, deadline scheduler etc. So this rules out those things at least.

Any thoughts on what else I could try?

Revision history for this message
Bart Verwilst (verwilst) wrote :

Isn't this bug the same as #146924?
This problem seems to hit at least a few users, but no solution in sight...

Revision history for this message
Bart Verwilst (verwilst) wrote :

so what you are saying is, that if you keep the kernel ( 2.6.22-14-xen for example ), but replace the xen-utils with the official ones, everything works fine?
( I want the kernel because of apparmor and stuffs :) )

Revision history for this message
Bryan York (bryan-york) wrote :

This is the same as bug # 146924, however that bug doesn't fully describe the issue. I'm seeing this as well, and the machine will randomly hang, even when a DomU is not running. This is a showstopper bug. Any suggestions with anyone here? I can't downgrade xen-utils with the stock repositories...

I made a bug report at xensource's bugzilla:
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1103

Revision history for this message
Luis Henriques (henrix) wrote :

I am not sure if this is the same bug here but my Dell D620 hangs when I am using 2.6.22-14-xen. I boots OK but it will hang after a while (I am able to use the system normally during some minutes).

Last time I was using the xen tools to create a host and noted that a kernel Oops occurred (see attach). After a while, the system hang! Note that I was running only the host system.

Changed in ubuntu-xen:
status: Unknown → Confirmed
Revision history for this message
Daniel T Chen (crimsun) wrote :

Is this symptom still reproducible in 8.10 RC or later?

Changed in xen-3.1:
status: New → Incomplete
Revision history for this message
Andreas Moog (ampelbein) wrote : Closing the report.

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in xen-3.1 (Ubuntu):
status: Incomplete → Invalid
Changed in ubuntu-xen:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.