xen

"Time went backwards" + freeze for domU's with kernel 2.6.22-xen

Bug #146924 reported by John Leach
48
Affects Status Importance Assigned to Milestone
xen
Confirmed
High
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
Declined for Gutsy by Henrik Nilsen Omma
linux-source-2.6.22 (Ubuntu)
Won't Fix
Low
Unassigned
Declined for Gutsy by Henrik Nilsen Omma

Bug Description

Binary package hint: linux-source-2.6.22

Suspending or migrating a guest (migrating uses suspend) causes the clock to stop in the guest. It doesn't happen on *every* suspend or migrate - I've migrated machines successfully without this, but most of the time.

This is with linux-image-2.6.22-12-xen installed on Feisty and Xen 3.1 rebuilt for Feisty.

Confirmed by a 3rd party here: http://lists.xensource.com/archives/html/xen-users/2007-09/msg00615.html

Kernel ring buffer on the affected guest has the following or similar, and continues until a reboot of the guest:

netfront: device eth0 has copying receive path.
netfront: device eth1 has copying receive path.
printk: 359 messages suppressed.
clocksource/0: Time went backwards: delta=-5812811482752 shadow=435019569165 offset=93003582
printk: 210 messages suppressed.
clocksource/0: Time went backwards: delta=-5807892372062 shadow=440018178962 offset=13527903
printk: 232 messages suppressed.
clocksource/0: Time went backwards: delta=-5802895482933 shadow=445021304103 offset=7267091
printk: 207 messages suppressed.
clocksource/0: Time went backwards: delta=-5797823483340 shadow=450018502714 offset=82066273
printk: 347 messages suppressed.
clocksource/0: Time went backwards: delta=-5792831504160 shadow=455019778734 offset=72802936
etc.etc.etc.

John Leach (johnleach)
Changed in linux-source-2.6.22:
status: New → Confirmed
Revision history for this message
Martin Emrich (emme) wrote :

I had this here, too, but on the Dom0. I also see these messages. Over the weekend, the server went back to 1955, giving me a negative uptime :)

...
Apr 24 15:54:16 beelzebot kernel: printk: 11078013 messages suppressed.
Apr 24 15:54:16 beelzebot kernel: clocksource/0: Time went backwards: delta=-6934818239616680928 shadow=169557016039123 offset=870673
Apr 24 15:54:16 beelzebot kernel: printk: 11096888 messages suppressed.
Apr 24 15:54:16 beelzebot kernel: clocksource/0: Time went backwards: delta=-6934818234615914696 shadow=169562016045408 offset=1630992
Apr 24 15:54:16 beelzebot kernel: printk: 11082088 messages suppressed.
Apr 24 15:54:16 beelzebot kernel: clocksource/0: Time went backwards: delta=-6934818229615135381 shadow=169567016051452 offset=2403808
Apr 24 15:54:16 beelzebot kernel: printk: 11089239 messages suppressed.
Apr 24 15:54:16 beelzebot kernel: clocksource/0: Time went backwards: delta=-6934818224614334593 shadow=169572016008428 offset=3247489
Apr 24 15:54:16 beelzebot kernel: printk: 11076688 messages suppressed.
Apr 24 15:54:16 beelzebot kernel: clocksource/0: Time went backwards: delta=-6934818219617529015 shadow=169577016014706 offset=46909
Apr 24 15:54:16 beelzebot kernel: printk: 11082349 messages suppressed.
...

(note that this is not Apr 24 2007 :).

I have already disabled EIST in the BIOS.

Ciao

Martin

Revision history for this message
John Leach (johnleach) wrote :

Actually, I managed it in dom0 today too. I ran a cpu,disk,mem and network stress test on dom0 and received a big delta clocksource change after 20-30 mins, then the machine hung. The system time was correct when i started the test, so my ntp service didn't cause this.

I've been seeing this complete hang a couple of times in the last week on machines or two but only managed to catch this clocksource message by enabling remote syslogging as it never seems to get written to disk. I've seen it on a machine with no guests running and no load whatsoever too.

No panic messages either btw. Checked on the console too.

xen01 kernel: clocksource/0: Time went backwards: delta=-6164329055846119597 shadow=9515037682421 offset=888776

All these on Dell PowerEdge 1950 machines (not tested on other hardware).

I repeated it just now with only a network stress test (iperf). I got the following from the remote syslog then the machine hung again.

Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217525387920 shadow=711006713466 offset=4381314
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217525346621 shadow=711006713466 offset=4422132
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217525314134 shadow=711006713466 offset=4454408
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217525288215 shadow=711006713466 offset=4480460
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217524058472 shadow=711006713466 offset=5710035
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217524007717 shadow=711006713466 offset=5760868
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217523985997 shadow=711006713466 offset=5782540
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217523967085 shadow=711006713466 offset=5801481
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217522807718 shadow=711006713466 offset=6960795
Oct 1 11:44:56 xen01 kernel: clocksource/0: Time went backwards: delta=-6164321217522784542 shadow=711006713466 offset=6983995

Revision history for this message
Martin Emrich (emme) wrote :

I am now testing if it still happens if I set independend_wallclock=1 for Dom0 and all DomUs...

Revision history for this message
John Leach (johnleach) wrote :

By the sound of it Martin, your dom0 doesn't hang like mine. I'm going to file a separate bug about the hang as I wonder if it might be a separate issue.

I can trigger the hang in dom0 myself by generating network traffic - using the iperf tool.

Revision history for this message
Martin Emrich (emme) wrote :

Hmm, this is your bug, I would be the one to file a new bug.

I saw many reports of timekeeping problems on the xen mailing lists, many people were just told they have "cheap hardware", but many of them had good stuff by Dell, HP etc.

Most of them fixed their problems by setting independent_wallclock=1 in all Doms and using ntp everywhere, and this is what I did. Until now, my server works fine, but I'll keep an eye on it.

Ciao

Martin

Revision history for this message
Bart Verwilst (verwilst) wrote :

This morning, i noticed my domU wasn't responding anymore.. I checked with console, and noticed this as output:

[80182.004456] clocksource/0: Time went backwards: delta=-6917292717540055641 shadow=80182004054805 offset=400471
[80182.004507] clocksource/0: Time went backwards: delta=-6917292717540005396 shadow=80182004054805 offset=450712
[80182.004762] clocksource/0: Time went backwards: delta=-6917292717539750360 shadow=80182004054805 offset=705812
[80182.004842] clocksource/0: Time went backwards: delta=-6917292717539670405 shadow=80182004054805 offset=785744
[80182.004918] clocksource/0: Time went backwards: delta=-6917292717539594298 shadow=80182004054805 offset=861851

I could not do anything anymore to reach the server, and xm reboot'ing the domU caused all further xm commands to hang. ( ctrl-c'ing them still worked though )

dom0 is ubuntu gutsy, domU is a centos5 ( but both using 2.6.22-13-xen though )

Revision history for this message
Martin Emrich (emme) wrote :

Today, it happened again, despite independent_wallclock being set to 1 in the Dom0. I'm upgrading to 2.6.22-13 now...

Revision history for this message
Bart Verwilst (verwilst) wrote :

Well, i'm already using 2.6.22-13-xen... :(

Revision history for this message
Bart Verwilst (verwilst) wrote :

BTW this doesn't only happen while suspending, my domU had been running for several days without intervention.

Revision history for this message
Martin Emrich (emme) wrote :

Yep, I don't do suspending either, there are 2 DomUs running CentOS 4.5 (in HVM), they lost only about 6 hours.
Another weird thing is that the management console of the Areca RAID controller hangs, too (but this might be caused by the unusual negative uptime).

I found this:
http://www.koders.com/noncode/fidA1F753E28F04949A49D4A66F06D7E2A550E35F0A.aspx
, but its already older. I wonder if it is integrated in Ubuntu and if it could affect our problem.

Ciao

Martin

Revision history for this message
Bart Verwilst (verwilst) wrote :

Would it be possible to clear out and fix this bug in the official kernel before the release? Or at least get it as a fix soon thereafter? I'll be using gutsy with xen in production enviroments soon, and this is definitely a blocker for me :) Thanks! I'm willing to do whatever it takes to help fix this bug!

Revision history for this message
Martin Emrich (emme) wrote :

Bart, John: What hardware do you have? Maybe we have something in common...

We have a Core 2 Quad Q6700 on a SuperMicro PDSME+ Board (Intel 3010 Chipset) and an Areca 1120 RAID Controller.

Revision history for this message
Bart Verwilst (verwilst) wrote :

Just had one of my domU's hang again, with an ever increasing offset..

Dual Xeon 3070 here, Intel ServerBoard S3000AHLX, no raid controller..

I really hope to find a solution for this fast...

Revision history for this message
Martin Emrich (emme) wrote :

I had a nice chat in ##xen yesterday, an they recommended turning off both ACPI and HPET in the BIOS, and I did so. Let's see if it works now (the bug appears only after a few days of uptime).

If this is not possible, maybe one can turn it off via hypervisor or kernel command line.

Ciao

Martin

Revision history for this message
Bart Verwilst (verwilst) wrote :

Essentially this is just a copy of a mail i sent to xen-users:

Well, i've found this link earlier: http://lists.xensource.com/archives/html/xen-devel/2006-03/msg00442.html , but it seems the code has changed quite a bit since then..
Big parts of the patch seem integrated already though..
For example:

Current patch, in use in the running kernel:

+ if ((blocked > 0) && (delta_cpu > 0)) {
+ delta_cpu -= blocked;
+ if (unlikely(delta_cpu < 0))
+ blocked += delta_cpu; /* clamp local-time progress */
+ do_div(blocked, NS_PER_TICK);
+ per_cpu(processed_blocked_time, cpu) += blocked * NS_PER_TICK;
+ per_cpu(processed_system_time, cpu) += blocked * NS_PER_TICK;

Patch on the link provided above:

        if (stolen > 0) {
                delta_cpu -= stolen;
+ if (unlikely(delta_cpu < 0)) {
+ stolen += delta_cpu;
+ delta_cpu = blocked = 0;
+ }
                do_div(stolen, NS_PER_TICK);
                per_cpu(processed_stolen_time, cpu) += stolen * NS_PER_TICK;
                per_cpu(processed_system_time, cpu) += stolen * NS_PER_TICK;

As you can see, quite different.. "delta_cpu = blocked = 0;" is missing, but might be handled with the rest of the changes in the new code..

Another possibility is that this only applies for i386 (i'm running on x86_64.. )..
I've looked into the patch, and it seems to only patch file arch/i386/kernel/time-xen.c .
There isn't a x86_64 equivalent.. ( arch/x86_64/kernel/time-xen.c doesn't exist ) Is this normal? Could this be the cause?

Revision history for this message
Bart Verwilst (verwilst) wrote :

A person on the xen-users list hinted about ntpd. I noticed it wasn't running in the domU that hangs, so i enabled it there, hoping to see some improvement.
Anyone else running/not running ntpd here on domO and the domU's?

Thanks!

Revision history for this message
Martin Emrich (emme) wrote :

Sadly, turning of ACPI & HPET did not work. And using noapic on the command line makes the Areca RAID controller driver fail.

I don't think ntpd can help me here, as it is designed to correct small time drifts, update daylight savings etc. Here, the time jumps back to 1955, and all further access to the RTC is blocked by the kernel.

Ciao

Martin

Revision history for this message
Bart Verwilst (verwilst) wrote :

And there it went again... GRMBL!!

[52968.000183] clocksource/0: Time went backwards: delta=-6917295094804713137 shadow=52968000043427 offset=136671
[52968.000305] clocksource/0: Time went backwards: delta=-6917295094804588934 shadow=52968000043427 offset=260563
[52968.000367] clocksource/0: Time went backwards: delta=-6917295094804526915 shadow=52968000043427 offset=322560
[52968.000424] clocksource/0: Time went backwards: delta=-6917295094804470423 shadow=52968000043427 offset=379055
[52968.000480] clocksource/0: Time went backwards: delta=-6917295094804413766 shadow=52968000043427 offset=435720
[52968.000536] clocksource/0: Time went backwards: delta=-6917295094804357638 shadow=52968000043427 offset=491848
[52968.000592] clocksource/0: Time went backwards: delta=-6917295094804301540 shadow=52968000043427 offset=547946
[52968.000830] clocksource/0: Time went backwards: delta=-6917295094804064372 shadow=52968000043427 offset=785148
[52968.000891] clocksource/0: Time went backwards: delta=-6917295094804003241 shadow=52968000043427 offset=846275
[52968.000948] clocksource/0: Time went backwards: delta=-6917295094803946487 shadow=52968000043427 offset=902991

only xm destroy works to get the domU back on track..

Xen 3.1.1 is released, maybe that will help? I have the feeling we shouldn't get our hopes up for having this fixes through this bugreport, echo'ing in /dev/null will have more effect i'm afraid :)

Revision history for this message
Bart Verwilst (verwilst) wrote :

Seems like this is the same bug: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=195

For the record:

xm dmesg | grep -i "platform timer"
(XEN) Platform timer is 1.193MHz PIT

Revision history for this message
Bart Verwilst (verwilst) wrote :

Frankly i'm out of ideas. I've compiled several kernels with several different patches, non of them work. I've been spending 3 full days on this problem, and i'm fed up.

Revision history for this message
Bart Verwilst (verwilst) wrote :

Changed title because it doesn't just happen while suspending..

Revision history for this message
Martin Emrich (emme) wrote :

Yep, I played around with different settings, too. Enabling/disabling ACPI or HPET did not make a difference. And using "noapic" breaks the arcmsr driver for our RAID controller. The box went fine for a few days now, but there's already one of these "time went backwards" messages in the syslog, so i'm expecting the lock up soon.

Revision history for this message
Martin Emrich (emme) wrote :

This morning, it happened again:

Oct 11 12:44:01 beelzebot kernel: [80251.276057] printk: 4 messages suppressed.
Oct 11 12:44:01 beelzebot kernel: [80251.276062] clocksource/0: Time went backwards: delta=-999999319 shadow=80251276002078 offset=59391
Oct 12 15:48:10 beelzebot kernel: [177694.609813] Setting mem allocation to 4063232 kiB
Oct 12 15:48:10 beelzebot kernel: [177694.626366] Setting mem allocation to 4063232 kiB
Oct 13 06:16:29 beelzebot kernel: [229782.674001] clocksource/0: Time went backwards: delta=-6934879990527637305 shadow=229782673990236 offset=8063
May 8 01:39:03 beelzebot kernel: [229782.674007] clockso4605907
May 8 01:39:03 beelzebot kernel: [247527.670711] printk: 4 messages suppressed.
May 8 01:39:03 beelzebot kernel: [247527.670713] clocksource/0: Time went backwards: delta=-6934862245530924326 shadow=247526746101540 offset=924610392
May 8 01:39:03 beelzebot kernel: [247527.670715] printk: 4 messages suppressed.
May 8 01:39:03 beelzebot kernel: [247527.670717] clocksource/0: Time went backwards: delta=-6934862245530919819 shadow=247526746101540 offset=924614874
May 8 01:39:03 beelzebot kernel: [247527.670720] printk: 4 messages suppressed.
May 8 01:39:03 beelzebot kernel: [247527.670722] clocksource/0: Time went backwards: delta=-6934862245530915364 shadow=247526746101540 offset=924619332
May 8 01:39:03 beelzebot kernel: [247527.670724] printk: 4 messages suppressed.
.
.

shutting down on of the DomUs reliably crashes the whole box :(

Revision history for this message
Bart Verwilst (verwilst) wrote :

Euh, i just might have found the something interesting..
This morning, i wanted to check my mails ( which run in a domU ). I noticed i couldn't get to it, "Unable to connect", as usual.. But i had cacti open, which could read snmp from the server just fine.
I logged in through SSH, and noticed that the date was set to "Fri Dec 22 22:49:54 CET 2226", and not going forward anymore. The clock had stopped.

In the meantime, /var/log/messages was spewing messages like this:

[134918.952935] clocksource/0: Time went backwards: delta=-6917269600674797899 shadow=134918109266738 offset=843667158
[134923.954592] printk: 1157408 messages suppressed.
[134923.954606] clocksource/0: Time went backwards: delta=-6917269595673126078 shadow=134923109294656 offset=845310074
[134928.952308] printk: 1154382 messages suppressed.
[134928.952323] clocksource/0: Time went backwards: delta=-6917269590675409336 shadow=134928109321066 offset=842999953

I decided to change something to see if it works, so i did:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource, which showed "xen" as usual.
I then did
echo "jiffies" > /sys/devices/system/clocksource/clocksource0/current_clocksource, which instantly made my clock run again, and stalled programs revived. I saw my webmail ( which was still hanging all the time ) spring to life, along with my previously entered reboot command..

Does this narrow down the search to the solution?

Kind regards,

Bart

Revision history for this message
Martin Emrich (emme) wrote :

Great, another bit of hope :)

I changed to "jiffies", too. Let's pray this helps me, too ...

Ciao

Martin

Revision history for this message
Takeshi Sone (ts1) wrote :

Thanks Bart, setting clocksource to jiffies fixed "Time went backwards" error in my domU!
However, domU clock still goes back if I reboot dom0 (domUs are saved and restored by /etc/init.d/xendomains script).
But this time, nothing freezes, no kernel messages, just 'date' command and timestamps in the log files show the wrong time.
How long the clock goes back is related to uptime of dom0 (or maybe hypervisor).
When I reboot dom0 when uptime is 15 minutes , and domU restores at 30 seconds of uptime of rebooted dom0, the clock inside domU is 14m30s behind.

So I tried setting independent_wallclock to 1 too.
And now the clock does not goes back anymore!
Instead, clock is suspended while domU is suspended.
This is far better than clock going back.
Running ntpd in domU fixes the clock.

Revision history for this message
austin (bang-a-rang) wrote :

I was also experiencing this problem. Is anyone here running gutsy as a guest as well as the host? Gutsy has been hanging on boot.. it dosnt spit out the clocksource errors but hangs on the syncing clock part. My dapper guest shows the clock source errors.. though the jiffies seem to have remedied this..

Revision history for this message
Bart Verwilst (verwilst) wrote :

I must say i switched to my own home-grown kernels last week.. Problem solved :)

Revision history for this message
Martin Emrich (emme) wrote :

I "fixed" :-/ this by not using the 2.6.22 but the 2.6.19-4 kernel. This is of course no solution, especially not for people having very new hardware. But I'll stay subscribed and will help testing if fixed packages are made available.

Ciao

Martin

Revision history for this message
tirili (tom-tiri) wrote :

I got the same Problem with Xen version 3.1.0 (buildd@buildd) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) Fri Oct 12 16:26:34 GMT 2007

Ubuntu Gutsy. xm dmesg shows

(XEN) System RAM: 2015MB (2063676kB)
(XEN) Xen heap: 13MB (14228kB)
(XEN) Domain heap initialised: DMA width 32 bits
(XEN) Processor #0 15:3 APIC version 16
(XEN) Processor #1 15:3 APIC version 16
(XEN) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 3117.260 MHz processor.
(XEN) HVM: SVM enabled
(XEN) CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ stepping 03
(XEN) Mapping cpu 0 to node 255
(XEN) Booting processor 1/1 eip 90000
(XEN) Mapping cpu 1 to node 255
(XEN) AMD: Disabling C1 Clock Ramping Node #0
(XEN) CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ stepping 03
(XEN) Total of 2 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method
(XEN) Platform timer is 25.000MHz HPET
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***

Christian Reis (kiko)
Changed in xen:
importance: Undecided → Unknown
status: New → Unknown
Changed in xen:
status: Unknown → Confirmed
Revision history for this message
DeeKey (privateinf) wrote :

Same problem for me! :(

DomU's are not affected, but Dom0 starts to loose performance: It tak a lot of time to open application (even console), later the TCP/IP connection get lost.

The problem starts after I start HVM guest.

Revision history for this message
maxou (m-pierron) wrote :

I have a similar problem on opensuse 10.3/xen3.1 (up to date) running on dell PE860 (mono xeon 3230 quad core, 8GB ram, lsilogic raid0 sas 15K), my 2.6.22.12 xen X64 dom0 kernel freeze under parallel iperf loopback heavy load (with or without eth0/1 tg3 module loaded, and with or without xend running). It takes about 15min to freeze. This bug does not appear in non xen kernel. I really don't know what to do...

Revision history for this message
Martin Emrich (emme) wrote :

If your hardware permits it, and opensuse still has the 2.6.18/2.6.19 xen kernel, try it. Since I went from 2.6.22.14 to 2.6.19-4 on ubuntu, all works well.

Revision history for this message
Henrik Nilsen Omma (henrik) wrote :

The Hardy Heron kernel was recently uploaded for testing. We'd really appreciate it if you could try testing with this newer kernel and verify if this issue still exists. Unfortunately, the Hardy Heron Alpha1 LiveCD was released with the older 2.6.22 kernel. You'll have to manually install the newer Hardy Heron kernel in order to test. This should not be the case for Alpha2. However, here are the instructions to install (if you choose to do so):

1) edit the file /etc/apt/sources.list and add the following line:

deb http://archive.ubuntu.com/ubuntu hardy main restricted

2) sudo apt-get update
3) sudo apt-get install linux-image-2.6.24-1-generic
4) reboot and select the new kernel from the grub menu

After you've tested, please feel free to revert back - ie boot into the old kernel, sudo apt-get remove linux-image-2.6.24-1-generic, and remove the line from /etc/apt/sources.list . Please update this report with your results. Thanks in advance!

Changed in linux-source-2.6.22:
importance: Undecided → Low
Revision history for this message
Martin Emrich (emme) wrote :

Henrik, will this also work for a Dom0, or is the -generic kernel only for DomUs? If it can be used as a Dom0, I would set up a box and test it...

Ciao

Martin

Revision history for this message
Chuck Short (zulcss) wrote :

Actually this isnt fixed for hardy because there is no dom0 kernel for hardy yet.

Revision history for this message
Brian Murray (brian-murray) wrote :

I am assigning this bug to the 'ubuntu-kernel-team' per their bug policy. For future reference you can learn more about their bug policy at https://wiki.ubuntu.com/KernelTeamBugPolicies .

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
Tom De Clercq (g-launchpad-tomsworld-be) wrote :

> Actually this isnt fixed for hardy because there is no dom0 kernel for hardy yet.

Is there already some planning, when there will be a kernel available ?
I would to configure some servers but hardware is to new to use an 2.6.19 kernel.

Tom

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

Hi Guys,

I've been doing a LOT of work on Ubuntu XEN over the last couple of weeks and now have two live stacks which are staying up and doing a lot of work.
(hopefully)

I've found there are three fundamental things you need to make clocks reliable and not have the system pause under load, these are;

a. in the Dom0's grub menu.lst, add "dom0_mem=xxx" (where xxx is for example 512 for 512M)
b. add "xen.independent_wallclock=1" to /etc/sysctl.conf in each DomU
c. add "clock=jiffies" in the DomU's .cnf file in the "extra" section

This makes a huge difference to overall usability.

hth
Gareth.

Revision history for this message
Bart Verwilst (verwilst) wrote :

Myeah, but that clock=jiffies thing still makes me feel uneasy when running it in production.. I will stay with official xensource kernels until they update the kernel they build at ( 2.6.18 atm ), which will hopefully be soon because it would be welcome by now :)
For domU kernels, i am waiting for paravirt_ops to be included in the mainline x86_64 kernel.

Revision history for this message
Bjoern Koch (h.humpel) wrote :

Same problem here :/.
I am just trying to install my first DomU (following http://www.howtoforge.com/ubuntu-7.10-server-install-xen-from-ubuntu-repositories) and when calling "xen-create-image ... blah blah" the screen is flooded with the "Time went backwards" messages while debootstrap.
No usasble DomU image gets installed and so I can't start and edit anything within the DomU.
Furtheremore these messages do show up (plenty of them) in the log of the Dom0 (at least when calling "dmesg -c" it always re-appears).

No other workaround known yet ?

Revision history for this message
Bjoern Koch (h.humpel) wrote :

Btw.: last entry in the xen-tools logs for the DomU while debootstrap:
I: Configuring mktemp...

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

I have a solution I'm employing on 26 Xen instances.

It's called "use the Redhat 2.6.21 kernel which works!"

The Ubuntu Xen kernel has SO MANY problems it really isn't even worth looking at.

Revision history for this message
Bjoern Koch (h.humpel) wrote :

Just to be sure: you are using the Redhat kernel with ubuntu and ubuntu doesn't bother about it (this is what I think).
Or are you trying to tell me to install a Redhat system ?

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

The former, Ubuntu with a RH kernel .. the only issue I see is unwanted warning messages when using rsync .. apparently there is a discrepancy between kernel features available on the Ubuntu and RH kernels that rsync doesn't handle very well .. and one unavailable sysctl entry which gives an entry on startup, which is easily corrected.

Revision history for this message
Jan Evert van Grootheest (j-e-van-grootheest) wrote :

Gareth, me too.
I just tried (on my domU to fool around with) the latest rawhide kernel-xen.
Runs just fine (for the last 30 or minutes and there's a lockdep warning early in dmesg -- so there's not much trust).

janevert@odo:~$ uname -r
2.6.25-0.18.rc8.fc9.x86_64.xen

Revision history for this message
Bjoern Koch (h.humpel) wrote :

Just to let you know: still same problem after upgrading to 8.04beta and 8.04.

Revision history for this message
James Blackwell (jblack) wrote :

I'm seeing this sort of problem with 2.6.24 dom0 and 2.6.22 domU, both of which are running hardy. Sometimes, multiple domUs fail, other times, just one domU. In the most recent failure, ntp was running on both dom0 and domU, which should disprove that ntp causes the problem.

Revision history for this message
Esa Häkkinen (syke2) wrote : Ubuntu Hardy changed clock source to 'jiffies' by default

It seems that on Ubuntu Hardy the clock source is 'jiffies' on default.

Ubuntu 8.04 64bit Server, vanilla installation.
Motherboard is Intel DG33TL with G33 chipset, CPU is E6550.
on BIOS, HPET is enabled.
installed ubuntu-xen-server and rebooted.

root@ubuntu:~# xm dmesg |grep timer
(XEN) Platform timer overflows in 14998 jiffies.
(XEN) Platform timer is 14.318MHz HPET

root@ubuntu:~# uname -a
Linux enigma 2.6.24-16-xen #1 SMP Thu Apr 10 14:35:03 UTC 2008 x86_64 GNU/Linux

Revision history for this message
Jan Evert van Grootheest (j-e-van-grootheest) wrote :

I'm seeing this both on dom0 and domU with ...-19.34.
As far as I can tell, there's no relation between seeing this on dom0 and domU.
However, there seem to be no side-effects.
(this is an amd64 install on both dom0 and domU and xen 3.2.1 from debian)

dom0:
Jun 26 14:02:06 quark kernel: [187781.880995] clocksource/0: Time went backwards: ret=ac186aa11cf0 delta=-36099370 shadow=ac183af22f78 offset=31d8212e

domu:
Jun 25 17:02:03 suzy2 kernel: [111734.752716] clocksource/0: Time went backwards: ret=6754298ec3ef delta=-64141945 shadow=675421edd381 offset=b7493a5

Revision history for this message
Jason Kendall (coolacid) wrote :

Confirmed on 2.6.24-17-xen on domUs, no issue on dom0 as far as I can see. Whats weird, I was running fine for a long time, however, it just started today after about a month of no issues. Could it be a clock sync issue where if the clocks are out by X it causes the time to stop?

Revision history for this message
Jason Kendall (coolacid) wrote :

I just thought as to why it "just started"

I had corrected a bug in the xendomains rc script that had two cuts - I'll have to do an upgrade and then file another bug report for that later.

I'm assuming that my domU's are now suspending on my reboots, and over time the clock went backwards on the suspended domU.

Revision history for this message
Christian Holtje (docwhat) wrote :

This is still a problem as of 2008/8/8:
 * Ubuntu 8.4.1 Hardy Heron
 * Xen 3.2.0
 * Dom0 & DomU kernel: 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64

The workaround for this is to change your setup to not to save/restore, but to instead reboot.
This is done by altering dom0:/etc/defaults/xendomains and changing the value of XENDOMAINS_SAVE to the empty string.

Ciao!

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
James Blackwell (jblack) wrote :

This bug seems to be fixed by moving the kernel and the ramdisk for domUs up to 2.6.24-17. Can anyone else verify this?

Revision history for this message
Christian Holtje (docwhat) wrote :

@James Blackwell: I can recreate it with a newer kernel. See my previous comment: 2.6.24-19-xen doesn't work. My domU and my dom0 are all running the same kernel. I don't use a ramdisk.

To recreate the problem, just do a save-pause-restore of the domU.

Ciao!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Jason Kendall (coolacid) wrote :

Leann;

This is a Xen issue - I was unable to find a Xen version of the linux-image-2.6.27- tree in hardy or Intrepid.

Regards,
Jay

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

So .. yes it's a Xen issue, but it's Ubuntu specific, i.e. Ubuntu's Xen kernel "doesn't work".

I'm running 30 VM's, all had the problem on various Ubuntu kernels.
I've not had the problem in 8 months since switching the kernel out for the Redhat/Xen kernel.

.. so if you think Xen are every going to worry about this, not least as Ubuntu chose another hypervisor as standard, I would think again.

I'm finding Xen 3.2 to be pretty damn reliable, I'd be interested to hear how the Ubuntu supported hypervisor performs ..
(which one did they go for, kvm ?)

Revision history for this message
Jason Kendall (coolacid) wrote :

@Gareth: I'm not following you here. Is this a statement to my note that there is no Xen pre-release version of 2.6.27 for Intrepid? or a general comment?

Leann requested (bulk I might add, as I got this same request for other Xen tickets I have open), that we test the new kernel for this issue - however, since there is no Xen kernel available, we can't test it yet.

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

I think it could be taken as a general comment prompted by another posting on this thread.
(which is heading for it's first anniversary)

IMHO Xen should be taken out of the Repo's (for now) , Ubuntu's version simply does not work.

People read the press and see "Xen is great, Xen is stable", then install Ubuntu Xen and see critical problems which look for all the world like the may be problems with Xen itself.
[incidentally, this is *not* or certainly has not been the only serious problem with Ubuntu Xen]

Xen does not need this sort of press .. and I'm sure Ubuntu can live without it too.

You're going to say "how will it get fixed if it's not available for people to try?" .. normally I'd agree, however given the speed at which it's being worked on, I really do think it should be taken aside until it can be made to work.

Note; I don't have any connection with Xen or Ubuntu other than being a heavy Xen / Ubuntu user.

I wasted soooo much time trying to make Ubuntu/Xen work after thinking I must be at fault being convinced nothing so buggy would ever get released.

At the *very* least it needs a big warning label saying "do not use in production environment due to known issues" !!

Revision history for this message
James Blackwell (jblack) wrote :

It sounds to me like this bug report is actually two different, but possibly related, bugs. The first bug is time arbitrarily going backwards on 2.6.22, which I think is fixed when using a 2.6.24 and later domU. The second bug is a problem with saving/restoring domUs.

Does anyone agree?

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

Not really.

a. There's a whole list of problems I've come across and have posted details of on different threads
b. As of 3 weeks ago on Hardy, it wasn't fixed (which was last time I tried it)

Revision history for this message
Bjoern Koch (h.humpel) wrote :

Any news on this one ?
Just found a statement on the internet that this is some kind of XEN bug when using multicore or multiple CPUs.
I am trying to run it on a dual P3 board... :/. And same problem on debian and CentOS!

Revision history for this message
Gareth Bult (gareth-encryptec) wrote :

It's a "mis"-implementation of the Xen kernel patches which typically are not available for the most recent kernels.

Use a RedHat Xen kernel, this is the only one I've found to work.
2.6.21 is the one I'm using, this is 100% reliable.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Pieter (diepes) wrote :

i found this searching for a similar problem on Debian with xen.

in my DOMU a lot of messages like this, not usable at all.
clocksource/0: Time went backwards:

Then i found a solution, change the clock source form xen to jiffies. Problem started after moving DOM0 from kernel 2.6.18 to 2.6.26

in the xen config file for DOMu add "clocksource=jiffies" , and the problem was gone.
extra= "xencons=tty clocksource=jiffies"

Revision history for this message
Bart Verwilst (verwilst) wrote :

I guess this can be closed since it's so old it's probably no longer valid?

Revision history for this message
Justin Alan Ryan (justizin) wrote :

Well, it's certainly valid and affects current production environment for us. We know the clearest solution is to use newer xen and kernels, and we are using the jiffies workaround, but it's not clear there isn't a better solution.

From what I've read, Jiffies are in fact obsolete and can be problematic, so this problem is certainly very old.

$0.02

Changed in xen:
importance: Unknown → High
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.