dmesg time wildly incorrect on paravirtual EC2 instances.

Bug #1349883 reported by Joe Terranova on 2014-07-29
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Stefan Bader

Bug Description

On Ubuntu 14.04 EC2 instances, during startup the dmesg time goes from 0 seconds to thousands of seconds. This causes kernel logs to be recorded for months in the future.

This appears to be related to the kernel trusting the TSC clock on Xen servers, as described here:
https://forums.aws.amazon.com/thread.jspa?threadID=59753

But I am not sure that is the cause.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-32-generic 3.13.0-32.57
ProcVersionSignature: Ubuntu 3.13.0-32.57-generic 3.13.11.4
Uname: Linux 3.13.0-32-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 29 14:41 seq
 crw-rw---- 1 root audio 116, 33 Jul 29 14:41 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg: [7562292.301737] init: plymouth-upstart-bridge main process ended, respawning
Date: Tue Jul 29 14:47:54 2014
Ec2AMI: ami-00dc1368
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-919dcaf8
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci:

Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-32-generic N/A
 linux-backports-modules-3.13.0-32-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 29 14:41 seq
 crw-rw---- 1 root audio 116, 33 Jul 29 14:41 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: [7562292.301737] init: plymouth-upstart-bridge main process ended, respawning
DistroRelease: Ubuntu 14.04
Ec2AMI: ami-00dc1368
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-919dcaf8
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory
Lspci:

Lsusb: Error: [Errno 2] No such file or directory
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
ProcVersionSignature: Ubuntu 3.13.0-32.57-generic 3.13.11.4
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-32-generic N/A
 linux-backports-modules-3.13.0-32-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty ec2-images
Uname: Linux 3.13.0-32-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm admin ssh sudo
WifiSyslog:

_MarkForUpload: True

Joe Terranova (joeterranova) wrote :
Joe Terranova (joeterranova) wrote :

Appears to be related to bug #727459 , but I've added a new bug per request.

summary: - dmesg time wildly incorrect
+ dmesg time wildly incorrect on paravirtual EC2 instances.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1349883

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Stefan Bader (smb) on 2014-07-29
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (smb)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Stefan Bader (smb) wrote :

Writing up my findings so far:
- this looks to be a m1.small instance (1 vcpu, 1.7GB memory)
- this instance type is a PV guest

So sched_clock (which is used by printk) will be using paravirt_sched_clock which is set to xen_clocksource_read. The time info is kept in a per-cpu variable (vcpu_info). This is initially part of the hypervisor shared info but get changed later.

start_kernel()
  ...
  setup_per_cpu_areas();
  /* printks from above are ok */
  smp_prepare_boot_cpu();
    xen_smp_prepare_boot_cpu();
      xen_setup_vcpu_info_placement();
      /* vcpu_info changed */
      xen_init_spinlocks();
      /* printks from above call jumped in time */

So it looks like for some yet unknown reason, registering the vcpu_info with the hypervisor is messing up the clocktime data.

@Joe, was this always happening (with the same instance type maybe) or about how often. I am trying to get a feeling for whether it is some race or maybe related to xen hypervisor version. Also will try to play around with this on a local Xen host (but rather tomorrow).

Joe Terranova (joeterranova) wrote :

You're correct on your findings. m1.small, paravirtual guest.

I'm in the process of migrating my company's servers from 12.04 to 14.04. It was during this migration that I realized I was seeing syslog-ng logs for October 2014 or later (ie months into the future) on the new 14.04 instances, and I've been seeing it intermittently when launching new instances. Not always, but on some launches, or on some reboots. I think it's either some some sort of race condition, or related to what Amazon hardware the instance is actually spun up on.

Stefan Bader (smb) wrote :

Somehow the Xen side (of moving the vcpu info containing the vcpu time info from the shared struct to the per-cpu area through the register hypercall) has not changed much between 12.04 and 14.04. The vcpu_clock has though. That involves some slightly different memory barriers. What has not changed is using a rdtsc instruction together with the timestamp from the time info to get a time delta. Which could again point to some TSC issues. But then, this always seems to get triggered exactly at the time of moving the vcpu info. So maybe some race related to the slightly different code of accessing the data.
But nothing that easily comes to my mind.

Stefan Bader (smb) wrote :

When inspecting the vcpu info on a local PV guest and a EC2 guest provided by Joe, I found that:

 time = {
    version = 33456,
    pad0 = 0,
    tsc_timestamp = 86813831140545,
    system_time = 37805107510066,
    tsc_to_system_mul = 3743284562,
    tsc_shift = -1 '\377',
    flags = 0 '\000',
    pad = "\000"
}

So system_time in both cases seems to be the uptime in nanoseconds of the Xen host. My machine was up for about 10hrs. In the PV guest the dmesg timestamps however are somehow corrected to be in agreement of the guests uptime.

time = {
    version = 19934,
    pad0 = 0,
    tsc_timestamp = 8365090813767560,
    system_time = 4647273524058147,
    tsc_to_system_mul = 2386093295,
    tsc_shift = 0 '\000',
    flags = 0 '\000',
    pad = "\000"
}

The structure on the ec2 instance also seems to contain the uptime of the Xen host (which is around 53 days). Though here, the timestamps in dmesg and the value of system_time are the same. So the guest suddenly got the uptime of the Xen host.

The only way this makes sense is that the delta (which is calculated from (rdtsc - tsc_timestamp) with the shift and mul values should normally be offset host to guest. Not sure how yet. Oh and one piece of information from the irc conversation: apparently this may not happen on first boot but on reboot, but once it happened it will always happen. So far only observed with Intel CPUs.

Joe Terranova (joeterranova) wrote :

To clarify, I'm pretty sure whether this happens depends on the Amazon hardware the instance is being run on.

My warning was that, while the instance was currently running on hardware that causes the problem, if it was stopped and then started later, it might not end up on the same hardware, and therefore the bug might not happen that launch.

If I have the problem on a running instance, and just reboot it, though, I've always had it again -- I assume Amazon doesn't move the instance if you're just rebooting it.

So if an instance is encountering the problem, it'll continue happening no matter how many times you reboot it. But if you shut it down, it might not happen when it's started next.

Stefan Bader (smb) wrote :

I had a dream... well honestly this idea came to me around then. So I was fishing in the same availability zone to get another case of this (was not sure I could just go back to yours). Then installed a 12.04 (3.2.x) kernel and bingo, same problem.
So at least I can stop trying to think of a way the changes between 3.2 and 3.13 may have caused this. And the reason I added the comment about the observation that it may not happen initially but then on a reboot and after that on every reboot was this: I have the feeling that the uptime of the host also is one part of the equation. So this seems:

- Intel CPU, the nonstop_tsc of the E5-2650 might be something to look at
- A certain uptime of the host
- Maybe the tsc/systime related code in the hypervisor

Well, it could also be independent of the CPU and only rely on the uptime and it is just pure bad luck that the only hosts we currently find are of a certain type. Only way to prove or bust this would be to find a unaffected guest showing an uptime of those 58 days. Figuring out the involvement of the hypervisor code is nearly impossible as Amazon uses a special version of 3.4.3.

Stefan Bader (smb) wrote :

Of course, after stating that I will look for a instance that is not affected, all of which I start are on the same cpu type and affected. :/ From further research it sounds like Xen 3.4.3 has a default of running guests in native tsc mode while Xen 4.0 and later have a default of native if the tsc is stable and emulated otherwise. I played around with the local host but could get no difference even if forcing native tsc mode. One thing I noticed then was that while on dom0 I have nonstop_tsc in the /proc/cpuinfo, it does not show up in my PV guest. Somehow we see it on AWS. Locally I am using Xen 4.4 though.

Stefan Bader (smb) wrote :

Right now, I am at a point where I think I need someone from the other side (Amazon). The only thing that is sure is that for some reason it looks like the vcpu time info misses in those cases we see, the correction offset to adapt the system uptime (host) to the uptime of the guest. But without implementation details of the host code it is hard to say anything more.

Steve Conklin (sconklin) wrote :

For what it's worth, we've seen this in an AWS instance. It is rare, and probably not reproduceable in a test environment/

Steve Conklin (sconklin) wrote :

I was wrong about this being rare, this is apparently happening quite often in most of our m1.small instances

Joe Terranova (joeterranova) wrote :

I've started a thread on the AWS forums about this.

https://forums.aws.amazon.com/thread.jspa?threadID=157777

If I don't get a response I'll poke our account rep.

Stefan Bader (smb) wrote :

Got feedback from the Amazon dev I was asking. Sounds like they got it on their plate. Have not heard any details though. But they got a link to this report and might jump in.

Joe Terranova (joeterranova) wrote :

For reference, this was Amazon's response on the forum post:
https://forums.aws.amazon.com/message.jspa?messageID=561728#561728

Basically "we are not changing, move to current generation instances."

Is there some sort of assistance I should request there, or through Amazon support. Has anyone from AWS gotten back to you?

Thor Nolen (tnolen) wrote :

Have we been able to confirm if this is not a reintroduction of or relation to the issue that was resolved in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/727459?

Stefan Bader (smb) wrote :

On the contrary I would say we can be quite sure this is *not* a re-introduction of the TSC problem. As I wrote in the other bug report, this time we do not have unusual high process times. In other words the time stays increments normally after boot. It is just this one jump at boot. And the jump is caused by the guest using the hosts uptime in the kernel. Visible effect is just the dmesg times being in the far(er) future.
And the jump happens exactly at that time when the guest starts to use a time structure provided by the host. Normally that contains the hosts uptime but also a correcting offset which is supposed to adjust the time value to a zero value at the guests boot. Since I cannot reproduce this with my Xen guests (Xen 4.1 or later) this does indicate to me that maybe EC2 hosts toolstack have a bug there. I think I only saw the jump with EC2 Xen 3.x but since I don't know how to enforce landing on a certain Xen version in EC2 it would be really tedious for me to compare.

Matt Wilson (msw-amazon) wrote :

Hi Stefan,

I looked at this a long time back (circa 2011), and things may have changed since then. See: https://forums.aws.amazon.com/thread.jspa?threadID=59753

When I looked at this last, we weren't emulating TSC and the CPUID flags that advertise invariant TSC came through. This was making the scheduler clock "stable" and printk() would use native_sched_clock() for x86, i.e. rdtscll().

I don't know how this is involved with setting up the pvclock timesource...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers