TSC is not reliable under Xen on some Intel CPUs

Bug #727459 reported by Doug Mitchell
78
This bug affects 11 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Lucid
Won't Fix
Medium
Unassigned
Trusty
Triaged
Undecided
Unassigned

Bug Description

SRU Justification

Impact: When applying the Xen patchset one endif seems to have gone to the wrong place. So instead of not setting sched_clock_stable it is done all the time.

Fix: Move endif so the whole code segment is covered.

Testcase: Can cause various effects and depends on CPU (see below). Incorrect CPU time but likely also causing the unexplainable process gets stuck or clone/fork issues.

---

Ubuntu 10.04.2 LTS
Ubuntu 2.6.32-312.24-ec2 2.6.32.27+drm33.12
Linux 2.6.32-312-ec2 #24-Ubuntu SMP Fri Jan 7 18:30:50 UTC 2011 x86_64 GNU/Linux

We are experiencing impossibly high (thousands of days) accumulated CPU times for processes in ps and top when using the above kernel on Amazon EC2.

Instances running on Intel E5430 are fine, while instances on E5507 all seem to have the problem.
Please see also https://bugzilla.kernel.org/show_bug.cgi?id=16314

Thanks,
Doug

summary: - TSC is not reliable undex Xen on some Intel CPUs
+ TSC is not reliable under Xen on some Intel CPUs
Revision history for this message
Doug Mitchell (doug-heroku) wrote :

Here is a patch we are testing now.

Thanks,
Doug

Revision history for this message
Andy Whitcroft (apw) wrote :

@Doug -- do we know if this is still an issue with Natty instances in -ec2? I would like to confirm this is at least fixed in the next release. If you are able to test and confirm that would be very helpful. Nominating for Lucid as your test kernels are versioned there.

Changed in linux (Ubuntu Lucid):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Changed in linux (Ubuntu Lucid):
status: New → Triaged
Revision history for this message
Stefan Bader (smb) wrote :

I would rather suspect that this is no problem after 10.04 as Xen code is much more integrated into the common kernel code. As for the patch. I sort of was hitting this while going over a big set of changes that we are missing and I think the way it currently is was a bug while porting the initial patchset. To me it looks like the endif should have always been afer setting the variable. I will reference this bug report in my changes, so it gets updated when I am finally done with the whole update.

Changed in linux (Ubuntu Lucid):
milestone: none → lucid-updates
Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Stefan Bader (smb)
description: updated
Brad Figg (brad-figg)
Changed in linux (Ubuntu Trusty):
status: New → Triaged
Revision history for this message
Joe Terranova (joeterranova) wrote :

Having the same issue on Trusty

3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

This is occurring for me on some EC2 instances. Not continuously, but sporadically when launching new instances.

I've attached a sample of the dmesg log showing the time jump.

Revision history for this message
Joe Terranova (joeterranova) wrote :

Full dmesg log as requested

Revision history for this message
Joe Terranova (joeterranova) wrote :

Per request, I've updated to the latest kernel, 3.13.0-32-generic, and still have the issue. Updated dmesg attached.

Revision history for this message
Stefan Bader (smb) wrote :

@Joe, please open a new bug report for this. I am not sure that in your case there will be the same unreasonable high process times in top. And even if, the kernel version for EC2 in Precise had very special Xen code. So it is unlikely the same bug re-surfacing. I believe "ubuntu-bug linux" should work from the ec2 instance. If not you can report into a file with "apport-cli --save <filename> linux" and the submit it from a ubuntu desktop with "apport-cli <filename>".

Revision history for this message
Joe Terranova (joeterranova) wrote :

Per your request, created bug #1349883

Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Lucid):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.