Hardy CPU physical hot plugging is broken

Bug #248509 reported by Tim Gardner on 2008-07-14
4
Affects Status Importance Assigned to Milestone
linux (Fedora)
Fix Released
Medium
linux (Ubuntu)
Undecided
Unassigned
Hardy
Undecided
Unassigned
Intrepid
Undecided
Unassigned

Bug Description

In the mainline kernel there are some deadlock issues when hot removing a
processor. This issues were discussed in detail at:

https://bugzilla.redhat.com/show_bug.cgi?id=448588

These 7 commits fix CPU hotplug:

ba62b077871a5255e271f4fdae57167651839277 - acpi: fix "buggy BIOS check" when CPUs are hot removed
63d38198a0f57dca87e6cb79931c7bedbb7ab069 - x86: fix paranoia about using BIOS quickboot mechanism.
2f67a0695dc389247c05041b05d2a2b06fc102a3 - flush kacpi_notify_wq before removing notify handler
087803d18fb8259cb844c075a35fb27c2d80792e - fix a deadlock issue when poking "eject" file
3d5ed99657e93cd0453a187c478e663e6b6a3a8b - force offline the processor during hot-removal
89d675d0f987534139d330eb2689ec53fab9404e - create sysfs link from acpi device to sysdev for cpu
ad7f0d9feee6980a3ab3ea806854f56817d1da8e - ACPI: fix checkpatch.pl complaints in scan.c

Description of problem:

There have been a series of patches committed to the mainline kernel
recently that address a performance issue for gettimeofday when running
on hypervisors that enable hardware assisted virtualization. The
non-ideal performance occurs because a CPUID instruction is used to
serialize the pipeline before RDTSC, and when using hardware
virtualization, CPUID always exits to the hypervisor.

The code in question also exists in the RHEL 5.2 64-bit kernel (see
get_cycles_sync in include/asm-x86_64/timex.h).

The fix is to use MFENCE/LFENCE instead of CPUID. Here are links to
relevant patches by Andi Kleen which are now in git:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=de4218634e3df6d73a3e6cdfdf3a17fa3bc7e013
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=707fa8ed923b1b6a3d7af0d386b0b3abad28ed19
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fde1b3fa947c2512e3715962ebb1d3a6a9b9bb7d
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6d63de8dbcda98511206897562ecfcdacf18f523
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06e4ec1c15691b0cfd2397ae32214fa36c90d71

Would you be able to make a similar change to the RHEL 5.2 kernel, to
address this issue ?

Thanks,
Alok

Tim Gardner (timg-tpi) wrote :

The previous set of commits was bogus. This is the real set:

commit ba62b077871a5255e271f4fdae57167651839277
Author: Alok Kataria <email address hidden>
Date: Tue Apr 8 17:41:56 2008 -0700

    acpi: fix "buggy BIOS check" when CPUs are hot removed

commit 63d38198a0f57dca87e6cb79931c7bedbb7ab069
Author: Alok Kataria <email address hidden>
Date: Mon Apr 7 11:38:33 2008 -0700

    x86: fix paranoia about using BIOS quickboot mechanism.

commit 2f67a0695dc389247c05041b05d2a2b06fc102a3
Author: Zhang Rui <email address hidden>
Date: Tue Apr 29 02:34:42 2008 -0400

    flush kacpi_notify_wq before removing notify handler

commit 26d46867b7d27f68a446b073dac7817721ae4c8f
Author: Zhang Rui <email address hidden>
Date: Tue Apr 29 02:35:48 2008 -0400

    fix a deadlock issue when poking "eject" file

commit b62b8ef906cdf7115af579ce7378886ce3e0ce00
Author: Zhang Rui <email address hidden>
Date: Tue Apr 29 02:35:56 2008 -0400

    force offline the processor during hot-removal

commit 9f1eb99c757939b0b1783f926130993e9c298bae
Author: Zhang Rui <email address hidden>
Date: Tue Apr 29 02:36:07 2008 -0400

    create sysfs link from acpi device to sysdev for cpu

Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux:
status: New → Fix Committed

(In reply to comment #0)

We're also seeing this problem - a little test script (doing 90,000,000
gettimeofday() calls) takes two minutes on our RH VMs... and thirty seconds on
my desktop.

Applying that kernel patch to the RH kernel isn't trivial - there have been a
*lot* of changes since 2.6.18 was cut. I'm giving it a go here, but I'm not
really a kernel coder.

Changed in linux:
status: Unknown → Confirmed
Tim Gardner (timg-tpi) on 2008-08-19
Changed in linux:
status: New → Invalid

Currently this appears to be an issue with 64 bit architectures and the implementation behind the syscalls within the xen kernel in the VM and the way they all exit the vm to the hypervisor. Is this related to not taking advantage of segmentation protection?

#include <stdio.h>

int main(int argc,char **argv) {

    int i;
    for ( i = 0 ; i < 90000000 ; i++ ) {
           gettimeofday();

     }

}

32 bit VM:
time ./time

real 1m42.460s
user 0m8.565s
sys 1m33.834s

64-bit VM:
time ./time_64

real 6m3.259s
user 0m26.750s
sys 5m36.501s

Not 100% sure on a gettimeofday why it requires to exit out of the VM completely? Is it not possible to take advantage of the generic 'vsyscalls' implementation which is in the 2.6.x branches?

*** Bug 460983 has been marked as a duplicate of this bug. ***

Well, I think there are actually two things at play in this BZ. The original request is to shift from using CPUID to serialize gettimeofday to using MFENCE/LFENCE for serializing. This should be faster under *full* virtualization, so a backport to RHEL-5 might be desirable.

Comment #3, however, talks about something different. In particular, it's talking about the Xen kernel, which does indeed have the vsyscall stuff off in 64-bit. I'm not entirely sure why; I don't think segmentation protection should have anything to do with it. In a quick test, I turned it on, and your little benchmark there went from 2m17s on this box to about 30s. I'm going to make a new BZ about that issue, since it doesn't really belong here.

Chris Lalancette

Changed in linux:
status: Confirmed → In Progress
Martin Pitt (pitti) wrote :

linux 2.6.24-21 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released

(In reply to comment #4)
> Well, I think there are actually two things at play in this BZ. The original
> request is to shift from using CPUID to serialize gettimeofday to using
> MFENCE/LFENCE for serializing. This should be faster under *full*
> virtualization, so a backport to RHEL-5 might be desirable.
>

<ping>
Has anybody been working on these patches, do we have a ETA as to which release can have this fix ?

> Comment #3, however, talks about something different. In particular, it's
> talking about the Xen kernel, which does indeed have the vsyscall stuff off in
> 64-bit. I'm not entirely sure why; I don't think segmentation protection
> should have anything to do with it. In a quick test, I turned it on, and your
> little benchmark there went from 2m17s on this box to about 30s. I'm going to
> make a new BZ about that issue, since it doesn't really belong here.

Can you please cc me on this BZ.

Thanks,
Alok

There is no ETA for these, but the earliest we can evaluate it for would be RHEL 5.4 at this point.

[changed the component category to "kernel", this is a generic kernel problem, only that the performance impact would be more for kernel running under hypervisors.]

I have cooked up some patches which use the mfence/lfence instead of cpuid.
Please have a look and let me know if you have any comments. Will upload them shortly.

Created attachment 324275
x86: Implement support to synchronize RDTSC with LFENCE on Intel CPUs

Created attachment 324276
x86: implement support to synchronize RDTSC through MFENCE on AMD CPUs

Created attachment 324277
x86: introduce rdtsc_barrier()

*** Bug 468459 has been marked as a duplicate of this bug. ***

I took a quick look at these patches, and they look entirely reasonable. The only question I have concerns the set_bit(X86_FEATURE_{L,M}FENCE_RDTSC, &c->x86_capability); don't we have to protect that by first checking if sse2 is enabled? Upstream that's done with the "cpu_has_xmm2" check, but since RHEL-5 doesn't have that, we'd have to do something a little more primitive. Or am I missing something?

Chris Lalancette

(In reply to comment #20)
> I took a quick look at these patches, and they look entirely reasonable. The
> only question I have concerns the set_bit(X86_FEATURE_{L,M}FENCE_RDTSC,
> &c->x86_capability); don't we have to protect that by first checking if sse2 is
> enabled?

These barrier changes are done only for 64bit code. All 64bit machines have SSE2 enabled, atleast thats what include/asm-x86_64/cpufeature.h says

#define cpu_has_xmm2 1

So i don't think we need the xmm2 check for RHEL5 since the 32 and 64bit code is still separate.

Thanks,
Alok

  Upstream that's done with the "cpu_has_xmm2" check, but since RHEL-5
> doesn't have that, we'd have to do something a little more primitive. Or am I
> missing something?
>
> Chris Lalancette

Ah, of course, silly me. Upstream has the combined 32/64 bit code, which is why it needs the protection. OK, great, thanks a lot!

Chris Lalancette

I've uploaded a test kernel that contains this fix (along with several others)
to this location:

http://people.redhat.com/clalance/virttest

Could the original reporter try out the test kernels there, and report back if
it fixes the problem?

Thanks,
Chris Lalancette

(In reply to comment #23)
> I've uploaded a test kernel that contains this fix (along with several others)
> to this location:
>
> http://people.redhat.com/clalance/virttest
>
> Could the original reporter try out the test kernels there, and report back if
> it fixes the problem?
>

Yep this kernel does fix the performance problems for me, thanks for picking up the patches.

Alok

Great, thanks for the testing!

Chris Lalancette

Updating PM score.

Changed in linux:
status: In Progress → Unknown
Changed in linux:
status: Unknown → Fix Committed

The fix for this is included in the latest RHEL5.4 beta kernels, available at:

http://people.redhat.com/dzickus/el5

Alok (or anyone else hitting this bug), can you please test this kernel when possible? Thanks!

~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

RHEL5.4 looks okay WRT these patches too. Thanks.

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Changed in linux (Fedora):
status: Fix Committed → Fix Released
Changed in linux (Fedora):
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.