kernel 2.6.24 "Soft lockup" under load on VMWare ESX 3.5

Bug #316187 reported by Etienne Goyer
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Colin Ian King

Bug Description

Under load, hardy kernel 2.6.24-23.46-server experience seemingly random soft lockups when run as a VM guest on VMWare ESX 3.5 update 2. The problem appears when multiple guest are put under load; I was stress-testing six guest, and two got the soft lockup problem described. The problem manifest itself seemingly at random.

The hypervisor is run on a Dell PowerEdge 2950 with 4 GB of RAM and two L5320 Xeon CPU.

The hardy guests are amd64. They are being put under load using the "stress" utility from the universe repository. The command line used to start the stress utility is:

    stress --cpu 8 --io 8 --vm 3 --vm-bytes 128M --hdd 2

The machine remain responsive after the soft lockup, and it is possible to kill the currently running stress processes by doing Ctrl-C and getting the shell back. Attached is the output of dmesg on the two VM (out of six) that had the problem, suffixed with -1 and -2 respectively.

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Notice that I have added the kernel parameter "clocksource=acpi_pm", as discussed in bug #261937 and at:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020

However, dmesg on both VM that crashed still show an error such as:

    Clocksource tsc unstable (delta = 89051661614 ns)

Could it be related?

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

I left the stress test running again on six VM overnight, and only one got the soft lockup. Attached is the dmesg output. I could probably provide more of these by re-running the stress test on a bunch of machine, but I think it might be redundant.

Changed in linux-meta:
assignee: nobody → colin-king
Revision history for this message
Colin Ian King (colin-king) wrote :

Incidentally, how long does it take before the problem occurs? Orders of minutes, hours, days?

Colin

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Generally within a matter of minutes, but sometime it would take longer.

Revision history for this message
Colin Ian King (colin-king) wrote :

Once we can reproduce this, I will see if upstream commit 126e01bf92dfc5f0ba91e88be02c473e1506d7d9 will help fix this.

Quoting the patch:

 softlockup: fix NOHZ wakeup

    David Miller reported:

    |--------------->
    the following commit:

    | commit 27ec4407790d075c325e1f4da0a19c56953cce23
    | Author: Ingo Molnar <email address hidden>
    | Date: Thu Feb 28 21:00:21 2008 +0100
    |
    | sched: make cpu_clock() globally synchronous
    |
    | Alexey Zaytsev reported (and bisected) that the introduction of
    | cpu_clock() in printk made the timestamps jump back and forth.
    |
    | Make cpu_clock() more reliable while still keeping it fast when it's
    | called frequently.
    |
    | Signed-off-by: Ingo Molnar <email address hidden>

    causes watchdog triggers when a cpu exits NOHZ state when it has been
    there for >= the soft lockup threshold, for example here are some
    messages from a 128 cpu Niagara2 box:

    [ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
    etc..

Revision history for this message
Colin Ian King (colin-king) wrote :

It appears that this kind of soft lockup kernel message easily reproduced with the following code on a 2.6.24-xxx hardy server kernel inside vmware server:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>

void setsched(void)
{
        struct sched_param mysched;
        mysched.sched_priority = 99;

        if( sched_setscheduler( 0, SCHED_FIFO, &mysched ) == -1 ) {
                fprintf(stderr,"Cannot set sched()\n");
                exit(0);
        }
}

int main(void)
{
        setsched();

        for (;;) {
        }
        return (0);
}

Basically we get the program to hog a CPU and the kernel reports that we have a CPU hogging task with a soft lockup message.

I'm now investigating this for any fixes upstream.

Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (6.0 KiB)

Let's look at these error messages. The following:

[ 554.116802] BUG: soft lockup - CPU#1 stuck for 11s! [stress:4618]
[ 557.606750] CPU 1:
[ 557.607150] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 592.791355] Pid: 4618, comm: stress Not tainted 2.6.24-22-server #1
[ 592.797376] RIP: 0033:[<00000000004032db>] [<00000000004032db>]
[ 594.408565] RSP: 002b:00007fff2a2e0200 EFLAGS: 00000202
[ 594.408959] RAX: 000000002fcd9fdc RBX: 00007fff2a2e0210 RCX: 00007f3021e3c0d4
[ 594.409443] RDX: 00007f3021e3c0e4 RSI: 00007fff2a2e01e4 RDI: 00007f3021e3c4c0
[ 594.409880] RBP: ffffffff8020c907 R08: 00007f3021e3c084 R09: 00007f3021e3c100
[ 594.410314] R10: 0000000000000000 R11: 00007f3021b15e80 R12: 00007fff2a2e0210
[ 594.410738] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fff2a2e06a0
[ 594.411187] FS: 00007f30222d86e0(0000) GS:ffff81001f001700(0000) knlGS:0000000000000000
[ 594.411838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.412227] CR2: 00007fff2a216000 CR3: 000000001b9f7000 CR4: 00000000000006e0
[ 594.412663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 594.431287] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 594.464652]
[ 594.464656] Call Trace:

..with no call trace occurs because the user space stress process is hogging all the CPU and the watchdog eventually gets some processing time and thinks the CPU is locked up.

1. The stress process is indeed consuming all the available cycles on this CPU - that's what it was designed to test
2. The watch dog which should wake up about every second eventually gets run and falsely thinks there is a soft lock up.
3. There is NO call trace - this is because the watchdog detects the problem in user space - i.e. the stress process and not because of any kernel related bug causing the CPU to lock up.

And now look at the next message:

[ 594.465954] BUG: soft lockup - CPU#3 stuck for 11s! [stress:4623]
[ 594.467024] CPU 3:
[ 594.467416] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 594.467482] Pid: 4623, comm: stress Not tainted 2.6.24-22-server #1
[ 594.467539] RIP: 0010:[<ffffffff8027fcc9>] [<ffffffff8027fcc9>] handle_IRQ_event+0x19/0x70
[ 594.732355] RSP: 0018:ffff81001d15bf38 EFLAGS: 00000246
[ 594.732745] RAX: 0000000000004100 RBX: 0000000000000000 RCX: ffff81001d150000
[ 594.733180] RDX: 0000000000000012 RSI: ffff81001f489340 RDI: 0000000000000012
[ 594.734110] RBP: ffff81001d15beb0 R08: 000000000000...

Read more...

Revision history for this message
Colin Ian King (colin-king) wrote :

One other point to make is that the message BUG: soft lockup - CPU#1 stuck for 11s! [stress:4618] is rather alarmist - it makes one think the CPU is not being used and is stuck. In fact, if one uses the top command one can see that the CPU is 100% busy running the stress process and not stuck at all.

Revision history for this message
Colin Ian King (colin-king) wrote :

Etienne,

In trying to reproduce an OOM and loading issue using the stress test application,
one needs to consider how the OOM occurs and if can be fixed or helped by
tweaking the virtual memory settings.

Some suggested tweaks to try are as follows:

as root:

echo 2 > /proc/sys/vm/overcommit_memory

- this tells the kernel to use a "never overcommit" policy that attempts
to prevent any overcommit of memory.

echo 1 > /proc/sys/vm/oom_kill_allocating_task

- this makes the kernel kill the over commiting task rather than it's
best guess at the biggest overcommitting task

Colin

Matt Zimmerman (mdz)
Changed in linux-meta:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

The bug have been marked Incomplete, but I am not sure what needs to be provided. A better test case?

AFAIC, I am entirely satisfied with Colin's explanation. I do think the spurious "soft lockup" message should be fixed, but it is not mine to decide.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Etienne, any change on this bug? Should we mark is ask Won't Fix now?

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Colin, it is your call whether the spurious soft lockup warnings should be fixed or not. Personally, I am fine knowing they are harmless. Thanks a lot!

Revision history for this message
Colin Ian King (colin-king) wrote :

Marking as "Won't Fix" as the error messages are soft lock up messages are coming from a misbehaving user space app and the watchdog is detecting this as a *potential* soft lockup issue.

Changed in linux:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.