Bug #316187 “kernel 2.6.24 “Soft lockup” under load on VMWare ES...” : Bugs : linux package : Ubuntu

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#1

dmesg-1.txt Edit (23.8 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#2

dmesg-2.txt Edit (25.0 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#3

cpuinfo-1.txt Edit (2.2 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#4

cpuinfo-2.txt Edit (2.2 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#5

lspci-vvnn-1.txt Edit (4.2 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#6

lspci-vvnn-2.txt Edit (4.2 KiB, text/plain)

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-11:

#7

Notice that I have added the kernel parameter "clocksource=acpi_pm", as discussed in bug #261937 and at:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020

However, dmesg on both VM that crashed still show an error such as:

Clocksource tsc unstable (delta = 89051661614 ns)

Could it be related?

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-12:

#8

dmesg-1bis.txt Edit (39.1 KiB, text/plain)

I left the stress test running again on six VM overnight, and only one got the soft lockup. Attached is the dmesg output. I could probably provide more of these by re-running the stress test on a bunch of machine, but I think it might be redundant.

Colin Ian King (colin-king) on 2009-01-15

Changed in linux-meta:
assignee:	nobody → colin-king

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-19:

#9

Incidentally, how long does it take before the problem occurs? Orders of minutes, hours, days?

Colin

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-01-19:

#10

Generally within a matter of minutes, but sometime it would take longer.

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-26:

#11

Once we can reproduce this, I will see if upstream commit 126e01bf92dfc5f0ba91e88be02c473e1506d7d9 will help fix this.

Quoting the patch:

softlockup: fix NOHZ wakeup

David Miller reported:

|--------------->
the following commit:

    | commit 27ec4407790d075c325e1f4da0a19c56953cce23
    | Author: Ingo Molnar <email address hidden>
    | Date: Thu Feb 28 21:00:21 2008 +0100
    |
    | sched: make cpu_clock() globally synchronous
    |
    | Alexey Zaytsev reported (and bisected) that the introduction of
    | cpu_clock() in printk made the timestamps jump back and forth.
    |
    | Make cpu_clock() more reliable while still keeping it fast when it's
    | called frequently.
    |
    | Signed-off-by: Ingo Molnar <email address hidden>

    causes watchdog triggers when a cpu exits NOHZ state when it has been
    there for >= the soft lockup threshold, for example here are some
    messages from a 128 cpu Niagara2 box:

[ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
etc..

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-26:

#12

It appears that this kind of soft lockup kernel message easily reproduced with the following code on a 2.6.24-xxx hardy server kernel inside vmware server:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>

void setsched(void)
{
struct sched_param mysched;
mysched.sched_priority = 99;

        if( sched_setscheduler( 0, SCHED_FIFO, &mysched ) == -1 ) {
                fprintf(stderr,"Cannot set sched()\n");
                exit(0);
        }
}

int main(void)
{
setsched();

        for (;;) {
        }
        return (0);
}

Basically we get the program to hog a CPU and the kernel reports that we have a CPU hogging task with a soft lockup message.

I'm now investigating this for any fixes upstream.

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-28:

#13

Download full text (6.0 KiB)

Let's look at these error messages. The following:

[ 554.116802] BUG: soft lockup - CPU#1 stuck for 11s! [stress:4618]
[ 557.606750] CPU 1:
[ 557.607150] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 592.791355] Pid: 4618, comm: stress Not tainted 2.6.24-22-server #1
[ 592.797376] RIP: 0033:[<00000000004032db>] [<00000000004032db>]
[ 594.408565] RSP: 002b:00007fff2a2e0200 EFLAGS: 00000202
[ 594.408959] RAX: 000000002fcd9fdc RBX: 00007fff2a2e0210 RCX: 00007f3021e3c0d4
[ 594.409443] RDX: 00007f3021e3c0e4 RSI: 00007fff2a2e01e4 RDI: 00007f3021e3c4c0
[ 594.409880] RBP: ffffffff8020c907 R08: 00007f3021e3c084 R09: 00007f3021e3c100
[ 594.410314] R10: 0000000000000000 R11: 00007f3021b15e80 R12: 00007fff2a2e0210
[ 594.410738] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fff2a2e06a0
[ 594.411187] FS: 00007f30222d86e0(0000) GS:ffff81001f001700(0000) knlGS:0000000000000000
[ 594.411838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.412227] CR2: 00007fff2a216000 CR3: 000000001b9f7000 CR4: 00000000000006e0
[ 594.412663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 594.431287] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 594.464652]
[ 594.464656] Call Trace:

..with no call trace occurs because the user space stress process is hogging all the CPU and the watchdog eventually gets some processing time and thinks the CPU is locked up.

1. The stress process is indeed consuming all the available cycles on this CPU - that's what it was designed to test
2. The watch dog which should wake up about every second eventually gets run and falsely thinks there is a soft lock up.
3. There is NO call trace - this is because the watchdog detects the problem in user space - i.e. the stress process and not because of any kernel related bug causing the CPU to lock up.

And now look at the next message:

[ 594.465954] BUG: soft lockup - CPU#3 stuck for 11s! [stress:4623]
[ 594.467024] CPU 3:
[ 594.467416] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 594.467482] Pid: 4623, comm: stress Not tainted 2.6.24-22-server #1
[ 594.467539] RIP: 0010:[<ffffffff8027fcc9>] [<ffffffff8027fcc9>] handle_IRQ_event+0x19/0x70
[ 594.732355] RSP: 0018:ffff81001d15bf38 EFLAGS: 00000246
[ 594.732745] RAX: 0000000000004100 RBX: 0000000000000000 RCX: ffff81001d150000
[ 594.733180] RDX: 0000000000000012 RSI: ffff81001f489340 RDI: 0000000000000012
[ 594.734110] RBP: ffff81001d15beb0 R08: 000000000000...

Let's look at these error messages. The following:

[  554.116802] BUG: soft lockup - CPU#1 stuck for 11s! [stress:4618]
[  557.606750] CPU 1:
[  557.607150] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[  592.791355] Pid: 4618, comm: stress Not tainted 2.6.24-22-server #1
[  592.797376] RIP: 0033:[<00000000004032db>]  [<00000000004032db>]
[  594.408565] RSP: 002b:00007fff2a2e0200  EFLAGS: 00000202
[  594.408959] RAX: 000000002fcd9fdc RBX: 00007fff2a2e0210 RCX: 00007f3021e3c0d4
[  594.409443] RDX: 00007f3021e3c0e4 RSI: 00007fff2a2e01e4 RDI: 00007f3021e3c4c0
[  594.409880] RBP: ffffffff8020c907 R08: 00007f3021e3c084 R09: 00007f3021e3c100
[  594.410314] R10: 0000000000000000 R11: 00007f3021b15e80 R12: 00007fff2a2e0210
[  594.410738] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fff2a2e06a0
[  594.411187] FS:  00007f30222d86e0(0000) GS:ffff81001f001700(0000) knlGS:0000000000000000
[  594.411838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  594.412227] CR2: 00007fff2a216000 CR3: 000000001b9f7000 CR4: 00000000000006e0
[  594.412663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.431287] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  594.464652] 
[  594.464656] Call Trace:

..with no call trace occurs because the user space stress process is hogging all the CPU and the watchdog eventually gets some processing time and thinks the CPU is locked up.

1. The stress process is indeed consuming all the available cycles on this CPU - that's what it was designed to test
2. The watch dog which should wake up about every second eventually gets run and falsely thinks there is a soft lock up.
3. There is NO call trace - this is because the watchdog detects the problem in user space - i.e. the stress process and not because of any kernel related bug causing the CPU to lock up.

And now look at the next message:

[  594.465954] BUG: soft lockup - CPU#3 stuck for 11s! [stress:4623]
[  594.467024] CPU 3:
[  594.467416] Modules linked in: iptable_filter ip_tables x_tables ipv6 lp loop af_packet serio_raw intel_agp container evdev button shpchp parport_pc parport ac i2c_piix4 pci_hotplug i2c_core pcspkr psmouse ext3 jbd mbcache sd_mod sg sr_mod cdrom floppy pata_acpi e1000 mptspi mptscsih mptbase scsi_transport_spi ata_piix ata_generic libata scsi_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[  594.467482] Pid: 4623, comm: stress Not tainted 2.6.24-22-server #1
[  594.467539] RIP: 0010:[<ffffffff8027fcc9>]  [<ffffffff8027fcc9>] handle_IRQ_event+0x19/0x70
[  594.732355] RSP: 0018:ffff81001d15bf38  EFLAGS: 00000246
[  594.732745] RAX: 0000000000004100 RBX: 0000000000000000 RCX: ffff81001d150000
[  594.733180] RDX: 0000000000000012 RSI: ffff81001f489340 RDI: 0000000000000012
[  594.734110] RBP: ffff81001d15beb0 R08: 0000000000000000 R09: ffff81001f508058
[  594.734681] R10: ffff810001027880 R11: 0000000000000001 R12: ffffffff8020cfb6
[  594.735114] R13: 0000000000000012 R14: ffff81001f489340 R15: ffff81001d148000
[  594.735590] FS:  00007f30222d86e0(0000) GS:ffff81001f001d00(0000) knlGS:0000000000000000
[  594.735708] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  594.735708] CR2: 00007f3021e3c090 CR3: 000000001c55c000 CR4: 00000000000006e0
[  594.735708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.735708] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  594.735708] 
[  594.735708] Call Trace:
[  594.735708]  <IRQ>  [<ffffffff8028181a>] handle_fasteoi_irq+0x8a/0x110
[  597.205321]  [<ffffffff8020ef6b>] do_IRQ+0x7b/0x100
[  597.205735]  [<ffffffff8020c891>] ret_from_intr+0x0/0xa
[  597.206129]  <EOI>  [<ffffffff80471377>] _spin_lock+0x7/0x10
[  601.702669]  [<ffffffff802d60c9>] sync_inodes_sb+0x89/0xb0
[  601.702669]  [<ffffffff802d6169>] __sync_inodes+0x79/0xc0
[  602.683258]  [<ffffffff802d61c1>] sync_inodes+0x11/0x30
[  602.683646]  [<ffffffff802d9272>] do_sync+0x12/0x70
[  602.684015]  [<ffffffff802d92de>] sys_sync+0xe/0x20
[  603.138204]  [<ffffffff8020c3e8>] sysret_careful+0xd/0x10
[  603.238563]  [<ffffffff8020c37e>] system_call+0x7e/0x83
[  603.238990]

Again, the user space stress process is hogging up the processor (doing what it was designed to do). This type we get a kernel stack dump to show us where the watch dog thinks the offending hog is. However,  this again I believe is a false hit. It just so happens that this time the CPU was in kernel space when the watch dog gets some scheduling time rather than in user space (as the first case was).

I've analysed this with a 2.6.27 kernel and I don't see this occuring since the scheduler has been improved, hence the watch dog gets scheduled more evenly. Also, in 2.6.27 the watch dog kicks off a warning after ~60 seconds rather than around 10 seconds in 2.6.24. Hence I believe it's more likely to get scheduled in a 60 second CPU hogging slot than a 10 second slot on 2.6.24 (which also probably has a less fair scheduler).

My conclusion is as follows:

1. The CPU hogging process is doing what it was deigned to do - hog CPU cycles
2. The kernel watchdog does detect that a user space process is hogging the CPU and hence dumps out these kernel messages
3. Perhaps we can conclude that the watch dog is producing some false hits - but the stack dumps are not a critical error - just warning messages with some debug to provide context to the CPU hog

How do we progress?:

1) Now we either look at tweaking the scheduler to make things more fair and to reduce the softlock messages
2) Or we just note that the kernel is doing what it is supposed to do and we acknowledge that these are legitimate warnings that a CPU hogging process is indeed hogging the CPU and the kernel has identified this.

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-28:

#14

One other point to make is that the message BUG: soft lockup - CPU#1 stuck for 11s! [stress:4618] is rather alarmist - it makes one think the CPU is not being used and is stuck. In fact, if one uses the top command one can see that the CPU is 100% busy running the stress process and not stuck at all.

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-01-30:

#15

Etienne,

In trying to reproduce an OOM and loading issue using the stress test application,
one needs to consider how the OOM occurs and if can be fixed or helped by
tweaking the virtual memory settings.

Some suggested tweaks to try are as follows:

as root:

echo 2 > /proc/sys/vm/overcommit_memory

- this tells the kernel to use a "never overcommit" policy that attempts
to prevent any overcommit of memory.

echo 1 > /proc/sys/vm/oom_kill_allocating_task

- this makes the kernel kill the over commiting task rather than it's
best guess at the biggest overcommitting task

Colin

Matt Zimmerman (mdz) on 2009-02-09

Changed in linux-meta:
importance:	Undecided → High
status:	New → Incomplete

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-02-12:

#16

The bug have been marked Incomplete, but I am not sure what needs to be provided. A better test case?

AFAIC, I am entirely satisfied with Colin's explanation. I do think the spurious "soft lockup" message should be fixed, but it is not mine to decide.

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-03-23:

#17

@Etienne, any change on this bug? Should we mark is ask Won't Fix now?

Revision history for this message

Etienne Goyer (etienne-goyer-outlands) wrote on 2009-03-24:

#18

Colin, it is your call whether the spurious soft lockup warnings should be fixed or not. Personally, I am fine knowing they are harmless. Thanks a lot!

Revision history for this message

Colin Ian King (colin-king) wrote on 2009-03-24:

#19

Marking as "Won't Fix" as the error messages are soft lock up messages are coming from a misbehaving user space app and the watchdog is detecting this as a *potential* soft lockup issue.

Changed in linux:
status:	Incomplete → Won't Fix

Ubuntu
linux package

kernel 2.6.24 "Soft lockup" under load on VMWare ESX 3.5

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

kernel 2.6.24 "Soft lockup" under load on VMWare ESX 3.5

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package