BUG: soft lockup - CPU#1 stuck for 63s! [swapper:0]

Bug #538057 reported by Matej.Pastor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-ec2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hi.

When I boot some of ubuntu ami, I got a cpu soft lock in a log. Sometimes it occur afer instance boot, sometimes later. I try cca 3-4 images of ubutnu lucid daily build, and it occur in all.

Ami: ami-cfd738a6
Description: Ubuntu lucid (development branch)
Release: 10.04

dmesg:

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.32-303-ec2 (buildd@crested) (gcc version 4.4.3 (Ubuntu 4.4.3-3ubuntu1) ) #7-Ubuntu SMP Wed Mar 10 11:23:24 UTC 2010 (Ubuntu 2.6.32-303.7-ec2)
[ 0.000000] Command line: root=/dev/sda1 ro 4
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] Xen-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 00000001e0800000 (usable)
[ 0.000000] last_pfn = 0x1e0800 max_arch_pfn = 0x80000000
[ 0.000000] last_pfn = 0x100000 max_arch_pfn = 0x80000000
[ 0.000000] initial memory mapped : 0 - 00000000
[ 0.000000] init_memory_mapping: 0000000000000000-0000000100000000
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] 0000000000 - 0100000000 page 4k
[ 0.000000] kernel direct mapping tables up to 100000000 @ 1846000-204b000
[ 0.000000] init_memory_mapping: 0000000100000000-00000001e0800000
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] 0100000000 - 01e0800000 page 4k
[ 0.000000] kernel direct mapping tables up to 1e0800000 @ 204b000-2f58000
[ 0.000000] (4 early reservations) ==> bootmem [0000000000 - 01e0000000]
[ 0.000000] #0 [0000932000 - 0001846000] Xen provided ==> [0000932000 - 0001846000]
[ 0.000000] #1 [0000100000 - 0000911bb8] TEXT DATA BSS ==> [0000100000 - 0000911bb8]
[ 0.000000] #2 [0001846000 - 000204b000] PGTABLE ==> [0001846000 - 000204b000]
[ 0.000000] #3 [000204b000 - 0002753000] PGTABLE ==> [000204b000 - 0002753000]
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000000 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x001e0800
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000000 -> 0x001e0000
[ 0.000000] 0: 0x001e0800 -> 0x001e0800
[ 0.000000] On node 0 totalpages: 1966080
[ 0.000000] free_area_init_node: node 0, pgdat ffffffff8079fd80, node_mem_map ffff880002753000
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 4040 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 14280 pages used for memmap
[ 0.000000] DMA32 zone: 1030200 pages, LIFO batch:31
[ 0.000000] Normal zone: 12572 pages used for memmap
[ 0.000000] Normal zone: 904932 pages, LIFO batch:31
[ 0.000000] NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 19 pages/cpu @ffff88000100c000 s47320 r8192 d22312 u77824
[ 0.000000] pcpu-alloc: s47320 r8192 d22312 u77824 alloc=19*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1939172
[ 0.000000] Kernel command line: root=/dev/sda1 ro 4
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] allocated 78725120 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] Software IO TLB disabled
[ 0.000000] Memory: 7627448k/7872512k available (4798k kernel code, 8192k absent, 236376k reserved, 2062k data, 232k init)
[ 0.000000] SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] NR_IRQS:96
[ 0.000000] Xen reported: 2004.540 MHz processor.
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] console [tty-1] enabled
[ 0.840001] Calibrating delay using timer specific routine.. 4010.56 BogoMIPS (lpj=20052829)
[ 0.840034] Security Framework initialized
[ 0.840064] AppArmor: AppArmor initialized
[ 0.840076] Mount-cache hash table entries: 256
[ 0.840216] Initializing cgroup subsys ns
[ 0.840226] Initializing cgroup subsys cpuacct
[ 0.840234] Initializing cgroup subsys memory
[ 0.840244] Initializing cgroup subsys devices
[ 0.840250] Initializing cgroup subsys freezer
[ 0.840279] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.840287] CPU: L2 Cache: 1024K (64 bytes/line)
[ 0.840307] SMP alternatives: switching to UP code
[ 0.901908] Brought up 1 CPUs
[ 0.901922] CPU0 attaching NULL sched-domain.
[ 0.902039] devtmpfs: initialized
[ 0.902620] NET: Registered protocol family 16
[ 0.903552] SMP alternatives: switching to SMP code
[ 0.966136] Initializing CPU#1
[ 0.966156] CPU0 attaching NULL sched-domain.
[ 0.966136] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.966136] CPU: L2 Cache: 1024K (64 bytes/line)
[ 1.240041] CPU0 attaching sched-domain:
[ 1.240048] domain 0: span 0-1 level CPU
[ 1.240051] groups: 0 1
[ 1.240058] CPU1 attaching sched-domain:
[ 1.240061] domain 0: span 0-1 level CPU
[ 1.240063] groups: 1 0
[ 1.240120] Brought up 2 CPUs
[ 1.240144] PCI: Fatal: No config space access function found
[ 1.240150] PCI: setting up Xen PCI frontend stub
[ 1.240621] bio: create slab <bio-0> at 0
[ 1.240760] vgaarb: loaded
[ 1.240912] suspend: event channel 15
[ 1.240912] xen_mem: Initialising balloon driver.
[ 1.242580] PCI: System does not support PCI
[ 1.242589] PCI: System does not support PCI
[ 1.242692] NET: Registered protocol family 8
[ 1.242699] NET: Registered protocol family 20
[ 1.242715] NetLabel: Initializing
[ 1.242720] NetLabel: domain hash size = 128
[ 1.242725] NetLabel: protocols = UNLABELED CIPSOv4
[ 1.242745] NetLabel: unlabeled traffic allowed by default
[ 1.242754] Switching to clocksource xen
[ 1.244492] AppArmor: AppArmor Filesystem Enabled
[ 1.244691] NET: Registered protocol family 2
[ 1.244809] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 1.246798] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[ 1.246891] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[ 1.247224] TCP: Hash tables configured (established 262144 bind 65536)
[ 1.247235] TCP reno registered
[ 1.247364] NET: Registered protocol family 1
[ 1.247482] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 1.247656] audit: initializing netlink socket (disabled)
[ 1.247680] type=2000 audit(1268393958.998:1): initialized
[ 1.256781] VFS: Disk quotas dquot_6.5.2
[ 1.256847] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.257019] DLM (built Mar 10 2010 11:25:43) installed
[ 1.257460] JFS: nTxBlock = 8192, nTxLock = 65536
[ 1.261358] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ 1.261877] SGI XFS Quota Management subsystem
[ 1.262246] Slow work thread pool: Starting up
[ 1.262287] Slow work thread pool: Ready
[ 1.262313] GFS2 (built Mar 10 2010 11:26:46) installed
[ 1.262326] msgmni has been set to 15360
[ 1.262577] alg: No test for stdrng (krng)
[ 1.262596] io scheduler noop registered
[ 1.262601] io scheduler anticipatory registered
[ 1.262606] io scheduler deadline registered (default)
[ 1.262625] io scheduler cfq registered
[ 1.277686] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 1.278811] brd: module loaded
[ 1.279260] loop: module loaded
[ 1.282582] Xen virtual console successfully installed as tty1
[ 1.282653] Event-channel device installed.
[ 1.294314] netfront: Initialising virtual ethernet driver.
[ 1.297161] xen-vbd: registered block device major 8
[ 1.298279] sdb: unknown partition table
[ 1.315361] sdc: unknown partition table
[ 1.317207] PPP generic driver version 2.4.2
[ 1.318004] Equalizer2002: Simon Janes (<email address hidden>) and David S. Miller (<email address hidden>)
[ 1.318257] tun: Universal TUN/TAP device driver, 1.6
[ 1.318265] tun: (C) 1999-2004 Max Krasnyansky <email address hidden>
[ 1.319153] i8042.c: No controller found.
[ 1.319229] mice: PS/2 mouse device common for all mice
[ 1.319309] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[ 1.319382] Driver for 1-wire Dallas network protocol.
[ 1.319879] NET: Registered protocol family 17
[ 1.320039] registered taskstats version 1
[ 1.416117] /build/buildd/linux-ec2-2.6.32/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 1.440926] kjournald starting. Commit interval 5 seconds
[ 1.440960] EXT3-fs: mounted filesystem with writeback data mode.
[ 1.440993] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
[ 1.442323] devtmpfs: mounted
[ 1.442481] Freeing unused kernel memory: 232k freed
[ 1.442660] Write protecting the kernel read-only data: 6440k
[ 2.325451] udev: starting version 151
[ 3.026948] EXT3 FS on sda1, internal journal
[ 3.457909] type=1505 audit(1268393961.212:2): operation="profile_load" pid=224 name="/sbin/dhclient3"
[ 3.458107] type=1505 audit(1268393961.212:3): operation="profile_load" pid=224 name="/usr/lib/NetworkManager/nm-dhcp-client.action"
[ 3.458189] type=1505 audit(1268393961.212:4): operation="profile_load" pid=224 name="/usr/lib/connman/scripts/dhclient-script"
[ 7.690625] NET: Registered protocol family 10
[ 7.691314] lo: Disabled Privacy Extensions
[ 11.342222] kjournald starting. Commit interval 5 seconds
[ 11.342241] EXT3-fs warning: checktime reached, running e2fsck is recommended
[ 11.393039] EXT3 FS on sdb, internal journal
[ 11.393054] EXT3-fs: mounted filesystem with writeback data mode.
[ 13.337050] type=1505 audit(1268393971.087:5): operation="profile_replace" pid=442 name="/sbin/dhclient3"
[ 13.337363] type=1505 audit(1268393971.087:6): operation="profile_replace" pid=442 name="/usr/lib/NetworkManager/nm-dhcp-client.action"
[ 13.337540] type=1505 audit(1268393971.087:7): operation="profile_replace" pid=442 name="/usr/lib/connman/scripts/dhclient-script"
[ 13.695206] type=1505 audit(1268393971.445:8): operation="profile_load" pid=443 name="/usr/sbin/tcpdump"
[ 17.811783] eth0: no IPv6 routers present
[ 642.603760] sdi: unknown partition table
[ 674.056262] kjournald starting. Commit interval 5 seconds
[ 674.059464] EXT3 FS on sdi, internal journal
[ 674.059470] EXT3-fs: mounted filesystem with writeback data mode.
[ 1671.652235] BUG: soft lockup - CPU#1 stuck for 63s! [swapper:0]
[ 1671.652235] Modules linked in: ipv6
[ 1671.652235] CPU 1:
[ 1671.652235] Modules linked in: ipv6
[ 1671.652235] Pid: 0, comm: swapper Not tainted 2.6.32-303-ec2 #7-Ubuntu
[ 1671.652235] RIP: e030:[<ffffffff801063aa>] [<ffffffff801063aa>] 0xffffffff801063aa
[ 1671.652235] RSP: e02b:ffff8801df855ed8 EFLAGS: 00000246
[ 1671.652235] RAX: 0000000000000000 RBX: ffff8801df855fd8 RCX: ffffffff801063aa
[ 1671.652235] RDX: ffff8801df855ec0 RSI: 0000000000000000 RDI: 0000000000000001
[ 1671.652235] RBP: ffff8801df855ef0 R08: 0000000000000000 R09: 0000000000000000
[ 1671.652235] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff807af8f8
[ 1671.652235] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1671.652235] FS: 00007fb96fca37c0(0000) GS:ffff88000101f000(0000) knlGS:0000000000000000
[ 1671.652235] CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
[ 1671.652235] CR2: 00000000013cf808 CR3: 00000001dd404000 CR4: 0000000000000660
[ 1671.652235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1671.652235] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[ 1671.652235] Call Trace:
[ 1671.652235] [<ffffffff8010c5a5>] ? xen_safe_halt+0x15/0x40
[ 1671.652235] [<ffffffff8010fde6>] xen_idle+0x36/0xa0
[ 1671.652235] [<ffffffff80107e65>] cpu_idle+0xb5/0x100
[ 1671.652235] [<ffffffff805a4e3f>] cpu_bringup_and_idle+0xe/0x10

ProblemType: Bug
Architecture: amd64
Date: Fri Mar 12 12:32:07 2010
DistroRelease: Ubuntu 10.04
Ec2AMI: ami-cfd738a6
Ec2AMIManifest: ubuntu-images-testing-us/ubuntu-lucid-daily-amd64-server-20100312.manifest.xml
Ec2AvailabilityZone: us-east-1a
Ec2InstanceType: m1.large
Ec2Kernel: aki-11c72878
Ec2Ramdisk: unavailable
Package: linux-image-2.6.32-303-ec2 2.6.32-303.7
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-303.7-ec2
SourcePackage: linux-ec2
Uname: Linux 2.6.32-303-ec2 x86_64

Revision history for this message
Matej.Pastor (georgewh) wrote :
Philip Muškovac (yofel)
affects: ubuntu → linux-ec2 (Ubuntu)
Revision history for this message
Matej.Pastor (georgewh) wrote :

I create (via debootstrap) and boot my own ebs ami. I think that problem is (maybe) in ipv6 module. But i don't include kernel modules to my ami, though i again get soft lockup. Then i copy modules to instance and after while (probably after loading ipv6 module) i got soft lockup again.

Revision history for this message
Matej.Pastor (georgewh) wrote :

aki id: aki-11c72878
uname -a: Linux ip-10-212-131-0 2.6.32-303-ec2 #7-Ubuntu SMP Wed Mar 10 11:23:24 UTC 2010 x86_64 GNU/Linux
modules were from ubuntu ami ami-11628d78

Revision history for this message
Scott Moser (smoser) wrote :

It seems most likely this is a duplicate of bug 540378 if you think otherwise, please say so.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.