CPU soft lockup in Xen PTE allocation on m2.2xlarge instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-meta-ec2 (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
The following soft lockup is seen randomly on m2.2xlarge instances in EC2:
[1284451.875485] BUG: soft lockup - CPU#3 stuck for 61s! [identify:24060]
[1284451.875485] Modules linked in: ipv6 ipt_REJECT ipt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp xt_owner iptable_filter ip_tables x_tables raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear md_mod
[1284451.875485] CPU 3:
[1284451.875485] Modules linked in: ipv6 ipt_REJECT ipt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp xt_owner iptable_filter ip_tables x_tables raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear md_mod
[1284451.875485] Pid: 24060, comm: identify Tainted: G D 2.6.32-316-ec2 #31-Ubuntu
[1284451.875485] RIP: e030:[<
[1284451.875485] RSP: e02b:ffff8800eb
[1284451.875485] RAX: 0000000000000000 RBX: ffff88025b7634c8 RCX: ffffffff810063aa
[1284451.875485] RDX: 0000000000000019 RSI: ffff8800eba4d948 RDI: 0000000000000003
[1284451.875485] RBP: ffff8800eba4d968 R08: ffff88025b763708 R09: 0000000000000040
[1284451.875485] R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000015
[1284451.875485] R13: 0000000000000045 R14: ffff88025b7634a8 R15: 0000000000000001
[1284451.875485] FS: 00007f1a7f48770
[1284451.875485] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[1284451.875485] CR2: 00007f16bfb0a398 CR3: 0000000001001000 CR4: 0000000000002660
[1284451.875485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1284451.875485] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1284451.875485] Call Trace:
[1284451.875485] [<ffffffff813a1
[1284451.875485] [<ffffffff813a7
[1284451.875485] [<ffffffff814b1
[1284451.875485] [<ffffffff814b1
[1284451.875485] [<ffffffff8101b
[1284451.875485] [<ffffffff8101d
[1284451.875485] [<ffffffff8101d
[1284451.875485] [<ffffffff810d2
[1284451.875485] [<ffffffff8103a
[1284451.875485] [<ffffffff8103f
[1284451.875485] [<ffffffff81040
[1284451.875485] [<ffffffff814b2
[1284451.875485] [<ffffffff8100d
[1284451.875485] [<ffffffff814b1
[1284451.875485] [<ffffffff8100b
[1284451.875485] [<ffffffff8101f
[1284451.875485] [<ffffffff810b2
[1284451.875485] [<ffffffff810b3
[1284451.875485] [<ffffffff8100a
[1284451.875485] [<ffffffff8101f
[1284451.875485] [<ffffffff8101f
[1284451.875485] [<ffffffff810b4
[1284451.875485] [<ffffffff8101f
[1284451.875485] [<ffffffff810c9
[1284451.875485] [<ffffffff810cd
[1284451.875485] [<ffffffff814b3
[1284451.875485] [<ffffffff814b1
The pinning hypercall in do_lN_entry_
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-ec2 2.6.32.341.22
ProcVersionSign
Uname: Linux 2.6.32-341-ec2 x86_64
Architecture: amd64
Date: Fri Jan 20 22:43:24 2012
Ec2AMI: ami-55dc0b3c
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: m2.2xlarge
Ec2Kernel: aki-427d952b
Ec2Ramdisk: unavailable
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-meta-ec2
Changed in linux-meta-ec2 (Ubuntu): | |
status: | Confirmed → Won't Fix |
The hypercall fails due to invalid write permissions on the page that's attempting to be pinned. Perhaps the page that's being pinned for PTEs was reused?
One fix that was applied to the upstream kernel for such problems was this: http:// git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= commitdiff; h=64141da587241 301ce8638cc945f 8b67853156ec
I don't think that's the cause in this case since XFS isn't in use. Perhaps some other kernel subsystem is leaving pages behind in the vmalloc area with write permissions set?