panic in task_rq_lock (race with concurrent semtimedop() timeouts and IPC_RMID)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Natty |
Fix Released
|
Medium
|
Herton R. Krzesinski |
Bug Description
SRU justification
=================
Impact
------
Kernel crash, due to race explained in upstream bug report: https:/
In practice likely to happen on a highly loaded webserver
Fix
---
Upstream commit d694ad62bf539db
Testcase
--------
https:/
It's attached to this bug as well.
- Build with gcc -o timedrm timedrm.cpp -lpthread
- Run with ./timedrm 250, sometimes you have to run more than one time to get the oops, but it's very easy to get the crash.
-------
When logged in I saw:
unity kernel: [669168.472431] last sysfs file: /sys/devices/
unity kernel: [669168.475971] Stack:
unity kernel: [669168.476634] Call Trace:
unity kernel: [669168.477094] Code: 00 48 c7 c3 c0 3c 01 00 49 89 fc 49 89 f5 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08 49 89 de <8b> 40 18 4c 03 34 c5 00 4b ac 81 4c 89 f7 e8 03 36 58 00 49 8b
unity kernel: [669168.479444] CR2: 00000000801f0f1d
In the log:
Mar 1 06:25:04 unity apache2[14216]: [Thu Mar 01 06:25:04 2012] [notice] SIGUSR1 received. Doing graceful restart
Mar 1 06:25:04 unity kernel: [669168.471999] BUG: unable to handle kernel paging request at 00000000801f0f1d
Mar 1 06:25:04 unity kernel: [669168.472131] IP: [<ffffffff81051
Mar 1 06:25:04 unity kernel: [669168.472229] PGD 0
Mar 1 06:25:04 unity kernel: [669168.472312] Oops: 0000 [#1] SMP
Mar 1 06:25:04 unity kernel: [669168.472431] last sysfs file: /sys/devices/
Mar 1 06:25:04 unity kernel: [669168.472508] CPU 7
Mar 1 06:25:04 unity kernel: [669168.472545] Modules linked in: ipt_MASQUERADE iptable_nat kvm_intel kvm ip6t_LOG xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT ipt_LOG xt_limit xt_tcpudp ipt_addrtype xt_state
ip6table_filter ip6_tables radeon nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ipv4 ipmi_devintf nf_defrag_ipv4 ipmi_watchdog nf_conntrack_ftp psmouse nf_conntrack ttm drm_kms_helper ipmi_si drm ipt
able_filter serio_raw joydev i5400_edac edac_core ipmi_poweroff ip_tables ioatdma ipmi_msghandler i5k_amb lp i2c_algo_bit x_tables bridge stp parport shpchp usbhid hid usb_storage uas igb arcmsr dca
Mar 1 06:25:04 unity kernel: [669168.474703]
Mar 1 06:25:04 unity kernel: [669168.474756] Pid: 1832, comm: apache2 Not tainted 2.6.38-10-server #46~lucid1-Ubuntu Supermicro X7DWU/X7DWU
Mar 1 06:25:04 unity kernel: [669168.475004] RIP: 0010:[<
Mar 1 06:25:04 unity kernel: [669168.475114] RSP: 0018:ffff88040c
Mar 1 06:25:04 unity kernel: [669168.475171] RAX: 00000000801f0f05 RBX: 0000000000013cc0 RCX: 0000000000000002
Mar 1 06:25:04 unity kernel: [669168.475245] RDX: 0000000000000282 RSI: ffff88040c10fe20 RDI: 00007f558925f8f0
Mar 1 06:25:04 unity kernel: [669168.475320] RBP: ffff88040c10fde8 R08: 0000000000989680 R09: 000000000000028b
Mar 1 06:25:04 unity kernel: [669168.475393] R10: 0000000000007bea R11: 0000000000000001 R12: 00007f558925f8f0
Mar 1 06:25:04 unity kernel: [669168.475467] R13: ffff88040c10fe20 R14: 0000000000013cc0 R15: 0000000000000007
Mar 1 06:25:04 unity kernel: [669168.475542] FS: 00007f5589d0374
Mar 1 06:25:04 unity kernel: [669168.475617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 1 06:25:04 unity kernel: [669168.475674] CR2: 00000000801f0f1d CR3: 000000040eb35000 CR4: 00000000000026e0
Mar 1 06:25:04 unity kernel: [669168.475748] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 1 06:25:04 unity kernel: [669168.475821] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 1 06:25:04 unity kernel: [669168.475895] Process apache2 (pid: 1832, threadinfo ffff88040c10e000, task ffff88040c1f2dc0)
Mar 1 06:25:04 unity kernel: [669168.475971] Stack:
Mar 1 06:25:04 unity kernel: [669168.476022] 00007f558925f8f0 ffff88040f155ec8 000000000000000f 0000000000000000
Mar 1 06:25:04 unity kernel: [669168.476225] ffff88040c10fe58 ffffffff8105f6dc ffff88040c10fe28 0000000700000286
Mar 1 06:25:04 unity kernel: [669168.476429] 0000000000000003 0000000181a4d7f0 ffff8804015d9850 0000000000000282
Mar 1 06:25:04 unity kernel: [669168.476634] Call Trace:
Mar 1 06:25:04 unity kernel: [669168.476689] [<ffffffff8105f
Mar 1 06:25:04 unity kernel: [669168.476747] [<ffffffff8105f
Mar 1 06:25:04 unity kernel: [669168.476806] [<ffffffff8126a
Mar 1 06:25:04 unity kernel: [669168.476863] [<ffffffff8126b
Mar 1 06:25:04 unity kernel: [669168.476921] [<ffffffff81164
Mar 1 06:25:04 unity kernel: [669168.476979] [<ffffffff8126b
Mar 1 06:25:04 unity kernel: [669168.477036] [<ffffffff8100c
Mar 1 06:25:04 unity kernel: [669168.477094] Code: 00 48 c7 c3 c0 3c 01 00 49 89 fc 49 89 f5 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08 49 89 de <8b> 40 18 4c 03 34 c5 00 4b ac 81 4c 89 f7 e8 03 36 58 00 49 8b
Mar 1 06:25:04 unity kernel: [669168.479300] RIP [<ffffffff81051
Mar 1 06:25:04 unity kernel: [669168.479391] RSP <ffff88040c10fdc8>
Mar 1 06:25:04 unity kernel: [669168.479444] CR2: 00000000801f0f1d
Mar 1 06:25:04 unity kernel: [669168.479497] ---[ end trace b2b87cfb63915f6c ]---
This happens QUITE OFTEN. Only solution: Sync Filesystem and power cycle (read: I can't reboot, I have to pull the plug! (well, pushing reset button or the same via MagicKey....)
Furthermore: Apache in this case will no longer answer, and won't be able to Stop, It goes zombie.
The System is still accessible, except for Apache - and Apache can't be braught back to live...
Can't say, if it is a memory issue, but note: This is a Server, it has ECC FB-DIMM Memory. Will have to do a memory check some time. But nothing in this regard has been seen in the logs of the daughter board.
Some System info:
Distributor ID: Ubuntu
Description: Ubuntu 10.04.4 LTS
Release: 10.04
Codename: lucid
*-memory
physical id: 16
slot: System board or motherboard
size: 16GiB
*-bank:0
slot: DIMM1A
size: 4GiB
width: 64 bits
clock: 800MHz (1.2ns)
*-bank:1
slot: DIMM1B
clock: 800MHz (1.2ns)
*-bank:2
slot: DIMM2A
size: 4GiB
width: 64 bits
clock: 800MHz (1.2ns)
*-bank:3
slot: DIMM2B
clock: 800MHz (1.2ns)
*-bank:4
slot: DIMM3A
size: 4GiB
width: 64 bits
clock: 800MHz (1.2ns)
*-bank:5
slot: DIMM3B
clock: 800MHz (1.2ns)
*-bank:6
slot: DIMM4A
size: 4GiB
width: 64 bits
clock: 800MHz (1.2ns)
*-bank:7
slot: DIMM4B
clock: 800MHz (1.2ns)
*-cpu:0
product: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz
vendor: Intel Corp.
physical id: 4
bus info: cpu@0
version: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz
slot: LGA771/CPU1
size: 3GHz
width: 64 bits
clock: 1600MHz
rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority
*-cache:0
slot: L1 Cache
size: 16KiB
*-cache:1
slot: L2 Cache
size: 12MiB
*-cpu:1
product: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz
vendor: Intel Corp.
physical id: 5
bus info: cpu@1
version: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz
slot: LGA771/CPU2
size: 3GHz
width: 64 bits
clock: 1600MHz
rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority
*-cache:0
slot: L1 Cache
size: 16KiB
*-cache:1
slot: L2 Cache
size: 12MiB
Related branches
CVE References
description: | updated |
Changed in linux (Ubuntu Natty): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 943815
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.