kernel soft lockup race condition on filesystem read operations in generic_file_splice_read function

Bug #790557 reported by Daniel
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Hardy
Fix Released
Medium
Leann Ogasawara

Bug Description

SRU Justification:

Impact: Without the fix, users can experience "sporadic kernel lockups on a Ubuntu Hardy LTS fileserver which produces serious downtimes."

Fix: upstream commit 8191ecd1d14c6914c660dfa007154860a7908857

Test case: Without a patched kernel you'll see soft lockup error messages in your dmesg output an experience sporadic kernel lockups. With a patched kernel you won't experience the lockups or see the error messages.

Hello,

we are experiencing sporadic kernel lockups on a Ubuntu Hardy LTS fileserver which produces serious downtimes. The following message can be found in our kern.log and dmesg:

May 30 13:55:20 sanhead01 kernel: [699831.819099] BUG: soft lockup - CPU#1 stuck for 11s! [nfsd:17397]
May 30 13:55:20 sanhead01 kernel: [699831.891913] CPU 1:
May 30 13:55:20 sanhead01 kernel: [699831.891914] Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bonding usbkbd qla2xxx raid1 raid10 raid456 async_xor async_memcpy async_tx xor raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod fbcon tileblit font bitblit softcursor fan thermal processor forcedeth tg3 ehci_hcd e1000 ohci_hcd scsi_transport_fc scsi_tgt pata_amd sata_nv pata_acpi ata_generic libata usbhid hid sd_mod sg scsi_mod ext3 jbd mbcache shpchp pci_hotplug evdev pcspkr serio_raw button psmouse i2c_nforce2 i2c_core joydev uhci_hcd usbcore ac video output sbs sbshc container battery dock bridge 8021q af_packet drbd cn
May 30 13:55:20 sanhead01 kernel: [699831.891952] Pid: 17397, comm: nfsd Not tainted 2.6.24-26-server #1
May 30 13:55:20 sanhead01 kernel: [699831.891954] RIP: 0010:[find_get_pages_contig+0x95/0xb0] [find_get_pages_contig+0x95/0xb0] find_get_pages_contig+0x95/0xb0
May 30 13:55:20 sanhead01 kernel: [699831.891959] RSP: 0018:ffff8100cba31a88 EFLAGS: 00000286
May 30 13:55:20 sanhead01 kernel: [699831.891961] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8100cba31c10
May 30 13:55:20 sanhead01 kernel: [699831.891963] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff81011da43200
May 30 13:55:20 sanhead01 kernel: [699831.891965] RBP: ffff81001c6914d8 R08: 0000000000000001 R09: 0000000000000000
May 30 13:55:20 sanhead01 kernel: [699831.891967] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000014
May 30 13:55:20 sanhead01 kernel: [699831.891970] R13: 0000000000000001 R14: 0000000000000000 R15: ffff81011da43200
May 30 13:55:20 sanhead01 kernel: [699831.891972] FS: 00007f0e957c66e0(0000) GS:ffff81011bc01800(0000) knlGS:0000000000000000
May 30 13:55:20 sanhead01 kernel: [699831.891974] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 30 13:55:20 sanhead01 kernel: [699831.891977] CR2: 00007fbdb04a2000 CR3: 0000000118845000 CR4: 00000000000006e0
May 30 13:55:20 sanhead01 kernel: [699831.891979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 30 13:55:20 sanhead01 kernel: [699831.891981] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 30 13:55:20 sanhead01 kernel: [699831.891983]
May 30 13:55:20 sanhead01 kernel: [699831.891983] Call Trace:
May 30 13:55:20 sanhead01 kernel: [699831.891989] [ext3:generic_file_splice_read+0x10b/0x1e10] generic_file_splice_read+0x10b/0x4c0
May 30 13:55:20 sanhead01 kernel: [699831.891998] [ifind_fast+0x45/0xa0] ifind_fast+0x45/0xa0
May 30 13:55:20 sanhead01 kernel: [699831.892002] [ext3:iget_locked+0x44/0x800] iget_locked+0x44/0x180
May 30 13:55:20 sanhead01 kernel: [699831.892007] [<ffffffff883c85aa>] :exportfs:find_acceptable_alias+0x1a/0xe0
May 30 13:55:20 sanhead01 kernel: [699831.892012] [<ffffffff883c8703>] :exportfs:exportfs_decode_fh+0x93/0x270
May 30 13:55:20 sanhead01 kernel: [699831.892020] [<ffffffff88432490>] :nfsd:nfsd_acceptable+0x0/0xf0
May 30 13:55:20 sanhead01 kernel: [699831.892032] [<ffffffff883dfa69>] :sunrpc:cache_check+0x49/0x490
May 30 13:55:20 sanhead01 kernel: [699831.892040] [set_current_groups+0x23b/0x240] set_current_groups+0x23b/0x240
May 30 13:55:20 sanhead01 kernel: [699831.892050] [splice_direct_to_actor+0xbc/0x190] splice_direct_to_actor+0xbc/0x190
May 30 13:55:20 sanhead01 kernel: [699831.892058] [<ffffffff88433e50>] :nfsd:nfsd_direct_splice_actor+0x0/0x20
May 30 13:55:20 sanhead01 kernel: [699831.892070] [<ffffffff88433e27>] :nfsd:nfsd_vfs_read+0x3c7/0x3f0
May 30 13:55:20 sanhead01 kernel: [699831.892083] [<ffffffff88434402>] :nfsd:nfsd_read+0xe2/0x100
May 30 13:55:20 sanhead01 kernel: [699831.892095] [<ffffffff883d8a90>] :sunrpc:svc_sock_enqueue+0x80/0x360
May 30 13:55:20 sanhead01 kernel: [699831.892106] [<ffffffff8843c6fd>] :nfsd:nfsd3_proc_read+0xfd/0x1a0
May 30 13:55:20 sanhead01 kernel: [699831.892116] [<ffffffff8842f271>] :nfsd:nfsd_dispatch+0xb1/0x240
May 30 13:55:20 sanhead01 kernel: [699831.892130] [<ffffffff883d7dad>] :sunrpc:svc_process+0x47d/0x7e0
May 30 13:55:20 sanhead01 kernel: [699831.892133] [<ffffffff80236540>] default_wake_function+0x0/0x10
May 30 13:55:20 sanhead01 kernel: [699831.892138] [__down_read+0x12/0xb1] __down_read+0x12/0xb1
May 30 13:55:20 sanhead01 kernel: [699831.892147] [<ffffffff8842f810>] :nfsd:nfsd+0x0/0x2e0
May 30 13:55:20 sanhead01 kernel: [699831.892154] [<ffffffff8842f99f>] :nfsd:nfsd+0x18f/0x2e0
May 30 13:55:20 sanhead01 kernel: [699831.892160] [child_rip+0xa/0x12] child_rip+0xa/0x12
May 30 13:55:20 sanhead01 kernel: [699831.892167] [<ffffffff8842f810>] :nfsd:nfsd+0x0/0x2e0
May 30 13:55:20 sanhead01 kernel: [699831.892179] [<ffffffff8842f810>] :nfsd:nfsd+0x0/0x2e0
May 30 13:55:20 sanhead01 kernel: [699831.892182] [child_rip+0x0/0x12] child_rip+0x0/0x12

It seems to me that the following patch is related to and most probably fixes the problem:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-
rc8/2.6.25-rc8-mm2/broken-out/generic_file_splice_read-fix-lockups.patch

Can you provide this patch as a security/stability update for the Ubuntu Hardy LTS Kernel please?

System Information:

sanhead01:~# lsb_release -rd
Description: Ubuntu 8.04.4 LTS
Release: 8.04

Linux sanhead01 2.6.24-26-server #1 SMP Tue Dec 1 18:26:43 UTC 2009 x86_64 GNU/Linux
Ubuntu 2.6.24-26.64-server

Revision history for this message
Daniel (pada) wrote :
Daniel (pada)
affects: linux-ubuntu-modules-2.6.24 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in dianosing the problem. From a terminal window please run:

apport-collect 790557

and then change the status of the bug back to 'New'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel (pada) wrote :

Hi Brad,

since this is a highly productive system, we will not run a tool like apport-collect on these production machines.

Additionally, we have patched the most up-to-date ubuntu hardy kernel source package and installed the abovementioned patch from the vanilla upstream kernel to fix the problem and ensure stability of our fileserver.

Regards,
Daniel

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

~/linux$ git show 8191ecd1d14c6914c660dfa007154860a7908857
commit 8191ecd1d14c6914c660dfa007154860a7908857
Author: Jens Axboe <email address hidden>
Date: Thu Apr 10 08:24:25 2008 +0200

    splice: fix infinite loop in generic_file_splice_read()

linux$ git describe --contains 8191ecd1d14c6914c660dfa007154860a7908857
v2.6.25-rc9~29^2~1

It appears the patch in question was included upstream as of v2.6.25, thus only the Hardy 8.04 release is affected. Thus, I'm marking the actively developed linux task as Fix Released. I've opened a Hardy nomination for this patch to be considered for a Hardy SRU (stable release update). I'll submit this patch to the mailing list shortly. Thanks.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Hardy):
assignee: nobody → Leann Ogasawara (leannogasawara)
importance: Undecided → Medium
status: New → In Progress
description: updated
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Hardy):
status: In Progress → Fix Committed
Revision history for this message
Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel for Hardy in -proposed (2.6.24-29.94) solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hardy' to 'verification-done-hardy'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hardy
Daniel (pada)
tags: added: verification-done-hardy
removed: verification-needed-hardy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.24-29.94

---------------
linux (2.6.24-29.94) hardy-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #853945

  [Upstream Kernel Changes]

  * ipv6: make fragment identifications less predictable, CVE-2011-2699
    - LP: #827685
    - CVE-2011-2699
  * splice: fix infinite loop in generic_file_splice_read()
    - LP: #790557
  * cifs: fix possible memory corruption in CIFSFindNext, CVE-2011-3191
    - LP: #834135
    - CVE-2011-3191
  * befs: ensure fast symlinks are NUL-terminated, CVE-2011-2928
    - LP: #834124
    - CVE-2011-2928
  * befs: Validate length of long symbolic links, CVE-2011-2928
    - LP: #834124
    - CVE-2011-2928
  * Validate size of EFI GUID partition entries, CVE-2011-1776
    - LP: #844365
    - CVE-2011-1776
  * inet_diag: fix inet_diag_bc_audit(), CVE-2011-2213
    - LP: #838421
    - CVE-2011-2213
  * Bluetooth: Prevent buffer overflow in l2cap config request,
    CVE-2011-2497
    - LP: #838423
    - CVE-2011-2497
 -- Herton Ronaldo Krzesinski <email address hidden> Mon, 19 Sep 2011 12:24:41 -0300

Changed in linux (Ubuntu Hardy):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.