[SandyBridge] kernel BUG at /build/buildd/linux-2.6.38/fs/nfsd/nfs4state.c:3132!

Bug #716811 reported by Bryce Harrington
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Andy Whitcroft
Declined for Natty by Herton R. Krzesinski
nfs-utils (Ubuntu)
Invalid
Undecided
Unassigned
Declined for Natty by Herton R. Krzesinski

Bug Description

Updated to 2.6.38-2-generic-pae from .38-1 this morning on this SandyBridge system with an ATI graphics card running the opensource -ati driver. I've seen this BUG twice now. Not sure what triggers it.

At the time this happened, I was typing up a reply to a bug report in firefox, when suddenly the display stuttered. I had just finished typing "Xorg.0.log from" except what showed in the text area was "Xorg.0.loggggggggggggggggggggg"; then suddenly it switched to a console where the kernel BUG listed below was displayed. The music player also stopped (was playing an internet radio station.) I tapped a few keys and tried vt switching but no go, there seemed to be no response. But then a few moments later I tried again and could switch back to vt7 and file this report. I clicked play on the music player and it went back to playing music.

This box was freshly installed a couple days ago. I installed and set up an NFS server on it last night, and it seemed to work ok. This morning I tried to wake the monitors from power saving, but they never came back. I ssh'd in but didn't see anything unusual at the time other than being on .38-1, however now reviewing my /var/log/kern.log I see it had failed with this same BUG at 8am so guessing it might have been the same problem. Since I was on .38-1 I decided to upgrade to .38-2 before reporting it. Not long after rebooting onto that kernel I saw this BUG again, exactly as I described above, except that the music player did not stop (I was playing music files from an nfs share). I rebooted again, and now after about 6 hrs of uptime have experienced it again.

I notice it is mentioning nfs4 in the error, however afaik I'm just using nfs3 (I don't need any of the nfs4 features).

One thing that's odd, since returning to vt7 I can't find the console where the BUG error had been printed out. I took a photo but it's basically just this:

[22144.624356] ------------[ cut here ]------------
[22144.624884] kernel BUG at /build/buildd/linux-2.6.38/fs/nfsd/nfs4state.c:3132!
[22144.625703] invalid opcode: 0000 [#1] SMP
[22144.626185] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[22144.627073] Modules linked in: binfmt_misc parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer radeon ttm snd_seq_device drm_kms_helper lp drm snd soundcore joydev serio_raw xhci_hcd snd_page_alloc nfsd exportfs nfs lockd fscache nfs_acl i2c_algo_bit auth_rpcgss parport sunrpc usbhid r8169 hid ahci libahci floppy
[22144.631845]
[22144.632008] Pid: 1113, comm: nfsd Not tainted 2.6.38-2-generic-pae #29-Ubuntu P67A-UD4/P67A-UD4
[22144.632995] EIP: 0060:[<f8e1bc37>] EFLAGS: 00010246 CPU: 0
[22144.633625] EIP is at nfs4_preprocess_stateid_op+0x327/0x3a0 [nfsd]
[22144.634333] EAX: 00000000 EBX: ec32d398 ECX: eed4295c EDX: f754ad80
[22144.635041] ESI: 00000000 EDI: f8e31cc1 EBP: ed991eb0 ESP: ed991e88
[22144.635750] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[22144.636360] Process nfsd (pid: 1113, ti=ed990000 task=f6cd9940 task.ti=ed990000)
[22144.637196] Stack:
[22144.637418] f8dffe36 ed991ecc f8e16441 ffffffff c5b7275c eed4293c 00000010 eed4293c
[22144.638382] f25ef820 f7410000 ed991ecc f8e0c675 eed4295c eed42800 f25ef800 eed42800
[22144.639347] eed42934 ed991f0c f8e0cdc9 f8e11520 ed991ee8 c107f1c5 f7410000 00000002
[22144.640311] Call Trace:
[22144.640589] [<f8dffe36>] ? fh_verify+0x136/0x260 [nfsd]
[22144.641196] [<f8e16441>] ? nfsd4_encode_operation+0x61/0x180 [nfsd]
[22144.641922] [<f8e0c675>] nfsd4_read+0x45/0xc0 [nfsd]
[22144.642497] [<f8e0cdc9>] nfsd4_proc_compound+0x369/0x420 [nfsd]
[22144.643180] [<f8e11520>] ? nfsd4_decode_compound+0x240/0x330 [nfsd]
[22144.643902] [<c107f1c5>] ? groups_free+0x45/0x50
[22144.644437] [<f8e0c630>] ? nfsd4_read+0x0/0xc0 [nfsd]
[22144.645021] [<f8dfc913>] nfsd_dispatch+0xd3/0x210 [nfsd]
[22144.645643] [<f8d2b559>] svc_process_common+0x2c9/0x5c0 [sunrpc]
[22144.646342] [<f8d3758d>] ? svc_xprt_received+0x2d/0x40 [sunrpc]
[22144.647030] [<f8d38479>] ? svc_recv+0x479/0x730 [sunrpc]
[22144.647648] [<f8d2b92c>] svc_process+0xdc/0x140 [sunrpc]
[22144.648260] [<c1525dc0>] ? down_read+0x10/0x20
[22144.648774] [<f8dfc0f0>] nfsd+0xb0/0x140 [nfsd]
[22144.649296] [<c1041e6e>] ? complete+0x4e/0x60
[22144.649800] [<f8dfc040>] ? nfsd+0x0/0x140 [nfsd]
[22144.650332] [<c1076484>] kthread+0x74/0x80
[22144.704641] [<c1076410>] ? kthread+0x0/0x80
[22144.759083] [<c100b13e>] kernel_thread_helper+0x6/0x10
[22144.811794] Code: fd ff ff 0f 83 2f ff ff ff eb 8d 31 f6 8d b4 26 00 00 00 00 e9 bc fd ff ff 8b 42 24 31 f6 8b 4d 08 85 c0 89 01 0f 85 aa fd ff ff <0f> 0b 8b 57 24 8b 42 20 85 c0 74 37 8b 55 08 89 02 e9 94 fd ff
[22144.867810] EIP: [<f8e1bc37>] nfs4_preprocess_stateid_op+0x327/0x3a0 [nfsd] SS:ESP 0068:ed991e88
[22145.219739] ---[ end trace 42f5b713d5ecc48b ]---
[22231.170388] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-2-generic-pae 2.6.38-2.29
Regression: Yes
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.38-2.29-generic-pae 2.6.38-rc3
Uname: Linux 2.6.38-2-generic-pae i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: PCH [HDA Intel PCH], device 0: ALC892 Analog [ALC892 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bryce 2141 F.... pulseaudio
 /dev/snd/pcmC0D0p: bryce 2141 F...m pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xfbff8000 irq 60'
   Mixer name : 'Realtek ALC892'
   Components : 'HDA:10ec0892,1458a022,00100302'
   Controls : 32
   Simple ctrls : 18
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfbbfc000 irq 61'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100100'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Thu Feb 10 16:48:41 2011
Frequency: Once a day.
HibernationDevice: RESUME=UUID=7c6eccc7-e8d3-4a5b-9ede-1888deff2e36
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110202)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. P67A-UD4
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=C
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic-pae root=UUID=2b238928-f3f1-4074-b551-2cceff54c6a4 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-2-generic-pae N/A
 linux-backports-modules-2.6.38-2-generic-pae N/A
 linux-firmware 1.46
RfKill:

SourcePackage: linux
dmi.bios.date: 11/25/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F4
dmi.board.name: P67A-UD4
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF4:bd11/25/2010:svnGigabyteTechnologyCo.,Ltd.:pnP67A-UD4:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnP67A-UD4:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: P67A-UD4
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

[Workaround]
Specify in /etc/defaults/nfs-kernel-server:

# Number of servers to start up
RPCNFSDCOUNT='8 --no-nfs-version 4'

(From http://andy.delcambre.com/2007/06/25/disabling-nfsv4-on-ubuntu.html)

Revision history for this message
Bryce Harrington (bryce) wrote :
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Bug 716811] Re: [SandyBridge] kernel BUG at /build/buildd/linux-2.6.38/fs/nfsd/nfs4state.c:3132!

Btw, since the BUG, I notice my cpu has a rather high load, with a bunch
of defunct nfsd processes:

top - 18:17:27 up 7:42, 31 users, load average: 8.30, 8.13, 8.07
Tasks: 258 total, 1 running, 249 sleeping, 0 stopped, 8 zombie
Cpu(s): 13.4%us, 0.7%sy, 0.0%ni, 85.7%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4107372k total, 3845740k used, 261632k free, 179540k buffers
Swap: 2980860k total, 180k used, 2980680k free, 483024k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  234 root 20 0 0 0 0 D 0 0.0 0:00.00 kworker/u:7
 1108 root 20 0 0 0 0 D 0 0.0 0:00.04 nfsd
 1109 root 20 0 0 0 0 D 0 0.0 0:00.05 nfsd
 1110 root 20 0 0 0 0 D 0 0.0 0:00.04 nfsd
 1111 root 20 0 0 0 0 D 0 0.0 0:00.06 nfsd
 1112 root 20 0 0 0 0 D 0 0.0 0:00.06 nfsd
 1114 root 20 0 0 0 0 D 0 0.0 0:00.08 nfsd
 1115 root 20 0 0 0 0 D 0 0.0 0:00.06 nfsd
12092 bryce 20 0 2632 1224 852 R 0 0.0 0:15.34 top

Revision history for this message
Bryce Harrington (bryce) wrote :

Here is the remainder of the log from that BUG until system restarted.

Feb 10 16:46:28 humber kernel: [22231.170388] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
Feb 10 16:50:07 humber kernel: [22449.421156] ACPI Warning: Incorrect checksum in table [TAMG] - 0xD6, should be 0xD5 \
(20110112/tbutils-314)
Feb 10 16:50:07 humber kernel: [22449.424733] ACPI Warning: Incorrect checksum in table [TAMG] - 0xD6, should be 0xD5 \
(20110112/tbutils-314)
Feb 10 17:09:39 humber kernel: [23618.546442] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
Feb 10 17:09:46 humber kernel: [23625.741585] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
Feb 10 18:33:34 humber kernel: Kernel logging (proc) stopped.
Feb 10 18:36:44 humber kernel: imklog 4.6.4, log source = /proc/kmsg started.
Feb 10 18:36:44 humber kernel: [ 0.000000] Initializing cgroup subsys cpuset
Feb 10 18:36:44 humber kernel: [ 0.000000] Initializing cgroup subsys cpu
Feb 10 18:36:44 humber kernel: [ 0.000000] Linux version 2.6.38-2-generic-pae (buildd@rothera) (gcc version 4.5.2 (\
Ubuntu/Linaro 4.5.2-2ubuntu1) ) #29-Ubuntu SMP Fri Feb 4 14:43:27 UTC 2011 (Ubuntu 2.6.38-2.29-generic-pae 2.6.38-rc3)

Humorously, I don't know whether the system crashed and shut itself down, or if it was my wiley 1 1/2 yr old son, who happened to be visiting my office at the time and has a penchant for pushing power buttons. I was watching him pretty close, but he's quick and sneaky.

Revision history for this message
Bryce Harrington (bryce) wrote :
tags: added: kernel-key
Revision history for this message
Herton R. Krzesinski (herton) wrote :

This was reported recently also at https://lkml.org/lkml/2011/2/8/370 (ongoing discussion)

Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

On Fri, Feb 11, 2011 at 06:04:46PM -0000, Herton R. Krzesinski wrote:
> This was reported recently also at https://lkml.org/lkml/2011/2/8/370
> (ongoing discussion)

Ah interesting.

"""
This looks like it could be a known delegation bug.... Would it be
possible for you to check whether this is reproduceable on
         git://linux-nfs.org/~bfields/topics.git for-2.6.38-incoming
"""

Sounds like a fix is forthcoming. I didn't think I had nfsv4 on but
guess I'll need to doublecheck.

Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

From upstream discussion via http://www.gossamer-threads.com/lists/linux/kernel/1337297

"""
> > > I think I see the problem.... Could you try fetching that branch again?
> > >
> > > git://linux-nfs.org/~bfields/linux-topics.git for-2.6.38-incoming
> > >
> > > (But I've only tested that it compiles so far.)
> >
> > Sorry, I already left my office and have no access to the machine.
> > I'll test it on Monday if Tino can't test it beforehand.
>
> I've been testing it for hours, and so far this looks good.
> FireFox starts and doesn't trigger BUG().

Good, thanks for the testing.
"""

Revision history for this message
Bryce Harrington (bryce) wrote :

In investigating this a bit further it appears the server will respond with nfsv4 by default even if not specifically requested as vers=4 by the client. I guess that causes problems when you've set the server up for sending nfsv3. Anyway, I've modified nfs-utils defaults file to include a comment as to how to disable nfsv4 in case you want to only do version 3. So far, system seems stable after a couple reboots and a few minute's nfs usage (although the crash was intermittent about every day or so, so we'll have to see).

Revision history for this message
Andy Whitcroft (apw) wrote :

@bryce -- cannot work out from the upstream thread what the patch was, as the conversation is all occuring on the 11th of feb, but the end of the referenced git branch doesn't have anything after the 6th of feb. Anyone got a pointer to the upstream fix??

Revision history for this message
Herton R. Krzesinski (herton) wrote :

@andy -- I think I saw today the fix on that branch (http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=shortlog;h=refs/heads/for-2.6.38-incoming), but now I can't find it too... may be it was pushed to other tree

Revision history for this message
Herton R. Krzesinski (herton) wrote :

The commit titled "nfsd4: acquire only one lease per file" should be the fix, in this pull request: https://lkml.org/lkml/2011/2/15/265

Revision history for this message
Herton R. Krzesinski (herton) wrote :

The mentioned fix is now in 2.6.38-rc5, commit acfdf5c

Revision history for this message
Andy Whitcroft (apw) wrote :

@Herton -- thanks for the research on this.

I have just pushed a rebase to v2.6.28-rc5 to our git repository so moving this Fix Committed. Could those of you who can reproduce this please test the kernels below (different bug number but they represent the tip of the v2.6.38-rc5 rebase which includes this fix):

    http://people.canonical.com/~apw/lp713769-natty/

Please test and report back here. Thanks.

Changed in linux (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Andy Whitcroft (apw)
Revision history for this message
Bryce Harrington (bryce) wrote :

On Wed, Feb 16, 2011 at 04:10:40PM -0000, Andy Whitcroft wrote:
> @Herton -- thanks for the research on this.
>
> I have just pushed a rebase to v2.6.28-rc5 to our git repository so
> moving this Fix Committed. Could those of you who can reproduce this
> please test the kernels below (different bug number but they represent
> the tip of the v2.6.38-rc5 rebase which includes this fix):
>
> http://people.canonical.com/~apw/lp713769-natty/
>
> Please test and report back here. Thanks.

Thanks for getting a kernel built with the fix so quickly. :-)

I've removed my workaround and installed/booted this kernel.

The issue typically only surfaced after using the system for a while
(maybe triggered by client nfs activity?) somewhere between a few hours
and a day. So I'll run with this for a day or so and see if it comes
back.

Bryce

Revision history for this message
Bryce Harrington (bryce) wrote :

On Wed, Feb 16, 2011 at 04:10:40PM -0000, Andy Whitcroft wrote:
> @Herton -- thanks for the research on this.
>
> I have just pushed a rebase to v2.6.28-rc5 to our git repository so
> moving this Fix Committed. Could those of you who can reproduce this
> please test the kernels below (different bug number but they represent
> the tip of the v2.6.38-rc5 rebase which includes this fix):
>
> http://people.canonical.com/~apw/lp713769-natty/
>
> Please test and report back here. Thanks.

So far, so good.

Idle side note, this also restored functioning plymouth for me (although
ironically it displays the logo for only a fraction of a second).

Bryce

Revision history for this message
Andy Whitcroft (apw) wrote :

As this rebase is now out in the archive, and based on the testing results here I am going to close this Fix Released.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in nfs-utils (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.