rpc.nfsd generates oops

Bug #48675 reported by Matt on 2006-06-06
8
Affects Status Importance Assigned to Milestone
Ubuntu
Medium
Unassigned
linux-source-2.6.15 (Ubuntu)
Medium
Unassigned
nfs-user-server (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: nfs-kernel-server

I upgraded my NFS server at home to Dapper (from Hoary) and have started getting oops'es from rpc.nfsd.

I've tried both the user and kernel versions of the nfs server and it oops'es in both cases, though when I use the user-space version the machine doesn't always need to be rebooted.

This only happens under heavy load, but it never happened before I upgraded to Dapper.

Jun 4 08:16:24 localhost kernel: [4427848.536000] Unable to handle kernel paging request at virtual address 000029d6
Jun 4 08:16:24 localhost kernel: [4427848.536000] printing eip:
Jun 4 08:16:24 localhost kernel: [4427848.536000] c0148e82
Jun 4 08:16:24 localhost kernel: [4427848.536000] *pde = 00000000
Jun 4 08:16:24 localhost kernel: [4427848.536000] Oops: 0002 [#1]
Jun 4 08:16:24 localhost kernel: [4427848.536000] PREEMPT
Jun 4 08:16:24 localhost kernel: [4427848.536000] Modules linked in: savage drm ipv6 video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery i2c_acpi_ec ac af_packet dm_mod md_mod sr_mod sbp2 scsi_mod lp snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq tsdev snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device i2c_prosavage i2c_algo_bit snd i2c_viapro i2c_core via_ircc irda shpchp pci_hotplug pcspkr soundcore serio_raw parport_pc parport 8139cp 8139too crc_ccitt mii psmouse floppy rtc via_agp agpgart evdev ext3 jbd ide_generic ohci1394 ieee1394 ehci_hcd uhci_hcd usbcore ide_cd cdrom ide_disk via82cxxx generic thermal processor fan capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
Jun 4 08:16:24 localhost kernel: [4427848.536000] CPU: 0
Jun 4 08:16:24 localhost kernel: [4427848.536000] EIP: 0060:[free_block+114/240] Not tainted VLI
Jun 4 08:16:24 localhost kernel: [4427848.536000] EFLAGS: 00010016 (2.6.15-23-386)
Jun 4 08:16:24 localhost kernel: [4427848.536000] EIP is at free_block+0x72/0xf0
Jun 4 08:16:24 localhost kernel: [4427848.536000] eax: 000029d6 ebx: d0715000 ecx: d07151a4 edx: e49cf000
Jun 4 08:16:24 localhost kernel: [4427848.536000] esi: dfffdbe0 edi: 00000008 ebp: dffffc00 esp: dfe57dd4
Jun 4 08:16:24 localhost kernel: [4427848.536000] ds: 007b es: 007b ss: 0068
Jun 4 08:16:24 localhost kernel: [4427848.536000] Process kswapd0 (pid: 115, threadinfo=dfe56000 task=dfe3e550)
Jun 4 08:16:24 localhost kernel: [4427848.536000] Stack: 0000003c 00000000 c3bdea2c dfff9610 c0148f45 dffffc00 dfff9610 0000003c
Jun 4 08:16:24 localhost kernel: [4427848.536000] 00000000 0000003c dfff9600 00000246 c3bdea2c dfe57f78 c014911b dffffc00
Jun 4 08:16:24 localhost kernel: [4427848.536000] dfff9600 dffffc00 c166dd60 00000001 c0164b6a dffffc00 c3bdea2c c3bdea2c
Jun 4 08:16:24 localhost kernel: [4427848.536000] Call Trace:
Jun 4 08:16:24 localhost kernel: [4427848.536000] [cache_flusharray+69/208] cache_flusharray+0x45/0xd0
Jun 4 08:16:24 localhost kernel: [4427848.536000] [kmem_cache_free+43/64] kmem_cache_free+0x2b/0x40
Jun 4 08:16:24 localhost kernel: [4427848.536000] [free_buffer_head+26/80] free_buffer_head+0x1a/0x50
Jun 4 08:16:24 localhost kernel: [4427848.536000] [try_to_free_buffers+89/160] try_to_free_buffers+0x59/0xa0
Jun 4 08:16:24 localhost kernel: [4427848.536000] [shrink_list+1015/1200] shrink_list+0x3f7/0x4b0
Jun 4 08:16:24 localhost kernel: [4427848.536000] [refill_inactive_zone+920/1152] refill_inactive_zone+0x398/0x480
Jun 4 08:16:24 localhost kernel: [4427848.536000] [shrink_cache+256/784] shrink_cache+0x100/0x310
Jun 4 08:16:24 localhost kernel: [4427848.536000] [shrink_zone+142/240] shrink_zone+0x8e/0xf0
Jun 4 08:16:24 localhost kernel: [4427848.536000] [balance_pgdat+652/1040] balance_pgdat+0x28c/0x410
Jun 4 08:16:24 localhost kernel: [4427848.536000] [kswapd+198/272] kswapd+0xc6/0x110
Jun 4 08:16:24 localhost kernel: [4427848.536000] [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Jun 4 08:16:24 localhost kernel: [4427848.536000] [kswapd+0/272] kswapd+0x0/0x110
Jun 4 08:16:24 localhost kernel: [4427848.536000] [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Jun 4 08:16:24 localhost kernel: [4427848.536000] Code: 24 18 8b 0c b8 8d 81 00 00 00 40 c1 e8 0c c1 e0 05 8b 15 30 ab 40 c0 8b 5c 10 1c 8b 44 24 20 8b 74 85 14 8b 13 8b 43 04 89 42 04 <89> 10 c7 03 00 01 10 00 c7 43 04 00 02 20 00 2b 4b 0c 89 c8 31
Jun 4 08:16:24 localhost kernel: [4427848.536000] <6>note: kswapd0[115] exited with preempt_count 1
Jun 4 08:16:24 localhost kernel: [4427848.551000] Unable to handle kernel paging request at virtual address 000029d6
Jun 4 08:16:24 localhost kernel: [4427848.551000] printing eip:
Jun 4 08:16:24 localhost kernel: [4427848.551000] c0148e82
Jun 4 08:16:24 localhost kernel: [4427848.551000] *pde = 00000000
Jun 4 08:16:24 localhost kernel: [4427848.551000] Oops: 0002 [#2]
Jun 4 08:16:24 localhost kernel: [4427848.551000] PREEMPT
Jun 4 08:16:24 localhost kernel: [4427848.551000] Modules linked in: savage drm ipv6 video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery i2c_acpi_ec ac af_packet dm_mod md_mod sr_mod sbp2 scsi_mod lp snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq tsdev snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device i2c_prosavage i2c_algo_bit snd i2c_viapro i2c_core via_ircc irda shpchp pci_hotplug pcspkr soundcore serio_raw parport_pc parport 8139cp 8139too crc_ccitt mii psmouse floppy rtc via_agp agpgart evdev ext3 jbd ide_generic ohci1394 ieee1394 ehci_hcd uhci_hcd usbcore ide_cd cdrom ide_disk via82cxxx generic thermal processor fan capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
Jun 4 08:16:24 localhost kernel: [4427848.551000] CPU: 0
Jun 4 08:16:24 localhost kernel: [4427848.551000] EIP: 0060:[free_block+114/240] Not tainted VLI
Jun 4 08:16:24 localhost kernel: [4427848.551000] EFLAGS: 00010016 (2.6.15-23-386)
Jun 4 08:16:24 localhost kernel: [4427848.551000] EIP is at free_block+0x72/0xf0
Jun 4 08:16:24 localhost kernel: [4427848.551000] eax: 000029d6 ebx: d0715000 ecx: d07151a4 edx: e49cf000
Jun 4 08:16:24 localhost kernel: [4427848.551000] esi: dfffdbe0 edi: 00000008 ebp: dffffc00 esp: f77d7b6c
Jun 4 08:16:24 localhost kernel: [4427848.551000] ds: 007b es: 007b ss: 0068
Jun 4 08:16:24 localhost kernel: [4427848.551000] Process rpc.nfsd (pid: 4250, threadinfo=f77d6000 task=c1adfa70)
Jun 4 08:16:24 localhost kernel: [4427848.551000] Stack: 0000003c 00000000 c3bde4e4 dfff9610 c0148f45 dffffc00 dfff9610 0000003c
Jun 4 08:16:24 localhost kernel: [4427848.551000] 00000000 0000003c dfff9600 00000246 c3bde4e4 f77d7d0c c014911b dffffc00
Jun 4 08:16:24 localhost kernel: [4427848.551000] dfff9600 dffffc00 c11873e0 00000001 c0164b6a dffffc00 c3bde4e4 c3bde4e4
Jun 4 08:16:24 localhost kernel: [4427848.551000] Call Trace:
Jun 4 08:16:24 localhost kernel: [4427848.551000] [cache_flusharray+69/208] cache_flusharray+0x45/0xd0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [kmem_cache_free+43/64] kmem_cache_free+0x2b/0x40
Jun 4 08:16:24 localhost kernel: [4427848.551000] [free_buffer_head+26/80] free_buffer_head+0x1a/0x50
Jun 4 08:16:24 localhost kernel: [4427848.551000] [try_to_free_buffers+89/160] try_to_free_buffers+0x59/0xa0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [shrink_list+1015/1200] shrink_list+0x3f7/0x4b0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [shrink_cache+673/784] shrink_cache+0x2a1/0x310
Jun 4 08:16:24 localhost kernel: [4427848.551000] [get_dirty_limits+23/304] get_dirty_limits+0x17/0x130
Jun 4 08:16:24 localhost kernel: [4427848.551000] [throttle_vm_writeout+53/112] throttle_vm_writeout+0x35/0x70
Jun 4 08:16:24 localhost kernel: [4427848.551000] [shrink_zone+142/240] shrink_zone+0x8e/0xf0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [shrink_caches+111/144] shrink_caches+0x6f/0x90
Jun 4 08:16:24 localhost kernel: [4427848.551000] [try_to_free_pages+175/480] try_to_free_pages+0xaf/0x1e0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [__alloc_pages+356/736] __alloc_pages+0x164/0x2e0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [generic_file_buffered_write+382/1712] generic_file_buffered_write+0x17e/0x6b0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [tcp_send_delayed_ack+143/320] tcp_send_delayed_ack+0x8f/0x140
Jun 4 08:16:24 localhost kernel: [4427848.551000] [tcp_rcv_established+1621/2080] tcp_rcv_established+0x655/0x820
Jun 4 08:16:24 localhost kernel: [4427848.551000] [current_fs_time+68/112] current_fs_time+0x44/0x70
Jun 4 08:16:24 localhost kernel: [4427848.551000] [__generic_file_aio_write_nolock+650/1216] __generic_file_aio_write_nolock+0x28a/0x4c0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [ip_local_deliver+277/624] ip_local_deliver+0x115/0x270
Jun 4 08:16:24 localhost kernel: [4427848.551000] [generic_file_aio_write+95/208] generic_file_aio_write+0x5f/0xd0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [pg0+945206410/1069368320] ext3_file_write+0x2a/0xb3 [ext3]
Jun 4 08:16:24 localhost kernel: [4427848.551000] [do_sync_write+186/288] do_sync_write+0xba/0x120
Jun 4 08:16:24 localhost kernel: [4427848.551000] [pg0+945680239/1069368320] rtl8139_poll+0x4f/0x100 [8139too]
Jun 4 08:16:24 localhost kernel: [4427848.551000] [__do_softirq+79/176] __do_softirq+0x4f/0xb0
Jun 4 08:16:24 localhost kernel: [4427848.551000] [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Jun 4 08:16:24 localhost kernel: [4427848.551000] [irq_exit+53/64] irq_exit+0x35/0x40
Jun 4 08:16:24 localhost kernel: [4427848.551000] [do_IRQ+31/48] do_IRQ+0x1f/0x30
Jun 4 08:16:24 localhost kernel: [4427848.551000] [common_interrupt+26/32] common_interrupt+0x1a/0x20
Jun 4 08:16:24 localhost kernel: [4427848.551000] [vfs_write+150/336] vfs_write+0x96/0x150
Jun 4 08:16:24 localhost kernel: [4427848.551000] [sys_write+56/128] sys_write+0x38/0x80
Jun 4 08:16:24 localhost kernel: [4427848.551000] [sysenter_past_esp+84/121] sysenter_past_esp+0x54/0x79
Jun 4 08:16:24 localhost kernel: [4427848.551000] Code: 24 18 8b 0c b8 8d 81 00 00 00 40 c1 e8 0c c1 e0 05 8b 15 30 ab 40 c0 8b 5c 10 1c 8b 44 24 20 8b 74 85 14 8b 13 8b 43 04 89 42 04 <89> 10 c7 03 00 01 10 00 c7 43 04 00 02 20 00 2b 4b 0c 89 c8 31
Jun 4 08:16:24 localhost kernel: [4427848.551000] <6>note: rpc.nfsd[4250] exited with preempt_count 1

Tollef Fog Heen (tfheen) wrote :

If it's an oops, it's probably a kernel problem. Reassigning.

Steffen Neumann (sneumann) wrote :

I can confirm such a bug, and it is not neccessarily related to high loads.
We have a dapper NFS server, and 4 dapper clients.

It seems to be introduced between 2.6.15-26.47 and 2.6.15-28.51.
With the -28.51 kernel, we have frequent (every 15-30 mins) lookups
on the clients. The kern.log on the server is attached.

I'd recommend to bump severity.

Yours,
Steffen

Steffen Neumann (sneumann) wrote :

I can confirm the problem on the -proposed 2.6.15-50 kernel.

Yours,
Steffen

Steffen Neumann (sneumann) wrote :

Hi [anybody listening here ?!]

Since we went back to 2.6.15-26.47 we had enourmous stability problems and the NFS server machine locked up hard every hour or so on medium load. No traces or logfiles, sorry.

We went back to the 2.6.15-26.46 and had no problems since.

Yours,
Steffen

I reported this bug ages ago. I no longer have the hardware to help
out. Sorry.

--m

Steffen Neumann wrote:
> Hi [anybody listening here ?!]
>
> Since we went back to 2.6.15-26.47 we had enourmous stability problems
> and the NFS server machine locked up hard every hour or so on medium
> load. No traces or logfiles, sorry.
>
> We went back to the 2.6.15-26.46 and had no problems since.
>
> Yours,
> Steffen
>
>

This bug has had no activity for a considerable period. This is a check to see if there is still interest in investigating this bug report.
I suspect this is now fixed.

Changed in linux-source-2.6.15:
status: New → Incomplete
Changed in nfs-user-server:
status: New → Incomplete
Steffen Neumann (sneumann) wrote :

On Fr, 2008-03-14 at 06:30 +0000, Gareth Fitzworthington wrote:
> This bug has had no activity for a considerable period. This is a check to see if there is still interest in investigating this bug report.
> I suspect this is now fixed.

It is not fixed, we "backported" a 2.6.22 server kernel
to avoid the problem.

Yours,
Steffen

--
IPB Halle AG Massenspektrometrie & Bioinformatik
Dr. Steffen Neumann http://www.IPB-Halle.DE
Weinberg 3 http://msbi.bic-gh.de
06120 Halle Tel. +49 (0) 345 5582 - 1470
                                  +49 (0) 345 5582 - 0
sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409

Can you examine Bug #58170 . It seems likely to be the same bug.
There appears to be a kernel race condition which occurs on kernels 2.6.15 & 2.6.16. Apparently fixed by 2.6.18 (maybe before). It occurs with nfs mounts.
This bug commentary includes a patch which is reported to have solved this for these kernels.
Perhaps you can comment & test if you are able.

I'm unsure yet why so few are affected by this race condition.

Connor Imes (ckimes) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering is this still an issue for you? Can you try with latest Ubuntu release? Thanks in advance.

Steffen Neumann (sneumann) wrote :

We have moved to a different setup, so we can't reproduce the problem with the newer release.
Yours,
Steffen

Connor Imes (ckimes) wrote :

Matt and Steffen can no longer reproduce this bug because they don't have the hardware or have moved on to other setups. I am closing this bug report. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in linux-source-2.6.15:
status: Incomplete → Invalid
Changed in nfs-user-server:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers