NFS load locks processes and mounts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I have multiple Onstor NFS NASs with multiple Ubuntu clients. The clients are all LTS releases (Dapper, Hardy and now Lucid). The clients are running autofs to mount the exports from the NAS. On my new Lucid machine, high NFS traffic generated by the client makes the mounts lock up.
- Processes doing NFS file operations lock up (State D). Can't SIGTERM, SIGHUP, SIGKILL the affected process.
- The affected mount can't be umounted unless I use umount -fl /mountpoint
- Once the mount is killed I can SIGKILL the processes.
- The process performing IO on the affected mount never dies or exits to shell. (zombies)
- Other processes performing IO on other mounts exit normally once the affected mount is unmounted
- Other clients are unaffected by this condition
- Once a mount is affected by this condition it is no longer mountable by either autofs or manual mount
- Other mounts on that same NAS are no longer mountable as well.
- I can ping and showmount -e and the affected NAS
- Have to echo b > /proc/sysrq-trigger in order to reboot.
This happens on every Lucid server kernel I've tried (even the current mainline kernel).
To reproduce:
1. Setup automounter on 4 different shares on 4 different servers.
2. cd in to those 4 shares and run iozone -a
3. wait for the processes to stop.
/proc/mounts:
vsvr-4.nfs:/slow08 /mnt/slow/vol08 nfs rw,nosuid,
/etc/auto.slow:
vol08 -rw,nosuid,
/etc/auto.master:
/mnt/slow /etc/auto.slow -nosuid
showmount -e vsvr-5.nfs
Export list for vsvr-5.nfs:
/slow11 *
/slow15 *
rpcinfo -p vsvr-5.nfs
program vers proto port
100000 2 udp 111 portmapper
100000 2 tcp 111 portmapper
100003 2 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 udp 2049 nfs
100003 3 tcp 2049 nfs
100005 1 udp 2087 mountd
100005 1 tcp 2087 mountd
100005 2 udp 2087 mountd
100005 2 tcp 2087 mountd
100005 3 udp 2087 mountd
100005 3 tcp 2087 mountd
100021 1 udp 2090 nlockmgr
100021 1 tcp 2090 nlockmgr
100021 3 udp 2090 nlockmgr
100021 3 tcp 2090 nlockmgr
100021 4 udp 2090 nlockmgr
100021 4 tcp 2090 nlockmgr
100024 1 udp 2092 status
100024 1 tcp 2092 status
100333 1 udp 2049
100333 1 tcp 2049
100333 2 udp 2049
100333 2 tcp 2049
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-
Regression: No
Reproducible: Yes
ProcVersionSign
Uname: Linux 2.6.32-26-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Fri Dec 10 00:14:16 2010
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:
ProcCmdLine: BOOT_IMAGE=
ProcEnviron:
LANGUAGE=en
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 03/19/2009
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.
dmi.modalias: dmi:bvnPhoenixT
dmi.product.name: VMware Virtual Platform
dmi.product.
dmi.sys.vendor: VMware, Inc.
tags: | added: kj-triage |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
tags: | added: lucid |
Anyone? This bug was submitted a month ago.