svc_getreqset doesn't work on 64bit hosts

Bug #89801 reported by Matt T. Proud on 2007-03-05
4
Affects Status Importance Assigned to Milestone
glibc (Debian)
Fix Released
Unknown
glibc (Ubuntu)
Undecided
Unassigned
Dapper
Medium
Martin Pitt

Bug Description

rpc.mountd will consume 100 percent CPU because of this bug on x86_64 systems.
Could a fix please be backported to Dapper Drake's 2.3.6 glibc?

Here's Debian's bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=413450

I'd reference the discussion that I had with Neil Brown on the NFS mailing lists if I could, but the server where the lists are archived appears to be unavailable.

Transcript of Conversation:
Hello,

I am curious why rpc.mountd would be consuming 100 percent CPU on a
mail server. I have noticed this periodically, and it does not appear
to matter how overloaded the server is with respect to mail clients.

I have started rpc.mountd with debugging enabled, and so far nothing
interesting appears in the logs. NFS functions perfectly on the
client-side.

Here's a brief description of the setup:

Mail resides on the mail server locally, and it is exported to
clients. The chief way for clients to access the mail is via IMAP, but
we do export the mail for the convenience of our power-users who care
to manually grep through mail data. Mail is stored in the maildir
format. Given this context, the amount of manual access that occurs
over NFS for this mail is miniscule, so I am surprised to see this
happening.

The mail server is running Ubuntu 6.06 with kernel 2.6.15-28-amd64-k8.
NFS is version 1.0.7-3ubuntu2. The mail data is stored on a RAID1
array on a tuned ext3 filesystem. We are considering switching to
RAID10 in the near future, even though we have not experienced many
I/O bottlenecks.

When rpc.mountd acts funny, an attached strace reveals the a select
function call occurring very rapidly:

select(1024, [3 4 5 6 7 32 33 34 35 36 37 38 39 40], NULL, NULL, NULL)
= 9 (in [32 33 34 35 36 37 38 39 40])

When rpc.mountd behaves normally a periodic select statement with
fewer file descriptors is called periodically.

We currently have the following options set:

options lockd nlm_udpport=32768 nlm_tcpport=32768

/var/maildir localhost(rw,sync,no_root_squash) \
                       @mail_clients(rw,sync) @nfs_server(rw,sync) \
                       xxxx(rw,sync) xxxxxxxx(rw,sync) \
                       xxxxxx(rw,sync,no_root_squash) xxxx.xxx.xxx.xxx(ro,sync)

Our clients have the following set:

maildir -fstype=nfs,_netdev,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,intr,nosuid,nodev,noacl,rw
   mail:/var/maildir

If the kernel server is restarted, rpc.mountd behaves normally for a time.

The server is multi-core, so the consumption is not a major problem;
but the fact that it is occurring is troubling.

Best,

Matt

On Sunday March 4, <email address hidden> wrote:
>
> When rpc.mountd acts funny, an attached strace reveals the a select
> function call occurring very rapidly:
>
> select(1024, [3 4 5 6 7 32 33 34 35 36 37 38 39 40], NULL, NULL, NULL)
> = 9 (in [32 33 34 35 36 37 38 39 40])

It would help if we could see matching "lsof -p xxx" output to see
exactly what those 9 fds are, but they are presumably tcp connections.

I cannot imagine why mountd would spin on something like that - if
select says that it can read from those fds, it should try to read...

It might be worth trying a newer nfs-utils. Just grab
1.0.12 from
  http://www.kernel.org/pub/linux/utils/nfs/

./configure --without-nfsv4 --without-gss
make

then kill mountd and run ./utils/mountd/mountd

See if that makes a difference.

NeilBrown

I'll give those new nfs-utils a try tomorrow.

Here's what lsof says:

root@mail:~# lsof -p 21766
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
rpc.mount 21766 root cwd DIR 253,4 91 25168840
/var/lib/nfs
rpc.mount 21766 root rtd DIR 253,0 4096 2 /
rpc.mount 21766 root txt REG 253,2 75272 16777397
/usr/sbin/rpc.mountd
rpc.mount 21766 root mem REG 0,0 0
[heap] (stat: No such file or directory)
rpc.mount 21766 root mem REG 253,0 96096 262157
/lib/ld-2.3.6.so
rpc.mount 21766 root mem REG 253,0 1272336 262202
/lib/libc-2.3.6.so
rpc.mount 21766 root mem REG 253,0 85984 262281
/lib/libnsl-2.3.6.so
rpc.mount 21766 root mem REG 253,0 35968 262613
/lib/libwrap.so.0.7.6
rpc.mount 21766 root mem REG 253,0 44056 262301
/lib/libnss_files-2.3.6.so
rpc.mount 21766 root mem REG 253,0 39928 262357
/lib/libnss_nis-2.3.6.so
rpc.mount 21766 root mem REG 253,4 217016 25169923
/var/db/nscd/hosts
rpc.mount 21766 root 0u CHR 1,3 6187 /dev/null
rpc.mount 21766 root 1u CHR 1,3 6187 /dev/null
rpc.mount 21766 root 2u CHR 1,3 6187 /dev/null
rpc.mount 21766 root 3u REG 0,3 0 4026532520
/proc/net/rpc/auth.unix.ip/channel
rpc.mount 21766 root 4u REG 0,3 0 4026532570
/proc/net/rpc/nfsd.export/channel
rpc.mount 21766 root 5u REG 0,3 0 4026532574
/proc/net/rpc/nfsd.fh/channel
rpc.mount 21766 root 6u IPv4 12238621 UDP
*:passwd_server
rpc.mount 21766 root 7u IPv4 12238636 TCP
*:764 (LISTEN)
rpc.mount 21766 root 9u unix 0xffff81005777ae40 12242159 socket
rpc.mount 21766 root 32u IPv4 12956690 TCP
mail.xxx.xxx.xxx:764->calvin.xxx.xxx.xxx:684 (CLOSE_WAIT)
rpc.mount 21766 root 33u IPv4 12956694 TCP
mail.xxx.xxx.xxx:764->cyan.xxx.xxx.xxx:709 (CLOSE_WAIT)
rpc.mount 21766 root 34u IPv4 12956695 TCP
mail.xxx.xxx.xxx:764->denim.xxx.xxx.xxx:763 (CLOSE_WAIT)
rpc.mount 21766 root 35u sock 0,5 12956696
can't identify protocol
rpc.mount 21766 root 36u IPv4 12956697 TCP
mail.xxx.xxx.xxx:764->peach.xxx.xxx.xxx:930 (CLOSE_WAIT)
rpc.mount 21766 root 37u IPv4 12956698 TCP
mail.xxx.xxx.xxx:764->orange.xxx.xxx.xxx:693 (CLOSE_WAIT)
rpc.mount 21766 root 38u IPv4 12956699 TCP
mail.xxx.xxx.xxx:764->app64.xxx.xxx.xxx:1007 (CLOSE_WAIT)
rpc.mount 21766 root 39u IPv4 12956701 TCP
mail.xxx.xxx.xxx:764->black.xxx.xxx.xxx:767 (CLOSE_WAIT)
rpc.mount 21766 root 40u IPv4 12956710 TCP
mail.xxx.xxx.xxx:764->green.xxx.xxx.xxx:887 (CLOSE_WAIT)

On Sunday March 4, <email address hidden> wrote:
> I'll give those new nfs-utils a try tomorrow.

No need thanks, I can duplicate the problem with current nfs-utils.

> rpc.mount 21766 root 32u IPv4 12956690 TCP
> mail.xxx.xxx.xxx:764->calvin.xxx.xxx.xxx:684 (CLOSE_WAIT)
> rpc.mount 21766 root 33u IPv4 12956694 TCP
> mail.xxx.xxx.xxx:764->cyan.xxx.xxx.xxx:709 (CLOSE_WAIT)
> rpc.mount 21766 root 34u IPv4 12956695 TCP
> mail.xxx.xxx.xxx:764->denim.xxx.xxx.xxx:763 (CLOSE_WAIT)
> rpc.mount 21766 root 35u sock 0,5 12956696
> can't identify protocol
> rpc.mount 21766 root 36u IPv4 12956697 TCP
> mail.xxx.xxx.xxx:764->peach.xxx.xxx.xxx:930 (CLOSE_WAIT)
> rpc.mount 21766 root 37u IPv4 12956698 TCP
> mail.xxx.xxx.xxx:764->orange.xxx.xxx.xxx:693 (CLOSE_WAIT)
> rpc.mount 21766 root 38u IPv4 12956699 TCP
> mail.xxx.xxx.xxx:764->app64.xxx.xxx.xxx:1007 (CLOSE_WAIT)
> rpc.mount 21766 root 39u IPv4 12956701 TCP
> mail.xxx.xxx.xxx:764->black.xxx.xxx.xxx:767 (CLOSE_WAIT)
> rpc.mount 21766 root 40u IPv4 12956710 TCP
> mail.xxx.xxx.xxx:764->green.xxx.xxx.xxx:887 (CLOSE_WAIT)
>

They are all in 'CLOSE_WAIT' meaning that the client has closed it's
end of the connection and mountd should close it's end.

The fact that they are all > 32 is suspicious. fds less than that are
getting closed, these aren't.

In fact, if I try to make lots of connections to mountd, it goes into
a spin as soon as anything happens on an fd >= 32, whether a read or a
close.

It looks like a glibc bug.
Looking at glibc-2.3.6 (which I happen to have lying around), in
sunrpc/svc.c we have

void
svc_getreqset (fd_set *readfds)
{
 register fd_mask mask;
 register fd_mask *maskp;
 register int setsize;
 register int sock;
 register int bit;

 setsize = _rpc_dtablesize ();
 if (setsize > FD_SETSIZE)
   setsize = FD_SETSIZE;
 maskp = readfds->fds_bits;
 for (sock = 0; sock < setsize; sock += NFDBITS)
   for (mask = *maskp++; (bit = ffs (mask)); mask ^= (1 << (bit - 1)))
     INTUSE(svc_getreq_common) (sock + bit - 1);
}

'fd_mask' is defined in sys/select.h to be 'long int', so we shouldn't
be calling 'ffs', we should be calling 'ffsl'.
Also, the
         mask ^= (1 << (bit - 1))
should be
         mask ^= (1L << (bit - 1))

If I make those two changes, stick that code in svc_run.c, rename it
to my_svc_getreqset and call that function instead of svc_getreqset,
it works fine.

Ho hum. Time to set a bug report off to glibc...

For now: Well, you could try finding a 32bit mountd and run that. Or
make the same change I did and compile something yourself.

NeilBrown

On Monday March 5, <email address hidden> wrote:
>
> Ho hum. Time to set a bug report off to glibc...

Looks like this has already been fixed.

2005-11-03 Roland McGrath <email address hidden>

       [BZ #1548]
       * sunrpc/svc.c (svc_getreqset): Use ffsl instead of ffs on fd_mask,
       make sure constant is long.
       From Jay Lan <email address hidden>.

So it is fixed in glibc 2.4 and later which SuSE and RH ship, but not
in 2.3.6 which Debian and Ubuntu ship.

I wonder how hard it would be to autodetect the libc version and
include a fix in nfs-utils if an old glibc were in use.....

NeilBrown

On Monday March 5, <email address hidden> wrote:
> On Monday March 5, <email address hidden> wrote:
> >
> > Ho hum. Time to set a bug report off to glibc...
>
> Looks like this has already been fixed.
>
> 2005-11-03 Roland McGrath <email address hidden>
>
> [BZ #1548]
> * sunrpc/svc.c (svc_getreqset): Use ffsl instead of ffs on fd_mask,
> make sure constant is long.
> From Jay Lan <email address hidden>.
>
> So it is fixed in glibc 2.4 and later which SuSE and RH ship, but not
> in 2.3.6 which Debian and Ubuntu ship.

Filed as bug 413450 with Debian.

>
> I wonder how hard it would be to autodetect the libc version and
> include a fix in nfs-utils if an old glibc were in use.....

I don't think I'll bother. As the bug doesn't stop mountd from
working, but just causes it to soak up CPU time, I'm going to decide
it isn't critical enough to warrant a gross hack, so my "official"
position is that glibc needs to be fixed.

Thanks for the bug report.

NeilBrown

Matthias Klose (doko) wrote :

opened dapper task, fixed in edgy and feisty.

Changed in glibc:
importance: Undecided → Medium
status: Unconfirmed → Confirmed
status: Unconfirmed → Fix Released
Matthias Klose (doko) wrote :

test packages are available at http://people.ubuntu.com/~doko/tmp/

Changed in glibc:
status: Unknown → Fix Released
Matthias Klose (doko) wrote :

proposing a SRU to fix #89801. although the system continues to work, the performance is degraded by using resources for one CPU. The bug is fixed in upstream glibc-2.4 (and edgy and feisty).

Martin Pitt (pitti) wrote :

This patch is small, obvious just by looking at it, just as I like it. :-P

Looks fine, please go ahead and upload.

Changed in glibc:
status: Confirmed → In Progress
Martin Pitt (pitti) wrote :

Accepted into dapper-proposed, please go ahead with QA testing.

Changed in glibc:
status: In Progress → Fix Committed
Michael Vogt (mvo) wrote :

Thanks for your update.

Please include instructions how to reproduce the bug. The policy says:
"Detailled instructions how to reproduce the bug.- These should allow someone who is not familiar with the affected package to reproduce the bug and verify that the updated package fixes the problem."

I will be happy to do the verification once I have this instructions.

Thanks,
 Michael

Matt T. Proud (matttproud) wrote :

Not another canned bureaucratic response. This is a case for pragmatism.

Reproduction would be somewhat difficult, as it involves a complete NFS client and server set-up. The problem is somewhat intermittent, too. It can manifest itself; other times not.

I have left the employer where this was a constant issue a couple of months ago for a job with another one. The new employer---I legally can't say whom---has about as rigorous QA standards as Debian, and we've backported the fix. Please take that as advice.

Thanks,

Matt

Martin Pitt (pitti) wrote :

Fix has been tested long enough in Feisty now, considering good enough for Dapper.

Changed in glibc:
assignee: nobody → pitti
Martin Pitt (pitti) wrote :

Copied to dapper-updates.

Changed in glibc:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Patches

Remote bug watches

Bug watches keep track of this bug in other bug trackers.