Ubuntu
nfs-utils package

rpc.gssd exits immediately

Bug #925364 reported by David Ambrose-Griffith on 2012-02-02

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	nfs-utils (Ubuntu)	Triaged	Low	Unassigned

Bug Description

Release: Precise Pangolin 12.04 Alpha
Version: 1.2.5-3ubuntu1 (nfs-common)

Starting rpc.gssd initially appears to launch ok

root@cisadl-tst:~# status gssd
gssd stop/waiting
root@cisadl-tst:~# start gssd
gssd start/pre-start, process 3624
root@cisadl-tst:~# status gssd
gssd stop/waiting
root@cisadl-tst:~#

Running rpc.gssd in the foreground manually also seems to exit immediately with status code 1

root@cisadl-tst:~# rpc.gssd -fvvv ; echo $?
beginning poll
1

Expected behaviour: rpc.gssd starts and allows Kerberized NFS to work
Observed behaviour: rpc.gssd exits immediately and Kerberized NFS fails.

I have attached the output of strace -f against rpc.gssd if it is of any help.

Best regards,

David Ambrose-Griffith
Technical Specialist (Unix/Linux/Storage)
Computing and Information Services
Durham University

Revision history for this message

David Ambrose-Griffith (d-e-ambrose-griffith) wrote on 2012-02-02:

strace of rpc.gssd -fvvv Edit (27.0 KiB, text/plain)

Steve Langasek (vorlon) on 2012-02-02

Changed in nfs-utils (Ubuntu):
importance:	Undecided → High

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-02-02:

Hi David,

That's a pretty serious problem, but I can't reproduce it here. I have gssd running normally on precise. Is there any possibility that your system libraries, or rpc.gssd itself, are corrupted? (Running 'debsums -s' should check this.) I don't see any code paths at all in gssd that would result in an exit without printing a further error message about what's going on.

Changed in nfs-utils (Ubuntu):
status:	New → Incomplete

Revision history for this message

David Ambrose-Griffith (d-e-ambrose-griffith) wrote on 2012-02-02:

I've run debsums, and nothing comes to light there.

I've also downloaded the bzr branch and compiled a new copy, which exhibits the same symptoms.

Adding in some debugging lines, there seems to be something odd going on around line 216 on gssd_main_loop.c where it tests for something and exits with status code 1 if it is true

Changing that to..

        printerr(1, "beginning poll\n");
        while (1) {
                while (dir_changed) {
                        dir_changed = 0;
                        if (update_client_list()) {
                                /* Error msg is already printed */
                                printerr(1, "DAG Test exit\n");
                                exit(1);
                        }

causes DAG Test exit to be printed.

Looking back at the update_client_list() routine around line 588 in gssd_proc.c and adding in a debug line like so

/* Used to read (and re-read) list of clients, set up poll array. */
int
update_client_list(void)
{
int retval = -1;
struct topdirs_info *tdi;

        TAILQ_FOREACH(tdi, &topdirs_list, list) {
                retval = process_pipedir(tdi->dirname);
                if (retval)
                        printerr(1, "WARNING: error processing %s\n",
                                 tdi->dirname);

        }
        printerr(1, "DAG WARNING: retval=%d\n", retval);
        return retval;
}

root@cisadl-tst:~/nfs-utils/utils/gssd# ./gssd -fvvv
beginning poll
DAG WARNING: retval=-1
DAG Test exit

Any further ideas?
causes the following when run.

Revision history for this message

David Ambrose-Griffith (d-e-ambrose-griffith) wrote on 2012-02-02:

Further digging, and it appears rpc_pipefs wasn't mounted (oops), so the topdir list was empty and the foreach was thus iterating over an empty list and returning -1 triggering the exit(1);

Perhaps we need an error in here, rather than just falling through to exit(1);

Best regards,

David

Revision history for this message

Steve Langasek (vorlon) wrote on 2012-02-02:

Ah, ok; this makes the lack of sensible error message a lower-priority issue then.

This doesn't explain why the upstart job fails, however, since the preinst script explicitly takes care of mounting rpc_pipefs. Once rpc_pipefs is mounted, does rpc.gssd -fvvv succeed? Does the upstart job still fail?

Changed in nfs-utils (Ubuntu):
status:	Incomplete → Triaged
importance:	High → Low

Revision history for this message

David Ambrose-Griffith (d-e-ambrose-griffith) wrote on 2012-02-03:

Once rpc_pipefs is manually mounted, then yes, rpc.gssd -fvvv works as expected and I've managed to get an NFSv4 with Kerberos mount against our NetApp filer.

On boot however I'm getting...

Starting NFSv4 id <-> name mapper [ OK ]

repeatedly, with things like Automounter starting in and around it

Finally...
Starting NFSv4 id <-> name mapper [ Fail ]

*Then* it appears to start the network, and waits an additional 60 seconds for the network (the machine in question is just connected to eth0 and gets its address from an Infoblox DHCP appliance)

Once booted /run/rpc_pipefs isn't mounted (but can be mounted manually) and idmapd (and gssd) are both not running.

Networking is up once booted, I don't know exactly what order things are coming up however.

rpc.idmapd -fvvv also requires the "-p /run/rpc_pipefs" argument which I can't spot in the /etc/init/idmapd.conf file, but perhaps this isn't anything relevant.

I'll try and reproduce these issues on a clean re-install of this machine next.

Best regards,

David

Revision history for this message

Ken Pratt (kenpratt) wrote on 2013-11-02:

I don't user Kerberos. However, If I don't want a long wait when mounting NFSv4 shares, I must enable this on the client. It results in:

Nov 2 13:31:50 poogoo rpc.gssd[2342]: ERROR: gssd_refresh_krb5_machine_credential: no usable keytab entry found in keytab /etc/krb5.keytab for connection with host fit.thepratts.info
Nov 2 13:31:50 poogoo rpc.gssd[2342]: ERROR: No credentials found for connection to server fit.thepratts.info

But the mount occurs very quickly and seems to be ok.

There should be a way on the client side to state that Kerberos is not being used.

Revision history for this message

Ken Pratt (kenpratt) wrote on 2013-11-02:

Oops - the previous comment is incorrect. The top level directories are mounted but contain no files or subdirectories.

Revision history for this message

Steve Langasek (vorlon) wrote on 2013-11-02:

Ken, you don't say what it is you are enabling on the client. But I don't think the problem you're experiencing is related to this bug. Please file a separate bug report for your issue.