NFSv4 client losing authentication despite kinit

Bug #687418 reported by Tim Watts
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nfs4-acl-tools (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hi,

I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4 server is running Centos 5.5 and we use MIT kerberos and LDAP for users/groups.

All works fine, except after a while my client loses its authentication - symptom: home directory mount drops to "nobody".

This can happen anytime between 48 hours and 2 hours after a full client reboot.

When it happens, if I unmount my home dir, then remount the fault is cleared and I can work again.

What doesn't work is just doing a kinit -f

In fact, the fact that the problem can happen after just 2 hours suggests it is not a ticket timeout problem, though it looks like one.

I have run rpc.gssd in foreground debug mode and that doesn't say much during the problem times.

I have tried upping kernel debugging:

rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root callback client mount all

but I'm not sure what I'm looking for.

The symptoms feel like the kernel is losing the ticket or timing it out - is there any way to examing the state of the kernel ticket cache?

I am tempted to say this is a bug, but I would like to investigate further.

Any pointers much appreciated.

Cheers

Tim

PS

Just did it again!

OK - as me, in a terminal

(as root):
rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root callback client mount all
rpcdebug -m rpc xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache all

(as me in another terminal)
echo hello > wibble

dmesg lines that seem relevant:

[ 2957.276656] NFS: create(0:21/63365126), wibble
[ 2957.276658] RPC: looking up Generic cred
[ 2957.276665] RPC: new task initialized, procpid 2884
[ 2957.276668] RPC: allocated task ecfb7240
[ 2957.276670] RPC: 15071 __rpc_execute flags=0x81
[ 2957.276673] RPC: 15071 call_start nfs4 proc OPEN (async)
[ 2957.276676] RPC: 15071 call_reserve (status 0)
[ 2957.276679] RPC: 15071 reserved req eee2c660 xid 6c28409f
[ 2957.276681] RPC: 15071 call_reserveresult (status 0)
[ 2957.276683] RPC: 15071 call_allocate (status 0)
[ 2957.276686] RPC: 15071 allocated buffer of size 5140 at f6918000
[ 2957.276688] RPC: 15071 call_bind (status 0)
[ 2957.276690] RPC: 15071 call_connect xprt edc06800 is connected
[ 2957.276692] RPC: 15071 call_transmit (status 0)
[ 2957.276694] RPC: 15071 xprt_prepare_transmit
[ 2957.276697] RPC: 15071 rpc_xdr_encode (status 0)
[ 2957.276699] RPC: 15071 marshaling RPCSEC_GSS cred ed612000
[ 2957.276701] RPC: 15071 gss_marshal
[ 2957.276703] RPC: gss_krb5_seal
[ 2957.276717] RPC: krb5_encrypt returns 0
[ 2957.276720] RPC: krb5_encrypt returns 0
[ 2957.276723] RPC: 15071 using AUTH_GSS cred ed612000 to wrap rpc data
[ 2957.276725] RPC: 15071 gss_wrap_req
[ 2957.276727] encode_compound: tag=
[ 2957.276730] RPC: gss_krb5_seal
[ 2957.276735] RPC: krb5_encrypt returns 0
[ 2957.276737] RPC: krb5_encrypt returns 0
[ 2957.276739] RPC: 15071 gss_wrap_req returning 0
[ 2957.276741] RPC: 15071 xprt_transmit(328)
[ 2957.276772] RPC: xs_tcp_send_request(328) = 328
[ 2957.276774] RPC: 15071 xmit complete
[ 2957.276777] RPC: 15071 sleep_on(queue "xprt_pending" time 664319)
[ 2957.276779] RPC: 15071 added to queue edc069dc "xprt_pending"
[ 2957.276781] RPC: 15071 setting alarm for 60000 ms
[ 2957.276784] RPC: wake_up_next(edc06974 "xprt_resend")
[ 2957.276786] RPC: wake_up_next(edc0690c "xprt_sending")
[ 2957.277103] RPC: xs_tcp_data_ready...
[ 2957.277106] RPC: xs_tcp_data_recv started
[ 2957.277108] RPC: reading TCP record fragment of length 152
[ 2957.277110] RPC: reading XID (4 bytes)
[ 2957.277112] RPC: reading reply for XID 6c28409f
[ 2957.277114] RPC: reading CALL/REPLY flag (4 bytes)
[ 2957.277115] RPC: read reply XID 6c28409f
[ 2957.277118] RPC: XID 6c28409f read 144 bytes
[ 2957.277120] RPC: xprt = edc06800, tcp_copied = 152, tcp_offset = 152, tcp_reclen = 152
[ 2957.277122] RPC: 15071 xid 6c28409f complete (152 bytes received)
[ 2957.277125] RPC: 15071 __rpc_wake_up_task (now 664319)
[ 2957.277126] RPC: 15071 disabling timer
[ 2957.277129] RPC: 15071 removed from queue edc069dc "xprt_pending"
[ 2957.277133] RPC: __rpc_wake_up_task done
[ 2957.277134] RPC: xs_tcp_data_recv done
[ 2957.277141] RPC: 15071 __rpc_execute flags=0x81
[ 2957.277144] RPC: 15071 call_status (status 152)
[ 2957.277146] RPC: 15071 call_decode (status 152)
[ 2957.277148] RPC: 15071 validating RPCSEC_GSS cred ed612000
[ 2957.277150] RPC: 15071 gss_validate
[ 2957.277151] RPC: krb5_read_token
[ 2957.277157] RPC: krb5_encrypt returns 0
[ 2957.277158] RPC: krb5_get_seq_num:
[ 2957.277161] RPC: gss_k5decrypt returns 0
[ 2957.277163] RPC: 15071 gss_validate: gss_verify_mic succeeded.
[ 2957.277165] RPC: 15071 using AUTH_GSS cred ed612000 to unwrap rpc data
[ 2957.277167] RPC: krb5_read_token
[ 2957.277171] RPC: krb5_encrypt returns 0
[ 2957.277173] RPC: krb5_get_seq_num:
[ 2957.277175] RPC: gss_k5decrypt returns 0
[ 2957.277181] RPC: 15071 gss_unwrap_resp returning -13
[ 2957.277183] RPC: 15071 call_decode result -13
[ 2957.277186] RPC: 15071 return 0, status -13
[ 2957.277187] RPC: 15071 release task
[ 2957.277197] RPC: freeing buffer of size 5140 at f6918000
[ 2957.277200] RPC: 15071 release request eee2c660
[ 2957.277202] RPC: wake_up_next(edc06a44 "xprt_backlog")
[ 2957.277205] RPC: 15071 releasing RPCSEC_GSS cred ed612000
[ 2957.277207] RPC: rpc_release_client(f6dcef00)
[ 2957.277214] RPC: 15071 freeing task
[ 2957.277217] NFS: dentry_delete(tjw/wibble, 18)
[ 2957.277678] RPC: looking up Generic cred
[ 2957.277684] NFS: permission(0:21/63365126), mask=0x1, res=0
[ 2957.277688] NFS: nfs_lookup_revalidate(tjw/bin) is valid
[ 2957.277691] RPC: looking up Generic cred
[ 2957.277693] NFS: permission(0:21/63373722), mask=0x1, res=0
[ 2957.277697] NFS: nfs_lookup_revalidate(bin/printprompt) is valid
[ 2957.277702] RPC: looking up Generic cred
[ 2957.277704] NFS: permission(0:21/63373722), mask=0x1, res=0
[ 2957.277706] RPC: looking up Generic cred
[ 2957.277709] NFS: permission(0:21/63365126), mask=0x1, res=0
[ 2957.277712] NFS: nfs_lookup_revalidate(tjw/.coresettings) is valid
[ 2957.277714] RPC: looking up Generic cred
[ 2957.277716] NFS: permission(0:21/63414371), mask=0x1, res=0
[ 2957.277719] NFS: nfs_lookup_revalidate(.coresettings/bin) is valid
[ 2957.277721] RPC: looking up Generic cred
[ 2957.277724] NFS: permission(0:21/63414379), mask=0x1, res=0
[ 2957.277726] RPC: looking up Generic cred
[ 2957.277734] RPC: new task initialized, procpid 4423
[ 2957.277736] RPC: allocated task efad49c0
[ 2957.277739] RPC: 15072 __rpc_execute flags=0x81
[ 2957.277742] RPC: 15072 return 0, status 0
[ 2957.277743] RPC: 15072 release task
[ 2957.277746] RPC: 15072 releasing RPCSEC_GSS cred ed612000
[ 2957.277748] RPC: rpc_release_client(f6dcef00)
[ 2957.277757] RPC: 15072 freeing task

Revision history for this message
Tim Watts (tw-dionic) wrote :

I should add the mount options are:

-fstype=nfs4,proto=tcp,sec=krb5i,port=2049,hard,nosuid,nodev,intr,rsize=32768,wsize=32768

affects: ubuntu → nfs4-acl-tools (Ubuntu)
Revision history for this message
Tim Watts (tw-dionic) wrote :

With respect Fabio - are you sure this is an nfs4-acl-tools related issue?

We're not using ACLs on the filesystem in question. It seems to simply lose the mount authentication/authorisation as far as I can see - which suggests rpc.gssd or a kernel problem surely???

Cheers

Tim

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.