Ubuntu

Kerberos + LDAP + NFSv4 - Unable to recover unattended client

Reported by Brian the Lion on 2011-06-07
126
This bug affects 17 people
Affects Status Importance Assigned to Milestone
Kerberos
New
Undecided
Unassigned
NFS-Utils
New
Undecided
Unassigned
nfs-utils (Debian)
New
Unknown
nfs-utils (Ubuntu)
High
Adam Stokes
Precise
High
Unassigned

Bug Description

[Impact]
Those who heavily rely on kerberized mounted home directories

[Test Case]
Hi there!

I've configured a Natty client/server pair to authenticate over Kerberos and LDAP and to mount user home directories via NFSv4 with sec=krb5. I am using a slight variation on the configuration described here: http://www.danbishop.org/2011/05/01/ubuntu-11-04-sbs-small-business-server-setup-part-3-openldap/

Under this setup, user sessions that are left unattended for a long period of time -- eg, when someone goes home for the night but stays logged in -- always result in a wedged machine. What do I mean by "wedged?" When the user returns to their session (the next morning), the screen is sorta grayed out. Keystrokes and mouse movement fail to elicit a reaction from the OS. I can switch to an ANSI terminal (Ctrl+Alt+F1), but cannot log in as the offending user there; the prompt will accept a username and password but never return. I CAN login using my localadmin, presumably because it uses UNIX authentication rather than LDAP/Kerberos. I have heretofore been unable to recover the machine as the localadmin, though. If localadmin attempts to sudo reboot the machine, the reboot process starts but never finishes.

[Regression Potentional]
Seems minimal as we are adding an additional condition check for expired tickets.

[More info]

Some odd things in the server syslog:

Jun 6 07:40:15 server krb5kdc[822]: AS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.0.59: NEEDED_PREAUTH: <email address hidden> for <email address hidden>, Additional pre-authentication required
Jun 6 07:40:15 server krb5kdc[822]: AS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.0.59: ISSUE: authtime 1307360415, etypes {rep=18 tkt=18 ses=18}, <email address hidden> for <email address hidden>
Jun 6 07:40:15 server krb5kdc[822]: TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.0.59: ISSUE: authtime 1307360415, etypes {rep=18 tkt=18 ses=18}, <email address hidden> for <email address hidden>
Jun 6 07:40:15 server krb5kdc[822]: TGS_REQ (3 etypes {1 3 2}) 192.168.0.59: ISSUE: authtime 1307360415, etypes {rep=18 tkt=18 ses=1}, <email address hidden> for <email address hidden>
Jun 6 07:40:15 server nslcd[950]: [92ef4c] nslcd_passwd_byname(nfs/carina.co57.lan): invalid user name
Jun 6 07:46:49 server slapd[836]: <= bdb_equality_candidates: (uid) not indexed
Jun 6 07:46:49 server slapd[836]: <= bdb_equality_candidates: (cn) not indexed
Jun 6 07:48:51 server slapd[836]: <= bdb_equality_candidates: (uidNumber) not indexed
Jun 6 07:49:20 server slapd[836]: <= bdb_equality_candidates: (uid) not indexed
Jun 6 07:57:07 server slapd[836]: <= bdb_equality_candidates: (uid) not indexed
Jun 6 07:57:07 server slapd[836]: <= bdb_equality_candidates: (cn) not indexed
Jun 6 07:59:35 server slapd[836]: <= bdb_equality_candidates: (uid) not indexed
Jun 6 08:00:00 server slapd[836]: <= bdb_equality_candidates: (cn) not indexed
Jun 6 08:00:01 server slapd[836]: last message repeated 3 times

And from all over the client syslog:

Jun 6 10:53:28 carina kernel: [47636.670075] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:33 carina kernel: [47641.666533] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:38 carina kernel: [47646.662437] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:43 carina kernel: [47651.658844] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:48 carina kernel: [47656.655152] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:53 carina kernel: [47661.651498] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:53:58 carina kernel: [47666.647829] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:03 carina kernel: [47671.644084] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:08 carina kernel: [47676.640219] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:13 carina kernel: [47681.636699] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:18 carina kernel: [47686.632981] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:23 carina kernel: [47691.629134] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:28 carina kernel: [47696.625429] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:33 carina kernel: [47701.621717] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:38 carina kernel: [47706.617861] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:43 carina kernel: [47711.614235] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:48 carina kernel: [47716.610530] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 6 10:54:53 carina kernel: [47721.606813] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.

My intuition is the following: The user's client-side Kerberos ticket is expiring (RPCSEC_GSS errors) and the sec=krb5 on NFS is sitting in a poll loop, waiting for a new one. This is somehow causing the rest of the system to grind to a halt, whether through resource usage or blocking in the kernel. I will continue to investigate and post evidence as I come by it. In the meantime, does anybody have any ideas?

Cheers!
~Brian

tags: added: kerberos krb5 ldap nfs
Shimi Chen (shimi-chen) on 2011-06-08
affects: ubuntu → libauthen-simple-kerberos-perl (Ubuntu)
Ansgar Burchardt (aburch) wrote :

I don't see why this should be related to libauthen-simple-kerberos-perl.

affects: libauthen-simple-kerberos-perl (Ubuntu) → ubuntu
Brian the Lion (rossabri) wrote :

Bump? This problem is making my life miserable.

Brian the Lion (rossabri) wrote :

Folks on #kerberos are saying that this bug is due to a version mismatch between the kernel and nfs-utils.

description: updated
Brian the Lion (rossabri) wrote :

I'm super keen to try debugging this myself -- you can even assign me the bug -- if somebody will give me a little direction. Cheers!

Steve Langasek (vorlon) wrote :

If #kerberos thinks it's a kernel/nfs-utils version mismatch, have you tried testing with the version combination they recommend?

Your bug report includes no information about what versions of anything you're running. Please run 'apport-collect 794112'.

Brian the Lion (rossabri) wrote :

@Steve: I have not. What would the procedure for that look like? Purge the existing nfs-utils deb, and then build and install nfs-utils from source? Is there anything I can do to further pinpoint the problem before I try that?

Brian the Lion (rossabri) wrote :

Another theory: nslcd is trying to refresh the client's kerberos ticket via LDAP. It is failing because, unlike the user principles, the nfs principles do not have LDAP entries. Should they? Or is there a way to tell the nfs clients to not use LDAP?

On Sat, Jun 25, 2011 at 09:15:42PM -0000, Brian the Lion wrote:
> @Steve: I have not. What would the procedure for that look like? Purge
> the existing nfs-utils deb, and then build and install nfs-utils from
> source?

Yes, that would work.

> Is there anything I can do to further pinpoint the problem before I try
> that?

Not that I know of.

On Sat, Jun 25, 2011 at 11:54:37PM -0000, Brian the Lion wrote:
> Another theory: nslcd is trying to refresh the client's kerberos ticket
> via LDAP. It is failing because, unlike the user principles, the nfs
> principles do not have LDAP entries. Should they? Or is there a way to
> tell the nfs clients to not use LDAP?

I have no idea how this would work... I would say that if nslcd can get
*any* kerberos tickets via LDAP, that's a misconfiguration of the directory,
since that would bypass the Kerberos security model. NFS clients are
certainly not "using LDAP" to get kerberos tickets, anyway.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Observation: rpciod, nfsiod, and nfsv4.0-svc do not respond to kill -9 under these conditions.

Brian the Lion (rossabri) wrote :

The client OS appears to be wedging at precisely the time of a DHCP refresh. I came in this morning at 10:00am and found my desktop wedged with the clock stuck at 6:01am. From the syslog:

Jun 29 06:01:04 carina kernel: [70343.412331] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 29 06:01:09 carina kernel: [70348.408657] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 29 06:01:14 carina dhclient: DHCPREQUEST of 192.168.0.59 on eth0 to 192.168.0.2 port 67
Jun 29 06:01:14 carina dhclient: DHCPACK of 192.168.0.59 from 192.168.0.2
Jun 29 06:01:14 carina dhclient: bound to 192.168.0.59 -- renewal in 15298 seconds.
Jun 29 06:01:14 carina kernel: [70353.404947] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 29 06:01:19 carina kernel: [70358.401192] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.
Jun 29 06:01:24 carina kernel: [70363.397718] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server 192.168.0.2.

Brian the Lion (rossabri) wrote :

Again today, the client wedged at the same time as the DHCP refresh. The client's IP did not change. Any thoughts on what could be going on here?

This bug seems to strike me as well, but without LDAP being involved. After migrating from 10.04 to 11.04 the same setup (kerberos, NFS4) leads to frozen machines in the morning.

In my case the bug arose pretty exactly 10m before a DHCP request. Don't know if there are linked:

Jul 1 04:17:01 pcandreas2 CRON[8863]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 1 04:18:42 pcandreas2 kernel: [76984.004995] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server someserver.
Jul 1 04:18:47 pcandreas2 kernel: [76989.015060] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server someserver.
[...]
Jul 1 04:28:33 pcandreas2 kernel: [77575.185114] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server someserver.
Jul 1 04:28:37 pcandreas2 dhclient: DHCPREQUEST of 10.0.1.42 on eth0 to 10.0.0.2 port 67
Jul 1 04:28:37 pcandreas2 dhclient: DHCPACK of 10.0.1.42 from 10.0.0.2
Jul 1 04:28:37 pcandreas2 dhclient: bound to 10.0.1.42 -- renewal in 37405 seconds.
Jul 1 04:28:38 pcandreas2 kernel: [77580.194966] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server someserver.
[...]

Brian the Lion (rossabri) wrote :
Download full text (4.1 KiB)

Some of my blocked processes are starting to generate stack traces from the kernel:

Jul 1 08:11:43 carina kernel: [36142.699465] INFO: task chrome:2165 blocked for more than 120 seconds.
Jul 1 08:11:43 carina kernel: [36142.699469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 1 08:11:43 carina kernel: [36142.699472] chrome D 0000000000000004 0 2165 1 0x00000000
Jul 1 08:11:43 carina kernel: [36142.699477] ffff8804005f5e48 0000000000000086 ffff8804005f5fd8 ffff8804005f4000
Jul 1 08:11:43 carina kernel: [36142.699482] 0000000000013d00 ffff88040dfa4858 ffff8804005f5fd8 0000000000013d00
Jul 1 08:11:43 carina kernel: [36142.699486] ffff88041f982dc0 ffff88040dfa44a0 0000000c00000001 ffff88040dfa44a0
Jul 1 08:11:43 carina kernel: [36142.699491] Call Trace:
Jul 1 08:11:43 carina kernel: [36142.699501] [<ffffffff815c2a1d>] rwsem_down_failed_common+0xcd/0x170
Jul 1 08:11:43 carina kernel: [36142.699505] [<ffffffff815c2ad3>] rwsem_down_write_failed+0x13/0x20
Jul 1 08:11:43 carina kernel: [36142.699511] [<ffffffff812e6ac3>] call_rwsem_down_write_failed+0x13/0x20
Jul 1 08:11:43 carina kernel: [36142.699515] [<ffffffff815c1dd2>] ? down_write+0x32/0x40
Jul 1 08:11:43 carina kernel: [36142.699521] [<ffffffff8126ebf0>] sys_shmdt+0x60/0x180
Jul 1 08:11:43 carina kernel: [36142.699526] [<ffffffff8100c002>] system_call_fastpath+0x16/0x1b
Jul 1 08:11:43 carina kernel: [36142.699530] INFO: task chrome:2182 blocked for more than 120 seconds.
Jul 1 08:11:43 carina kernel: [36142.699532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 1 08:11:43 carina kernel: [36142.699534] chrome D 0000000000000000 0 2182 1 0x00000000
Jul 1 08:11:43 carina kernel: [36142.699538] ffff8804029f5a98 0000000000000086 ffff8804029f5fd8 ffff8804029f4000
Jul 1 08:11:43 carina kernel: [36142.699542] 0000000000013d00 ffff88041030df38 ffff8804029f5fd8 0000000000013d00
Jul 1 08:11:43 carina kernel: [36142.699547] ffffffff81a0b020 ffff88041030db80 ffff8800bf7542a8 ffff8800bf413d00
Jul 1 08:11:43 carina kernel: [36142.699551] Call Trace:
Jul 1 08:11:43 carina kernel: [36142.699567] [<ffffffffa0b666b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 1 08:11:43 carina kernel: [36142.699571] [<ffffffff815c0980>] io_schedule+0x70/0xc0
Jul 1 08:11:43 carina kernel: [36142.699582] [<ffffffffa0b666be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 1 08:11:43 carina kernel: [36142.699586] [<ffffffff815c12ff>] __wait_on_bit+0x5f/0x90
Jul 1 08:11:43 carina kernel: [36142.699597] [<ffffffffa0b666b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 1 08:11:43 carina kernel: [36142.699601] [<ffffffff815c13ac>] out_of_line_wait_on_bit+0x7c/0x90
Jul 1 08:11:43 carina kernel: [36142.699606] [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
Jul 1 08:11:43 carina kernel: [36142.699616] [<ffffffffa0b66a66>] nfs_wait_on_request+0x36/0x40 [nfs]
Jul 1 08:11:43 carina kernel: [36142.699628] [<ffffffffa0b6c933>] nfs_try_to_update_request+0x83/0x160 [nfs]
Jul 1 08:11:43 carina kernel: [36142.699640] [<ffffffffa0b6ca4d>] nfs_writepage_setup+0x3d/0x1e0 [nfs]
Jul 1...

Read more...

Changed in ubuntu:
status: New → Confirmed
Timo Aaltonen (tjaalton) wrote :
affects: ubuntu → nfs-utils (Ubuntu)
Changed in nfs-utils (Ubuntu):
importance: Undecided → High
Changed in nfs-utils (Debian):
status: Unknown → New
tags: added: rls-mgr-p-tracking
Chris J Arges (arges) wrote :

This could be related to this thread (thanks to Sachin):
http://thread.gmane.org/gmane.linux.nfs/47940/focus=47947

I have built a kernel with a cherry picked patch from e49a29bd0eacce9d4956c4daf777a330115b369d, which is the upstream commit of this patch.

Please see if my Precise kernel build fixes the issue, you can download the files at:
http://people.canonical.com/~arges/lp794112/

Thanks,

Changed in linux:
assignee: nobody → Chris J Arges (christopherarges)
Steve Langasek (vorlon) wrote :

Could someone test the kernel image Chris posted?

affects: nfs-utils (Ubuntu Precise) → linux (Ubuntu Precise)
Changed in linux (Ubuntu Precise):
status: Confirmed → Incomplete
Shawn Haggett (podge-9) wrote :

I installed the kernel build posted by Chris in #15, logged in as an LDAP user with KRB5 auth and a kerberised nfs4 home directory, then left the machine unattended for >24 hours. Under these conditions the X session would be locked up when I returned to the machine and the nfs mount inaccessable (if switching to a virtual console and logging in as a local user).

This kernel seems to have mostly fixed things. This time I left the machine running with two terminals open, one showing the output of klist, so I could see when the ticket expired, the other running 'watch date'. I found the x session frozen a few seconds after the ticket expired. However, switching to a virtual console, I could log in as a local user and still access the nfs mount fine. There appeared to be now errors in the syslog (either now or around the time the ticket expired). Then still at the virtual console, I ssh'ed into this same box but as the ldap user, and logged in fine. When I then switched back to the X session it had unfrozen, although both terminal windows were gone and replaced with an error message telling me it had crashed. I'm not sure what caused the hung X session (or if it's still related to this bug), but the NFS mount seems to be handling ticket expiration better now. I should also mention that this machine does have machine credentials in Kerberos as well.

Steve Atwell (satwell) wrote :

This same problem applies to kerberized NFSv3 as well as NFSv4. In both cases, the kernel will keep retrying if rpc.gssd only finds expired credentials. I've been investigating this problem because after a Lucid to Precise upgrade, users with kerberized NFS homedirs are unable to unlock their screens.

Back in Jan 2010, rpc.gssd got support for returning EKEYEXPIRED:
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commit;h=289ad31e

And around the same time, the kernel was changed to retry on EKEYEXPIRED:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=2c643488 (NFSv4)
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=b68d69b8 (NFSv3)

So it looks like this is intended behavior, but it leaves users with kerberized NFS home directories in a really bad situation. There have been some proposed patches both here and in the linked Debian bug against nfs-utils, but so far it doesn't look like any have been accepted upstream.

Ingar Smedstad (ingsme) wrote :

We use sssd and had the same problem until I set krb5_renew_interval in the sssd.conf. After that we have had no problems.

The Kernel posted by Chris allows, (with console login), the user to unlock the screensaver but applications, such like web browser, remains stuck and the session has to be restarted in order to work properly.

Dominic Gross (domgross) wrote :

> The Kernel posted by Chris allows, (with console login), the user to unlock the
> screensaver

Well, this seems to fix the original bug reported here. Which is that nobody can log in using LDAP / Kerberos once a ticket of one signed in user expired.

> but applications, such like web browser, remains stuck and the session has to
> be restarted in order to work properly.

This looks like the intended behavior to me. The user's Kerberos Ticket expires some time after log in. At that point the applications can no longer access the user's NFS home directory and the applications get stuck or crash. Once a user enters his / her password again a new ticket is granted and the user can log into the session /access the home directory again. However, in my experience few applications fully recover from not being able to access the home directory for a longer time.

So, it seems to me, that in order to fix this remaining issue one needs to set up something to automatically renew Kerberos Tickets. This can be implemented either via a cronjob or packages like kstart or sssd.

Le 1 juil. 2012 à 17:14, Dominic Gross a écrit :

>> The Kernel posted by Chris allows, (with console login), the user to unlock the
>> screensaver
>
> Well, this seems to fix the original bug reported here. Which is that
> nobody can log in using LDAP / Kerberos once a ticket of one signed in
> user expired.

yes it is.

>
>> but applications, such like web browser, remains stuck and the session has to
>> be restarted in order to work properly.
>
> This looks like the intended behavior to me. The user's Kerberos Ticket
> expires some time after log in. At that point the applications can no
> longer access the user's NFS home directory and the applications get
> stuck or crash. Once a user enters his / her password again a new ticket
> is granted and the user can log into the session /access the home
> directory again. However, in my experience few applications fully
> recover from not being able to access the home directory for a longer
> time.

It wasn't the behaviour before rpc.gssd returns EKEYEXPIRED. Ce filesystem was fully accessible to the users apps even if they got stuck for days … It seems correct to me that the filesystem remains unaccessible until the user unlock the screensaver … for obvious security purpose (implementing an auto refresh, just like you said, seems to me like a security breach). However, it would be nice to have a way to get the former behaviour which allows user to get back his session without relogging and ,at the same time, don't give system access to the user FS even when the user is gone away.

>
> So, it seems to me, that in order to fix this remaining issue one needs
> to set up something to automatically renew Kerberos Tickets. This can be
> implemented either via a cronjob or packages like kstart or sssd.
>

    Christophe Ségui
   Responsable
   informatique
Institut de Mathématiques de Toulouse
Université de Toulouse - CNRS
118 Route de Narbonne
31062 Toulouse Cedex 09

Tel : (+33) 5 61 55 63 78
<email address hidden>
http://www.math.univ-toulouse.fr

Download full text (8.1 KiB)

Here seems to be the kernel patch we're expecting: http://www.spinics.net/lists/linux-nfs/msg31197.html

Regards
Le 1 juil. 2012 à 17:14, Dominic Gross a écrit :

>> The Kernel posted by Chris allows, (with console login), the user to unlock the
>> screensaver
>
> Well, this seems to fix the original bug reported here. Which is that
> nobody can log in using LDAP / Kerberos once a ticket of one signed in
> user expired.
>
>> but applications, such like web browser, remains stuck and the session has to
>> be restarted in order to work properly.
>
> This looks like the intended behavior to me. The user's Kerberos Ticket
> expires some time after log in. At that point the applications can no
> longer access the user's NFS home directory and the applications get
> stuck or crash. Once a user enters his / her password again a new ticket
> is granted and the user can log into the session /access the home
> directory again. However, in my experience few applications fully
> recover from not being able to access the home directory for a longer
> time.
>
> So, it seems to me, that in order to fix this remaining issue one needs
> to set up something to automatically renew Kerberos Tickets. This can be
> implemented either via a cronjob or packages like kstart or sssd.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/794112
>
> Title:
> Kerberos + LDAP + NFSv4 on Natty - Unable to recover unattended client
>
> Status in Network Authentication System:
> New
> Status in The Linux Kernel:
> New
> Status in NFS-Utils - NFS support files common to client and server:
> New
> Status in “linux” package in Ubuntu:
> Incomplete
> Status in “linux” source package in Precise:
> Incomplete
> Status in “nfs-utils” package in Debian:
> New
>
> Bug description:
> Hi there!
>
> I've configured a Natty client/server pair to authenticate over
> Kerberos and LDAP and to mount user home directories via NFSv4 with
> sec=krb5. I am using a slight variation on the configuration described
> here: http://www.danbishop.org/2011/05/01/ubuntu-11-04-sbs-small-
> business-server-setup-part-3-openldap/
>
> Under this setup, user sessions that are left unattended for a long
> period of time -- eg, when someone goes home for the night but stays
> logged in -- always result in a wedged machine. What do I mean by
> "wedged?" When the user returns to their session (the next morning),
> the screen is sorta grayed out. Keystrokes and mouse movement fail to
> elicit a reaction from the OS. I can switch to an ANSI terminal
> (Ctrl+Alt+F1), but cannot log in as the offending user there; the
> prompt will accept a username and password but never return. I CAN
> login using my localadmin, presumably because it uses UNIX
> authentication rather than LDAP/Kerberos. I have heretofore been
> unable to recover the machine as the localadmin, though. If localadmin
> attempts to sudo reboot the machine, the reboot process starts but
> never finishes.
>
> Some odd things in the server syslog:
>
> Jun 6 07:40:15 server krb5kdc[822]: AS_REQ (7 etypes {18 17 16 23 1 3 2}) 192.168.0.59: NEE...

Read more...

Automatically renewing the ticket is not a security breach. Since it can be done without storing passwords I don't see why it should be unsafe. IMHO it currently is the only reasonably safe way to keep NFS home directories accessible for long running jobs (e.g. if you have to run a simulation overnight) and unattended GUI applications. If the user is not around the screen should be locked anyway. It is certainly much safer than just extending the expiration date of the ticket.

On a standard MIT Kerberos installation the user can renew the ticket without entering the password for up to 7 days if the ticket and your account are still valid. Obviously the longer the ticket is out there, the higher the risk that somebody might steal it, so this has to be configured accordingly. But I really don't see a big security issue there.

I can't agree. Long run jobs and desktop session are two different cases.
When user leaves at the of the day his desk and leave its session open, it seems normal that the filesystem, without revalidation becomes unavailable, like it always use to be. Once unavailable, it can't be used be an attacker who gains root access and, through sudo gains user fs access. When the user get back to his desk, he revalidates his ticket and things goes on. Having an automatic ticket renewal discards any advantages of using nfsv4+kerberos (why don't simply use nfsv3 and his, no ticket to renew, no FS availability issue …).

Long runs jobs is another case in which user must access the FS over long period and should'nt be handled in the same way. It can be done as you describe or through nfsv3 on a dedicated node where security is much more drastic.

As i already said, a mainstream patch has been proposed to handle this : http://www.spinics.net/lists/linux-nfs/msg31257.html .

Bests

Le 2 juil. 2012 à 19:13, Dominic Gross a écrit :

> Automatically renewing the ticket is not a security breach. Since it can
> be done without storing passwords I don't see why it should be unsafe.
> IMHO it currently is the only reasonably safe way to keep NFS home
> directories accessible for long running jobs (e.g. if you have to run a
> simulation overnight) and unattended GUI applications. If the user is
> not around the screen should be locked anyway. It is certainly much
> safer than just extending the expiration date of the ticket.

--
    Christophe Ségui
   Responsable
   informatique
Institut de Mathématiques de Toulouse
Université de Toulouse - CNRS
118 Route de Narbonne
31062 Toulouse Cedex 09

Tel : (+33) 5 61 55 63 78
<email address hidden>
http://www.math.univ-toulouse.fr

Chris J Arges (arges) on 2012-08-14
Changed in linux (Ubuntu Precise):
assignee: nobody → Chris J Arges (christopherarges)
no longer affects: linux

Given the discussion on the linux-nfs list, I actually doubt this change will be reverted. I can see that this could potentially be desired behavior, but in some circumstances, it's catastrophic. For example, in our environment we have kerberized nfs home directories. If a user runs something in screen and logs out, they can't ever log back in to renew credentials if they expire. Also, if they're logged into a graphical workstation and credentials expire while the screensaver is running, it can't ever pop up the dialog prompting for password - ouch!

I'm testing the patch provided by John Hughes on the Debian bug and it seems to work really well. The only catch is that you have to edit the gssd.conf upstart script directly, since it doesn't read RPCGSSD_OPTS from the nfs-utils defaults file any more. (bug #564043)

I'm rolling this out to a few of our more public machines this weekend and if all goes well, I'll put together a debdiff.

The patch from the debian bug has been working well on all of our systems and completely fixes the issues we had been seeing related to the new EKEYEXPIRED behavior.

I applied the upstream patch to nfs-utils 1.2.5, and also made a small tweak to the gssd man page to document it.

I'm not sure whether a debdiff or the raw patch is more useful, so I'll attach both.

The attachment "nfs-utils_1.2.5-3ubuntu4.debdiff" of this bug report has been identified as being a patch in the form of a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch

Matthew,

Just to verify you are running a precise kernel and not the one patched from #15?

Thanks
Adam

Hi Adam,

Yes - we are running the unpatched precise kernel. I don't remember the version when I first started testing with my nfs-utils patch, but we're currently running linux-image-3.2.0-30-generic version 3.2.0-30.48. A few systems that haven't rebooted recently are still on linux-headers-3.2.0-29-generic version 3.2.0-29.46.

I've been running my nfs-utils patch on about 70 machines with kerberized nfs home directories since August 22nd and all blocking issues we were seeing on credential expiration are gone.

Thanks and let me know if you need any other info.

-Matt

Adam Stokes (adam-stokes) wrote :

Excellent, thanks Matt. I'll get the SRU process rolling on this and see if we can get this into the the distro.

Thanks again,
Adam

Changed in linux (Ubuntu Precise):
assignee: Chris J Arges (christopherarges) → Adam Stokes (adam-stokes)
status: Incomplete → In Progress
Adam Stokes (adam-stokes) wrote :
Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Adam Stokes (adam-stokes)
description: updated
summary: - Kerberos + LDAP + NFSv4 on Natty - Unable to recover unattended client
+ Kerberos + LDAP + NFSv4 - Unable to recover unattended client
Changed in linux (Ubuntu Precise):
milestone: none → ubuntu-12.04.2
Stéphane Graber (stgraber) wrote :

Uploaded to quantal, looking at precise now.

affects: linux (Ubuntu) → nfs-utils (Ubuntu)
Changed in nfs-utils (Ubuntu):
status: In Progress → Fix Released
Stéphane Graber (stgraber) wrote :

Uploaded to precise, unsubscribing sponsors.

Kjell Braden (afflux) wrote :

I've built 1.2.5-3ubuntu3.1 locally on my precise machine and it fixes the issue. Please approve for -proposed.

tags: added: verification-done-precise
tags: added: verification-done
Adam Stokes (adam-stokes) wrote :

Hi,

This was uploaded to -proposed, however, I dont see a comment on the bug where it shows the upload and asking for verification?

Thanks,
Adam

Kjell Braden (afflux) wrote :

Hi Adam,

it's has been uploaded to -proposed, but is pending build approval (ie. step 5 from https://wiki.ubuntu.com/StableReleaseUpdates#Procedure): https://bugs.launchpad.net/ubuntu/precise/+queue?queue_state=1&queue_text=

Adam Stokes (adam-stokes) wrote :

Hi,

This has been sitting in the unapproved queue since 9/28 any ETA as to when this will be approved?

Thanks,
Adam

Hello Brian, or anyone else affected,

Accepted nfs-utils into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nfs-utils/1:1.2.5-3ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nfs-utils (Ubuntu Precise):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Steve Atwell (satwell) wrote :

Update looks good. I've installed nfs-common 1:1.2.5-3ubuntu3.1 from precise-proposed and added a -e to the rpc.gssd exec in /etc/init/gssd.conf. After stopping and starting the gssd service, I'm now getting immediate EACCESS errors when trying to open files with expired kerberos tickets.

Steve Atwell (satwell) on 2012-10-10
tags: added: verification-done
removed: verification-needed

Thanks everyone for getting this approved and into -proposed.

I just wanted to add a "me too" to the verification of nfs-common 1:1.2.5-3ubuntu3.1 in precise-proposed. I've installed this, and with the -e option to rpc.gssd, EACCESS is returned when kerberos tickets are expired.

I'll look forward to seeing this in -updates...

Kjell Braden (afflux) wrote :

The fix works perfectly.

One small issue though: there is no way to add the -e switch except manually in /etc/init/gssd.conf. This is fine for me, because I run cfengine at my site to keep this fix. For other users an option in /etc/default/nfs-common could make sense.

Kjell:

This would make sense. Unfortunately, there's an open bug about this (bug #564043) that's currently in "Won't Fix" because fixing it runs into yet another bug (bug #545673). Until the underlying upstart and file location issues get sorted out, passing options to gssd from the nfs-common defaults file probably isn't possible.

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.5-3ubuntu3.1

---------------
nfs-utils (1:1.2.5-3ubuntu3.1) precise-proposed; urgency=low

  [ Matthew L. Dailey ]
  * Add "-e" (ticket expiry is error) option to rpc.gssd to prevent hangs due
    to EKEYEXPIRED error from kernel on ticket expiry. LP: #794112
 -- Adam Stokes <email address hidden> Thu, 06 Sep 2012 13:06:19 -0400

Changed in nfs-utils (Ubuntu Precise):
status: Fix Committed → Fix Released
Flávio Martins (xhaker) wrote :

Thanks for this fixe. I have been hunting down this problem and only just found this solution. I can add that if you don't want to edit /etc/init/gssd.conf directly you can always do this:

echo 'exec rpc.gssd -e' | sudo tee /etc/init/gssd.override

Then restart the service and check if it worked.

mahmoud (mahmoud085) on 2014-01-07
Changed in nfs-utils (Ubuntu Precise):
assignee: Adam Stokes (adam-stokes) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.