Partially incorrect uid mapping with nfs4/idmapd/ldap-auth

Bug #1124250 reported by Norbert Muda on 2013-02-13
152
This bug affects 26 people
Affects Status Importance Assigned to Milestone
Fedora
Won't Fix
Critical
linux (Ubuntu)
Low
Unassigned
Trusty
Low
Dariusz Gadomski
Utopic
Low
Dariusz Gadomski
nfs-utils (Debian)
Confirmed
Unknown

Bug Description

[Impact]

 * This bug is likely to cause an incorrect UID/GID mapping for NFS shares in case of large numbers of differend UIDs/GIDs or in case of expired UID/GID mappings (stored as keys in the kernel).

[Test Case]

 1. Setup a nfs4 server exporting /home with a large number of different users and ldap-based authentication.
 2. Mount the share on a ldap-connected client machine.
 3. List the mounted /home directory.
 4. Wait more than 10 minutes (the default key expiration time) and list it again with ls -l.

Expected result - all directories are listed with correct UIDs/GIDs.
Actual result - some of the directories may be listed with incorrect UID/GID of 4294967294.

[Regression Potential]

 * This issue has been merged upstream in the 3.18 kernel and is also present in Debian's 3.16 kernel.

[Other Info]

* Original bug description:

I'm running a nfs4 server exporting a directory /home (ext4,usrquota). This server is running Ubuntu 12.04 amd64(up-to-date). This directory is handling 662 homedirs for ldap authenticated users.
/etc/exports is :
/exports 192.168.0.0/24(rw,fsid=0,no_subtree_check)

Important lines in /etc/idmapd.conf :
domain=my-domain.org

[Translation]
Method=nsswitch.

In /etc/default/nfs-common :
NEED_IDMAPD=yes

In /etc/default/nfs-kernel-server :
RPCNFSDCOUNT=75
RPCMOUNTDOPTS=--manage-gids

2 Clients (rhel6 x86 & Ubuntu 12.04.2 i686) are mounting this nfs4 exported directory with no problems :
When doing ls -l /home on this clients, I have :
...
drwx------ 4 user100 oldusers 4096 sept. 21 2011 user100
drwx------ 4 user101 oldusers 4096 sept. 21 2011 user101
drwx------ 37 user102 oldusers 4096 oct. 1 19:06 user102
drwx------ 36 user103 users 4096 févr. 5 21:08 user103
drwx------ 36 user104 users 4096 févr. 8 14:03 user104
drwx------ 30 user105 users 4096 févr. 4 18:01 user105
drwx------ 28 user106 oldusers 4096 oct. 5 2011 user106
drwx------ 37 user107 oldusers 4096 janv. 8 14:52 user107
drwx------ 31 user108 users 4096 déc. 4 11:52 user108
drwx------ 4 user109 oldusers 4096 sept. 21 2011 user109
drwx--x--x 45 user110 oldusers 4096 janv. 22 15:53 user109
drwx------ 31 user111 users 4096 janv. 29 12:03 user110
...
uid/gid mapping works fine, authldap works fine, ...

All Clients running Ubuntu 12.10 i686 or Ubuntu 12.10 amd64 are experiencing the same problem :
The config files are the same that used in ubuntu 12.04.
Auth ldap is correctly configured, user can log in.

This is the /etc/fstab entry for /home :
192.168.0.1:/ /home nfs rw,nfsvers=4 0 0

Important lines in /etc/idmapd.conf :
domain=my-domain.org
[Translation]
Method=nsswitch

In /etc/default/nfs-common :
NEED_IDMAPD=yes

/etc/nsswitch.conf is :
passwd: files ldap
group: files ldap
shadow: files ldap

When doing ls -l /home there is a strange problem :

drwx------ 4 4294967294 oldusers 4096 sept. 21 2011 user100
drwx------ 4 user101 oldusers 4096 sept. 21 2011 user101
drwx------ 37 user102 oldusers 4096 oct. 1 19:06 user102
drwx------ 36 4294967294 users 4096 févr. 5 21:08 user103
drwx------ 36 4294967294 users 4096 févr. 8 14:03 user104
drwx------ 30 4294967294 users 4096 févr. 4 18:01 user105
drwx------ 28 4294967294 oldusers 4096 oct. 5 2011 user106
drwx------ 37 4294967294 oldusers 4096 janv. 8 14:52 user107
drwx------ 31 4294967294 users 4096 déc. 4 11:52 user108
drwx------ 4 user109 oldusers 4096 sept. 21 2011 user109
drwx--x--x 45 4294967294 oldusers 4096 janv. 22 15:53 user110
drwx------ 31 4294967294 users 4096 janv. 29 12:03 user111

for 571 homedirs (this number varies at each reboot)/662, the owner is the value 4294967294. For the 91 remaining homedirs,
the owner is correct. The gidnumber is correctly mapped for all (only 5 differents values used for gidNumber).

In /var/log/syslog, I can see :

For example : user110 is mapped as 4294967294.
but the command "id user110" returns :
uid=31124(user110) gid=666(oldusers) groupes=666(oldusers)

user110 logs in (auth ldap) from tty1. He runs "ls -l /home/user110/" :

drwxr-xr-x 8 4294967294 oldusers 4096 janv. 19 2012 Bureau
drwxr-xr-x 3 4294967294 oldusers 4096 déc. 2 2011 Documents
drwxr-xr-x 2 4294967294 oldusers 4096 déc. 2 2011 Images

Then, he runs "touch /home/user110/test" :

drwxr-xr-x 8 4294967294 oldusers 4096 janv. 19 2012 Bureau
drwxr-xr-x 3 4294967294 oldusers 4096 déc. 2 2011 Documents
drwxr-xr-x 2 4294967294 oldusers 4096 déc. 2 2011 Images
drwxr-xr-x 2 4294967294 oldusers 0 févr. 13 16:01 test

On the nfs server, If i do a ls -l in the same directory :

drwxr-xr-x 8 user110 oldusers 4096 janv. 19 2012 Bureau
drwxr-xr-x 3 user110 oldusers 4096 déc. 2 2011 Documents
drwxr-xr-x 2 user110 oldusers 4096 déc. 2 2011 Images
drwxr-xr-x 2 user110 oldusers 0 févr. 13 16:01 test

I can see that the "test" file is owned by the correct user.

I've tried without & with nscd, same results.
I've tried using sssd, libnss-sss & pam_sss for ldap auth and having exactly the same results :

In /var/log/syslog, I have :
...
rpc.idmapd[561]: nss_getpwnam: name '<email address hidden>' domain 'my-domain.org': resulting localname 'user109'
rpc.idmapd[561]: nfs4_name_to_uid: nsswitch->name_to_uid returned 0
rpc.idmapd[561]: nfs4_name_to_uid: final return value is 0
rpc.idmapd[561]: Client 0: (user) name "<email address hidden>" -> id "55101"
rpc.idmapd[561]: nfs4_name_to_uid: calling nsswitch->name_to_uid
rpc.idmapd[561]: nss_getpwnam: name '<email address hidden>' domain 'my-domain.org': resulting localname 'user102'
rpc.idmapd[561]: nfs4_name_to_uid: nsswitch->name_to_uid returned 0
rpc.idmapd[561]: nfs4_name_to_uid: final return value is 0
rpc.idmapd[561]: Client 0: (user) name "<email address hidden>" -> id "55199"
...
only for the correctly mapped entries. No warnings or errors (rate limit disabled in rsyslog.conf) and verbosity set to 5 in idmapd.conf. It seems that rpc.idmapd never does mapping for other entries.

CVE References

David,

Would it make sense to patch the kernel so the maxkeys/root_maxkeys are set to a more reasonable value?

I have given a look at the relevant sources for the fedora kernel (upstream it is just the same). It appears that nfsid keys should be created within the keyring

        keyring = key_alloc(&key_type_keyring, ".id_resolver", 0, 0, cred,
                             (KEY_POS_ALL & ~KEY_POS_SETATTR) |
                             KEY_USR_VIEW | KEY_USR_READ,
                             KEY_ALLOC_NOT_IN_QUOTA);

in idmap.c

However they do still count toward the quota of root (whence the problem).
This is quite surprising and, unless I am misrepresenting the situation, it could be a bug somewhere else.

The issue is still there on a fresh installation of a Fedora 18. Now this is
quite unfortunate: like this NFS4 is unreliable and quite unusable especially on systems like mail servers that typically handle files with many differing ownerships in a common directory.
Is this going to be fixed?

description: updated
81 comments hidden view all 120 comments
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status: New → Confirmed
82 comments hidden view all 120 comments

The problem is still present after a fresh update of the client:

nfs client (Fedora 18):
nfs-utils-1.2.7-3.fc18.i686
kernel-PAE-3.8.2-206.fc18.i686

nfs server (Fedora 16):
nfs-utils-1.2.5-5.fc16.i686
kernel-PAE-3.3.5-2.fc16.i686

The description of the problem above still applies. Moreover nothing
is written in /var/log/messages

I don't see the issue between 2 Fedora 18 machines. Unfortunately, our Fedora and Ubuntu clients do run into this problem all the time with the home and mail directories, which are on RHEL 6 servers.
Could it be that the bug was fixed in recent Fedora kernels, but that RHEL 6 is still waiting for a fix?

This is what I use on our Fedora machines (1000 is enough for us ATM):

/etc/sysctl.d/nfsv4_idmap_maxkeys:

  # NFSv4 idmap entries are counted against a very low quota
  # https://bugzilla.redhat.com/show_bug.cgi?id=876705
  kernel.keys.root_maxkeys = 1000
  kernel.keys.maxkeys = 1000

(In reply to comment #6)
> I don't see the issue between 2 Fedora 18 machines.

After a quick check I realized that with two Fedora 18 the uid mapping mechanism wasn't working at all (strangely). If this is the case it is no wonder that you didn't see the issue in that case. Could you check? Just create the same username
with different UIDs on the two machines.

Actually, I believe the change in behaviour is documented here:
http://comments.gmane.org/gmane.linux.nfs/46028
When kerberos is not in use the client now just sends uid/gid pairs by default.
I still wonder if this might masking an actual bug on nfs keys being counted as in quota, though.

(In reply to comment #6)
> I don't see the issue between 2 Fedora 18 machines.

I wanted to investigate the issue between two Fedora 18 machines.
Unfortunately I couldn't find a way to activate the uid mapping mechanism.
Everything I tried had simply results compatible with nfs version 3.

My test case was:
1. create a user "testnfs" on both the server and client with different
UIDs
2. create a file on the server with (local) ownership by testnfs
3. nfs-mount the directory on the client machine and investigate the ownership
of the file by the (local on the client) testnfs user.

I espect ownership (e.g. uid is remapped), which is *not* the case.

---

The same experiment carried on a Fedora 16 server, kernel 3.3.5-2.fc16.i686.PAE
nfs-utils-1.2.5-5.fc16.i686, on the contrary gave the expected uid remapping.

---

Not that I actually *need* remapping (on the contrary!), this is just to
understand whether the problem is transient, and solved for the future or not.

Indeed, between fedora 18 machines, idmap seems to have no effect. The only reason why I didn't see any files owned by "nobody" was, because we use LDAP as central user database, so all numerical user id's could be resolved identically at server and client side. But if I have a local user with different user id, it will show up as a numerical id on the other side.

in
/sys/module/nfs/parameters/nfs4_disable_idmapping
and
/sys/module/nfsd/parameters/nfs4_disable_idmapping

the default value is Y (which justifies the fact that idmapping seems
disabled between two fedora 18 machines.

However, twiggling with those files did not allow me to accomplish
a fully functional nfs-id-mapping as with a fedora 16 nfs server:

apparently the uids are correctly remapped according to the local
username, but actual access to the files obey to "non-remapped" uid;
a quite weird situation.

I guess I am missing something, or perhaps uid remapping support is
currently broken in nfs 4.1

89 comments hidden view all 120 comments

Have a look into /proc/key-users to see if user root reaches its quotas:

    0: 1029 1028/1028 10/100 2345/20000

means that root uses 1022 from 10000 possible an uses 2345 bytes from maximum 20000.

You may have to encrease both. For each user and group you should increase it by one. So if you 1000 users and 1000 groups you should root allow more than 2000 keys (and enough memory for that).

Try

echo 400000 > /proc/sys/kernel/keys/root_maxbytes
echo 10000 > /proc/sys/kernel/keys/root_maxkeys

and see if this fixes the problem for you.

Norbert Muda (norbert-muda) wrote :

No, it doesn't fixes the problem for me :-(

89 comments hidden view all 120 comments

With respect to the mapping cache capacity, there are two problems that need addressing:

 (1) The capacity of a keyring isn't sufficient (~1024 on 32-bits, ~512 on 64-bits). I have patches to expand this, but they're not quite upstream yet. This limits the size of the cache.

 (2) The default maximum number of keys (quota) is only 200. This can be altered from the running kernel as mentioned in comment 7 by tweaking sysctls for the moment. Ideally, though, kernel-wrought keys like this shouldn't be counted towards quota - something that will require a patch.

88 comments hidden view all 120 comments
jtlb (jt-lb) wrote :

Have you tried with an older kernel ? I face the same bug when using 3.10.x while there is no problem when using 3.2.x.

jtlb (jt-lb) wrote :

Using `git bisect`, It appears that this bug has been introduced between v3.3 and v3.4 by this commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=57e62324e469e092ecc6c94a7a86fe4bd6ac5172

88 comments hidden view all 120 comments

Patches to expand keyring capacity have been committed to the upstream security tree and will hopefully go to Linus in the next merge window:

http://git.kernel.org/cgit/linux/kernel/git/jmorris/linux-security.git/commit/?h=next&id=b2a4df200d570b2c33a57e1ebfa5896e4bc81b69

We're actually already carrying those patches in F20 and rawhide.

88 comments hidden view all 120 comments
Denys Duchier (denys.duchier) wrote :

I am experiencing exactly the same problem. has there been progress on this issue? is there an update available that fixes it?

jtlb (jt-lb) wrote :

@denys.duchier As I understood by reading the code, this is not a bug but a feature. In Linux 3.4 they dropped their own B-tree uid/username caching in favor of the "key" infrastructure which, by design, includes quota. While this is a pain in this case, it mitigates the risk of DOS attack by filling up kernel's memory.

I saw 2 options:
 1/ increase root's quota as explained by @wolfgang-walter
 2/ fallback to (poorly documented) nfs3 like behavior

I personally did the later. In this scheme, uids are sent over the wire as equivalent strings ie username="123" for uid=123 instead of mapping it to "<email address hidden>". The other end *should* detect it is a stringified uid and convert it back. This is all the magic. I said "should" as it actually depends on the exact implementation since this is fallback behavior instead of standard. It works with Linux and, with reasonable effort, with Solaris.

Denys Duchier (denys.duchier) wrote :

unfortunately, like comment #3 reported, the quota is not the problem.

87 comments hidden view all 120 comments

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

86 comments hidden view all 120 comments
Joerg Delker (ubuntu-delker) wrote :

I was encountering the very same problem in an identical environment (nfs4 and LDAP based nameservice) but on DEBIAN (!).
Since I was not able to find any relevant bug reports within the Debian bug tracking system, I was very delighted to find someone having the same problem and reporting it here.

Following this and another bug report I found at RedHat (https://bugzilla.redhat.com/show_bug.cgi?id=876705), I placed the following kernel parameters configuration:

in /etc/sysctl.d/nfsv4_idmap_maxkeys:

  # NFSv4 idmap entries are counted against a very low quota
  # https://bugzilla.redhat.com/show_bug.cgi?id=876705
  kernel.keys.root_maxkeys = 1000
  kernel.keys.maxkeys = 1000

After activating that with sysctl the problem was gone with my installation.

Joerg Delker (ubuntu-delker) wrote :

For your reference:

before applying the "fix" in comment #9 the keys seem to be saturated (199/200).
(That was after "ls -al" in a nfs4 mounted dir with ~500 different owners)

# cat /proc/key-users
    0: 204 203/203 199/200 6275/20000

After enabling the new key quotas it showed:

# cat /proc/key-users
    0: 516 515/515 511/1000 16185/20000

86 comments hidden view all 120 comments

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

F19 won't get this until it is rebased to 3.13. If the original reporter has since moved to F20, we can close this as NEXTRELEASE.

(In reply to Josh Boyer from comment #19)
> F19 won't get this until it is rebased to 3.13. If the original reporter
> has since moved to F20, we can close this as NEXTRELEASE.

I (the original reporter) just moved to F20. The problem is still present:

-----------------------------------------------------------------------
# mv [to a directory with more than 200 different owners]
# echo "200" > /proc/sys/kernel/keys/root_maxkeys
# ls -l
[...]
-rw------- 1 4294967294 mail 2958131 Jan 7 11:56 xxxx
[...]
# echo "10000" > /proc/sys/kernel/keys/root_maxkey
# ls -l
[...]
-rw------- 1 xxxx mail 2958131 Jan 7 11:56 xxxx
[...]
-----------------------------------------------------------------------

should I change the Version from 19 to 20?

David, do you know why Maurizio is still seeing this given that we're carrying the updated keyring patches?

I just rechecked after a "yum update" and a reboot. I can confirm the
problem.

info:
$ uname -r
3.12.6-300.fc20.i686+PAE
$ rpm -q nfs-utils
nfs-utils-1.2.8-6.0.fc20.i686

(I also removed my only local entry in /etc/sysctl.d that changed
the value of /proc/sys/kernel/keys/root_maxkeys to 10000, just to
make sure that the default value is still 200).

However keep in mind that the NFS server is a fedora 16:
$ uname -r
3.3.5-2.fc16.i686.PAE
$ rpm -q nfs-utils
nfs-utils-1.2.5-5.fc16.i686

I have a similar (or the same?) problem. Since Fedora 16 our nfsv4 clients will show the owner or group of a file as "4294967294" when there are to many different owners or groups when listing a directory (using "ls -l").

The problem still exists on Fedora 19 and Fedora 20.

I have tried the test from comment #20 (with an added "e" in the last line containing "root_maxkey"). But I see no difference if /proc/sys/kernel/keys/root_maxkeys is 200 or 10000.

I have tested it with kernel-3.12.8-300.fc20.x86_64 and nfs-utils-1.2.9-2.1.fc20.x86_64 .

Sorry, I missed an 's' (not an 'e') in the command (cut/paste problem), so the
command is actually:

# echo "10000" > /proc/sys/kernel/keys/root_maxkeys

perhaps after the command it is better if you clean the keys with

# nfsidmap -c

This does the trick for us, it is strange that it does not change anything
to you! If the change works, then you can make this happen at boot by creating
a file in /etc/sysctl.d/ with a name like "99-local.conf" containing

# Keys for nfs
kernel.keys.root_maxkeys = 10000

(In reply to Edgar Hoch from comment #23)
> I have a similar (or the same?) problem. Since Fedora 16 our nfsv4 clients
> will show the owner or group of a file as "4294967294" when there are to
> many different owners or groups when listing a directory (using "ls -l").
>
> The problem still exists on Fedora 19 and Fedora 20.
>
> I have tried the test from comment #20 (with an added "e" in the last line
> containing "root_maxkey"). But I see no difference if
> /proc/sys/kernel/keys/root_maxkeys is 200 or 10000.
>
> I have tested it with kernel-3.12.8-300.fc20.x86_64 and
> nfs-utils-1.2.9-2.1.fc20.x86_64 .

Sorry, of course, I have added a "s", not a "e" - my error in the text of comment #23.

Now I have created a file /etc/sysctl.d/99-maxkeys.conf with content as you suggested in comment #24. After a reboot, "cat /proc/sys/kernel/keys/root_maxkeys" prints "10000".

Then I called "ls -l" on a list of all home directories (several hundred), and the problem still exists: The first directories are displayed with the correct username and groupname, and somewhere in the middle the remaining directories (and files, of course) with new usernames and groupnames are displayed as "4294967294".

/proc/keys contains 529 lines.
In a previous try, with /proc/sys/kernel/keys/root_maxkeys having the default value of 200, /proc/keys had about 150 lines. I have retried this configuration again - currently /proc/keys has 205 lines.
Immediate after reboot, /proc/keys contains 14 lines.

I see that increasing the value of /proc/sys/kernel/keys/root_maxkeys will map more file uids and gids to the correct name. But even if the value is much bigger than the number of registered users and groups not all uids and gids are mapped.

The nfs server which I have used for this test runs Fedora 19, currently with kernel-3.11.7-200.fc19.x86_64 because its a production server in use which I should not reboot too often. We use nis for distributing the users on the hosts.

Additional info:

"nfsidmap -c" have not solved the problem. /proc/keys was cleared (except of some "basic" values), but the nfs idmap problem still occured after listing the nfs directories / files.

This is the reason why I have tried the file /etc/sysctl.d/99-maxkeys.conf so /proc/sys/kernel/keys/root_maxkeys have the right value (10000) immediate after boot. But this haven't solved the problem, too.

we also use NIS, our nfs server is older than yours (Fedora 16, kernel
3.3.5-2.fc16.i686.PAE... however we only have slightly more than 200 users,
so we barely see the problem with the default 200 value for root_maxkeys.
However we do see if if we purposefully decrease the value from 200 to, say, 10.

What comes in mind is that perhaps there is a maximal value (512?) for the
number of keys, or perhaps you hit against the default maximum root_maxbytes
(defaults to 20000). You could try to increase it as well...

(In reply to Edgar Hoch from comment #26)
> Additional info:
>
> "nfsidmap -c" have not solved the problem. /proc/keys was cleared (except of
> some "basic" values), but the nfs idmap problem still occured after listing
> the nfs directories / files.
>
> This is the reason why I have tried the file /etc/sysctl.d/99-maxkeys.conf
> so /proc/sys/kernel/keys/root_maxkeys have the right value (10000) immediate
> after boot. But this haven't solved the problem, too.

Ah, using NIS/YP, try the *_tw_* workaround from https://bugzilla.redhat.com/show_bug.cgi?id=740024#c6

I presume you refer to:

    echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
    echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

personally I do not think there is a relation, however no problems in trying
that, however you should be more specific, the above setting has to be done
on the NFS client, the NFS server, the NIS server?

(In reply to Anders Blomdell from comment #28)
> Ah, using NIS/YP, try the *_tw_* workaround from
> https://bugzilla.redhat.com/show_bug.cgi?id=740024#c6

(In reply to Maurizio Paolini from comment #29)
> I presume you refer to:
>
> echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
> echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Setting net.ipv4.tcp_tw_recycle and net.ipv4.tcp_tw_reuse didn't help in my case. First I have tried this on the nfs client and then on the nfs client and nfs server too.
I have used the following equalent commands:

  sysctl net.ipv4.tcp_tw_recycle=1
  sysctl net.ipv4.tcp_tw_reuse=1

But I tried another thing:
I have increased the value kernel.keys.root_maxbytes. The default was 20000.
First I have increased only kernel.keys.root_maxbytes, leaving kernel.keys.root_maxkeys at default value (200), but this didn't help.
Then I have increased kernel.keys.root_maxkeys too (again). Now all uids and gids on the nfs filesystems are mapped to the correct username, no "4294967294" is displayed anymore.

It seems this solves the "4294967294" nfs problem for me.
I did the following on the nfsv4 client (nfs server was unchanged resp. set to the same state as before the tests):

  sysctl kernel.keys.root_maxkeys=10000
  sysctl kernel.keys.root_maxbytes=200000

I don't know how big the values should be, but it seems they are big enought for our configuration now.

Fine... we then should change the name of this bug to include also the "maxbytes" :-)
of course the point is that there should be no such a barrier on a
production and historical filesystem like NFS, especially in the complete
absense of any error message or indications of any kind on how to solve it
(it can byte you very hard if e.g. there is a "sendmail" running on the
NFS client with the mailboxes exported from a NFS server, like we have)

Did someone have a check to comment #3 above by Luca Giuzzi? It seems very
straight to the point, perhaps pointing to a bug in the keys implementation!

(In reply to Edgar Hoch from comment #30)
>[...]
> But I tried another thing:
> I have increased the value kernel.keys.root_maxbytes. The default was 20000.
> First I have increased only kernel.keys.root_maxbytes, leaving
> kernel.keys.root_maxkeys at default value (200), but this didn't help.
> Then I have increased kernel.keys.root_maxkeys too (again). Now all uids and
> gids on the nfs filesystems are mapped to the correct username, no
> "4294967294" is displayed anymore.
>
>
> It seems this solves the "4294967294" nfs problem for me.
> I did the following on the nfsv4 client (nfs server was unchanged resp. set
> to the same state as before the tests):
>
> sysctl kernel.keys.root_maxkeys=10000
> sysctl kernel.keys.root_maxbytes=200000
>
> I don't know how big the values should be, but it seems they are big enought
> for our configuration now.

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.13.4-200.fc20. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

The problem still exists in kernel kernel-3.13.4-200.fc20.x86_64.
The parameters values are still too low.

# sysctl -a|grep kernel.keys.root
kernel.keys.root_maxbytes = 20000
kernel.keys.root_maxkeys = 200

I think there should be no fixed limit at all for these values (or at least a very high, to prevent an error loop to consume unlimited memory). The kernel should allocate as much memory it needs to save all usernames, uids, gids, etc that exists on that system (including nis, ldap, etc.). The list of usernames, groupnames, uids, is limited because the files which contains the list have a limited lenght and usernames etc. are not generated dynamically while the system is running (except a fixed amount, for example by new packages, or manually by the system administrator).

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

101 comments hidden view all 120 comments
Vertago1 (vertago1) wrote :

running:
sudo sysctl kernel.keys.root_maxkeys=1000
sudo sysctl kernel.keys.maxkeys=1000

fixes the issue for me to some extent. I still have some files with 4294967294 instead of nobody or nogroup.

I only see the problem on one ldap and nfsv4 client running ubuntu 14.04 LTS (as a testbed). My other servers are running 12.04 and don't have the issue. They don't seem to be using these keys at all. Is idmaping enabled by default on 14.04 where it wasn't in 12.04?

Vertago1 (vertago1) wrote :

I have created and attached a file 30-nfsv4-quota.conf for /etc/sysctl.d/ that ups the limit at boot time for ubuntu. It fixes the limit problem but I still have 4294967294 showing up instead of nobody or nogroup.

Confirmed on ubuntu 14.04 kernel 3.13.0-27-generic with NFSv4 mounts to an Isilon filesystem - even with very high values for the various kernel.keys parameters, we still get sporadic failures for UID/username mappings resulting in user or group IDs on new files showing up as 4294967294 rather than the correct value.

Our LDAP has < 250 users, and I've set these sysctl values on the NFS clients:

kernel.keys.gc_delay = 300
kernel.keys.maxbytes = 20000
kernel.keys.maxkeys = 10000
kernel.keys.persistent_keyring_expiry = 259200
kernel.keys.root_maxbytes = 400000
kernel.keys.root_maxkeys = 10000

100 comments hidden view all 120 comments

With kernel 3.15.3-200.fc20.i686+PAE it seems that the default in
/proc/sys/kernel/keys/root_maxkeys is 10000 instead of 200; however this
is just a workaround, since NFS key *should not* count against the
root_maxkeys quota.

By manually changing 10000 to a low value the problem appears again, thus
showing that still the NFS keys count against the root quota.

The NFS client runs on a Fedora 20, whereas the NFS server resides on a
fedora-release-16-1, kernel 3.3.5-2.fc16.i686.PAE.

99 comments hidden view all 120 comments
Lars Behrens (lars-behrens-u) wrote :

Confirm this for 12.04 with hwe and trusty-kernel (linux-3.13.0-32-generic #57~precise1-Ubuntu)

We have less than 100 ldap users and under 100 groups.
On some clients here we have to use hwe with a newer kernel to support the hardware.
And on exactly those and only those clients from time to time gid and/or uid shows up with 4294967294 for some or all home directories.

gid/uid is ok on the other 12.04 clients with default kernel (3.2.0-67-generic #101-Ubuntu)

So this too seems to point towards a bug or misconfiguration with Kernel 3.13

user-keys show e.g. 0: 110 109/109 102/200 3778/20000 so quota isn't reached by far.

Vladimir (vladimir-kozlov) wrote :

Confirm this for 14.04 with 3.13.0-35-generic kernel on server and clients (all are x86_64).

Had to use nfds.nfs4_disable_idmapping=0 in order for allow idmapping to work on old (10.04 and RHEL58) clients.

Vladimir (vladimir-kozlov) wrote :

Maybe this will help: issuing of 'sudo nfsidmap -c' solves problem immediately till next time.

Changed in nfs-utils (Ubuntu):
assignee: nobody → Dariusz Gadomski (dgadomski)

The work around in the RH ticket did not fix my issues on Ubuntu Trusty

# uname -a
Linux gentoo 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/sys/kernel/keys/root_maxbytes
200000

#cat /proc/sys/kernel/keys/root_maxkeys
10000

# cat /etc/sysctl.conf | grep root_
kernel.keys.root_maxkeys=10000
kernel.keys.root_maxbytes=200000

# ls -l ~/.ssh/
total 291
-rw------- 1 blkperl them 6387 Mar 28 11:37 authorized_keys
-rw------- 1 4294967294 them 672 Aug 15 11:44 config
-rw------- 1 4294967294 them 25 Jul 25 11:17 id_rsa
-rw------- 1 4294967294 them 159617 Aug 13 18:22 known_hosts
-rw------- 1 4294967294 them 158443 Jul 30 10:11 known_hosts.old
drwxr-xr-x 3 4294967294 them 18 Jan 30 2014 old
-rw------- 1 4294967294 them 1679 Aug 12 17:03 test-deploy_id_rsa
-rw------- 1 4294967294 them 398 Aug 12 17:03 test-deploy_id_rsa.pub

Dariusz Gadomski (dgadomski) wrote :

@blkperl

William, could you please confirm that you are using kernel at least 3.13.0-35.62 (this is a solution to #1344405 which may also cause similar effects).

Could you also please provide the output of:
$ sudo cat /proc/key-users

You may also try to increase values (beside their root_* counterparts):
kernel.keys.maxkeys=10000
kernel.keys.maxbytes=200000

Regards,
Dariusz

Hi Dariusz,

chicken:~# cat /proc/key-users
    0: 32 31/31 25/10000 537/200000
 3404: 4 4/4 4/200 38/20000
11254: 3 3/3 3/200 39/20000

I'm seeing the issue on both of the latest kernels 3.13.0.36.43 and 3.13.0.35.42

Increasing the values maxkeys and maxbytes and rebooting seems to have worked. I'll let you know if the systems change their mind.

95 comments hidden view all 120 comments

@Maurizio Paolini: is the fact that the NFS keys *should not* count against the root_maxkeys quota documented anywhere?

I was also expecting this to be outside the quote. I have made some research in the kernel code and here is what I was able to find:

* the keyring is created with the KEY_ALLOC_NOT_IN_QUOTA flag (and the absense of Q flag in /proc/keys confirms that), however

* individual keys are created in nfs_idmap_request_key:
  - request_key function is called, which
  - calls the internal (not listed in any exported headers) request_key_and_link function
  - request_key_and_link is passed the *KEY_ALLOC_IN_QUOTA* flag making it an explicit call to keep the key in the quota

* nfsidmap executable from the nfs-utils package
  - uses a keyctl_instantiate function. The 4th parameter of this function is keyring id and in case of this tool is always 0.
  - I believe the control later enters kernel space in the keyctl_instantiate_key function via the syscall interface.
  - this keyring id is then mapped to an actual in-kernel keyring and the key being mapped (through the nfsidmap commandline) is linked to that keyring. Obviously, if 0 is always passed there it will never happen. On the other hand, patching this issue (locally) has not caused any changes in the quota behaviour.

Summary:
According to current implementation (and my understanding) current behavior is expected. I would be very interested any opinion on this.

94 comments hidden view all 120 comments

Still seeing the issue after increasing maxkeys and maxbytes as specified in the last comment.

Dariusz Gadomski (dgadomski) wrote :

Hello William,

Regarding your comment #20: did the contents of /proc/key-users when you observe to issue somehow differ from the ones you posted in comment#19?

What I am trying to determine is whether for some reasons the quota is hit on your machine when you observe the issue or is it caused by any other error.

Thank you.

Here's an example of a broken machine with the increased maxkeys and maxbytes. It doesn't look different to me from the one in #19.

# cat /proc/key-users
    0: 23 22/22 16/10000 322/200000
11254: 5 5/5 5/10000 49/200000

We also tried installing the 3.16.3-031603-generic #201409171435 kernel and rebooting but the issue is still persistant.

Changed in nfs-utils (Debian):
status: Unknown → Incomplete
Dariusz Gadomski (dgadomski) wrote :

William,

I'm sorry, but I cannot reproduce this issue with the kernels you mentioned.

Are you able to provide me sosreport from the affected system?

You could also try adding -vvv option to your /etc/request-key.d/id_resolver.conf to make it look like this:
create id_resolver * * /usr/sbin/nfsidmap -vvv -t 600 %k %d

and then list the problematic directory. There should be more verbose output either in dmesg output or in the syslog.

Thank you,
Dariusz

Dave Chiluk (chiluk) on 2014-09-24
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Carl Hetherington (cth-carlh) wrote :

I have poked at this a bit. On my system, running this:

#!/bin/bash
while [ 1 ]; do
  touch foo
  test=`ls -lh foo | grep -v c.hetherington`
  if [ "$test" == "" ]; then
    echo "OOPS"
    echo $test
  fi
  sleep 1s
  rm foo
done

prints OOPS exactly 10 minutes after the first resolution of my username (c.hetherington) to my uid (10000). When this happens, -2 is returned as the uid/gid of the test file.

As far as I can see:

nfs_map_name_to_uid() returns -2 in *uid; it calls
nfs_idmap_lookup_id() which fails because it calls
nfs_idmap_get_key() which fails because it calls
nfs_idmap_request_key() which fails because it calls
request_key_with_auxdata() which fails because it calls
wait_for_key_construction() which fails because
key_validate() returns EKEYEXPIRED.

At some point subsequently, a new call to nfs_map_name_to_uid ends up calling /sbin/request-key after which everything is ok again.

I'm printk()ing the kernel and testing here so let me know if there's anything useful I can try.

Carl Hetherington (cth-carlh) wrote :

The attached patch is a hack (to Ubuntu's 3.13.0 as shipped with 14.04) which seems to help here. I am no kernel developer, but maybe it will help to describe the problem and suggest a proper solution.

The attachment "0001-Invalidate-expired-keys-when-they-are-requested-in-o.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Dave Chiluk (chiluk) on 2014-09-29
Changed in linux (Ubuntu Utopic):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu Trusty):
status: New → Won't Fix
importance: Undecided → Low
Dave Chiluk (chiluk) wrote :

The ubuntu kernel uses the same values as the upstream kernel in regards to these values. They are tunable for exactly this kind of case.

I brought this case up with the Ubuntu Kernel team, and unfortunately due to the fact that this could potentially be used in a memory-exhaustion, denial of service type attack we will not be changing from the default values. That being said if the mainline kernel decides to change the defaults we would definitely consider following mainline. For most machines raising these default values isn't an issue. However since Ubuntu is so prevalent in virtualized environments where memory is more restricted we will not be changing these values.

If you feel strongly that these values need to be changed please pursue with the mainline linux maintainers.

Dave Chiluk (chiluk) wrote :

@Carl Hetherington

Your patch is interesting. Please submit it to the mainline kernel, and to stable if you feel it deserves to go into stable. Once it hits stable it will then likely get picked up by the Ubuntu 3.13 kernel.

Carl Hetherington (cth-carlh) wrote :

Actually, I think this patch is a bit less invasive. I'll submit to the mainline kernel list and pick up my fire extinguisher ;)

Bryan Quigley (bryanquigley) wrote :

nfs_patch2.patch works for me w/ ~27000 home directory setup. Thanks! Please do link to the lkml if you can (might take a few days to appear).

Carl Hetherington (cth-carlh) wrote :

Hi Bryan, I'm glad it's working, thanks for the report. No response on LKML yet; here's the message:

https://lkml.org/lkml/2014/9/30/435

Bryan Quigley (bryanquigley) wrote :

For anyone following at home:
http://www.spinics.net/lists/linux-nfs/msg47185.html

@Carl, For the future, it's probably better to use https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ at least when you're pushing upstream. It does help a lot for possibly SRUing to know that it works on the trusty branch too :). Thanks for all your work!

Michael (m123) wrote :

I am not entirely sure if this is 100%ly related to this bug, but let me tell you my story here (contains another workaround):

I was also experiencing the problem of frequently having my files owned by 4294967294.
Setup is Ubuntu 14.04 with automounted nfs4/kerberos homes, the NFS server is running Debian Wheezy.

The problem was not existing with the previously used Ubuntu 13.10, so I began investigating and tried almost everything I found (which is mostly documented here), ranging from setting sysctl values to installing the kernel patch posted here.

However, nothing did help, so I decided to debug via /proc/keys:

While I still had the problem, /proc/keys (as seen by root) showed keys like this:

0094f999 I--Q--- 1 15s 3b010000 0 0 id_legacy uid:user@fqdn: 5

Note worthing is the remaining time of 15 seconds, shortly thereafter the problem occured for me and /proc/keys looked like
this:

0094f999 I--Q--- 1 expd 3b010000 0 0 id_legacy uid:user@fqdn: 5

The key was "expired" and there was no new one in the list.
So I issued "nfsidmap -v -c" (which did repair the situation everytime I have tried) and voilà:

5482b3a I--Q--- 1 9m 3b010000 0 0 id_legacy uid:user@fqdn: 5

I had a fresh key with a lifetime of ~ 10 minutes. But listen up, now comes the final workaround which has "fixed" the problem for about 3 or 4 days now:

 # apt-get install keyutils
 # restart idmapd
 # nfsidmap -v -c

And now the keys do no longer expire:

2014218e I--Q--- 1 perm 3b010000 0 0 id_resolv uid:user@fqdn: 5

As already mentioned, this is working since several days now without any issues, my stress-test to check this is by the way:

 somedir$ for i in $(seq 100000); do touch $i;sleep 0.2;done
 somedir$ while (true); do ls -lR | grep 4294967294;done

I still do not know exactly why installing keyutils has solved the issue or why this package was not previously installed as a dependency, but hey, it is a workaround at least for me and maybe others.

Carl Hetherington (cth-carlh) wrote :

Hi Michael,

Thanks... installing keyutils seems to work for me too (without the kernel patch). I haven't investigated too closely, but it looks like the two fixes are sort-of equivalent. The userspace fix is far more appealing, though!

Bryan Quigley (bryanquigley) wrote :

Interesting.. keyutils doesn't seem to help in my case. I'm running ls on the ~27000 user accounts home directory..

I don't understand why this would help... all nfsidmap would do is clear it once, and then it can fill up again/expire again.

Carl Hetherington (cth-carlh) wrote :

Bryan: AFAICS the thing is that keyutils changes things so that the id_resolv uid:user@fqdn keys never expire. Without it, they expire after 10 minutes, and that triggers the bug which my kernel patch "fixes".

Bryan Quigley (bryanquigley) wrote :

@carlh
Ah, your kernel patch also fixes the case where the key cache get's filled. (Which is my issue)

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu Trusty):
status: New → Confirmed
Carl Hetherington (cth-carlh) wrote :

I think this patch:
http://article.gmane.org/gmane.linux.nfs/67156
is another fix for this bug. I'm sure it is more elegant than mine. @Bryan: perhaps you could test it?

Bryan Quigley (bryanquigley) wrote :

Works for original case, except nogroup now returns 4294967294, will ping list with results..

75 comments hidden view all 120 comments

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

David, can you look at comment #35 and comment #36 and weigh in? Are NFS keys to be counted towards the quota?

Changed in nfs-utils (Debian):
status: Incomplete → Confirmed

Just as an info. In a fedora 21 the default value for kernel.keys.maxkeys
seems to have been increased to 1000000, with no entry in sysctl.conf; also
the "maxbytes" has a very large default value.

Manually lowering that quota still exposes the problem. I have no idea if
the keys count against the root_maxkeys quota on purpose or not.

[I changed the fedora release for this bug report to 21]

Changed in nfs-utils (Debian):
status: Confirmed → Fix Released
Chris J Arges (arges) on 2015-03-26
Changed in linux (Ubuntu Utopic):
status: Won't Fix → In Progress
Changed in linux (Ubuntu Trusty):
status: Won't Fix → In Progress
description: updated
tags: added: cts
Chris J Arges (arges) on 2015-03-26
no longer affects: nfs-utils (Ubuntu)
no longer affects: nfs-utils (Ubuntu Trusty)
no longer affects: nfs-utils (Ubuntu Utopic)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Dariusz Gadomski (dgadomski)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Dariusz Gadomski (dgadomski)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Andy Whitcroft (apw) on 2015-04-01
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Brad Figg (brad-figg) on 2015-04-17
tags: added: verification-needed-trusty
tags: added: verification-needed-utopic
Brad Figg (brad-figg) on 2015-04-29
tags: added: verification-done-trusty verification-done-utopic
removed: verification-needed-trusty verification-needed-utopic
Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Changed in nfs-utils (Debian):
status: Fix Released → Confirmed

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 21 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Changed in fedora:
importance: Unknown → Critical
status: Unknown → Won't Fix
Displaying first 40 and last 40 comments. View all 120 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.