open(2) returns EOVERFLOW within tmpfs+userns

Bug #1659087 reported by Jonathan Calmels
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned
Xenial
Confirmed
High
Unassigned

Bug Description

On Ubuntu 4.4.0-59.80-generic 4.4.35, open(2) returns EOVERFLOW when creating a file in tmpfs with user namespace enabled.

This issue wasn't present in 4.4.0-47 and has probably been introduced by https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634964

Step to reproduce:

$ unshare -r -U -m /bin/bash
# mount -t tmpfs tmpfs /mnt
# echo $$
2354

In another terminal:

$ sudo nsenter -t 2354 -m
# touch /mnt/foo
touch: cannot touch '/mnt/foo': Value too large for defined data type

Note that we are not joining the user namespace when creating the file but we would expect `touch' to succeed and create the file with an inode set to INVALID_UID/GID (i.e. nobody:nogroup) within the mount namespace.

Tags: xenial
Jonathan Calmels (3xx0)
summary: - open(2) returns EOVERFLOW with tmpfs+userns
+ open(2) returns EOVERFLOW within tmpfs+userns
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Seth Forshee (sforshee) wrote :

Iirc, as of 4.8 what you're seeing is upstream behavior, and yes it did get backported to xenial in the series you referenced.

Even if the inode is created with INVALID_UID/INVALID_GID you aren't going to be able to do anything with it. So I guess the question is why you need to be able to do that and whether or not you can accomplish that some other way. If not then the behavior would need to change upstream - even if we fix it in xenial 4.4 kernels you'll probably just hit it again later.

Note that you don't actually need to enter the user namespace to create the file, you just need fsuid/fsgid to be ids which have a mapping in the user ns.

Revision history for this message
Jonathan Calmels (3xx0) wrote :

Interesting, I tried to reproduce with a Debian strech 4.8.0-2-amd64 (Debian 4.8.15-2) and 4.9.0-1-amd64 (Debian 4.9.2-2) and didn't hit the error.

This seems to be specific to tmpfs though, touching the file anywhere else yields the nobody:nogroup mapping.

Regarding my use-case, I'm creating the file in order to bind-mount on top of it. Container runtimes usually expose host files (e.g. char device) within containers similarly.

Revision history for this message
Seth Forshee (sforshee) wrote : Re: [Bug 1659087] Re: open(2) returns EOVERFLOW within tmpfs+userns

On Tue, Jan 24, 2017 at 09:37:18PM -0000, Jonathan Calmels wrote:
> Interesting, I tried to reproduce with a Debian strech 4.8.0-2-amd64
> (Debian 4.8.15-2) and 4.9.0-1-amd64 (Debian 4.9.2-2) and didn't hit the
> error.

Odd. The commit responsible for this should be 036d523641c6 "vfs: Don't
create inodes with a uid or gid unknown to the vfs" which did go into
4.8. I'll have to take another look.

> This seems to be specific to tmpfs though, touching the file anywhere
> else yields the nobody:nogroup mapping.

It would happen for any filesystem which was mounted from within the
user namespace.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key xenial
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Jonathan Calmels (3xx0) wrote :

I can't reproduce the issue on Fedora 4.9.5-200.fc25.x86_64 either.
As you pointed out, setting the fsuid/fsgid to a known userns mapping does work.

tags: removed: kernel-da-key
Revision history for this message
Seth Forshee (sforshee) wrote :

I've been in communication with the upstream namespace maintainer, and the intention was certainly that what you're doing should fail. However there was an oversight that missed the O_CREAT case. Due to some differences in 4.4 the backport did cover that case.

As I mentioned above, you can still do what your trying to do, you just need to make sure your process's fsuid/fsgid are mapped into the containers user namespace before creating the file.

Revision history for this message
Jonathan Calmels (3xx0) wrote :

Glad you figured it out, I suspected this had something to do with the may_create/may_o_create code paths but couldn't wrap my head around it. Hopefully this will be addressed upstream in the near future.

This change in behavior is surprising though, maybe this should be documented in user_namespaces(7).

Revision history for this message
Jonathan Calmels (3xx0) wrote :

Trying the fsuid/fsgid workaround, I came across another oddity:

$ id -u
1000
$ id -g
1000
$ unshare -r -U -m /bin/bash
# mount -t tmpfs tmpfs /mnt
# chmod 555 /mnt
# ls -ldn /mnt
dr-xr-xr-x 2 0 0 40 Jan 26 14:15 /mnt
# echo $$
2354

In another terminal:

$ sudo nsenter -G 1000 -S 1000 -t 2354 -m
$ ls -ldn /mnt
dr-xr-xr-x 2 1000 1000 40 Jan 26 14:10 /mnt
$ touch /mnt/foo
touch: cannot touch '/mnt/foo': Permission denied

Even though I'm supposed to be root in the context of the user namespace, I can't create the file because I'm lacking the write permission on the mount directory.
In this case, setting the fsuid/fsgid is not sufficient, I have to join the user namespace if I want the permissions to be resolved correctly.

Revision history for this message
Jonathan Calmels (3xx0) wrote :

Thinking more about it, this might be due to the lack of CAP_DAC_OVERRIDE so I guess I need this too right?

Revision history for this message
Seth Forshee (sforshee) wrote :

On Thu, Jan 26, 2017 at 10:42:44PM -0000, Jonathan Calmels wrote:
> Thinking more about it, this might be due to the lack of
> CAP_DAC_OVERRIDE so I guess I need this too right?

Yes, since you removed write permissions from the directory you're not
able to write to it without capabilities.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.