consistent failure with overlayfs and unix sockets

Bug #1214500 reported by Sidnei da Silva on 2013-08-20
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned

Bug Description

Seems like overlayfs and unix sockets are not playing well. It might be racy, as it fails consistently on my laptop with an SSD, whereas for smoser it only failed a single time.

Steps to reproduce:

REL="precise"
$ sudo lxc-create -n source-$REL-amd64 -t ubuntu-cloud -- \
   --release=$REL --arch=amd64

## clone via overlayfs ##
$ sudo lxc-clone --snapshot -B overlayfs -o source-$REL-amd64 -n $REL-overlayfs-01

$ sudo lxc-start -n $REL-overlayfs-01

### inside ###
$ sudo apt-get update && sudo apt-get install supervisor -y

$ sudo service supervisor stop
$ sudo sed -i.dist 's,var/run/*supervisor.sock,srv/supervisor.sock,' /etc/supervisor/supervisord.conf
$ sudo service supervisor start
$ sudo supervisorctl maintail
unix:///srv/supervisor.sock refused connection

Sidnei da Silva (sidnei) wrote :

Host:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Saucy Salamander (development branch)
Release: 13.10
Codename: saucy

Linux sidnei-laptop 3.11.0-3-generic #6-Ubuntu SMP Mon Aug 19 14:42:35 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ apt-cache policy lxc
lxc:
  Installed: 0.9.0.0~staging~20130819-1713-0ubuntu1~ppa1~saucy1
  Candidate: 0.9.0.0~staging~20130819-1713-0ubuntu1~ppa1~saucy1
  Version table:
 *** 0.9.0.0~staging~20130819-1713-0ubuntu1~ppa1~saucy1 0
        500 http://ppa.launchpad.net/ubuntu-lxc/daily/ubuntu/ saucy/main amd64 Packages
        100 /var/lib/dpkg/status
     0.9.0-0ubuntu22 0
        500 http://br.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages

Guest:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise

Scott Moser (smoser) wrote :

Attaching 'serv.py' which mimics behavior of supervisor/http.py in supervisor for setting up the socket, and then implements an 'echo' server. client.py is just a simple client to run and show the problem.

One would expect to be able to do this:
$ ./serv.py /tmp/my-socket &
$ ./client.py /tmp/my-socket
Received 'Hello, world'

The behavior seems to differ between precise and saucy kernels.
One thing to note is that on precise kernel and overlayfs, I see:
$ sudo rm -f /tmp/my-socket

$ ./serv.py /tmp/my-socket
Traceback (most recent call last):
  File "./serv.py", line 22, in <module>
    os.link(tempname, socketname)
OSError: [Errno 1] Operation not permitted

$ sudo rm -f /tmp/my-socket
$ sudo ./serv.py /tmp/my-socket &
$ sudo ./client.py /tmp/my-socket
Received 'Hello, world'

Ie, this works as root, but not as non-root.

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

I added 'linux' as this is at very least a change in behavior between 12.04 kernel and saucy kernel.

Serge Hallyn (serge-hallyn) wrote :

Regarding creating the socket as a non-root user - did you chown the directory to your user's uid? I have no problem once I've done that in running the server.

I have the same issue with the client:

sudo mkdir /mnt2 /upper
sudo mount -t overlayfs -o lowerdir=/mnt2,upperdir=/upper none /mnt
sudo chown -R ubuntu: /mnt
./serv.py /mnt/sock &
./client.py /mnt/sock

Traceback (most recent call last):
  File "./client.py", line 7, in <module>
    s.connect(socketname)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1214500

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Serge Hallyn (serge-hallyn) wrote :

Oh, never mind, I see - that's on precise. and yes I get the same result all around.

Scott Moser (smoser) wrote :

one simple little test that i see differing results depending on kernel is 'mkfifo foo && ln foo bar'.
Heres a test script that demos that, and output on precise, raring, saucy (no containers used).

== raring: 3.8.0-27-generic ==
$ ls -ld .
drwxrwxrwt 5 root root 4096 Aug 21 19:16 .
$ sudo rm -Rf lower upper mp
$ mkdir lower upper mp
$ sh -c 'echo hi > lower/f1.txt'
$ sudo mount -t overlayfs -o lowerdir=/tmp/lower,upperdir=/tmp/upper overlay /tmp/mp
$ sh -c 'cd mp && touch foo && ln foo bar'
file link: user: PASS
$ sudo sh -c 'cd mp && touch foo && ln foo bar'
file link: root: PASS
$ sh -c 'cd mp && mkfifo foo && ln foo bar'
fifo link: user: PASS
$ sudo sh -c 'cd mp && mkfifo foo && ln foo bar'
fifo link: user: PASS

== saucy: 3.11.0-2-generic ==
$ ls -ld .
drwxrwxrwt 17 root root 32768 Aug 21 15:17 .
$ sudo rm -Rf lower upper mp
[sudo] password for smoser:
$ mkdir lower upper mp
$ sh -c 'echo hi > lower/f1.txt'
$ sudo mount -t overlayfs -o lowerdir=/tmp/lower,upperdir=/tmp/upper overlay /tmp/mp
$ sh -c 'cd mp && touch foo && ln foo bar'
file link: user: PASS
$ sudo sh -c 'cd mp && touch foo && ln foo bar'
file link: root: PASS
$ sh -c 'cd mp && mkfifo foo && ln foo bar'
fifo link: user: PASS
$ sudo sh -c 'cd mp && mkfifo foo && ln foo bar'
fifo link: user: PASS

== precise: 3.2.0-51-virtual ==
$ ls -ld .
drwxrwxrwt 8 root root 4096 Aug 21 19:18 .
$ sudo rm -Rf lower upper mp
$ mkdir lower upper mp
$ sh -c 'echo hi > lower/f1.txt'
$ sudo mount -t overlayfs -o lowerdir=/tmp/lower,upperdir=/tmp/upper overlay /tmp/mp
$ sh -c 'cd mp && touch foo && ln foo bar'
file link: user: PASS
$ sudo sh -c 'cd mp && touch foo && ln foo bar'
file link: root: PASS
$ sh -c 'cd mp && mkfifo foo && ln foo bar'
ln: failed to create hard link `bar' => `foo': Operation not permitted
fifo link: user: FAIL[1]
$ sudo sh -c 'cd mp && mkfifo foo && ln foo bar'
fifo link: user: PASS

no longer affects: lxc (Ubuntu)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Derek Simkowiak (derek-x) wrote :

Still there in 14.04.

Please be advised, this bug breaks the SSH ControlMaster feature, which in turn breaks Android builds.

I am trying to use LXC containers (snapshot clones using overlayfs) to build Android (12.04 container on a 14.04 host). Android's build utility "repo" is a python script that uses SSH with an SSH ControlMaster (shared SSH session using a FIFO socket). It creates a master connection like this:

  ssh -M -N -p 29422 -o ControlPath=/tmp/ssh-ujjH6W/master-%r@%h:%p my-upstream-android.mirror.com

...and then it reuses that connection to do a git pull (from gerrit) for 100+ git repos (called "projects" in Android lingo).

Since pipes don't work, running "./repo sync" results in a bunch of errors like this:

Control socket connect(/<email address hidden>): Connection refused
Control socket connect(/<email address hidden>): Connection refused
Control socket connect(/<email address hidden>): Connection refused
...etc...

This error message was hard to diagnose and find. I had to know to look for "LXC Unix Sockets", specifically leaving out the terms Android, repo, ssh, etc. Even then it was the fifth Google result down.

Derek Simkowiak (derek-x) wrote :

There is a similar issue with bindfs (SSH fails with the same error when using ControlMaster). Maybe it's related. See:

http://code.google.com/p/bindfs/issues/detail?id=27

Tomasz Cholewa (slashroot) wrote :

Hi guys,

I've been struggling with similar issue in centos 6.6. In previous version (6.5) everything worked fine, but after update I wasn't able to set socket for ssh control master on overlayfs. I made some tedious research and comparisons and it turned out that it's not about the kernel (made few downgrades) but ssh.
According to https://bugzilla.redhat.com/show_bug.cgi?id=953088 in ssh version included in RHEL 6.6 (and CentOS) there was a patch witch modifies behaviour of creating control socket. When I downgraded ssh packages (openssh{,-clients,-server}) to the version without that patch (5.3p1-94.el6) everything worked like a charm!

I hope it will also help you to solve your problem in Ubuntu.

Ryan Lane (rlane32) wrote :

Looks like this is fixed at least in 16.04 via: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1607404

The change in the kernel was: https://github.com/torvalds/linux/commit/30402c8949934fbaca07d9c20074d0d7a5a8385f

In the release notes in #1607404 the following are listed, which seem to match the fix:

    - vfs: add d_real_inode() helper
    - af_unix: fix hard linked sockets on overlay

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.