hostnamectl fails under lxd unpriv container

Bug #1575779 reported by Ryan Harper
144
This bug affects 26 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Confirmed
Critical
Unassigned

Bug Description

1. % lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

2. % apt-cache policy apparmor
apparmor:
  Installed: 2.10.95-0ubuntu2
  Candidate: 2.10.95-0ubuntu2
  Version table:
 *** 2.10.95-0ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status
% apt-cache policy lxd
lxd:
  Installed: 2.0.0-0ubuntu4
  Candidate: 2.0.0-0ubuntu4
  Version table:
 *** 2.0.0-0ubuntu4 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status

3. lxc launch ubuntu-daily:xenial x1
    lxc exec x1 /bin/bash

root@x1:~# hostnamectl status
   Static hostname: x1
         Icon name: computer-container
           Chassis: container
        Machine ID: 833b8548c7ce4118b4c9c5c3ae4f133d
           Boot ID: 9d5fbb053cf7494589c0863a0a4cf0ca
    Virtualization: lxc
  Operating System: Ubuntu 16.04 LTS
            Kernel: Linux 4.4.0-18-generic
      Architecture: x86-64

4. hostnamectl status hangs indefinitely

On the host, there are some audit messages for each invocation of hostnamectl

[411617.032274] audit: type=1400 audit(1461695563.731:100): apparmor="DENIED" operation="file_lock" profile="lxd-x1_</var/lib/lxd>" pid=17100 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none

It's related to socket activation. One can workaround this by running systemd-hostnamed in the background first

root@x1:~# /lib/systemd/systemd-hostnamed &
[1] 2462
root@x1:~# hostnamectl status
   Static hostname: x1
         Icon name: computer-container
           Chassis: container
        Machine ID: 833b8548c7ce4118b4c9c5c3ae4f133d
           Boot ID: 9d5fbb053cf7494589c0863a0a4cf0ca
    Virtualization: lxc
  Operating System: Ubuntu 16.04 LTS
            Kernel: Linux 4.4.0-18-generic
      Architecture: x86-64

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: apparmor 2.10.95-0ubuntu2
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: GNOME-Flashback:GNOME
Date: Wed Apr 27 11:19:27 2016
InstallationDate: Installed on 2016-01-01 (117 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20151209)
ProcKernelCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=e0b8b294-f364-4ef5-aa70-1916cdd37192 ro quiet splash vt.handoff=7
SourcePackage: apparmor
Syslog:

UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Tyler Hicks (tyhicks) wrote :

Thanks for the bug report. The problem is now understood. systemd is calling lockf() on an anonymous socket file and the AppArmor profile language does not support a way to grant file locking permissions on a socket that does not have a path associated with it.

The AppArmor socket file rule type needs to gain a new permission for file locking. This will require changes to the kernel and apparmor_parser and, eventually, the AppArmor Python utilities.

Changed in apparmor (Ubuntu):
status: New → Triaged
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This is also showing up in other places, including a java app called Maven

https://github.com/lxc/lxc/issues/1023

Changed in apparmor (Ubuntu):
importance: Undecided → High
Revision history for this message
Dongwon Cho (dongwoncho) wrote :

When running chef-client, it calls hostnamectl so gets hung as well when running on LXD container.

Revision history for this message
Peter Hallen (pete-hallen) wrote :

Seeing this as well for Ansible against LXC containers.

ansible 2.2.0.0

fatal: [somehost.tld]: FAILED! => {
    "changed": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "name": "somehost.tld"
        },
        "module_name": "hostname"
    },
    "msg": "Command failed rc=1, out=, err=Could not set property: Activation of org.freedesktop.hostname1 timed out\n"
}

Revision history for this message
Christian Reis (kiko) wrote :

I also see this trigger with juju-deployed jenkins and jenkins-slave services against the lxd provider:

   apparmor="DENIED" operation="file_lock" profile="lxd-juju-449b90-9_</var/lib/lxd>" pid=18662 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none

Revision history for this message
Dongwon Cho (dongwoncho) wrote :

A poor workaround.

don@node02:~$ time /usr/bin/hostnamectl

real 0m25.031s
user 0m0.000s
sys 0m0.004s

don@node02:~$ sudo mv /usr/bin/hostnamectl /usr/bin/hostnamectl_bak

don@node02:~$ sudo bash -c "cat << EOF1 > /usr/bin/hostnamectl
> cat << EOF2
> Static hostname: $(hostname)
> Icon name: computer-server
> Chassis: server
> Machine ID: IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> Boot ID: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
> Operating System: $(echo -e `lsb_release --description | awk -F ':' '{print $2}'`)
> Kernel: Linux $(uname -r)
> Architecture: $(uname -i)
> EOF2
> EOF1"

don@node02:~$ sudo chmod +x /usr/bin/hostnamectl
don@node02:~$ time /usr/bin/hostnamectl
   Static hostname: node02
         Icon name: computer-server
           Chassis: server
        Machine ID: IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
           Boot ID: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
  Operating System: Ubuntu 16.04.2 LTS
            Kernel: Linux 4.4.0-83-generic
      Architecture: x86_64

real 0m0.007s
user 0m0.004s
sys 0m0.000s

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

systemd-hostnamed.service in artful specifies PrivateNetwork=yes, however this fails to setup under upriv container, and thus systemd-hostnamed fails to even start now:

root@test20170919:~# systemctl status systemd-hostnamed
● systemd-hostnamed.service - Hostname Service
   Loaded: loaded (/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2017-10-14 23:41:54 UTC; 1min 34s ago
     Docs: man:systemd-hostnamed.service(8)
           man:hostname(5)
           man:machine-info(5)
           https://www.freedesktop.org/wiki/Software/systemd/hostnamed
  Process: 1245 ExecStart=/lib/systemd/systemd-hostnamed (code=exited, status=225/NETWORK)
 Main PID: 1245 (code=exited, status=225/NETWORK)
      CPU: 909us

Oct 14 23:41:54 test20170919 systemd[1]: systemd-hostnamed.service: Failed to set invocation ID on control group /system.slice/systemd-hostnamed.service, ignoring: Operation not permitted
Oct 14 23:41:54 test20170919 systemd[1]: Starting Hostname Service...
Oct 14 23:41:54 test20170919 systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=225/NETWORK
Oct 14 23:41:54 test20170919 systemd[1]: Failed to start Hostname Service.
Oct 14 23:41:54 test20170919 systemd[1]: systemd-hostnamed.service: Unit entered failed state.
Oct 14 23:41:54 test20170919 systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'.

Not sure how to get this fixed.

tags: added: rls-bb-incoming
Revision history for this message
Ryan Harper (raharper) wrote :

Likely related, but in Artful systemd-networkd is setting the hostname and has a 10 second timeout:

# systemctl status --no-pager -l systemd-networkd
● systemd-networkd.service - Network Service
   Loaded: loaded (/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-11-09 15:20:37 UTC; 1h 7min ago
     Docs: man:systemd-networkd.service(8)
 Main PID: 146 (systemd-network)
   Status: "Processing requests..."
    Tasks: 1 (limit: 4915)
   Memory: 1.4M
      CPU: 32ms
   CGroup: /system.slice/systemd-networkd.service
           └─146 /lib/systemd/systemd-networkd

Nov 09 15:20:37 a2 systemd[1]: systemd-networkd.service: Failed to set invocation ID on control group /system.slice/systemd-networkd.service, ignoring: Operation not permitted
Nov 09 15:20:37 a2 systemd[1]: Starting Network Service...
Nov 09 15:20:37 a2 systemd-networkd[146]: eth0: Gained IPv6LL
Nov 09 15:20:37 a2 systemd-networkd[146]: Enumeration completed
Nov 09 15:20:37 a2 systemd[1]: Started Network Service.
Nov 09 15:20:40 a2 systemd-networkd[146]: eth0: DHCPv4 address 10.245.119.172/24 via 10.245.119.1
Nov 09 15:20:40 a2 systemd-networkd[146]: Not connected to system bus, ignoring transient hostname.
Nov 09 15:20:49 a2 systemd-networkd[146]: eth0: Configured
Nov 09 15:21:18 a2 systemd-networkd[146]: Could not set hostname: Method call timed out

Revision history for this message
Stéphane Graber (stgraber) wrote :

Someone with systemd knowledge should check what PrivateNetwork actually does. The name implies it's unsharing a new network namespace, which is perfectly fine to do inside a container.

So the fact that it's failing hints that it's in fact trying to do something more than that.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1575779] Re: hostnamectl fails under lxd unpriv container

I can confirm that if I set PrivateNetwork=no that hostnamed runs and boot
is magically 10 seconds faster.

On Thu, Nov 9, 2017 at 1:46 PM, Stéphane Graber <email address hidden>
wrote:

> Someone with systemd knowledge should check what PrivateNetwork actually
> does. The name implies it's unsharing a new network namespace, which is
> perfectly fine to do inside a container.
>
> So the fact that it's failing hints that it's in fact trying to do
> something more than that.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1575779
>
> Title:
> hostnamectl fails under lxd unpriv container
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/
> 1575779/+subscriptions
>

Revision history for this message
tonyk (s-launchpad-anroet-com) wrote :

Thanks for that PrivateNetwork=no hint - works like a charm!

For those that need this, follow the steps below:

1. systemctl edit systemd-hostnamed

   Add the 2 lines below then exit the editor (don't forget to save when prompted):

   [Service]
   PrivateNetwork=no

2. This will create an override.conf file with the above 2 lines in the directory:

   /etc/systemd/system/systemd-hostnamed.service.d/

3. The update systemd:

   systemctl daemon-reload

4. Then restart the service:

   systemctl restart systemd-hostnamed

You should now be able to run hostnamectl without it hanging.

Revision history for this message
tonyk (s-launchpad-anroet-com) wrote :

Comment on post #12 above (as one cannot edit):

Step 4 can be omitted as I don't think the service needs to be restarted.

I think the hostnamectl command starts this service on demand when changing the hostname.

Revision history for this message
Christian Brauner (cbrauner) wrote :

Hey, so we're seeing an instance of this issue and the problem is that a lock is taken on an fd instead of a path. This should be legal and we urgently need a fix for this since this is starting to break all systemd services running in a container that use PrivateUsers= and anything else that hits the following codepath:

        if (lockf(netns_storage_socket[0], F_LOCK, 0) < 0)
                return -errno;

in systemd.

Changed in apparmor (Ubuntu):
status: Triaged → Confirmed
importance: High → Critical
Revision history for this message
Wolfgang Bumiller (wbumiller) wrote :

For completeness here's a minimal test case not requiring systemd:

/*
# apparmor_parser -r /etc/apparmor.d/bug-profile
# (tested without the flags here as well btw.)
profile bug-profile flags=(attach_disconnected,mediate_deleted) {
   network,
   file,
   unix,
}

# gcc this.c
# ./a.out
lock = 2 (Success)
# aa-exec -p bug-profile ./a.out
lock = 2 (Permission denied)

kernel: audit: type=1400 audit(1530774919.510:93): apparmor="DENIED" operation="file_lock" profile="bug-profile" pid=21788 comm="a.out" family="unix" sock_type="dgram" protocol=0 addr=none
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/file.h>

int
main(int argc, char **argv)
{
 int sp[2];
 if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sp) != 0) {
  perror("socketpair");
  exit(1);
 }
 int rc = flock(sp[0], LOCK_EX);
 printf("lock = %i (%m)\n");

 close(sp[0]);
 close(sp[1]);
 return 0;
}

Revision history for this message
Christian Brauner (cbrauner) wrote :

So, the good news is that this is all fixed upstream starting with 4.17 with the socket mediation patchset that got merged a short while ago. The bad news is that we need to get this patchset backported and it is quite large:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80a17a5f501ea048d86f81d629c94062b76610d4

Changed in apparmor (Ubuntu):
status: Confirmed → Fix Committed
Changed in apparmor (Ubuntu):
status: Fix Committed → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

Marked as duplicate of 1780227 even though this bug report predates it, simply because the newer bug report has more discussion about how to actually get this resolved.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.