Xenial: neutron agents container unable to resolve names

Bug #1612412 reported by Michael Gugino
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Low
Markos Chandras

Bug Description

neutron agents container is unable to resolve host names using ping or apt-key. Strangely, apt-get seems to work.

Seeing the following in the lxc host's audit.log:

type=AVC msg=audit(1470946346.326:416208): apparmor="ALLOWED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 profile="/{usr/,}bin/ping" name="var/lib/lxc/aio1_neutron_agents_container-3e9a5936/delta0/etc/ld.so.cache" pid=25132 comm="ping" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

Failed name lookup - disconnect path seems to be related to lazily unmounting a drive:

http://wiki.apparmor.net/index.php/FAQ#Failed_name_lookup_-_disconnected_path

Unsure why this is happening, possibly overlay fs problem?

Revision history for this message
Michael Gugino (gugino-michael) wrote :

Restarting the container does not help. Destroying and rebuilding the container does not help. Other containers seem to be operating fine.

Revision history for this message
Michael Gugino (gugino-michael) wrote :

Replacing the container's rootfs from overlayfs to dir backed (normal lxc mode) fixes this issue.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

No one confirmed it yet.
Here is a link to our recent bug triage meeting where we discussed that bug:
http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2016-08-23.log.html#t2016-08-23T16:26:21

We'll be waiting for further input.

Revision history for this message
eil397 (anton-haldin) wrote :

I was trying to reproduce it. And looks like I'm able to reproduce it on my local env ( laptop):

root@ubuntu-xenial:/opt/openstack-ansible/playbooks# git branch -v
* master 8ecbf73 Merge "Removed variable changes table from the doc."

root@ubuntu-xenial:/opt/openstack-ansible/playbooks# ansible neutron_dhcp_agent -m shell -a "ping www.google.com"
aio1_neutron_agents_container-64a73a4a | FAILED | rc=127 >>
ping: error while loading shared libraries: libcap.so.2: cannot stat shared object: Permission denied

My steps to reproduce:
 - vagrant init ubuntu/xenial64
 - add config for persistent storage plugin [1]
 - vagrant ssh
 - git clone https://github.com/openstack/openstack-ansible /opt/openstack-ansible
 - cd /opt/openstack-ansible
 - scripts/bootstrap-ansible.sh
 - export BOOTSTRAP_OPTS="bootstrap_host_data_disk_device=sdc"
 - scripts/bootstrap-aio.sh
 - scripts/gate-check-commit.sh
 - ansible neutron_dhcp_agent -m shell -a "ping www.google.com"

[1] my config for second drive
config.persistent_storage.enabled = true
config.persistent_storage.location = "~/VirtualBox\ VMs/seconddriveosatrusty.vdi"
config.persistent_storage.size = 60000

Revision history for this message
eil397 (anton-haldin) wrote :

Small correction: "scripts/gate-check-commit.sh" should be changed to "scripts/run-playbooks.sh"

gate-check-commit.sh script can be used instead of sequence (bootstrap-aio.sh + run-playbooks.sh)

Revision history for this message
eil397 (anton-haldin) wrote :

traceroute and host also failed "Failed name lookup - disconnected path" [1]

flag "attach_disconnected" is way to resolve disconnected paths.[2]
patch example for bin.ping profile [3]

right now profile "unconfined" is used for neutron_agent [4]
and default profile "lxc-openstack" for containers already contains flag "attach_disconnected" [5]

I think this profile can be used for neutron_agent.
As next step I will check how it will work without 'lxc.cgroup.devices.allow=a *:* rmw' and bind mount of /lib/modules.

[1]
root@aio1-neutron-agents-container-26994ece:~# traceroute www.google.com
traceroute: error while loading shared libraries: libc.so.6: cannot stat shared object: Permission denied
root@aio1-neutron-agents-container-26994ece:~# ping 8.8.8.8
ping: error while loading shared libraries: libcap.so.2: cannot stat shared object: Permission denied

[2]
http://wiki.apparmor.net/index.php/ReleaseNotes_2_5
path name lookup and mediation of
Two new profile flags have been introduced to aid in mediation of disconnected paths. AppArmor's default behavior is to reject new accesses to disconnected paths reporting back the pathname without a leadeding / Unfortunately this can break some applications, if a profile must allow for mediation of disconnected paths then the profile flag attach_disconnected can be used. This prepend a leading / to the reported name, however this may not result in the original name of the file as AppArmor can only attach the file to root, not to its original location.
 /some/profile (attach_disconnected) {
   ...
 }

[3]
$ diff /etc/apparmor.d/bin.ping /etc/apparmor.d/bin.ping.old
13c13
< /{usr/,}bin/ping flags=(complain,attach_disconnected) {
---
> /{usr/,}bin/ping flags=(complain) {

[4] https://github.com/openstack/openstack-ansible/blob/master/playbooks/os-neutron-install.yml#L27

[5] https://github.com/openstack/openstack-ansible-lxc_hosts/blob/master/templates/lxc-openstack.apparmor.j2#L4

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Kevin Carter (kevin-carter)
Revision history for this message
Logan V (loganv) wrote :

I've only seen this happen on neutron-agents containers, but it is not isolated to DNS. This is from an 8/5/2016 AIO build:

root@aio1-neutron-agents-container-5f914532:/# stat /lib/x86_64-linux-gnu/libdbus-1.so.3.7.6
  File: ‘/lib/x86_64-linux-gnu/libdbus-1.so.3.7.6’
  Size: 281552 Blocks: 552 IO Block: 4096 regular file
Device: 801h/2049d Inode: 3152953 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-08-04 16:26:39.773635000 +0000
Modify: 2014-11-25 20:38:11.000000000 +0000
Change: 2016-08-04 16:26:39.785635000 +0000
 Birth: -
root@aio1-neutron-agents-container-5f914532:/# stat /lib/x86_64-linux-gnu/libdbus-1.so.3
  File: ‘/lib/x86_64-linux-gnu/libdbus-1.so.3’ -> ‘libdbus-1.so.3.7.6’
  Size: 18 Blocks: 0 IO Block: 4096 symbolic link
Device: 801h/2049d Inode: 3152990 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-08-04 16:26:39.889635000 +0000
Modify: 2014-11-25 20:38:06.000000000 +0000
Change: 2016-08-04 16:26:39.889635000 +0000
 Birth: -
root@aio1-neutron-agents-container-5f914532:/# dnsmasq
dnsmasq: error while loading shared libraries: libdbus-1.so.3: cannot stat shared object: Permission denied

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :
Revision history for this message
eil397 (anton-haldin) wrote :

@Logan
Thank you for details about these error messages with dnsmasq.
I just checked it on my aio installation without errors.

root@aio1-neutron-agents-container-3b0c3d0c:~# dnsmasq
root@aio1-neutron-agents-container-3b0c3d0c:~# ps ax |grep dnsmasq
 7846 ? S 0:00 dnsmasq
 7848 pts/4 S+ 0:00 grep --color=auto dnsmasq

Revision history for this message
eil397 (anton-haldin) wrote :

root@aio1-neutron-agents-container-3b0c3d0c:~# which dnsmasq
/usr/sbin/dnsmasq
root@aio1-neutron-agents-container-3b0c3d0c:~# ldd /usr/sbin/dnsmasq
        linux-vdso.so.1 => (0x00007fffff589000)
        libdbus-1.so.3 => /lib/x86_64-linux-gnu/libdbus-1.so.3 (0x00007fe616ee8000)
        libnetfilter_conntrack.so.3 => /usr/lib/x86_64-linux-gnu/libnetfilter_conntrack.so.3 (0x00007fe616ccc000)
        libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007fe616a95000)
        libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007fe616862000)
        libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fe6165e2000)
        libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007fe6163ae000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe615fe5000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe615dc8000)
        libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007fe615d42000)
        libnfnetlink.so.0 => /usr/lib/x86_64-linux-gnu/libnfnetlink.so.0 (0x00007fe615b3b000)
        libmnl.so.0 => /lib/x86_64-linux-gnu/libmnl.so.0 (0x00007fe615935000)
        /lib64/ld-linux-x86-64.so.2 (0x000055d8a8d74000)
        libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fe615712000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe61550a000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fe6152e8000)
        libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007fe615006000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fe614d96000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe614b92000)
        libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007fe61497d000)

Revision history for this message
eil397 (anton-haldin) wrote :

another one test:
 - i've changed aa_profile for neutron_server container from lxc-openstack to unconfined in container config.
 - restart neutron_server
 - ping 8.8.8.8

returns same error.

looks like: xenial + overlayfs without attache_disconected flag = errors "Failed name lookup - disconnected path"

another one test:
 - on installed aio host: lxc-clone --snapshot -B overlayfs -o ubuntu-xenial-amd64 -n test1
 - echo "lxc.aa_profile=unconfined" >> /var/lib/lxc/test1/config
 - lxc-start -n test1
 - ssh test1 "ping 8.8.8.8"

returns same error.

Revision history for this message
eil397 (anton-haldin) wrote :

test with new base container and clone:
 - lxc-create -n source-xenial-amd64 -t ubuntu-cloud -- --release=xenial --arch=amd64
 - lxc-clone --snapshot -B overlayfs -o source-xenial-amd64 -n test2
 - echo "lxc.aa_profile=unconfined" >> /var/lib/lxc/test2/config
 - pub key added though lxc-execute
 - ping 8.8.8.8

returns same error: ping: error while loading shared libraries: libcap.so.2: cannot stat shared object: Permission denied

Revision history for this message
eil397 (anton-haldin) wrote :

same scenario ( create new xenial container as base and clone ( copy) ) on empty xenial vm was finished without errors. looks like it is probably not upstream specific around xenial-overlayfs:lxc-apparmor(unconfined) but some specific related with set of configurations for osa( aio) ?

logs from /var/log/audit/audit.log:
------
type=AVC msg=audit(1472586116.191:962154): apparmor="ALLOWED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 profile="/{usr/,}bin/ping" name="source-xenial-amd64/rootfs/etc/ld.so.cache" pid=26478 comm="ping" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
type=SYSCALL msg=audit(1472586116.191:962154): arch=c000003e syscall=5 success=no exit=-13 a0=3 a1=7ffd5814fc00 a2=7ffd5814fc00 a3=7f0bf6109480 items=0 ppid=26460 pid=26478 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts4 ses=9279 comm="ping" exe="/bin/ping" key=(null)
type=AVC msg=audit(1472586116.191:962155): apparmor="ALLOWED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 profile="/{usr/,}bin/ping" name="source-xenial-amd64/rootfs/lib/x86_64-linux-gnu/libcap.so.2.24" pid=26478 comm="ping" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
type=SYSCALL msg=audit(1472586116.191:962155): arch=c000003e syscall=5 success=no exit=-13 a0=3 a1=7ffd5814fc50 a2=7ffd5814fc50 a3=2e6f732e70616362 items=0 ppid=26460 pid=26478 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts4 ses=9279 comm="ping" exe="/bin/ping" key=(null)
------

I can see that it was classified as "ALLOWED" by apparmor. Maybe some specific security settings are transform this "warning" "info="Failed name lookup - disconnected path" to "error while loading shared libraries: libcap.so.2: cannot stat shared object: Permission denied" ?

Revision history for this message
eil397 (anton-haldin) wrote :

Current workaround is remove bin.ping profile and reload profiles in apparmor.
After that ping is working in network container.

Another way is to add flag attach_disconected to bin.ping profie.

Potential log-term solution is to use lxc-openstack profile for neutron_agent container.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

I've self-assigned this due to inactivity. If someone else can pick this up and push a patch up in the next 12 hours, that's fine - otherwise I'll pick it up in the morning.

Changed in openstack-ansible:
assignee: Kevin Carter (kevin-carter) → Jesse Pretorius (jesse-pretorius)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/351776
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=0a5a11704fb6f85fb1bc151a54c6b4e9ea40a2be
Submitter: Jenkins
Branch: master

commit 0a5a11704fb6f85fb1bc151a54c6b4e9ea40a2be
Author: Jesse Pretorius <email address hidden>
Date: Fri Aug 5 15:42:58 2016 +0100

    Prevent overlayfs use in test when kernel < 3.18 or release == trusty

    The overlayfs version in kernel version < 3.18 was not production-ready
    and should be avoided on Trusty due general instability. This patch
    prevents overlayfs from being used when implementing an AIO with
    kernel version and release versions that are not not suitable.

    Related-Bug: #1612412
    Related-Bug: #1631690
    Change-Id: I224c27ed645c3f3817721baccd5d9e5ce19f3a03

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/386128

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible (stable/newton)

Reviewed: https://review.openstack.org/386128
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=6bbb3d5e1ab18f0cd375847a7087f99896651377
Submitter: Jenkins
Branch: stable/newton

commit 6bbb3d5e1ab18f0cd375847a7087f99896651377
Author: Jesse Pretorius <email address hidden>
Date: Fri Aug 5 15:42:58 2016 +0100

    Prevent overlayfs use in test when kernel < 3.18 or release == trusty

    The overlayfs version in kernel version < 3.18 was not production-ready
    and should be avoided on Trusty due general instability. This patch
    prevents overlayfs from being used when implementing an AIO with
    kernel version and release versions that are not not suitable.

    Related-Bug: #1612412
    Related-Bug: #1631690
    Change-Id: I224c27ed645c3f3817721baccd5d9e5ce19f3a03
    (cherry picked from commit 0a5a11704fb6f85fb1bc151a54c6b4e9ea40a2be)

tags: added: in-stable-newton
tags: added: newton-rc-potential
removed: in-stable-newton
Revision history for this message
Jirayut Nimsaeng (winggundamth) wrote :

Any update on this? I'm using stable/newton but still couldn't use ping inside neutron agent container

root@aio:~# uname -a
Linux dear-test 4.4.0-51-generic #72-Ubuntu SMP Thu Nov 24 18:29:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@aio:~# lxc-attach -n aio1_neutron_agents_container-de4c77e7
root@aio1-neutron-agents-container-de4c77e7:~# ping www.google.com
ping: error while loading shared libraries: libcap.so.2: cannot stat shared object: Permission denied
root@aio1-neutron-agents-container-de4c77e7:~#

In user_variables.yml

lxc_container_backing_store: "overlayfs"

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Removing myself as assignee as I have not managed to look into this.

Changed in openstack-ansible:
assignee: Jesse Pretorius (jesse-pretorius) → nobody
Revision history for this message
Jirayut Nimsaeng (winggundamth) wrote :

So anyone can guide me how to fix this problem? So I can submit the patch to fix this.

Revision history for this message
Jirayut Nimsaeng (winggundamth) wrote :

I can confirm that using lxc_container_backing_store: "dir" fix this problem.

Revision history for this message
Major Hayden (rackerhacker) wrote :

Is this still a problem? This bug has been open for quite some time and I can't reproduce it in Ocata/Pike.

Changed in openstack-ansible:
status: Confirmed → Incomplete
Revision history for this message
Jordan Callicoat (jcallicoat) wrote :

I just hit this on a newton AIO with lxc_container_backing_store: overlayfs.

RPCO commit 1a042f7d82 (pulling in OSA commit aa27606499)

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openstack-ansible because there has been no activity for 60 days.]

Changed in openstack-ansible:
status: Incomplete → Expired
Changed in openstack-ansible:
status: Expired → Confirmed
assignee: nobody → Markos Chandras (hwoarang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-lxc_container_create (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/535333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-lxc_container_create (master)

Reviewed: https://review.openstack.org/535333
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/commit/?id=887ebaa3ce4538f495729defee65bfde3f1bba05
Submitter: Zuul
Branch: master

commit 887ebaa3ce4538f495729defee65bfde3f1bba05
Author: Markos Chandras <email address hidden>
Date: Thu Jan 18 12:28:29 2018 +0000

    tests: Set lxc-openstack apparmor profile when overlayfs is used

    The overlayfs backing store doesn't play well with the unconfined
    profile and many tools (eg ping, traceroute) are failing to work
    with the following error:

    ping: error while loading shared libraries: libcap.so.2: cannot stat
    shared object: Permission denied

    As such, lets switch to the lxc-openstack profile is overlayfs is used
    as the backing store.

    Change-Id: Ibe1149ee4fedd2b3d487887e504c500c96165467
    Related-Bug: #1612412

Revision history for this message
Markos Chandras (hwoarang) wrote :
Changed in openstack-ansible:
status: Confirmed → Fix Committed
Mohammed Naser (mnaser)
Changed in openstack-ansible:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.