Bug #1595451 “Qemu with GlusterFS Libgfapi access to VM storage ...” : Bugs : AppArmor

André Bauer (monotek) on 2016-06-23

tags:	added: glusterfs kvm libgfapi
tags:	added: libvirt

André Bauer (monotek) on 2016-06-23

description:

updated

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2016-06-23:

#1

I believe the info="label not found" portion of the log means that the profile for that specific VM isn't loaded into the kernel. Check /sys/kernel/security/apparmor/profiles on both the source and destination machines to make sure that the VM-specific profile is loaded on both.

THanks

Revision history for this message

André Bauer (monotek) wrote on 2016-06-24:

#2

I did a "watch -n1 ls -al /sys/kernel/security/apparmor/policy/profiles" while migrating and for a short moment the directory was there on the Xenial server until the migration error popped up:

drwxr-xr-x 3 root root 0 Jun 24 10:57 libvirt-e86cf3f9-970f-bff1-d689-75a0a2d45a5d.12

On the trusty server the file is also there but has another name.

Revision history for this message

Christian Boltz (cboltz) wrote on 2016-06-25:

#3

/sys/fs/cgroup/devices/machine/** rwa, is an invalid rule (and therefore the profile will fail to load) because w and a conflict. w includes a permissions, so please change the rule to use rw.

Revision history for this message

André Bauer (monotek) wrote on 2016-06-27:

#4

Using only "/sys/fs/cgroup/devices/machine/** rw" i get:

type=VIRT_RESOURCE msg=audit(1467035730.817:52434): pid=4767 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm resrc=cgroup reason=deny vm="checkbox" uuid=e86cf3f9-970f-bff1-d689-75a0a2d45a5d cgroup="/sys/fs/cgroup/devices/machine/checkbox.libvirt-qemu/" class=all exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'

Revision history for this message

Stephen (belrik) wrote on 2016-08-17:

#5

Have you discovered any solution to the problem of using GlusterFS with libgfapi block devices in Xenial? I use your PPA which worked perfectly on Trusty and it did work for a brief time on Xenial but today I attempted to restart a VM after a routine apt-get upgrade and found that I hit this bug. I'm confused as to what exactly changed as I had managed to start a VM using libgfapi under Xenial prior to today but today's upgrade killed this functionality. Maybe the clue is in recent qemu updates?

Revision history for this message

André Bauer (monotek) wrote on 2016-08-25:

#6

No, sorry. I'm back on trusty...

Revision history for this message

Stephen (belrik) wrote on 2016-09-08:

#7

Shame. Wish the Debian upstream developers would accept the Gluster team's versioning scheme and then this would be enabled by default and have many more eyeballs on it. Seems that GlusterFS's implementation of version numbers and the point at which they choose to break/retain compatibility is the crux of the whole problem and why it doesn't appear in Ubuntu's QEMU by default.

Revision history for this message

André Bauer (monotek) wrote on 2016-09-08:

#8

Nö, its actually because glusterfs is in universe repo.

Already added a mainline inclusion request to the bugtracker but no progress:

https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247

Revision history for this message

André Bauer (monotek) wrote on 2016-11-26:

#9

Same problem with GlusterFS 3.8.6 and Xenial Qemu client...

Revision history for this message

Helensvale Technology Group (htgsolutions) wrote on 2017-02-15:

#10

Download full text (3.3 KiB)

Hi, are there any updates on this issue? I just ran in to this problem on Debian 8 and glusterfs 3.8.8 and qemu 2.8 packagesI backported from unstable. I found this bug while searching for possible solutions.

infrastructure@us-vm-1:~# sudo qemu-img create -f qcow2 gluster://127.0.0.1:24007/datastore/testvm.qcow2 10G
Formatting 'gluster://127.0.0.1:24007/datastore/testvm.qcow2', fmt=qcow2 size=10
737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2017-02-15 00:17:35.360852] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361066] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361111] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361233] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387340] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387432] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387613] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387651] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344465] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344558] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344721] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344767] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236208] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236270] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236410] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline u...

Hi, are there any updates on this issue? I just ran in to this problem on Debian 8 and glusterfs 3.8.8 and qemu 2.8 packagesI backported from unstable. I found this bug while searching for possible solutions.

infrastructure@us-vm-1:~# sudo qemu-img create -f qcow2 gluster://127.0.0.1:24007/datastore/testvm.qcow2 10G
Formatting 'gluster://127.0.0.1:24007/datastore/testvm.qcow2', fmt=qcow2 size=10
737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2017-02-15 00:17:35.360852] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361066] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361111] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361233] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387340] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387432] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387613] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387651] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344465] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344558] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344721] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344767] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236208] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236270] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236410] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236461] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2017-02-15:

#11

htg, check dmesg | grep DENIED to see if AppArmor profiles are blocking your progress. If not, you may need to open a new bug elsewhere. Thanks.

Revision history for this message

Stephen (belrik) wrote on 2017-05-31:

#12

Download full text (3.5 KiB)

Hi, I was doing some maintenance on my Gluster shares and checked in on this bug again, it's still an issue. Here are logs when (attempting to) start a VM via libgfapi:

# qemu-system-x86_64 -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none
[2017-05-31 15:32:10.931097] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-VM-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translator names.
[2017-05-31 15:32:10.934558] E [socket.c:2310:socket_connect_finish] 0-VM-client-2: connection to 192.168.0.197:49153 failed (Connection refused)
[2017-05-31 15:32:10.935614] E [socket.c:2310:socket_connect_finish] 0-VM-client-1: connection to 192.168.0.20:49155 failed (Connection refused)
[2017-05-31 15:32:10.945254] E [MSGID: 108006] [afr-common.c:4799:afr_notify] 0-VM-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-05-31 15:32:10.945527] W [MSGID: 108001] [afr-common.c:4888:afr_notify] 0-VM-replicate-0: Client-quorum is not met
qemu-system-x86_64: -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none: Could not open 'gluster://127.0.0.1:24007/VM/centos-test.qcow2': No such file or directory

I've added these lines to /etc/apparmor.d/abstractions/libvirt-qemu to avoid any DENIED messages:

# For gluster use
  /usr/lib/x86_64-linux-gnu/glusterfs/** rmix,
  /proc/sys/net/ipv4/ip_local_reserved_ports r,
  /tmp/** rwcx,

DMESG output:

[270519.573087] audit: type=1400 audit(1496243895.879:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/libvirtd" pid=3075 comm="apparmor_parser"
[270519.661002] audit: type=1400 audit(1496243895.967:29): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/dhcpd" pid=3079 comm="apparmor_parser"
[270519.673425] audit: type=1400 audit(1496243895.979:30): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/ntpd" pid=3078 comm="apparmor_parser"
[270519.685399] audit: type=1400 audit(1496243895.991:31): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/tcpdump" pid=3081 comm="apparmor_parser"
[270519.736547] audit: type=1400 audit(1496243896.043:32): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/libvirt/virt-aa-helper" pid=3076 comm="apparmor_parser"
[270519.799579] audit: type=1400 audit(1496243896.107:33): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/sbin/dhclient" pid=3082 comm="apparmor_parser"
[270519.799883] audit: type=1400 audit(1496243896.107:34): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=3082 comm="apparmor_parser"
[270519.800152] audit: type=1400 audit(1496243896.107:35): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=3082 comm="apparmor_parser"
[270519.800416] audit: type=1400 audit(1496243896.107:36): apparmor="STATUS" operation="profile_replace" profile="unc...

Hi, I was doing some maintenance on my Gluster shares and checked in on this bug again, it's still an issue. Here are logs when (attempting to) start a VM via libgfapi:

# qemu-system-x86_64 -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none
[2017-05-31 15:32:10.931097] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-VM-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translator names. 
[2017-05-31 15:32:10.934558] E [socket.c:2310:socket_connect_finish] 0-VM-client-2: connection to 192.168.0.197:49153 failed (Connection refused)
[2017-05-31 15:32:10.935614] E [socket.c:2310:socket_connect_finish] 0-VM-client-1: connection to 192.168.0.20:49155 failed (Connection refused)
[2017-05-31 15:32:10.945254] E [MSGID: 108006] [afr-common.c:4799:afr_notify] 0-VM-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-05-31 15:32:10.945527] W [MSGID: 108001] [afr-common.c:4888:afr_notify] 0-VM-replicate-0: Client-quorum is not met
qemu-system-x86_64: -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none: Could not open 'gluster://127.0.0.1:24007/VM/centos-test.qcow2': No such file or directory

I've added these lines to /etc/apparmor.d/abstractions/libvirt-qemu to avoid any DENIED messages:

# For gluster use
  /usr/lib/x86_64-linux-gnu/glusterfs/** rmix,
  /proc/sys/net/ipv4/ip_local_reserved_ports r,
  /tmp/** rwcx,

DMESG output:

[270519.573087] audit: type=1400 audit(1496243895.879:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/libvirtd" pid=3075 comm="apparmor_parser"
[270519.661002] audit: type=1400 audit(1496243895.967:29): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/dhcpd" pid=3079 comm="apparmor_parser"
[270519.673425] audit: type=1400 audit(1496243895.979:30): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/ntpd" pid=3078 comm="apparmor_parser"
[270519.685399] audit: type=1400 audit(1496243895.991:31): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/tcpdump" pid=3081 comm="apparmor_parser"
[270519.736547] audit: type=1400 audit(1496243896.043:32): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/libvirt/virt-aa-helper" pid=3076 comm="apparmor_parser"
[270519.799579] audit: type=1400 audit(1496243896.107:33): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/sbin/dhclient" pid=3082 comm="apparmor_parser"
[270519.799883] audit: type=1400 audit(1496243896.107:34): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=3082 comm="apparmor_parser"
[270519.800152] audit: type=1400 audit(1496243896.107:35): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=3082 comm="apparmor_parser"
[270519.800416] audit: type=1400 audit(1496243896.107:36): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=3082 comm="apparmor_parser"
[270523.115465] audit: type=1400 audit(1496243899.423:37): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/libvirtd" pid=3152 comm="apparmor_parser"

Is anything else missing from apparmor? It appears that qemu isn't able to connect to the ports required for each of the GlusterFS bricks on the remote systems.

Revision history for this message

Stephen (belrik) wrote on 2017-05-31:

#13

I should add that I'm using qemu-kvm 1.2.5 with glusterFS 3.10.2 on Xenial 16.04.2 with a 4.4 series kernel.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2017-05-31:

#14

Stephen, two thoughts:

- glusterfs isn't in main in Ubuntu so I don't think qemu should support it at all; see https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247 for more information

- Those error messages are "connection to 192.168.0.197:49153 failed (Connection refused)" -- please check netstat -lntp on 192.168.0.197 to make sure that a daemon is listening there. I f a daemon is listening there, then please also check firewall rules on both hosts as well as all routers between the hosts to ensure the packets would be allowed.

Thanks

Revision history for this message

Stephen (belrik) wrote on 2017-06-01:

#15

First the connection refused- it appears that although 'gluster volume status' reports a different port after the service restarts the service is still only listening on 49152 (according to 'netstat -an | grep 4915') and never listens on 49153 or higher. I restarted all the nodes one by one to get the process onto 49152 on all nodes and avoid this. The error persisted.

I have done some more research. In particular I have re-homed my libvirtd service install to a container running *inside* the Xenial server but using the 14.04 binaries. This container can run KVM processes using libgfapi connections from it's 14.04 environment to 16.04 on the host. The qemu service on the host 16.04 environment cannot.

I will see if abstracting libvirtd, qemu and kvm into a container is a workable solution.

After seeing this libgfapi feature arrive upstream in 2014 I don't know what to do at this point, any hint as to when Debian and then Ubuntu will move this to main? I know there are a lot of packages but is this there any response from the GlusterFS maintainer?

Revision history for this message

Joaquin Menchaca (darkn3rd) wrote on 2020-10-12:

#16

sudo systemctl disable apparmor

AppArmor

Qemu with GlusterFS Libgfapi access to VM storage does not work in Ubuntu Xenial

Bug Description

Other bug subscribers

Remote bug watches