Qemu with GlusterFS Libgfapi access to VM storage does not work in Ubuntu Xenial

Bug #1595451 reported by André Bauer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
AppArmor
New
Undecided
Unassigned

Bug Description

I'm using my own Qemu packages ( https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.7 ), which have support for GlusterFS enabled to access VM image storage via GlusterFS libgfapi.

I used this in Ubuntu Trusty (14.04) for quite a long time without problems. I only had to add the following lines to "/etc/apparmor.d/abstractions/libvirt-qemu" to get it work:

# for glusterfs
/proc/sys/net/ipv4/ip_local_reserved_ports r,
/usr/lib/@{multiarch}/glusterfs/**.so mr,
/tmp/** rw,

After updating one of my KVM/Qemu hosts to Ubuntu Xenial (16.04) it stopped working. I'm not able to migrate or start VMs on this host. If i try i get the following error from libvirt log:

Fehler: Interner Fehler: early end of file from monitor, possible problem: [2016-06-23 08:50:20.431986] E [MSGID: 104007] [glfs-mgmt.c:637:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume file (key:vmimages) [Invalid argument]
[2016-06-23 08:50:20.432110] E [MSGID: 104024] [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: storage.intdmz.h1.mdd (Permission denied) [Permission denied]
2016-06-23T08:50:21.427357Z qemu-system-x86_64: -drive file=gluster://storage.intdmz.h1.mdd/vmimages/checkbox.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback: Gluster connection failed for server=storage.intdmz.h1.mdd port=0 volume=vmimages image=checkbox.qcow2 transport=tcp: Permission denied

To find the problem i installed auditd and watched the logs with "tail -f /var/log/audit/audit.log | grep -i den" while trying to migrate the vm:

type=VIRT_RESOURCE msg=audit(1466672491.942:7789): pid=6617 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm resrc=cgroup reason=deny vm="checkbox" uuid=e86cf3f9-970f-bff1-d689-75a0a2d45a5d cgroup="/sys/fs/cgroup/devices/machine/checkbox.libvirt-qemu/" class=all exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'

My first idea was to add the follwoing line to "/etc/apparmor.d/abstractions/libvirt-qemu":

/sys/fs/cgroup/devices/machine/** rwa,

After this i found the following line in audit.log when trying to migrate:

type=AVC msg=audit(1466672697.947:8084): apparmor="DENIED" operation="change_profile" info="label not found" error=-2 profile="/usr/sbin/libvirtd" name="libvirt-e86cf3f9-970f-bff1-d689-75a0a2d45a5d" pid=11805 comm="libvirtd"

I'm not sure what to do now because "/etc/apparmor.d/usr.sbin.libvirtd" has alread this line:

change_profile -> @{LIBVIRT}-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*,

I also tried to set "/usr/lib/libvirt/virt-aa-helper" and "/usr/sbin/libvirtd" to aa-complain but it does not work. I see the VM migrating to the Xenial host in Virt Manager but it stays in paused mode.

What infos can i provide to get the problem solved?

André Bauer (monotek)
tags: added: glusterfs kvm libgfapi
tags: added: libvirt
André Bauer (monotek)
description: updated
Revision history for this message
Seth Arnold (seth-arnold) wrote :

I believe the info="label not found" portion of the log means that the profile for that specific VM isn't loaded into the kernel. Check /sys/kernel/security/apparmor/profiles on both the source and destination machines to make sure that the VM-specific profile is loaded on both.

THanks

Revision history for this message
André Bauer (monotek) wrote :

I did a "watch -n1 ls -al /sys/kernel/security/apparmor/policy/profiles" while migrating and for a short moment the directory was there on the Xenial server until the migration error popped up:

drwxr-xr-x 3 root root 0 Jun 24 10:57 libvirt-e86cf3f9-970f-bff1-d689-75a0a2d45a5d.12

On the trusty server the file is also there but has another name.

Revision history for this message
Christian Boltz (cboltz) wrote :

/sys/fs/cgroup/devices/machine/** rwa, is an invalid rule (and therefore the profile will fail to load) because w and a conflict. w includes a permissions, so please change the rule to use rw.

Revision history for this message
André Bauer (monotek) wrote :

Using only "/sys/fs/cgroup/devices/machine/** rw" i get:

type=VIRT_RESOURCE msg=audit(1467035730.817:52434): pid=4767 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm resrc=cgroup reason=deny vm="checkbox" uuid=e86cf3f9-970f-bff1-d689-75a0a2d45a5d cgroup="/sys/fs/cgroup/devices/machine/checkbox.libvirt-qemu/" class=all exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'

Revision history for this message
Stephen (belrik) wrote :

Have you discovered any solution to the problem of using GlusterFS with libgfapi block devices in Xenial? I use your PPA which worked perfectly on Trusty and it did work for a brief time on Xenial but today I attempted to restart a VM after a routine apt-get upgrade and found that I hit this bug. I'm confused as to what exactly changed as I had managed to start a VM using libgfapi under Xenial prior to today but today's upgrade killed this functionality. Maybe the clue is in recent qemu updates?

Revision history for this message
André Bauer (monotek) wrote :

No, sorry. I'm back on trusty...

Revision history for this message
Stephen (belrik) wrote :

Shame. Wish the Debian upstream developers would accept the Gluster team's versioning scheme and then this would be enabled by default and have many more eyeballs on it. Seems that GlusterFS's implementation of version numbers and the point at which they choose to break/retain compatibility is the crux of the whole problem and why it doesn't appear in Ubuntu's QEMU by default.

Revision history for this message
André Bauer (monotek) wrote :

Nö, its actually because glusterfs is in universe repo.

Already added a mainline inclusion request to the bugtracker but no progress:

https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247

Revision history for this message
André Bauer (monotek) wrote :

Same problem with GlusterFS 3.8.6 and Xenial Qemu client...

Revision history for this message
Helensvale Technology Group (htgsolutions) wrote :
Download full text (3.3 KiB)

Hi, are there any updates on this issue? I just ran in to this problem on Debian 8 and glusterfs 3.8.8 and qemu 2.8 packagesI backported from unstable. I found this bug while searching for possible solutions.

infrastructure@us-vm-1:~# sudo qemu-img create -f qcow2 gluster://127.0.0.1:24007/datastore/testvm.qcow2 10G
Formatting 'gluster://127.0.0.1:24007/datastore/testvm.qcow2', fmt=qcow2 size=10
737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2017-02-15 00:17:35.360852] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361066] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361111] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:35.361233] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387340] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387432] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387613] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:36.387651] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344465] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344558] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344721] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-3: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:37.344767] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236208] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-0: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236270] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-1: All subvolumes are down. Going offline until atleast one
of them comes back up.
[2017-02-15 00:17:38.236410] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-
datastore-replicate-2: All subvolumes are down. Going offline u...

Read more...

Revision history for this message
Seth Arnold (seth-arnold) wrote :

htg, check dmesg | grep DENIED to see if AppArmor profiles are blocking your progress. If not, you may need to open a new bug elsewhere. Thanks.

Revision history for this message
Stephen (belrik) wrote :
Download full text (3.5 KiB)

Hi, I was doing some maintenance on my Gluster shares and checked in on this bug again, it's still an issue. Here are logs when (attempting to) start a VM via libgfapi:

# qemu-system-x86_64 -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none
[2017-05-31 15:32:10.931097] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-VM-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translator names.
[2017-05-31 15:32:10.934558] E [socket.c:2310:socket_connect_finish] 0-VM-client-2: connection to 192.168.0.197:49153 failed (Connection refused)
[2017-05-31 15:32:10.935614] E [socket.c:2310:socket_connect_finish] 0-VM-client-1: connection to 192.168.0.20:49155 failed (Connection refused)
[2017-05-31 15:32:10.945254] E [MSGID: 108006] [afr-common.c:4799:afr_notify] 0-VM-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-05-31 15:32:10.945527] W [MSGID: 108001] [afr-common.c:4888:afr_notify] 0-VM-replicate-0: Client-quorum is not met
qemu-system-x86_64: -drive file=gluster://127.0.0.1:24007/VM/centos-test.qcow2,format=raw,if=none,id=drive-virtio-disk0,cache=none: Could not open 'gluster://127.0.0.1:24007/VM/centos-test.qcow2': No such file or directory

I've added these lines to /etc/apparmor.d/abstractions/libvirt-qemu to avoid any DENIED messages:

# For gluster use
  /usr/lib/x86_64-linux-gnu/glusterfs/** rmix,
  /proc/sys/net/ipv4/ip_local_reserved_ports r,
  /tmp/** rwcx,

DMESG output:

[270519.573087] audit: type=1400 audit(1496243895.879:28): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/libvirtd" pid=3075 comm="apparmor_parser"
[270519.661002] audit: type=1400 audit(1496243895.967:29): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/dhcpd" pid=3079 comm="apparmor_parser"
[270519.673425] audit: type=1400 audit(1496243895.979:30): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/ntpd" pid=3078 comm="apparmor_parser"
[270519.685399] audit: type=1400 audit(1496243895.991:31): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/tcpdump" pid=3081 comm="apparmor_parser"
[270519.736547] audit: type=1400 audit(1496243896.043:32): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/libvirt/virt-aa-helper" pid=3076 comm="apparmor_parser"
[270519.799579] audit: type=1400 audit(1496243896.107:33): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/sbin/dhclient" pid=3082 comm="apparmor_parser"
[270519.799883] audit: type=1400 audit(1496243896.107:34): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=3082 comm="apparmor_parser"
[270519.800152] audit: type=1400 audit(1496243896.107:35): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=3082 comm="apparmor_parser"
[270519.800416] audit: type=1400 audit(1496243896.107:36): apparmor="STATUS" operation="profile_replace" profile="unc...

Read more...

Revision history for this message
Stephen (belrik) wrote :

I should add that I'm using qemu-kvm 1.2.5 with glusterFS 3.10.2 on Xenial 16.04.2 with a 4.4 series kernel.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Stephen, two thoughts:

- glusterfs isn't in main in Ubuntu so I don't think qemu should support it at all; see https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247 for more information

- Those error messages are "connection to 192.168.0.197:49153 failed (Connection refused)" -- please check netstat -lntp on 192.168.0.197 to make sure that a daemon is listening there. I f a daemon is listening there, then please also check firewall rules on both hosts as well as all routers between the hosts to ensure the packets would be allowed.

Thanks

Revision history for this message
Stephen (belrik) wrote :

First the connection refused- it appears that although 'gluster volume status' reports a different port after the service restarts the service is still only listening on 49152 (according to 'netstat -an | grep 4915') and never listens on 49153 or higher. I restarted all the nodes one by one to get the process onto 49152 on all nodes and avoid this. The error persisted.

I have done some more research. In particular I have re-homed my libvirtd service install to a container running *inside* the Xenial server but using the 14.04 binaries. This container can run KVM processes using libgfapi connections from it's 14.04 environment to 16.04 on the host. The qemu service on the host 16.04 environment cannot.

I will see if abstracting libvirtd, qemu and kvm into a container is a workable solution.

After seeing this libgfapi feature arrive upstream in 2014 I don't know what to do at this point, any hint as to when Debian and then Ubuntu will move this to main? I know there are a lot of packages but is this there any response from the GlusterFS maintainer?

Revision history for this message
Joaquin Menchaca (darkn3rd) wrote :

sudo systemctl disable apparmor

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.