/etc/qemu-ifup not allowed by apparmor

Bug #1665698 reported by Logan V on 2017-02-17

This bug report will be marked for expiration in 57 days if no further activity occurs. (find out why)

14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Undecided
Unassigned
libvirt (Ubuntu)
Critical
Unassigned

Bug Description

I have VMs failing to start with 2017-02-17 15:38:44.458 264015 ERROR nova.compute.manager [instance: 0c97ab16-2d30-43fa-b0e4-a064a842b5ed] libvirtError: internal error: process exited while connecting to monitor: 2017-02-17T15:38:43.907222Z qemu-system-x86_64: -netdev tap,ifname=tapf34ef99e-18,id=hostnet0,vhost=on,vhostfd=28: network script /etc/qemu-ifup failed with status 256

Log excerpt:
http://cdn.pasteraw.com/b3tw4cjefomfi3e9k09hvodrfun85z

Seems to be that /etc/qemu-ifup is being blocked by apparmor:
type=AVC msg=audit(1487347189.015:28536): apparmor="DENIED" operation="exec" profile="libvirt-4a03fea7-e966-48e4-80ac-aa138db67243" name="/etc/qemu-ifup" pid=285438 comm="qemu-system-x86" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
type=PATH msg=audit(1487347189.015:28536): item=0 name="/etc/qemu-ifup" inode=66403 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL

root@ubuntu-trusty-5773:/etc/apparmor.d/abstractions# cat /etc/apparmor.d/libvirt/libvirt-4a03fea7-e966-48e4-80ac-aa138db67243
#
# This profile is for the domain whose UUID matches this file.
#

#include <tunables/global>

profile libvirt-4a03fea7-e966-48e4-80ac-aa138db67243 {
  #include <abstractions/libvirt-qemu>
  #include <libvirt/libvirt-4a03fea7-e966-48e4-80ac-aa138db67243.files>

}
root@ubuntu-trusty-5773:/etc/apparmor.d/abstractions# cat /etc/apparmor.d/libvirt/libvirt-4a03fea7-e966-48e4-80ac-aa138db67243.files
# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
  "/var/log/libvirt/**/instance-00000008.log" w,
  "/var/lib/libvirt/qemu/domain-instance-00000008/monitor.sock" rw,
  "/var/run/libvirt/**/instance-00000008.pid" rwk,
  "/run/libvirt/**/instance-00000008.pid" rwk,
  "/var/run/libvirt/**/*.tunnelmigrate.dest.instance-00000008" rw,
  "/run/libvirt/**/*.tunnelmigrate.dest.instance-00000008" rw,
  "/var/lib/nova/instances/4a03fea7-e966-48e4-80ac-aa138db67243/console.log" rw,
  "/var/lib/nova/instances/4a03fea7-e966-48e4-80ac-aa138db67243/console.log" rw,
  # for qemu guest agent channel
  owner "/var/lib/libvirt/qemu/channel/target/domain-instance-00000008/**" rw,
  /dev/vhost-net rw,

root@ubuntu-trusty-5773:/etc/apparmor.d/abstractions# dpkg -S libvirt-qemu
libvirt-bin: /etc/apparmor.d/abstractions/libvirt-qemu

root@ubuntu-trusty-5773:/etc/apparmor.d/abstractions# dpkg -l libvirt-bin
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-=========================-=========================-=======================================================================================
ii libvirt-bin 1.3.1-1ubuntu10.6~cloud0 amd64 programs for the libvirt library

Seeing identical behavior on Xenial
ubuntu@ubuntu-xenial-5165:~$ dpkg -l libvirt-bin
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-=========================-=========================-=======================================================================================
ii libvirt-bin 1.3.1-1ubuntu10.8 amd64 programs for the libvirt library

Jamie Strandboge (jdstrand) wrote :

/etc/qemu-ifup is a non-standard command. Can you give details of how you setup your system to use this?

WORKAROUND: add the following to /etc/apparmor.d/abstractions/libvirt-qemu:
/etc/qemu-ifup ixr,

and restart your VMs.

Changed in libvirt (Ubuntu):
status: New → Incomplete
Logan V (loganv) wrote :

That has been my challenge this morning in figuring this out. As far as I know, I don't use it or need what it provides. I can't find anything in /etc/libvirt or even /etc or my Openstack venv that is calling toward qemu-ifup. I grepped thru /usr looking for 'qemu-ifup' and the only hits were from the /usr/share/doc/qemu-system-*/common/qemu-doc.html mentions.

I began to suspect a possible packaging issue when I saw that it is sourced from upstream Ubuntu packaging, so that's what prompted this bug:

root@ubuntu-trusty-5773:~# dpkg -S /etc/qemu-ifup
qemu-system-common: /etc/qemu-ifup
root@ubuntu-trusty-5773:~# dpkg -l qemu-system-common
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-=========================-=========================-=======================================================================================
ii qemu-system-common 1:2.5+dfsg-5ubuntu10.5~cl amd64 QEMU full system emulation binaries (common files)

Logan V (loganv) wrote :

Also thanks for the workaround suggestion. I was looking at adding a workaround like that to the abstraction earlier but wasn't sure on the syntax :)

Logan V (loganv) wrote :

I wonder if https://review.openstack.org/#/c/425637/ has broken this on my version of libvirt by removing the script='' entry, which is maybe causing qemu to revert to its default of calling /etc/qemu-ifup, which as we know is not allowed by apparmor.

Jamie Strandboge (jdstrand) wrote :

Thank you for responding. Marking back to New so a member of the cloud team can take a further look.

Changed in libvirt (Ubuntu):
status: Incomplete → New
ChristianEhrhardt (paelzer) wrote :

Hi Logan, thank you for your report.
And also tanks Jamie to already provide a workaround for thew apparmor case.

We just SRUed a fix to Xenial to "allow execution of those scripts".
See https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1620407

From your report I see that you likely have Clou-Archive Mitaka which gave you access to the same SRU of libvirt to 1.3.1-1ubuntu10.8.

It very much confuses me that for your case it now seems to prohibiting such an execution.
As that is just inverse to what the fix does. Never the less it is too related to consider it just an accident.

Adding Cloud-Archive task to get their expertise as well.

Changed in libvirt (Ubuntu):
importance: Undecided → Critical
status: New → Incomplete
Changed in cloud-archive:
status: New → Incomplete
ChristianEhrhardt (paelzer) wrote :

Marking Critical as it is a potential SRU-regression

ChristianEhrhardt (paelzer) wrote :

Let me outline on the /etc/qemu-ifup a bit - or at least my understanding of it.
If you happen to run "type=ethernet" networking the script gets to be important.
See https://libvirt.org/formatdomain.html#elementsNICSEthernet
In general this is not recommended for various security reasons (you have to give libvirt/qemu more permissions/capabilities to get this working).
Now if you have type=ethernet and no script set, then the default is /etc/qemu-ifup, maybe that is how it gets this "config" in your case.

The Openstack commit you referred to could absolutely be related as well.
But it should not be in your software stack in your case right?

If applied this would make the xml instead `script=""` not set anything and yes - in that case it would then fall back to /etc/qemu-ifup which might tickle down to showing your issue.

ChristianEhrhardt (paelzer) wrote :

The function the upstream fix relates to in Openstack is:
def set_vif_host_backend_ethernet_config(conf, tapname):
    """Populate a LibvirtConfigGuestInterface instance
    with host backend details for an externally configured
    host device.

Before the change it was part of "get_config(self, instance, network, mapping)" in
"class LibvirtOpenVswitchDriver(LibvirtBaseVIFDriver)"

I'm not sure but does that imply it might trigger in any openvswitch setup?
Some of our tests should have had the same then.
I'll rerun some of those later to be sure.

The libvirt change on the other hand "only" changes to NOT refuse guests right away that have a script= attribute set. See the referred bug for details.
Now it does no more refuse to start those - that should (tm) not cause the issue. Maybe it really is the openstack commit you referred to (causing it to fallback to the default).
But as I said I'd wonder if that commit is active in your Software stack at all.

ChristianEhrhardt (paelzer) wrote :

I'm going off some more testing with that in mind what I documented on the bug already.

I really want to understand why it hits you now and if this is a SRU-regression, so here some questions.

Questions to Logan:
- Could you share your guest XML to help me to understand what combination drives your guest into being stopped by this apparmor DENY now?
- Did the workaround suggested by Jamie fix it for you?
- Could you also try downgrading to 1.3.1-1ubuntu10.7 if that get you working again?
- I have made assumptions, but want to confirm what are the releases you are using for Ubuntu, Cloud-Archive, Openstack?

Questions to the Cloud Archive Team:
- It didn't in mine, but did that show up in any of your usual regular tests?
- is the referred Openstack commit active in any of our cloud-archive release (if so in which)?
- Can we reproduce that somehow to iterate more quickly on it?

ChristianEhrhardt (paelzer) wrote :

I was testing various type=ethernet XML configurations.

Cases:
defaultpath => <script path='/etc/qemu-ifup'/>
emptyattrib => <script path=''/>
noattrib => no script tag at all

The target statement which the error of the known bug refers to is optional, so add another set of cases with
the same three again without a <target ...> attribute called "notgt-*".

                         Pre-Fix Post-Fix
default bug 1620407 working
empty bug 1620407 working
no bug 1620407 still bug 1620407*
notgt-default working working
notgt-empty can't be defined can't be defined
notgt-no working working

*We fixed bug 1620407 with a mimimal fix intentionally, to the "no" case is "ok" to still fail.

Now the Openstack case should (IMHO) be one of the "empty" cases before the fix to openstack that was referred.
That is the path='', since notgt-empty can't be defined (xml validation) it has to be the normal "empty" case.
After the fix it should be one of the 'no' cases.

But all cases either stayed as-is or were fixed, so I don't know.
Also I had no apparmor DENIES along any of that - even when using explicitly in the *default cases.

I really need the XML that is generated to understand what might be going on.
Also please help to answer the questions I listed in commend #10

ChristianEhrhardt (paelzer) wrote :

I think I found the guest definition in the attached raw log, analyzing ...

The important snippet is:
<interface type='ethernet'>
      <mac address='fa:16:3e:3b:8b:2b'/>
      <target dev='tapf34ef99e-18'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

That is my "no" case which should still fail the same way.
Yet I think I need to run that with an externally created tap device to fully match the Openstack+Openvswitch case. Doing that now ...

Still please help to answer all the other questions in comment #10.

tags: added: regression-update
ChristianEhrhardt (paelzer) wrote :

Even with that things are still working correctly for me:
Here the "no" case re-ran with the device being set up externally.

$ tunctl -t manualdevname
$ ip link set manualdevname up
$ ip link set manualdevname master virbr0

For the config:
    <interface type='ethernet'>
      <mac address='52:54:00:18:0d:a3'/>
      <target dev='manualdevname'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>

This case worked prior and after the fix in 1.3.1-1ubuntu10.8 for me then.

ChristianEhrhardt (paelzer) wrote :

FYI - while not confirmed I marked it regression-update for now until we know for sure it is not (to get the right attention).
I also pinged some Cloud Archive people on IRC to take a look.

ChristianEhrhardt (paelzer) wrote :

Just wanted to make one more thing clear, the libvirt in Xenial (and here in the bug) is 1.3.1.
That means it is pre http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=9c17d66
So the prereq to need https://review.openstack.org/#/c/425637/ is not given.
Maybe those two bite each other, but I still need some way to reproduce to fully understand.

My remaining check would be to run on Trusty+UCA?, but for that I need more confirmations what exactly is used and also another system to do so.

ChristianEhrhardt (paelzer) wrote :

Summarizing open questions to stop referring back:

Questions to Logan:
- Did the workaround suggested by Jamie fix it for you?
- Could you also try downgrading to 1.3.1-1ubuntu10.7 (or before) if that get you working again?
- I have made assumptions so far, but want to confirm what are the releases you are using for Ubuntu, Cloud-Archive, Openstack?

Questions to the Cloud Archive Team:
- It did not show up in explicit or automated tests, did that show up in any of your usual tests?
- is the referred Openstack commit active in any of our cloud-archive release (if so in which)?
- Can you reproduce that somehow to iterate more quickly on it?

ChristianEhrhardt (paelzer) wrote :

I have checked with our support people, but they somewhat calmed me down that this is not lighting up everywhere (?yet?). So keeping the state as is for now.

Logan V (loganv) wrote :

Hi Christian,

I have not tested the workaround yet. What I have done since I posted this was tried to narrow it down to the specific nova commit that I suspected triggered this. To be specific, I am deploying Newton using OpenStack-Ansible.

OSA pulls nova directly from upstream openstack git sources and does not consume Ubuntu Cloud Archive for that. However, the libvirt installed is sourced from UCA. For OSA newton on trusty, libvirt is sourced from UCA's trusty/mitaka repo. For OSA newton on xenial, libvirt comes from UCA xenial/newton.

Without changing the libvirt being installed, I have confirmed the specific nova change that triggers this is https://review.openstack.org/#/c/425637/. The guest xml that nova generates when this commit is applied is in the OP (http://cdn.pasteraw.com/b3tw4cjefomfi3e9k09hvodrfun85z). When I revert my nova SHA to the parent commit of this change, b51231c638228f67ab130a7855b9143b202733f6, without changing any libvirt bits, the VMs launch as expected since they now contain the "<script path=''/>" bits in the network interface xml.

Without digging in to libvirt too much, it seems like it should be fairly easy to reproduce by installing the latest xenial/newton or trusty/mitaka UCA sourced libvirt, and launching a VM with not "<script path=''/>" XML on the ethernet vif.

I am limited on how much time I can commit to testing other changes at this time, but I will try to test the workaround and/or the previous libvirt version you suggested asap. Hopefully this will help fill in the blanks but if you have any other questions I missed let me know!

ChristianEhrhardt (paelzer) wrote :

Just found that the report was for 1.3.1-1ubuntu10.6~cloud0 and only stated to "also apply" to 1.3.1-1ubuntu10.8. By that dropping the regression-update tag.

tags: removed: regression-update
ChristianEhrhardt (paelzer) wrote :

I tried to puzzle together the timeline on this.
Thanks rbasask for discussing with me to refocus on this issue.

Timeline:
#1 libvirt passed script to qemu, qemu executed
   1.3.1 as in Xenial or UCA-Mitaka still do that
   But Openstack passed script='' and qemu silently ignored it

#2 libvirt changed, now libvirt executes
   http://libvirt.org/git/?p=libvirt.git;a=commit;h=9c17d665fdc5f
   That is in Yakkety and later.
   This had an unintentional API change, that empty scripts behave differently.

#3 Openstack adapted to that API change
   https://review.openstack.org/#/c/425637/
   Not sure - is that in Ocata only - commit in 2017?
   Now new Openstack (#3) + New Libvirt (#2) work
   But if you happen to have an old libvirt like in #1 you now have different behavior.

#4 Upstream libvirt realizes the API break and fixes it
   http://libvirt.org/git/?p=libvirt.git;a=commit;h=1d9ab0f04af310e52f80b4281751655bb3bb7601
   But backporting that would not help, this is meant for libvirt later or equal to #2

#5 IMHO openstack should either
   - detect libvirt version and do differently depending on that (keep script='' for old ones)
   - or instead of not passing script at all pass /bin/true which will work on libvirt as old as #1

I expect you have an openstack of #4 and a libvirt of #1 which due to that cause this.
I still don't see the apparmor issue on my end, but that might be an additional issue.
Even in the /bin/true case we might hit an apparmor on /bin/true.

Please everybody still try to hep sorting out questions in comment #16.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers