AppArmor denies libvirt to use the virtual functions allocated by nova

Bug #1820302 reported by Nicolas Pochet
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Invalid
High
Unassigned

Bug Description

When deploying nova-compute and neutron-openvswitch with the following configuration:

  nova-compute-kvm:
    charm: cs:nova-compute
    num_units: 1
    bindings:
      "": *oam-space
      internal: *internal-space
    options:
      openstack-origin: distro
      enable-live-migration: True
      enable-resize: True
      migration-auth-type: ssh
      use-internal-endpoints: True
      libvirt-image-backend: qcow2
      restrict-ceph-pools: False
      aa-profile-mode: enforce
      virt-type: kvm
      pci-passthrough-whitelist: '[{"devname": "eth0", "physical_network": "physnet_sriov"}]'
    to:
    - 1
  neutron-openvswitch:
    charm: cs:neutron-openvswitch
    num_units: 0
    bindings:
      data: *overlay-space
    options:
      bridge-mappings: *bridge-mappings
      prevent-arp-spoofing: True
      firewall-driver: openvswitch
      enable-local-dhcp-and-metadata: true
      data-port: *data-port
      enable-sriov: True
      sriov-device-mappings: "physnet_sriov:eth0"
      sriov-numvfs: "eth0:8"

And trying to create an instance with the following commands:
openstack network create --provider-network-type vlan --provider-physical-network physnet_sriov --provider-segment 10 sriov
openstack subnet create --network sriov --subnet-range 192.168.1.0/24 sriov-subnet
openstack port create --vnic-type direct --network sriov sriov-port
openstack server create --port sriov-port --image bionic --flavor m1.small test-sriov

Nova-compute, on the host where the instance is supposed to be deployed, fails because AppArmor denies libvirt to use the virtual functions.

The work-around is to disable AppArmor:

juju config nova-compute-kvm aa-profile-mode=disable

Revision history for this message
David Ames (thedac) wrote :

Nicolas,

I have marked this as critical as I think this is an important bug. However, we could use more information and logs to help the investigation.

Can you please provide a juju crashdump or at least pertinent logs that show exactly where the failure occurs? A bundle would also be helpful.

What version of Ubuntu and OpenStack did you see this on?

PRE-TRIAGE:

We have some investigation to do for SRIOV and App Armor profiles

Changed in charm-nova-compute:
importance: Undecided → Critical
status: New → Incomplete
milestone: none → 19.07
Revision history for this message
Nicolas Pochet (npochet) wrote :

Hi David,

I do not have access to an environment with SR-IOV NICs.
I'll paste here a minimal bundle that could be used to reproduce the issue.
I also found this review https://review.fuel-infra.org/#/c/17318/2/debian/patches/qemu-apparmor-rules-for-sriov.patch but did not get the time to test it.

Revision history for this message
Nicolas Pochet (npochet) wrote :
Download full text (6.5 KiB)

Hi David,

Here is a stripped bundle that could be used to reproduce the issue:

series: bionic
variables:
  openstack-origin: &openstack-origin distro

  openstack-region: &openstack-region RegionOne

  vlan-ranges: &vlan-ranges "physnet1:1000:2000 physnet_sriov_1:1000:2000"
  bridge-mappings: &bridge-mappings "physnet1:br-data"
  data-port: &data-port "br-data:eth1"
  enable-sriov: &enable-sriov true
  sriov-numvfs: &sriov-numvfs 'eth0:32'
  sriov-device-mappings: &sriov-device-mappings 'physnet_sriov_1:eth0'
  pci-passthrough-whitelist: &pci-passthrough-whitelist '[{ "devname": "eth0", "physical_network": "physnet_sriov_1"}]'
  nova-default-filters: &nova-default-filters 'AggregateInstanceExtraSpecsFilter,RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter,NUMATopologyFilter'

machines:
  # Controller 1
  "0":
    constraints: tags=controllers
  # Controller 2
  "1":
    constraints: tags=controllers
  # Controller 3
  "2":
    constraints: tags=controllers
  # Compute
  "3":
    constraints: tags=compute

applications:
  # Ceph
  ceph-mon:
    charm: cs:ceph-mon
    num_units: 3
    options:
      source: *openstack-origin
    to:
    - lxd:0
    - lxd:1
    - lxd:2
  ceph-osd:
    charm: cs:ceph-osd
    num_units: 1
    options:
      osd-devices: /dev/sdb
      source: *openstack-origin
      autotune: true
    to:
    - '3'
  ceph-radosgw:
    charm: cs:ceph-radosgw
    num_units: 1
    options:
      source: *openstack-origin
      region: *openstack-region
      restrict-ceph-pools: False
    to:
    - lxd:0
  # OpenStack
  cinder:
    charm: cs:cinder
    num_units: 1
    options:
      openstack-origin: *openstack-origin
      block-device: None
      glance-api-version: 2
      use-internal-endpoints: True
      region: *openstack-region
    to:
    - lxd:1
  cinder-ceph:
    charm: cs:cinder-ceph
    num_units: 0
    options:
      restrict-ceph-pools: False
  glance:
    charm: cs:glance
    options:
      openstack-origin: *openstack-origin
      use-internal-endpoints: True
      restrict-ceph-pools: False
      region: *openstack-region
    num_units: 1
    to:
    - lxd:2
  keystone:
    charm: cs:keystone
    num_units: 1
    options:
      openstack-origin: *openstack-origin
      region: *openstack-region
      preferred-api-version: 3
    to:
    - lxd:0
  mysql:
    charm: cs:percona-cluster
    num_units: 1
    options:
      source: *openstack-origin
      wait-timeout: 180
      min-cluster-size: 1
      enable-binlogs: True
      performance-schema: True
    to:
    - lxd:1
  neutron-api:
    charm: cs:neutron-api
    num_units: 1
    options:
      openstack-origin: *openstack-origin
      region: *openstack-region
      neutron-security-groups: True
      overlay-network-type: vxlan gre
      use-internal-endpoints: True
      vlan-ranges: *vlan-ranges
      enable-l3ha: True
      dhcp-agents-per-network: 2
      enable-ml2-port-security: True
      default-tenant-n...

Read more...

David Ames (thedac)
Changed in charm-nova-compute:
milestone: 19.07 → 19.10
Revision history for this message
Ryan Beisner (1chb1n) wrote :

We've received more info about the reproducer, moving bug task back to NEW.

Changed in charm-nova-compute:
status: Incomplete → New
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 19.10 → 20.01
Revision history for this message
Nicolas Pochet (npochet) wrote :

De-escalating to Field-High as we have a work-around in place which consists of disabling AppArmor.

Changed in charm-nova-compute:
status: New → Triaged
Ryan Beisner (1chb1n)
Changed in charm-nova-compute:
importance: Critical → High
Revision history for this message
Ryan Beisner (1chb1n) wrote :

And, we seem to still be at the point of David's comment #1: Need to reproduce and capture logs.

Specifically, we need apparmor logs. We should put it in complain mode, reproduce with a similar deployment, then assess what needs to be changed or updated in the aa profiles based on that. Until we do that, we cannot consider this to be triaged.

When we determine this, the right place to fix it is in the packaging aa profiles, not really in the charm.

Changed in charm-nova-compute:
status: Triaged → New
James Page (james-page)
Changed in charm-nova-compute:
milestone: 20.01 → 20.05
Liam Young (gnuoy)
Changed in charm-nova-compute:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :
Download full text (8.8 KiB)

I have not been able to reproduce this. I have done an sriov enabled bionic/queens deploy creating the sriov port as outlined in the bug description and it works for me. I will add the bundle I tested with as an attachment. After deployment I tested with:

https://paste.ubuntu.com/p/b7yXp8SMNX/

$ juju deploy ./bundle-space.yaml
$ source openrc
$ curl http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img | \
        openstack image create --public --container-format=bare --disk-format=qcow2 \
        bionic
$ openstack flavor create --ram 4096 --disk 10 m1.small
$ openstack network create --provider-network-type vlan --provider-physical-network physnet_sriov_1 --provider-segment 1001 sriov
$ openstack subnet create --network sriov --subnet-range 192.168.1.0/24 sriov-subnet
$ openstack port create --vnic-type direct --network sriov sriov-port
$ openstack server create --port sriov-port --image bionic --flavor m1.small test-sriov
$ openstack server list
+--------------------------------------+------------+--------+--------------------+--------+----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+--------------------+--------+----------+
| 276d14fd-8ca1-4fda-b716-05d82e7ef439 | test-sriov | ACTIVE | sriov=192.168.1.13 | bionic | m1.small |
+--------------------------------------+------------+--------+--------------------+--------+----------+
$ openstack port show sriov-port
+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| allowed_address_pairs | |
| binding_host_id | node-hagecius |
| binding_profile | pci_slot='0000:03:10.2', pci_vendor_info='8086:10ed', physical_network='physnet_sriov_1' |
| binding_vif_details | port_filter='False', vlan='1001' |
| binding_vif_type | hw_veb ...

Read more...

Revision history for this message
Liam Young (gnuoy) wrote :

Bundle used for test deployment

Revision history for this message
Liam Young (gnuoy) wrote :

On the hypervisor:

apparmor module is loaded.
132 profiles are loaded.
132 profiles are in enforce mode.
...
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

Changed in charm-nova-compute:
assignee: Liam Young (gnuoy) → nobody
Revision history for this message
Liam Young (gnuoy) wrote :

I am going to mark this as incomplete. To reopen please follow Ryans requests in comment #6 and supply apparmor logs.

Changed in charm-nova-compute:
status: New → Incomplete
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 20.05 → 20.08
Revision history for this message
James Page (james-page) wrote :

Removing milestone as we're not able to reproduce this issue.

Changed in charm-nova-compute:
status: Incomplete → New
status: New → Incomplete
milestone: 20.08 → none
Revision history for this message
James Page (james-page) wrote :

Its been over 3 months since this bug was last touched and there is no further information provided to enable reproduction no marking this bug as Invalid.

Changed in charm-nova-compute:
status: Incomplete → Invalid
Revision history for this message
Ian Johnson (ijoh) wrote :

Can i ask if you applied CIS hardening as part of reproducer?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.