Instances miss neutron QoS on their ports after unrescue and soft reboot

Bug #1784006 reported by s10 on 2018-07-27
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Miguel Lavalle

Bug Description

Instances lose neutron QoS on their ports after unrescue and soft reboot

   Description
   ===========
   After some operations with instance: such as unrescue and soft reboot
   libvirt domains are created, but neutron doesn't set QoS on ports for VM.

   So user can avoid QoS per-port limitation and utilise all
hosts bandwidth.

   This doesn't happen after live migration, migration, hard reboot, rescue, shutdown with start.

   This problem doesn't happen for operations which ends up calling _create_domain_and_network():
   https://github.com/openstack/nova/blob/stable/pike/nova/virt/libvirt/driver.py#L5392

   In unrescue and soft reboot libvirt driver calls _create_domain() directly and don't execute plug_vifs():
   https://github.com/openstack/nova/blob/stable/pike/nova/virt/libvirt/driver.py#L2547

   Steps to reproduce
   ==================
  1. Create instance with port in neutron network

  2. Create QoS in neutron:
  $ neutron qos-policy-create limited_1000mbps
  $ neutron qos-bandwidth-limit-rule-create limited_1000mbps --max-kbps 1000000 --max-burst-kbps 160000

  3. Update port of the instance, assign policy:
  $ neutron port-update --qos-policy limited_1000mbps PORT_UUID

  4. Ensure, that QoS rule is applied to the port:

$ /sbin/tc -s qdisc show dev tap47eaf544-39
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 383621004 bytes 262469 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 173850 bytes 1515 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

  5. 1) Execute nova rescue and then nova unrescue for the instance
    or
      2) Execute nova reboot (without parameter --hard)

  6. See, that after tap interface recreation during libvirt domain start
  QoS are gone:

$ /sbin/tc -s qdisc show dev tap47eaf544-39
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1537 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

   Expected result
   ===============
  QoS rules are applied to the port, like in step 4.

   Actual result
   =============
  QoS rules are gone, tap interface is not limited:

$ /sbin/tc -s qdisc show dev tap47eaf544-39
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1537 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

   Environment
   ===========
   1. Exact version of OpenStack:
   OpenStack Pike
   nova 16.1.4
   neutron 11.0.5

   2. Which networking type did you use?
   Neutron with Open vSwitch

s10 (vlad-esten) on 2018-07-27
description: updated
summary: - Instances misses neutron QoS on their ports after unrescue and soft
- reboot
+ Instances miss neutron QoS on their ports after unrescue and soft reboot
Matt Riedemann (mriedem) on 2018-07-27
tags: added: libvirt
s10 (vlad-esten) wrote :

It looks like the bug in the Neutron. Similar bug was fixed in commit https://github.com/openstack/neutron/commit/60cb0911712ad11688b4d09e5c01ac39c49f5aea

Same thing happens with QoS, but QoS plugin never restores lost QoS rules: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/l2/extensions/qos.py#L257

tags: added: qos
LIU Yulong (dragon889) wrote :

I can reproduce this in stable/queens neutron with a single reboot command.

Changed in neutron:
status: New → Confirmed
s10 (vlad-esten) on 2018-07-30
description: updated
s10 (vlad-esten) on 2018-07-30
description: updated
description: updated
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → Miguel Lavalle (minsel)
s10 (vlad-esten) on 2018-09-14
no longer affects: nova
s10 (vlad-esten) wrote :

I've indicated that this bug doesn't affect Nova because I believe this is an issue in Neutron, not in Nova. I tried to change unrescue and soft reboot libvirt driver functions in Nova to act with ports like in hard reboot, and sometimes it helped, but not always, and rarely this issue occurs even after the hard reboot.

As a workaround I had to comment out two lines in https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/l2/extensions/qos.py#L257 , so Neutron just has to reapply QoS after every port change event, even if it is not necessary. With the change I88f9f5af95439f1536799169390764c89109f467 this frequent QoS rule reapply became not so RPC costly.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers