Neutron QoS Policy lost on interfaces

Bug #1845161 reported by Nguyen Duy Binh
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Rodolfo Alonso

Bug Description

Instance lost the QoS on interfaces with some operations like: reboot hard, live-migrate, migrate

   Description
   ===========
   When perform some operation with VM like: reboot hard, live-migrate, migrate, sometime, the QoS policy on the VM interfaces lost and neutron doesn't handle to restore it

   So user can avoid QoS per-port limitation and utilise all hosts bandwidth.

   Steps to reproduce
   ==================
  1. Create instance with port in neutron network(or can create a network with a QoS Policy)

  2. Create QoS in neutron:
  $ openstack network qos policy create --share qos-100Mb
  $ openstack network qos rule create --type bandwidth-limit --max-kbps 100000 --max-burst-kbits 0 --egress qos-100Mb
  $ openstack network qos rule create --type bandwidth-limit --max-kbps 100000 --max-burst-kbits 0 --ingress qos-100Mb

  3. Update port of the instance, assign policy:
  $ openstack port set --qos-policy qos-100Mb PORT_UUID

  4. Ensure, that QoS rule is applied to the port:

$ ovs-vsctl list interface qvoxxxxxxx-xx

......
ingress_policing_burst: 80000
ingress_policing_rate: 100000
.......

$ /sbin/tc -s qdisc show dev qvoxxxxxxx-xx
qdisc htb 1: root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000
 Sent 9701 bytes 93 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
 Sent 9576 bytes 130 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

  5. Perform operation like: reboot hard, live-migrate, migrate

  6. Sometime after that operation, VM interfaces lost its QoS policy

ovs-vsctl list interface qvoxxxxxxx-xx
......
ingress_policing_burst: 0
ingress_policing_rate: 0
.......

   Expected result
   ===============
  QoS rules are restore to the port

   Actual result
   =============
  QoS rules are lost, port has no limit

   Environment
   ===========
   1. Exact version of OpenStack:
   OpenStack Queens and Openstack Rocky

   2. Which networking type did you use?
   Neutron with Open vSwitch

Nguyen Duy Binh (binhnd)
information type: Public → Public Security
information type: Public Security → Public
Nguyen Duy Binh (binhnd)
description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thank you for your bug report and especially for the clear reproduction instructions.

I did not manage to reproduce the problem in my environment. That was a master devstack and I performed a hard reboot of the vm, but the QoS backend settings were not lost after it. On the other hand I had different interface names, likely because we are using different firewall drivers. And that may be the reason we're seeing different behavior. Which firewall_driver is in your neutron config? Mine was 'noop'.

Changed in neutron:
status: New → Incomplete
tags: added: qos
Revision history for this message
Nguyen Duy Binh (binhnd) wrote :

Thanks for your response,
With my case, you should try reboot hard the VM again and again and check the QoS on the network interface of the VM every time on the compute node with the command i show above. Sometime, you will see the problem that the QoS on the port has lost.

The firewall_driver in my neutron config is: iptables_hybrid

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/684457

Changed in neutron:
assignee: nobody → Nguyen Duy Binh (binhnd)
status: Incomplete → In Progress
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thanks, that was an important missing piece of information.

This way I managed to reproduce the problem (independently of the firewall_driver):

in shell #1 monitoring the backend qos settings:

watch "openstack server show vm1 -f value -c OS-EXT-SRV-ATTR:instance_name | xargs -r virsh dumpxml | xmlstarlet sel -t -v '//interface/target/@dev' | xargs -r sudo ovs-vsctl list interface | egrep ingress_policing

in shell #2 trying stuff like this (a few times):

for i in 1 2 3 ; do openstack server reboot --hard --wait vm1 ; done

At the end of one of these triple hard reboots the backend was left unconfigured:

ingress_policing_burst: 0
ingress_policing_rate: 0

I'm setting the importance to high because it seems to me this behavior could be abused by a malicious user (though I don't see a security hole).

Changed in neutron:
status: In Progress → Confirmed
importance: Undecided → High
Nguyen Duy Binh (binhnd)
Changed in neutron:
assignee: Nguyen Duy Binh (binhnd) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/690098

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: Confirmed → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/684457
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/690098
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=50ffa5173db03b0fd0fe7264e4b2a905753f86ec
Submitter: Zuul
Branch: master

commit 50ffa5173db03b0fd0fe7264e4b2a905753f86ec
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 22 14:21:08 2019 +0000

    [OVS] Handle added/removed ports in the same polling iteration

    The OVS agent processes the port events in a polling loop. It could
    happen (and more frequently in a loaded OVS agent) that the "removed"
    and "added" events can happen in the same polling iteration. Because
    of this, the same port is detected as "removed" and "added".

    When the virtual machine is restarted, the port event sequence is
    "removed" and then "added". When both events are captured in the same
    iteration, the port is already present in the bridge and the port is
    discharted from the "removed" list.

    Because the port was removed first and the added, the QoS policies do
    not apply anymore (QoS and Queue registers, OF rules). If the QoS
    policy does not change, the QoS agent driver will detect it and won't
    call the QoS driver methods (based on the OVS agent QoS cache, storing
    port and QoS rules). This will lead to an unconfigured port.

    This patch solves this issue by detecting this double event and
    registering it as "removed_and_added". When the "added" port is
    handled, the QoS deletion method is called first (if needed) to remove
    the unneded artifacts (OVS registers, OF rules) and remove the QoS
    cache (port/QoS policy). Then the QoS policy is applied again on the
    port.

    NOTE: this is going to be quite difficult to be tested in a fullstack test.

    Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
    Closes-Bug: #1845161

Changed in neutron:
status: In Progress → Fix Released
Nguyen Duy Binh (binhnd)
information type: Public → Public Security
Changed in ubuntu:
status: New → Invalid
no longer affects: ubuntu
Nguyen Duy Binh (binhnd)
information type: Public Security → Public
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/706905

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/706909

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/706910

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/706943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/706905
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=503bbdab871763860e615b923d8db12c1511a4f5
Submitter: Zuul
Branch: stable/train

commit 503bbdab871763860e615b923d8db12c1511a4f5
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 22 14:21:08 2019 +0000

    [OVS] Handle added/removed ports in the same polling iteration

    The OVS agent processes the port events in a polling loop. It could
    happen (and more frequently in a loaded OVS agent) that the "removed"
    and "added" events can happen in the same polling iteration. Because
    of this, the same port is detected as "removed" and "added".

    When the virtual machine is restarted, the port event sequence is
    "removed" and then "added". When both events are captured in the same
    iteration, the port is already present in the bridge and the port is
    discharted from the "removed" list.

    Because the port was removed first and the added, the QoS policies do
    not apply anymore (QoS and Queue registers, OF rules). If the QoS
    policy does not change, the QoS agent driver will detect it and won't
    call the QoS driver methods (based on the OVS agent QoS cache, storing
    port and QoS rules). This will lead to an unconfigured port.

    This patch solves this issue by detecting this double event and
    registering it as "removed_and_added". When the "added" port is
    handled, the QoS deletion method is called first (if needed) to remove
    the unneded artifacts (OVS registers, OF rules) and remove the QoS
    cache (port/QoS policy). Then the QoS policy is applied again on the
    port.

    NOTE: this is going to be quite difficult to be tested in a fullstack
    test.

    Conflicts:
          neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
    Closes-Bug: #1845161
    (cherry picked from commit 50ffa5173db03b0fd0fe7264e4b2a905753f86ec)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.2

This issue was fixed in the openstack/neutron 15.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/706943
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=233bc1ed11bf9d72a25f36722921781c364049e5
Submitter: Zuul
Branch: stable/queens

commit 233bc1ed11bf9d72a25f36722921781c364049e5
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 22 14:21:08 2019 +0000

    [OVS] Handle added/removed ports in the same polling iteration

    The OVS agent processes the port events in a polling loop. It could
    happen (and more frequently in a loaded OVS agent) that the "removed"
    and "added" events can happen in the same polling iteration. Because
    of this, the same port is detected as "removed" and "added".

    When the virtual machine is restarted, the port event sequence is
    "removed" and then "added". When both events are captured in the same
    iteration, the port is already present in the bridge and the port is
    discharted from the "removed" list.

    Because the port was removed first and the added, the QoS policies do
    not apply anymore (QoS and Queue registers, OF rules). If the QoS
    policy does not change, the QoS agent driver will detect it and won't
    call the QoS driver methods (based on the OVS agent QoS cache, storing
    port and QoS rules). This will lead to an unconfigured port.

    This patch solves this issue by detecting this double event and
    registering it as "removed_and_added". When the "added" port is
    handled, the QoS deletion method is called first (if needed) to remove
    the unneded artifacts (OVS registers, OF rules) and remove the QoS
    cache (port/QoS policy). Then the QoS policy is applied again on the
    port.

    NOTE: this is going to be quite difficult to be tested in a fullstack
    test.

    Conflicts:
          neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
    Closes-Bug: #1845161
    (cherry picked from commit 50ffa5173db03b0fd0fe7264e4b2a905753f86ec)
    (cherry picked from commit 3eceb6d2ae5523a2658906cc086ea98e1be3209a)
    (cherry picked from commit 6376391b45d6855705bd9a291996e3a4110ad26f)
    (cherry picked from commit ab8ad6f06d85fa03fa6d1e21a88f13e49dc627da)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/706910
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ab8ad6f06d85fa03fa6d1e21a88f13e49dc627da
Submitter: Zuul
Branch: stable/rocky

commit ab8ad6f06d85fa03fa6d1e21a88f13e49dc627da
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 22 14:21:08 2019 +0000

    [OVS] Handle added/removed ports in the same polling iteration

    The OVS agent processes the port events in a polling loop. It could
    happen (and more frequently in a loaded OVS agent) that the "removed"
    and "added" events can happen in the same polling iteration. Because
    of this, the same port is detected as "removed" and "added".

    When the virtual machine is restarted, the port event sequence is
    "removed" and then "added". When both events are captured in the same
    iteration, the port is already present in the bridge and the port is
    discharted from the "removed" list.

    Because the port was removed first and the added, the QoS policies do
    not apply anymore (QoS and Queue registers, OF rules). If the QoS
    policy does not change, the QoS agent driver will detect it and won't
    call the QoS driver methods (based on the OVS agent QoS cache, storing
    port and QoS rules). This will lead to an unconfigured port.

    This patch solves this issue by detecting this double event and
    registering it as "removed_and_added". When the "added" port is
    handled, the QoS deletion method is called first (if needed) to remove
    the unneded artifacts (OVS registers, OF rules) and remove the QoS
    cache (port/QoS policy). Then the QoS policy is applied again on the
    port.

    NOTE: this is going to be quite difficult to be tested in a fullstack
    test.

    Conflicts:
          neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
    Closes-Bug: #1845161
    (cherry picked from commit 50ffa5173db03b0fd0fe7264e4b2a905753f86ec)
    (cherry picked from commit 3eceb6d2ae5523a2658906cc086ea98e1be3209a)
    (cherry picked from commit 6376391b45d6855705bd9a291996e3a4110ad26f)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/706909
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6376391b45d6855705bd9a291996e3a4110ad26f
Submitter: Zuul
Branch: stable/stein

commit 6376391b45d6855705bd9a291996e3a4110ad26f
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Oct 22 14:21:08 2019 +0000

    [OVS] Handle added/removed ports in the same polling iteration

    The OVS agent processes the port events in a polling loop. It could
    happen (and more frequently in a loaded OVS agent) that the "removed"
    and "added" events can happen in the same polling iteration. Because
    of this, the same port is detected as "removed" and "added".

    When the virtual machine is restarted, the port event sequence is
    "removed" and then "added". When both events are captured in the same
    iteration, the port is already present in the bridge and the port is
    discharted from the "removed" list.

    Because the port was removed first and the added, the QoS policies do
    not apply anymore (QoS and Queue registers, OF rules). If the QoS
    policy does not change, the QoS agent driver will detect it and won't
    call the QoS driver methods (based on the OVS agent QoS cache, storing
    port and QoS rules). This will lead to an unconfigured port.

    This patch solves this issue by detecting this double event and
    registering it as "removed_and_added". When the "added" port is
    handled, the QoS deletion method is called first (if needed) to remove
    the unneded artifacts (OVS registers, OF rules) and remove the QoS
    cache (port/QoS policy). Then the QoS policy is applied again on the
    port.

    NOTE: this is going to be quite difficult to be tested in a fullstack
    test.

    Conflicts:
          neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py

    Change-Id: I51eef168fa8c18a3e4cee57c9ff86046ea9203fd
    Closes-Bug: #1845161
    (cherry picked from commit 50ffa5173db03b0fd0fe7264e4b2a905753f86ec)
    (cherry picked from commit 3eceb6d2ae5523a2658906cc086ea98e1be3209a)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.0.0.0b1

This issue was fixed in the openstack/neutron 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.7

This issue was fixed in the openstack/neutron 13.0.7 release.

tags: added: neutron-proactive-backport-potential
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

The change I51eef168fa8c18a3e4cee57c9ff86046ea9203fd was backported up to Queens.

Regards.

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers