Fullstack test test_min_bw_qos_policy_rule_lifecycle failing often

Bug #1819125 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Rodolfo Alonso

Bug Description

Fullstack test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_min_bw_qos_policy_rule_lifecycle is often failing.

Stack trace:

ft1.2: neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_min_bw_qos_policy_rule_lifecycle(egress,openflow-cli)_StringException: Traceback (most recent call last):
  File "/opt/stack/new/neutron/neutron/common/utils.py", line 685, in wait_until_true
    eventlet.sleep(sleep)
  File "/opt/stack/new/neutron/.tox/dsvm-fullstack/lib/python3.5/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/opt/stack/new/neutron/.tox/dsvm-fullstack/lib/python3.5/site-packages/eventlet/hubs/hub.py", line 297, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 60 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/new/neutron/neutron/tests/base.py", line 174, in func
    return f(self, *args, **kwargs)
  File "/opt/stack/new/neutron/neutron/tests/fullstack/test_qos.py", line 655, in test_min_bw_qos_policy_rule_lifecycle
    self._wait_for_min_bw_rule_applied(vm, MIN_BANDWIDTH, self.direction)
  File "/opt/stack/new/neutron/neutron/tests/fullstack/test_qos.py", line 675, in _wait_for_min_bw_rule_applied
    lambda: vm.bridge.get_egress_min_bw_for_port(
  File "/opt/stack/new/neutron/neutron/common/utils.py", line 690, in wait_until_true
    raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 60 seconds

Example of failure: http://logs.openstack.org/83/574783/39/check/neutron-fullstack/76eb05d/logs/testr_results.html.gz

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22line%20655%2C%20in%20test_min_bw_qos_policy_rule_lifecycle%5C%22

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/642000

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/642001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Rodolfo Alonso Hernandez (<email address hidden>) on branch: master
Review: https://review.openstack.org/642001
Reason: This patch was squashed in https://review.openstack.org/#/c/641117/, in order to push a unique patch to stabilize the CI gates.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/641117
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=92f1281b696c79133609d3c04b467ac7ea9f4337
Submitter: Zuul
Branch: master

commit 92f1281b696c79133609d3c04b467ac7ea9f4337
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Mar 5 18:37:44 2019 +0000

    Add a more robust method to check OVSDB values in BaseOVSTestCase

    Sometimes, when the OVSDB is too loaded (that could happen during the
    functional tests), there is a delay between the OVSDB post transaction
    end and when the register (new or updated) can be read. Although this is
    something that should not happen (considering the OVSDB is transactional),
    tests should deal with this inconvenience and provide a robust method to
    retrieve a value and at the same time check the value. This new method
    should provide a retrieving mechanism to read again the value in case of
    discordance.

    In order to solve the gate problem ASAP, another bug is fixed in this
    patch: to skip the QoS removal when OVS agent is initialized during
    funtional tests

    When executing functional tests, several OVS QoS policies specific for
    minimum bandwidth rules [1]. Because during the functional tests
    execution several threads can create more than one minimum bandwidth
    QoS policy (something in a production environment cannot happen), the
    OVS QoS driver must skip the execution of [2] to avoid removing other
    QoS created in parellel in other tests.

    This patch is marking as unstable "test_min_bw_qos_policy_rule_lifecycle"
    and "test_bw_limit_qos_port_removed". Those tests will be investigated
    once the CI gates are stable.

    [1] Those QoS policies are created only to hold minimum bandwidth rules.
        Those policies are marked with:
           external_ids: {'_type'='minimum_bandwidth'}
    [2] https://github.com/openstack/neutron/blob/d6fba30781c5f4e63beeda04d065226660fc92b6/neutron/plugins/ml2/drivers/openvswitch/agent/extension_drivers/qos_driver.py#L43

    Closes-Bug: #1818613
    Closes-Bug: #1818859
    Related-Bug: #1819125

    Change-Id: Ia725cc1b36bc3630d2891f86f76b13c16f6cc37c

tags: added: neutron-proactive-backport-potential
Miguel Lavalle (minsel)
Changed in neutron:
status: Confirmed → Fix Committed
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

"clear_minimum_bandwidth_qos" should not be executed when the ovs agent is started. As we can see in the following logs, the QoS min rule created by "test_min_bw_qos_policy_rule_lifecycle_egress" is deleted 100ms after in "test_bw_limit_qos_rules_changed_l2_agent_restart_egress" when the agent is restarted.

TestMinBwQoSOvs.test_min_bw_qos_policy_rule_lifecycle_egress,openflow-cli_/neutron-openvswitch-agent--2019-03-22--17-31-46-949318_log.txt.gz
595:2019-03-22 17:32:04.791 25537 DEBUG neutron.agent.common.ovs_lib [req-cd1834d9-fd79-4397-bbfb-e391cab67a48 - - - - -] ralonsoh - _list_queues() - queues: [{'_uuid': UUID('f5919c7f-2cff-489a-b296-c7197b0f52f9'), 'external_ids': {'id': 'port33d3c0', 'queue_type': '0'}, 'other_config': {'burst': '100000', 'max-rate': '500000'}}, {'_uuid': UUID('9d386da4-7ce0-48e0-9c9d-a6c49d1f26eb'), 'external_ids': {'port': '2f5b97bd-8b57-4bb2-9d39-aa282d91d93c', 'queue-num': '3', 'type': 'minimum_bandwidth'}, 'other_config': {'min-rate': '200000'}}] _list_queues /home/zuul/src/git.openstack.org/openstack/neutron/neutron/agent/common/ovs_lib.py:1066
596:2019-03-22 17:32:04.792 25537 DEBUG neutron.agent.common.ovs_lib [req-cd1834d9-fd79-4397-bbfb-e391cab67a48 - - - - -] ralonsoh - _update_queue() - queue: {'_uuid': UUID('9d386da4-7ce0-48e0-9c9d-a6c49d1f26eb'), 'external_ids': {'port': '2f5b97bd-8b57-4bb2-9d39-aa282d91d93c', 'queue-num': '3', 'type': 'minimum_bandwidth'}, 'other_config': {'min-rate': '200000'}} _update_queue /home/zuul/src/git.openstack.org/openstack/neutron/neutron/agent/common/ovs_lib.py:1048
601:2019-03-22 17:32:04.795 25537 DEBUG ovsdbapp.backend.ovs_idl.transaction [req-cd1834d9-fd79-4397-bbfb-e391cab67a48 - - - - -] Running txn n=1 command(idx=0): DbSetCommand(table=QoS, record=f63bb53a-b5be-48c3-99e3-53a7588a16da, col_values=(('queues', {3: UUID('9d386da4-7ce0-48e0-9c9d-a6c49d1f26eb')}),)) do_commit /home/zuul/src/git.openstack.org/openstack/neutron/.tox/dsvm-fullstack/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

TestBwLimitQoSOvs.test_bw_limit_qos_rules_changed_l2_agent_restart_egress,openflow-cli_/neutron-openvswitch-agent--2019-03-22--17-31-59-548846_log.txt.gz
384:2019-03-22 17:32:04.900 26913 DEBUG neutron.agent.common.ovs_lib [req-28c7ec1c-66dd-495e-9d8c-1afc2dfde3e9 - - - - -] ralonsoh - clear_minimum_bandwidth_qos() - queue_uuid: 9d386da4-7ce0-48e0-9c9d-a6c49d1f26eb clear_minimum_bandwidth_qos /home/zuul/src/git.openstack.org/openstack/neutron/neutron/agent/common/ovs_lib.py:1020
385:2019-03-22 17:32:04.904 26913 DEBUG ovsdbapp.backend.ovs_idl.transaction [req-28c7ec1c-66dd-495e-9d8c-1afc2dfde3e9 - - - - -] Running txn n=1 command(idx=0): DbDestroyCommand(table=Queue, record=9d386da4-7ce0-48e0-9c9d-a6c49d1f26eb) do_commit /home/zuul/src/git.openstack.org/openstack/neutron/.tox/dsvm-fullstack/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/646082

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Rodolfo Alonso Hernandez (<email address hidden>) on branch: master
Review: https://review.openstack.org/642000
Reason: Fix proposed in https://review.openstack.org/#/c/646082/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/646082
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=531fdc336b1b6b74de1b148b0dfe9eebd9e5cdc3
Submitter: Zuul
Branch: master

commit 531fdc336b1b6b74de1b148b0dfe9eebd9e5cdc3
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Mar 24 20:44:27 2019 +0000

    Mock OVSBrdge.clear_minimum_bandwidth_qos in fullstack tests

    This function will not be executed when the OVS agent is started,
    in order to keep QoS and Queue registers in OVS BD created by other
    tests.

    Change-Id: I054510403a4f46544ff78ee2f6babb1247726553
    Closes-Bug: #1819125

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.openstack.org/650256

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.openstack.org/650256
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f77089aca8a1b6bbabfe11db39d49d7872715d2d
Submitter: Zuul
Branch: stable/stein

commit f77089aca8a1b6bbabfe11db39d49d7872715d2d
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Mar 24 20:44:27 2019 +0000

    Mock OVSBrdge.clear_minimum_bandwidth_qos in fullstack tests

    This function will not be executed when the OVS agent is started,
    in order to keep QoS and Queue registers in OVS BD created by other
    tests.

    Change-Id: I054510403a4f46544ff78ee2f6babb1247726553
    Closes-Bug: #1819125
    (cherry picked from commit 531fdc336b1b6b74de1b148b0dfe9eebd9e5cdc3)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.1

This issue was fixed in the openstack/neutron 14.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.