Functional qos related tests fails often

Bug #1818613 reported by Slawek Kaplonski
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Rodolfo Alonso

Bug Description

Various QoS related tests are failing often recently. In all cases reason is the same: "ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Port with name=cc566ab0-4201-44b5-ae89-d342284ffdd6" during "_minimum_bandwidth_initialize".

Stacktrace:

ft1.1: neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_policy_rule_delete(ingress)_StringException: Traceback (most recent call last):
  File "neutron/tests/base.py", line 174, in func
    return f(self, *args, **kwargs)
  File "neutron/tests/functional/agent/l2/extensions/test_ovs_agent_qos_extension.py", line 354, in test_policy_rule_delete
    port_dict = self._create_port_with_qos()
  File "neutron/tests/functional/agent/l2/extensions/test_ovs_agent_qos_extension.py", line 172, in _create_port_with_qos
    self.setup_agent_and_ports([port_dict])
  File "neutron/tests/functional/agent/l2/base.py", line 375, in setup_agent_and_ports
    ancillary_bridge=ancillary_bridge)
  File "neutron/tests/functional/agent/l2/base.py", line 116, in create_agent
    ext_mgr, self.config)
  File "neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 256, in __init__
    self.connection, constants.EXTENSION_DRIVER_TYPE, agent_api)
  File "neutron/agent/agent_extensions_manager.py", line 54, in initialize
    extension.obj.initialize(connection, driver_type)
  File "neutron/agent/l2/extensions/qos.py", line 207, in initialize
    self.qos_driver.initialize()
  File "neutron/plugins/ml2/drivers/openvswitch/agent/extension_drivers/qos_driver.py", line 57, in initialize
    self._minimum_bandwidth_initialize()
  File "neutron/plugins/ml2/drivers/openvswitch/agent/extension_drivers/qos_driver.py", line 52, in _minimum_bandwidth_initialize
    self.br_int.clear_minimum_bandwidth_qos()
  File "neutron/agent/common/ovs_lib.py", line 1006, in clear_minimum_bandwidth_qos
    self.ovsdb.db_destroy('QoS', qos_id).execute(check_error=True)
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 40, in execute
    txn.add(self)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/api.py", line 112, in transaction
    del self._nested_txns_map[cur_thread_id]
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/api.py", line 69, in __exit__
    self.result = self.commit()
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
    raise result.ex
ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Port with name=cc566ab0-4201-44b5-ae89-d342284ffdd6

Example failure: http://logs.openstack.org/74/640874/1/check/neutron-functional-python27/d51cd50/logs/testr_results.html.gz

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22line%2052%2C%20in%20_minimum_bandwidth_initialize%5C%22

Revision history for this message
LIU Yulong (dragon889) wrote : Re: Functional/fullstack qos related tests fails often
summary: - Functional qos tests fails often
+ Functional/fullstack qos related tests fails often
Revision history for this message
LIU Yulong (dragon889) wrote :

http://logs.openstack.org/47/638647/7/check/neutron-functional/ce2f9fd/logs/testr_results.html.gz, this is the mainly issue I met recently. I can reproduce these failure locally with 10-20% probability.

Revision history for this message
LIU Yulong (dragon889) wrote :

This is similar to the bug description.
http://logs.openstack.org/41/638641/3/check/neutron-functional/07d58ad/logs/testr_results.html.gz
But the failure cases are:
test_update_minimum_bandwidth_queue
test_port_creation_with_different_dscp_markings

Revision history for this message
LIU Yulong (dragon889) wrote :

This case, we met the 'ovsdbapp' commit timeout:

http://logs.openstack.org/45/638645/5/check/neutron-functional-python27/19ecc65/logs/testr_results.html.gz

ft1.34: neutron.tests.functional.agent.test_ovs_lib.OVSBridgeTestCase.test_get_vif_port_by_id_StringException: Traceback (most recent call last):
  File "neutron/tests/base.py", line 174, in func
    return f(self, *args, **kwargs)
  File "neutron/tests/functional/agent/test_ovs_lib.py", line 342, in test_get_vif_port_by_id
    vif_ports = [self.create_ovs_vif_port() for i in range(3)]
  File "neutron/tests/functional/agent/test_ovs_lib.py", line 56, in create_ovs_vif_port
    port_name, ofport = self.create_ovs_port(attrs)
  File "neutron/tests/functional/agent/test_ovs_lib.py", line 47, in create_ovs_port
    return (port_name, self.br.add_port(port_name, *attrs.items()))
  File "neutron/agent/common/ovs_lib.py", line 297, in add_port
    *interface_attr_tuples))
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/api.py", line 112, in transaction
    del self._nested_txns_map[cur_thread_id]
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/api.py", line 69, in __exit__
    self.result = self.commit()
  File "/opt/stack/new/neutron/.tox/dsvm-functional-python27/local/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
    timeout=self.timeout)
ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.schema.open_vswitch.commands.AddPortCommand object at 0x7f416f2281d0>, <ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7f416e6737d0>] exceeded timeout 10 seconds

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Lets keep this one related to functional tests. For issue with fullstack there is https://bugs.launchpad.net/neutron/+bug/1818697

summary: - Functional/fullstack qos related tests fails often
+ Functional qos related tests fails often
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

I found in [1] that sometimes, when the OVSDB is too loaded (that could happens during FTs), there is a delay between the OVSDB post transaction end and when the register (or parameter) can be read. Although this is something that should not happen (considering the OVSDB is transactional), we should deal with this inconvenience and provide a robust method to retrieve a value and at the same time check it's value, providing a retry method to retrieve again it in case of error.

I'll push a patch to test this theory.

[1] https://review.openstack.org/#/c/482226/57/vif_plug_ovs/tests/functional/ovsdb/test_ovsdb_lib.py@63

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/641117

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Lajos Katona (lajos-katona) wrote :
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

To summarize the status of this bug. There are several errors reported here:

1) The OVSDB check failures [1]. As commented in [2], those problems can be addresses with the implementation done in [3].

2) The OVS QoS minimum bandwidth initialization [4]. This problem is reported in [5] and addressed in [6]. I'll update the patch bug description not to this one but to 1818859, to keep each problem in different reports.

3) The ovsdbapp timeout [7]. This problem should be addressed in other bug. Actually there is a RFE [8] to configure two new parameters (inactivity_probe and max_backoff) with the aim of solving the problem we are finding here. This problem should be addressed in this bug/RFE.

Regards.

[1] http://logs.openstack.org/74/640874/1/check/neutron-functional-python27/d51cd50/logs/testr_results.html.gz
[2] https://bugs.launchpad.net/neutron/+bug/1818613/comments/6
[3] https://review.openstack.org/#/c/641117/
[4] http://logs.openstack.org/74/640874/1/check/neutron-functional-python27/d51cd50/logs/testr_results.html.gz
[5] https://bugs.launchpad.net/neutron/+bug/1818859
[6] https://review.openstack.org/#/c/641411/
[7] http://logs.openstack.org/45/638645/5/check/neutron-functional-python27/19ecc65/logs/testr_results.html.gz
[8] https://bugs.launchpad.net/neutron/+bug/1817022

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/641117
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=92f1281b696c79133609d3c04b467ac7ea9f4337
Submitter: Zuul
Branch: master

commit 92f1281b696c79133609d3c04b467ac7ea9f4337
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Mar 5 18:37:44 2019 +0000

    Add a more robust method to check OVSDB values in BaseOVSTestCase

    Sometimes, when the OVSDB is too loaded (that could happen during the
    functional tests), there is a delay between the OVSDB post transaction
    end and when the register (new or updated) can be read. Although this is
    something that should not happen (considering the OVSDB is transactional),
    tests should deal with this inconvenience and provide a robust method to
    retrieve a value and at the same time check the value. This new method
    should provide a retrieving mechanism to read again the value in case of
    discordance.

    In order to solve the gate problem ASAP, another bug is fixed in this
    patch: to skip the QoS removal when OVS agent is initialized during
    funtional tests

    When executing functional tests, several OVS QoS policies specific for
    minimum bandwidth rules [1]. Because during the functional tests
    execution several threads can create more than one minimum bandwidth
    QoS policy (something in a production environment cannot happen), the
    OVS QoS driver must skip the execution of [2] to avoid removing other
    QoS created in parellel in other tests.

    This patch is marking as unstable "test_min_bw_qos_policy_rule_lifecycle"
    and "test_bw_limit_qos_port_removed". Those tests will be investigated
    once the CI gates are stable.

    [1] Those QoS policies are created only to hold minimum bandwidth rules.
        Those policies are marked with:
           external_ids: {'_type'='minimum_bandwidth'}
    [2] https://github.com/openstack/neutron/blob/d6fba30781c5f4e63beeda04d065226660fc92b6/neutron/plugins/ml2/drivers/openvswitch/agent/extension_drivers/qos_driver.py#L43

    Closes-Bug: #1818613
    Closes-Bug: #1818859
    Related-Bug: #1819125

    Change-Id: Ia725cc1b36bc3630d2891f86f76b13c16f6cc37c

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b3

This issue was fixed in the openstack/neutron 14.0.0.0b3 development milestone.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.