networking_ovn.tests.functional.test_metadata_agent.TestMetadataAgent.test_agent_resync_on_non_existing_bridge fails because agent doesn't refresh the settings

Bug #1830221 reported by Jakub Libosvar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Undecided
Jakub Libosvar

Bug Description

The test fails time to time with the following exception:

2019-05-23 04:14:02.029222 | primary | 2019-05-23 04:14:02.028 | Captured traceback:
2019-05-23 04:14:02.032488 | primary | 2019-05-23 04:14:02.031 | ~~~~~~~~~~~~~~~~~~~
2019-05-23 04:14:02.035876 | primary | 2019-05-23 04:14:02.035 | b'Traceback (most recent call last):'
2019-05-23 04:14:02.040320 | primary | 2019-05-23 04:14:02.039 | b' File "/opt/stack/new/networking-ovn/.tox/dsvm-functional/lib/python3.6/site-packages/neutron/common/utils.py", line 685, in wait_until_true'
2019-05-23 04:14:02.046230 | primary | 2019-05-23 04:14:02.045 | b' eventlet.sleep(sleep)'
2019-05-23 04:14:02.049845 | primary | 2019-05-23 04:14:02.047 | b' File "/opt/stack/new/networking-ovn/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/greenthread.py", line 36, in sleep'
2019-05-23 04:14:02.052983 | primary | 2019-05-23 04:14:02.052 | b' hub.switch()'
2019-05-23 04:14:02.056268 | primary | 2019-05-23 04:14:02.055 | b' File "/opt/stack/new/networking-ovn/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 297, in switch'
2019-05-23 04:14:02.059667 | primary | 2019-05-23 04:14:02.058 | b' return self.greenlet.switch()'
2019-05-23 04:14:02.062601 | primary | 2019-05-23 04:14:02.061 | b'eventlet.timeout.Timeout: 10 seconds'
2019-05-23 04:14:02.066974 | primary | 2019-05-23 04:14:02.065 | b''
2019-05-23 04:14:02.069651 | primary | 2019-05-23 04:14:02.068 | b'During handling of the above exception, another exception occurred:'
2019-05-23 04:14:02.071829 | primary | 2019-05-23 04:14:02.071 | b''
2019-05-23 04:14:02.075084 | primary | 2019-05-23 04:14:02.074 | b'Traceback (most recent call last):'
2019-05-23 04:14:02.078197 | primary | 2019-05-23 04:14:02.077 | b' File "/opt/stack/new/networking-ovn/.tox/dsvm-functional/lib/python3.6/site-packages/neutron/tests/base.py", line 176, in func'
2019-05-23 04:14:02.080841 | primary | 2019-05-23 04:14:02.080 | b' return f(self, *args, **kwargs)'
2019-05-23 04:14:02.084127 | primary | 2019-05-23 04:14:02.083 | b' File "/opt/stack/new/networking-ovn/networking_ovn/tests/functional/test_metadata_agent.py", line 180, in test_agent_resync_on_non_existing_bridge'
2019-05-23 04:14:02.777333 | primary | 2019-05-23 04:14:02.776 | b' exception=exc)'
2019-05-23 04:14:02.781301 | primary | 2019-05-23 04:14:02.779 | b' File "/opt/stack/new/networking-ovn/.tox/dsvm-functional/lib/python3.6/site-packages/neutron/common/utils.py", line 689, in wait_until_true'
2019-05-23 04:14:02.783213 | primary | 2019-05-23 04:14:02.782 | b' raise exception'
2019-05-23 04:14:02.785957 | primary | 2019-05-23 04:14:02.785 | b"Exception: Agent bridge hasn't changed from br-int to br-new in 10 seconds after Port_Binding event"
2019-05-23 04:14:02.788135 | primary | 2019-05-23 04:14:02.787 | b''

Changed in networking-ovn:
assignee: nobody → Jakub Libosvar (libosvar)
Revision history for this message
Jakub Libosvar (libosvar) wrote :

UPD: The reason agent doesn't load new configuration is that there is no Port_Binding event emitted. That's because the SB DB is not even updated after lsp_bind() call. The transaction was aborted:

2019-05-24 13:37:54.560 18162 INFO ovsdbapp.backend.ovs_idl.transaction [req-07c1404f-5dad-4f47-96e2-06b48835218e - - - - -] Running txn n=1 command(idx=0): LspBindCommand(port=ovn-port-ce658053-294d-4e7d-90fe-4ac5ddd36b6b, chassis=34e7a9f8-fc4d-48db-8993-5aaac5106348, may_exist=False)
2019-05-24 13:37:54.560 18162 INFO ovsdbapp.backend.ovs_idl.transaction [req-07c1404f-5dad-4f47-96e2-06b48835218e - - - - -] Transaction aborted

At this point it's unknown why the transaction gets aborted.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661628

Changed in networking-ovn:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.opendev.org/661628
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=e9c4a51e8179245eb308d90e1b35baa914227e64
Submitter: Zuul
Branch: master

commit e9c4a51e8179245eb308d90e1b35baa914227e64
Author: Jakub Libosvar <email address hidden>
Date: Mon May 27 14:42:28 2019 +0000

    functional: Wait for PB event before binding the port

    On busy boxes there is a chance ovn-northd doesn't create a port binding
    entry quick enough in the Southbound DB when a new logical switch port
    is created. It can happen that test attempts to bind a port that doesn't
    exist in the southbound DB yet, because it's being processed by
    ovn-northd.

    This patch adds a one time event and the test waits 5 seconds for the
    port binding to be created. Only after that it calls lsp_bind().

    The patch also makes the lsp_bind fails in case of failure inside of
    ovsdbapp, previously, the exception was swallowed and it made debugging
    a lot harder.

    It was required to bump ovsdbapp version from 0.10.0 to 0.14.0 because
    that's the version that implements WaitEvent class.

    Closes bug: #1830221

    Change-Id: Ieb48056ef728b12ced01f2829dc2d57b9bb6b4dc

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/661685

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/662181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.opendev.org/662181
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=495b451e61c24f60c48a0a64214c7cdbfbab414b
Submitter: Zuul
Branch: stable/rocky

commit 495b451e61c24f60c48a0a64214c7cdbfbab414b
Author: Jakub Libosvar <email address hidden>
Date: Mon May 27 14:42:28 2019 +0000

    functional: Wait for PB event before binding the port

    On busy boxes there is a chance ovn-northd doesn't create a port binding
    entry quick enough in the Southbound DB when a new logical switch port
    is created. It can happen that test attempts to bind a port that doesn't
    exist in the southbound DB yet, because it's being processed by
    ovn-northd.

    This patch adds a one time event and the test waits 5 seconds for the
    port binding to be created. Only after that it calls lsp_bind().

    The patch also makes the lsp_bind fails in case of failure inside of
    ovsdbapp, previously, the exception was swallowed and it made debugging
    a lot harder.

    It was required to bump ovsdbapp version from 0.10.0 to 0.14.0 because
    that's the version that implements WaitEvent class.

    UPDATE: Rocky has ovsdbapp version of 0.12.3 - this patch bumps the
            version to 0.12.4 that is about to be released and contains the
            change this patch requires.

    Closes bug: #1830221

    Depends-On: https://review.opendev.org/#/c/661797

    Change-Id: Ieb48056ef728b12ced01f2829dc2d57b9bb6b4dc
    (cherry picked from commit e9c4a51e8179245eb308d90e1b35baa914227e64)
    (cherry picked from commit 57160e81b50b93dbdf64b92c0b4aa5dead7effe2)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 7.0.0.0b1

This issue was fixed in the openstack/networking-ovn 7.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 6.0.1

This issue was fixed in the openstack/networking-ovn 6.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.1.0

This issue was fixed in the openstack/networking-ovn 5.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.