[OVN] neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log randomly fails

Bug #1868110 reported by Maciej Jozefczyk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Maciej Jozefczyk

Bug Description

The functional test
neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log

Randomly fails on our CI.

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :
Changed in neutron:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
Changed in neutron:
status: New → Confirmed
importance: Undecided → High
tags: added: functional-tests gate-failure
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

The issue for both of those failures is timeout: exceeded timeout 5 seconds.

2020-03-19 20:26:57.782 22962 ERROR neutron.plugins.ml2.managers ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.AddLSwitchPortCommand object at 0x7fba9ca44a58>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7fba9ca44588>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7fba9ca44630>] exceeded timeout 5 seconds

ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.AddLRouterPortCommand object at 0x7fdd739eb4e0>, <neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.SetLRouterPortInLSwitchPortCommand object at 0x7fdd72dc04e0>] exceeded timeout 5 seconds

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

This looks very similar to https://bugs.launchpad.net/neutron/+bug/1815142
but now it happens on OVN databases.

Looks like we could reuse changes:
https://review.opendev.org/#/c/641681/7
https://review.opendev.org/#/c/642721/9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/717704

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/717704
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f93aebe7903becb7289e08e0a9237170818b3a8b
Submitter: Zuul
Branch: master

commit f93aebe7903becb7289e08e0a9237170818b3a8b
Author: Maciej Józefczyk <email address hidden>
Date: Mon Apr 6 10:02:56 2020 +0000

    [OVN] Bump up transaction timeout for functional tests

    On heavy loaded environments, like Neutron gates, we can
    observe sporadic failures of functional tests, that are
    timeouts.

    Lets increase the timeout value to 15 seconds for functional
    tests because looks like 5 seconds is not enought.

    Change-Id: I327de751e3ba26c5be03b2571b105492661999cb
    Closes-Bug: 1868110

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :
Changed in neutron:
status: Fix Released → In Progress
status: In Progress → Confirmed
status: Confirmed → In Progress
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

According to the ^

First link - OVSDB Timeout after 15 seconds:

2020-04-14 11:29:00.680 22734 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client [req-8db4b189-1c47-42e2-9a33-b8b0e81c1176 - tenid - - -] FIP {'logical_ip': '40.0.0.199', 'external_ip': '100.0.0.32'} doesn't have external_ids. _delete_floatingip /home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py:1061
2020-04-14 11:29:00.682 22734 ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [req-b119e034-17b9-407b-b0c6-0f2f7cb1521d - - - - -] Maintenance task: Failed to fix resource f172d01d-7862-4347-b52d-fa6da7110ce6 (type: networks): ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.schema.ovn_northbound.commands.LsAddCommand object at 0x7fbd5bd6c6a0>] exceeded timeout 15 seconds

2. Link - issues is different:
The server could not comply with the request since it is either malformed or otherwise incorrect. - POST failed (client error): There was a conflict when trying to complete your request.

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

In last 7 days only one test failed from this test class:

http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:*test_ovn_db_sync* AND build_name: \"neutron-functional\" AND message:failed

https://8b674b5a0f361eaf6358-e25d1f8bae8601aed7ec89510f7333bc.ssl.cf5.rackcdn.com/722793/2/check/neutron-functional/ab8d0a7/testr_results.html

https://8b674b5a0f361eaf6358-e25d1f8bae8601aed7ec89510f7333bc.ssl.cf5.rackcdn.com/722793/2/check/neutron-functional/ab8d0a7/controller/logs/dsvm-functional-logs/neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_repair_delete_ovn_nb_db/testrun.txt

This particular execution failed on:

2020-04-27 12:15:16.543 22717 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client [req-1a7c5782-7b42-47d7-8306-112d593ede53 - tenid - - -] FIP {'logical_ip': '40.0.0.82', 'external_ip': '100.0.0.32'} doesn't have external_ids. _delete_floatingip /home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py:973
2020-04-27 12:15:16.704 22717 ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [req-fb0b10f1-eea5-4f17-8970-5f2b62f45ac7 - - - - -] Maintenance task: Failed to fix resource 81dbfeae-5c4b-4f7f-99ba-7ed0a5dfa4f2 (type: networks): ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.schema.ovn_northbound.commands.LsAddCommand object at 0x7f6662b15390>] exceeded timeout 15 seconds

Tests from this class are not failing that often after increasing the timeout in with this change [1]. We can increase again, as default value in the config is 180 seconds (and we set 15 for now for functional tests) or leave it as it is.

[1] https://review.opendev.org/#/c/717704/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/725906

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/725906
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c9eeb5debd012c81a7739c135b1d9f29d46a741e
Submitter: Zuul
Branch: master

commit c9eeb5debd012c81a7739c135b1d9f29d46a741e
Author: Maciej Józefczyk <email address hidden>
Date: Wed May 6 17:44:11 2020 +0200

    [OVN] Bump up transaction timeout for functional tests

    We still can see occasional failures because of timeouts
    in the test_ovn_db_sync tests, like 1 failure per week.

    As per agreement during last IRC meeting we decided to
    bump up the functional tests timout from 15 seconds to 30 seconds,
    because we're still pretty behind the default value of 180 seconds.

    Change-Id: Ib20cdd0bb7d24795c8bd5c84c6143000b9922b4d
    Related-Bug: 1868110

Revision history for this message
yatin (yatinkarel) wrote :

Closing at as not seeing this issue recently as timeout increased long back.

Changed in neutron:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.