Intermittently tempest run fails because SSH connection to instance fails - "ERROR ovsdbapp.backend.ovs_idl.transaction - RevisionConflict: OVN revision number for * (type: ports) is equal or higher than the given resource"

Bug #1871818 reported by Sandeep Yadav
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
High
Unassigned

Bug Description

Description
===========

Job periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train[1] sometimes fails during tempest run because ssh to instance is failing.

Tempest error - "SSHException: Error reading SSH protocol banner"
Ovn Error - "ERROR ovsdbapp.backend.ovs_idl.transaction - RevisionConflict: OVN revision number for * (type: ports) is equal or higher than the given resource"

Some example of failed jobs:-
===========================

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/8f33a89

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/5aacdf7/

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/d8e0d6a/

Issue is not consistent but seens ~4-5 in within last month, job build history details:-

[1] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train

Error Snippet
=============

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/8f33a89/job-output.txt

~~~
2020-04-09 02:17:22.883221 | primary | TASK [os_tempest : Generate test-list file] ************************************
2020-04-09 02:17:22.883312 | primary | Thursday 09 April 2020 02:17:22 +0000 (0:00:01.150) 0:55:32.125 ********
2020-04-09 02:17:27.269093 | primary | ok: [undercloud]
2020-04-09 02:17:27.278638 | primary |
2020-04-09 02:17:27.278709 | primary | PLAY RECAP *********************************************************************
2020-04-09 02:17:27.278922 | primary | overcloud-controller-0 : ok=1 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
2020-04-09 02:17:27.279058 | primary | overcloud-novacompute-0 : ok=1 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
2020-04-09 02:17:27.279167 | primary | overcloud-novacompute-1 : ok=1 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
2020-04-09 02:17:27.279265 | primary | undercloud : ok=42 changed=19 unreachable=0 failed=1 skipped=191 rescued=0 ignored=0
2020-04-09 02:17:27.279283 | primary |
2020-04-09 02:17:27.279348 | primary | Thursday 09 April 2020 02:17:27 +0000 (0:00:04.396) 0:55:36.522 ********
2020-04-09 02:17:27.279406 | primary | ===============================================================================
2020-04-09 02:17:27.280681 | primary | os_tempest : Execute tempest tests ----------------------------------- 3196.71s
2020-04-09 02:17:27.280873 | primary | os_tempest : Install distro packages ----------------------------------- 12.86s
2020-04-09 02:17:27.280920 | primary | os_tempest : Executing python-tempestconf ------------------------------ 12.73s
2020-04-09 02:17:27.281004 | primary | os_tempest : Create router --------------------------------------------- 11.62s
2020-04-09 02:17:27.281057 | primary | os_tempest : List tempest tests ----------------------------------------- 7.34s
2020-04-09 02:17:27.281153 | primary | os_tempest : Add tempest users ------------------------------------------ 7.16s
2020-04-09 02:17:27.281213 | primary | os_tempest : Ensure private network exists ------------------------------ 6.16s
2020-04-09 02:17:27.281259 | primary | os_tempest : Add tempest projects --------------------------------------- 6.11s
2020-04-09 02:17:27.281303 | primary | os_tempest : Generate test-list file ------------------------------------ 4.40s
2020-04-09 02:17:27.281347 | primary | os_tempest : Ensure public subnet exists -------------------------------- 4.33s
2020-04-09 02:17:27.281392 | primary | Gathering Facts --------------------------------------------------------- 4.17s
2020-04-09 02:17:27.281435 | primary | os_tempest : Ensure public network exists ------------------------------- 4.13s
2020-04-09 02:17:27.281479 | primary | os_tempest : Ensure private subnet exists ------------------------------- 3.99s
2020-04-09 02:17:27.281524 | primary | os_tempest : Generate tempest test whitelist ---------------------------- 3.10s
2020-04-09 02:17:27.281568 | primary | os_tempest : Override tempestconf profile ------------------------------- 3.08s
2020-04-09 02:17:27.281612 | primary | os_tempest : Generate tempest test blacklist ---------------------------- 2.87s
2020-04-09 02:17:27.281656 | primary | os_tempest : Initialise tempest workspace ------------------------------- 2.72s
2020-04-09 02:17:27.281700 | primary | os_tempest : Create tempest directories --------------------------------- 2.61s
2020-04-09 02:17:27.281744 | primary | os_tempest : Get the admin user project id ------------------------------ 2.51s
2020-04-09 02:17:27.281830 | primary | Copy clouds.yaml file from user's home to /root directory --------------- 2.02s
2020-04-09 02:17:27.410939 | primary | +(./toci_quickstart.sh:161): exit_value=2
2020-04-09 02:17:27.411603 | primary | +(./toci_quickstart.sh:164): [[ 2 == 0 ]]
2020-04-09 02:17:27.411710 | primary | +(./toci_quickstart.sh:165): [[ 2 != 0 ]]
2020-04-09 02:17:27.411853 | primary | +(./toci_quickstart.sh:165): echo 'Playbook run of baremetal-full-overcloud-validate.yml failed'
2020-04-09 02:17:27.411955 | primary | Playbook run of baremetal-full-overcloud-validate.yml failed
~~~

Tempest logs:-
=============

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/8f33a89/logs/undercloud/home/zuul/tempest/tempest.log.txt.gz

~~~
2020-04-09 02:08:48.535 359177 ERROR paramiko.transport [-] Traceback (most recent call last):: SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.535 359177 ERROR paramiko.transport [-] File "/usr/lib/python2.7/site-packages/paramiko/transport.py", line 1966, in run: SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.535 359177 ERROR paramiko.transport [-] self._check_banner(): SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.535 359177 ERROR paramiko.transport [-] File "/usr/lib/python2.7/site-packages/paramiko/transport.py", line 2143, in _check_banner: SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.536 359177 ERROR paramiko.transport [-] "Error reading SSH protocol banner" + str(e): SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.536 359177 ERROR paramiko.transport [-] SSHException: Error reading SSH protocol banner: SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.536 359177 ERROR paramiko.transport [-] : SSHException: Error reading SSH protocol banner
2020-04-09 02:08:48.570 359177 WARNING tempest.lib.common.ssh [-] Failed to establish authenticated ssh connection to cirros@10.100.0.8 (Error reading SSH protocol banner). Number attempts: 4. Retry after 5 seconds.: SSHException: Error reading SSH protocol banner
2020-04-09 02:08:53.130 359178 WARNING tempest.lib.common.ssh [-] Failed to establish authenticated ssh connection to cirros@10.0.0.112 (timed out). Number attempts: 1. Retry after 2 seconds.: timeout: timed out
2020-04-09 02:08:54.075 359177 INFO tempest.lib.common.ssh [-] Creating ssh connection to '10.0.0.104:22' as 'cirros' with public key authentication

~~~

OVN logs:-
========

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/8f33a89/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz

~~~
2020-04-09 02:03:05.229 32 ERROR ovsdbapp.backend.ovs_idl.transaction [req-538884bb-e133-4347-b772-d028e70df8e5 8f2ae947bd63455eb933da08738d5310 27d9389e95d547b881e22c1483fd6c87 - default default] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/commands.py", line 1017, in run_idl
    resource_id=self.name, resource_type=self.resource_type)
RevisionConflict: OVN revision number for d1c27633-fb38-400f-937a-abad7db412b1 (type: ports) is equal or higher than the given resource. Skipping update
~~~

Tags: alert
tags: added: alert
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Hi,

Found one interesting log:-

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/8f33a89/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

~~~
{0} neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle [199.260836s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/scenario/test_trunk.py", line 221, in test_trunk_subport_lifecycle
        self._wait_for_port(port)
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/scenario/test_trunk.py", line 141, in _wait_for_port
        "status {!r}.".format(port['id'], status)))
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/common/utils.py", line 82, in wait_until_true
        raise exception
    RuntimeError: Timed out waiting for port u'2ff1e990-aaba-4067-bbc3-22eea33012c8' to transition to get status 'ACTIVE'.

Revision history for this message
Jakub Libosvar (libosvar) wrote :
Download full text (3.8 KiB)

I looked at the test results and there was only one failing tests, here is my observing so far:

==============================
Failed 1 tests - output below:
==============================

neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle[id-bb13fe28-f152-4000-8131-37890a40c79e]
--------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/scenario/test_trunk.py", line 221, in test_trunk_subport_lifecycle
        self._wait_for_port(port)
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/scenario/test_trunk.py", line 141, in _wait_for_port
        "status {!r}.".format(port['id'], status)))
      File "/usr/lib/python2.7/site-packages/neutron_tempest_plugin/common/utils.py", line 82, in wait_until_true
        raise exception
    RuntimeError: Timed out waiting for port u'2ff1e990-aaba-4067-bbc3-22eea33012c8' to transition to get status 'ACTIVE'.

I checked the server logs and the subport association finished fine:

2020-04-09 01:46:22.617 32 DEBUG networking_ovn.ml2.trunk_driver [req-61a93d21-64b4-4eea-99f8-7ed33ea83754 f41d513e64b6497e82da9c448c1bdc08 670053b5f38746959786934f92c55d1c - default default] Done setting parent c528340a-8159-4cd8-8ba9-2ebd839fb4e9 for subport 2ff1e990-aaba-4067-bbc3-22eea33012c8 _set_binding_profile /usr/lib/python2.7/site-packages/networking_ovn/ml2/trunk_driver.py:93

2020-04-09 01:46:22.619 32 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): SetLSwitchPortCommand(lport=2ff1e990-aaba-4067-bbc3-22eea33012c8, if_exists=True, columns={'parent_name': 'c528340a-8159-4cd8-8ba9-2ebd839fb4e9', 'tag': 4}) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

2020-04-09 01:46:22.666 33 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortUpdateUpEvent(events=('update',), table='Logical_Switch_Port', conditions=(('up', '=', True),), old_conditions=(('up', '=', False),)) to row=Logical_Switch_Port(parent_name=[u'c528340a-8159-4cd8-8ba9-2ebd839fb4e9'], name=2ff1e990-aaba-4067-bbc3-22eea33012c8, enabled=[True], ha_chassis_group=[], up=[True], port_security=[u'fa:16:3e:6d:f7:9a'], dhcpv6_options=[], tag_request=[], dynamic_addresses=[], external_ids={u'neutron:cidrs': u'', u'neutron:revision_number': u'1', u'neutron:port_name': u'', u'neutron:network_name': u'neutron-62d8faf1-be2c-4829-8973-d133c7c2d33e', u'neutron:project_id': u'670053b5f38746959786934f92c55d1c', u'neutron:security_group_ids': u'215a8850-34f1-40d5-a468-814f4fc08949', u'neutron:device_id': u'', u'neutron:device_owner': u''}, tag=[4], type=, options={u'requested-chassis': u''}, dhcpv4_options=[], addresses=[u'fa:16:3e:6d:f7:9a']) old=Logical_Switch_Port(up=[False]) matches /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/event.py:44

2020-04-09 01:46:22.667 33 INFO networking_ovn.ml2.mech_driver [req-828a35a7-bca2-4425-a1be-fb94e87c4945 - - - - -] OVN reports status up for port: 2ff1e990-aaba-4067-bbc3-22e...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/719768

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.