tempest network test_list_agent test failed with mismatch error in train fs020

Bug #1855985 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

FS020 periodic train multinode job failed at following tempest tests:
http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/7e12b11/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

tempest.api.network.admin.test_agent_management.AgentManagementTestJSON.test_list_agent[id-9c80f04d-11f3-44a4-8738-ed2f879b0ff4]
--------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/tempest/api/network/admin/test_agent_management.py", line 49, in test_list_agent
        self.assertIn(self.agent, agents)
      File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 417, in assertIn
        self.assertThat(haystack, Contains(needle), message)
      File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
        raise mismatch_error
    testtools.matchers._impl.MismatchError: {u'binary': u'ovn-controller', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': False, u'topic': u'n/a', u'host': u'overcloud-novacompute-1.localdomain', u'agent_type': u'OVN Controller agent', u'id': u'f15d1f98-b0ce-4bf2-87b0-9ccd543b7d3e'} not in [{u'binary': u'ovn-controller', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': True, u'topic': u'n/a', u'host': u'overcloud-novacompute-1.localdomain', u'agent_type': u'OVN Controller agent', u'id': u'f15d1f98-b0ce-4bf2-87b0-9ccd543b7d3e'}, {u'binary': u'networking-ovn-metadata-agent', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': True, u'topic': u'n/a', u'host': u'overcloud-novacompute-1.localdomain', u'agent_type': u'OVN Metadata agent', u'id': u'906f7242-ac80-447d-8da4-95d47523e382'}, {u'binary': u'ovn-controller', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': True, u'topic': u'n/a', u'host': u'overcloud-novacompute-0.localdomain', u'agent_type': u'OVN Controller agent', u'id': u'3ab6df89-1af2-479f-8ca3-6ed74dc897a7'}, {u'binary': u'networking-ovn-metadata-agent', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': True, u'topic': u'n/a', u'host': u'overcloud-novacompute-0.localdomain', u'agent_type': u'OVN Metadata agent', u'id': u'208a64cf-7634-4428-9c14-696b69c064b0'}, {u'binary': u'ovn-controller', u'description': u'', u'admin_state_up': True, u'availability_zone': u'n/a', u'alive': True, u'topic': u'n/a', u'host': u'overcloud-controller-0.localdomain', u'agent_type': u'OVN Controller agent', u'id': u'40ef8c9b-d936-494c-9fce-1d16a6bb0d1b'}]

Captured pythonlogging:
~~~~~~~~~~~~~~~~~~~~~~~
    2019-12-11 01:19:35,607 307034 INFO [tempest.lib.common.rest_client] Request (AgentManagementTestJSON:test_list_agent): 200 GET http://10.0.0.5:9696/v2.0/agents 0.134s
    2019-12-11 01:19:35,607 307034 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
            Body: None
        Response - Headers: {'status': '200', u'content-length': '2192', 'content-location': 'http://10.0.0.5:9696/v2.0/agents', u'date': 'Wed, 11 Dec 2019 01:19:35 GMT', u'content-type': 'application/json', u'connection': 'close', u'x-openstack-request-id': 'req-e1e0c8a7-16e9-4cbb-8a9d-c6f8aa8b77ac'}
            Body: {"agents": [{"binary": "ovn-controller", "description": "", "availability_zone": "n/a", "heartbeat_timestamp": "2019-12-11 01:19:35.531509", "admin_state_up": true, "alive": true, "topic": "n/a", "host": "overcloud-novacompute-1.localdomain", "agent_type": "OVN Controller agent", "id": "f15d1f98-b0ce-4bf2-87b0-9ccd543b7d3e", "configurations": {"chassis_name": "f15d1f98-b0ce-4bf2-87b0-9ccd543b7d3e", "bridge-mappings": "datacentre:br-ex"}}, {"binary": "networking-ovn-metadata-agent", "description": "", "availability_zone": "n/a", "heartbeat_timestamp": "2019-12-11 01:19:35.567526", "admin_state_up": true, "alive": true, "topic": "n/a", "host": "overcloud-novacompute-1.localdomain", "agent_type": "OVN Metadata agent", "id": "906f7242-ac80-447d-8da4-95d47523e382", "configurations": {"chassis_name": "f15d1f98-b0ce-4bf2-87b0-9ccd543b7d3e", "bridge-mappings": "datacentre:br-ex"}}, {"binary": "ovn-controller", "description": "", "availability_zone": "n/a", "heartbeat_timestamp": "2019-12-11 01:19:35.578088", "admin_state_up": true, "alive": true, "topic": "n/a", "host": "overcloud-novacompute-0.localdomain", "agent_type": "OVN Controller agent", "id": "3ab6df89-1af2-479f-8ca3-6ed74dc897a7", "configurations": {"chassis_name": "3ab6df89-1af2-479f-8ca3-6ed74dc897a7", "bridge-mappings": "datacentre:br-ex"}}, {"binary": "networking-ovn-metadata-agent", "description": "", "availability_zone": "n/a", "heartbeat_timestamp": "2019-12-11 01:19:35.586821", "admin_state_up": true, "alive": true, "topic": "n/a", "host": "overcloud-novacompute-0.localdomain", "agent_type": "OVN Metadata agent", "id": "208a64cf-7634-4428-9c14-696b69c064b0", "configurations": {"chassis_name": "3ab6df89-1af2-479f-8ca3-6ed74dc897a7", "bridge-mappings": "datacentre:br-ex"}}, {"binary": "ovn-controller", "description": "", "availability_zone": "n/a", "heartbeat_timestamp": "2019-12-11 01:19:35.600684", "admin_state_up": true, "alive": true, "topic": "n/a", "host": "overcloud-controller-0.localdomain", "agent_type": "OVN Controller agent", "id": "40ef8c9b-d936-494c-9fce-1d16a6bb0d1b", "configurations": {"chassis_name": "40ef8c9b-d936-494c-9fce-1d16a6bb0d1b", "bridge-mappings": "datacentre:br-ex"}}]}

It is failing continously from last 2 days.
http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/2553b48/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

we checked the neutron and ovn logs at all controller and compute nothing interesting found.

Tags: tempest
summary: - tempest network test_list_agent test failed with mismatch error
+ tempest network test_list_agent test failed with mismatch error in train
+ fs020
Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

Looks like it might be related to this patch: https://review.opendev.org/#/c/696936/

Revision history for this message
chandan kumar (chkumar246) wrote :

There is one more failure
http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/7e12b11/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic[compute,id-e79f879e-debb-440c-a7e4-efeda05b6848,network]
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python2.7/site-packages/tempest/scenario/test_security_groups_basic_ops.py", line 508, in test_cross_tenant_traffic
        self._test_cross_tenant_allow(source_tenant, dest_tenant, ruleset)
      File "/usr/lib/python2.7/site-packages/tempest/scenario/test_security_groups_basic_ops.py", line 424, in _test_cross_tenant_allow
        self.check_remote_connectivity(access_point_ssh, ip, protocol=protocol)
      File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 1078, in check_remote_connectivity
        self.fail(msg)
      File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 690, in fail
        raise self.failureException(msg)
    AssertionError: Timed out waiting for 10.0.0.102 to become reachable from 10.0.0.117

I am not sure it is also linked with that.

Revision history for this message
chandan kumar (chkumar246) wrote :

tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic is not related, moved it to a different bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/698435

Revision history for this message
chandan kumar (chkumar246) wrote :

I am not sure it is a infra hiccup or something, sometimes it passes
http://logs.rdoproject.org/26/24026/3/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/478617c/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

{2} tempest.api.network.admin.test_agent_management.AgentManagementTestJSON.test_list_agent [0.197176s] ... ok

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

I've seen the list_agent failure before. It looks like a race condition in the tempest test itself since the test immediately stores the agent state on the class via a list() and calls list again later and makes sure that the first agent in the list (and it's fields) is in the list of fields returned later. If the agent alive status happens to be False when the class is initialized, then becomes alive at some point later, then the test will fail. I have also seen it fail because one test updates the description on the agent and cleans up after itself, but the list_agent test runs in between.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

I think the real problem with this being racy is just because the neutron_tempest_plugin versions of the tests are running at the same time that that in-tree tempest versions of the test_agent_management tests. Can we just remove the in-tree version of the tests at this point?

Revision history for this message
Marios Andreou (marios-b) wrote :

per comments 6/7/8 above... there are two green runs for the job from yesterday one check [1] and one periodic [2]

Does it make sense to even try and skip them then if it is what @otherwiseguy suggests? i.e. it might affect any other tests that are duplicated with the in tree ones?

Also do we want to combine this with https://bugs.launchpad.net/tripleo/+bug/1856016 if they have the same root cause?

[1] http://logs.rdoproject.org/51/24051/4/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/33c581a/
[2] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/56825e8/

Revision history for this message
chandan kumar (chkumar246) wrote :

Combining it from https://bugs.launchpad.net/tripleo/+bug/1856016

http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-train/7e12b11/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic[compute,id-e79f879e-debb-440c-a7e4-efeda05b6848,network] -> Failed

AssertionError: Timed out waiting for 10.0.0.102 to become reachable from 10.0.0.117

It is also related to race condition.
@marios, May be decreasing the tempest concurrency might help to slow down the api call and increase the tempest run time. If it does not work, we can move it to skip list?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/698664

Revision history for this message
chandan kumar (chkumar246) wrote :

@Terry, while looking at tempest and neutron tempest plugin https://opendev.org/openstack/tempest/src/branch/master/tempest/api/network/admin/test_agent_management.py#L23 and https://opendev.org/openstack/neutron-tempest-plugin/src/branch/master/neutron_tempest_plugin/api/admin/test_agent_management.py#L21 kind of look similar, may be we need to remove that part from tempest and reuse it from neutron tempest plugin. Need to check with @slawq, how to proceed here?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Chandan: yes, we have many duplicated tests between tempest and neutron_tempest_plugin repos.
If such tests aren't used by refstack, we should remove those tests from tempest and keep neutron_tempest_plugin versions. And AFAIR tests from "admin" module aren't used by refstack so should be good to be removed from tempest repo.
But You can confirm that with gmann also.

Revision history for this message
chandan kumar (chkumar246) wrote :

https://review.opendev.org/#/c/698589/ is the fix thank you @terry.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by Chandan Kumar (raukadah) (<email address hidden>) on branch: master
Review: https://review.opendev.org/698664

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Chandan Kumar (raukadah) (<email address hidden>) on branch: master
Review: https://review.opendev.org/698435

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/698435
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=c7eb7180a93fbb11a6aac9404b5edf82032c2310
Submitter: Zuul
Branch: master

commit c7eb7180a93fbb11a6aac9404b5edf82032c2310
Author: Chandan Kumar (raukadah) <email address hidden>
Date: Wed Dec 11 15:19:11 2019 +0530

    [Train] Move list_agent and traffic tests to skip list.

    It is blocking our train promotion till then putting it in
    skip list.

    Related-Bug: #1856016
    Related-Bug: #1855985

    Change-Id: I0102c39c7ac8123d365a8fb34684615754e79b05
    Signed-off-by: Chandan Kumar (raukadah) <email address hidden>

Changed in tripleo:
milestone: ussuri-1 → ussuri-2
Revision history for this message
wes hayutin (weshayutin) wrote :
tags: removed: promotion-blocker
wes hayutin (weshayutin)
tags: removed: alert
Changed in tripleo:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/702690

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 23.0.0

This issue was fixed in the openstack/tempest 23.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/702690
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=f8a87800090417afbed421c5121ae33fbd72283e
Submitter: Zuul
Branch: master

commit f8a87800090417afbed421c5121ae33fbd72283e
Author: Terry Wilson <email address hidden>
Date: Wed Jan 15 10:10:02 2020 -0600

    Re-add agent list test

    Change-Id: I66c0572dfa993425aad4c042d7d184039acb7005
    Related-bug: #1855985

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.