master promotion: tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario is failing on fs020 and tempest-scenario

Bug #1923194 reported by Ronelle Landy
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

 tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario is failing in periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master (sporadic) and periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-master (more consistent) with the following error:

{0} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [84.419488s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_minimum_basic.py", line 143, in test_minimum_basic_scenario
        (floating_ip['floating_ip_address'], server['addresses']))
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 435, in assertIsNotNone
        self.assertThat(observed, matcher, message)
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
        raise mismatch_error
    testtools.matchers._impl.MismatchError: None matches Is(None): Failed to find floating IP '192.168.24.142' in server addresses: {'tempest-TestMinimumBasicScenario-872779920-network': [{'version': 4, 'addr': '10.100.0.5', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:45:ae:9b'}]}

This is a basic test and having this fail is critical.

The full test log is below:

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-master/0ea5e75/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

The failure is consistent in tempest-scenario since 04/08:

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-master

Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → wallaby-rc1
importance: Undecided → Critical
status: New → Triaged
tags: added: ci promotion-blocker
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Jiri Podivin (jpodivin) wrote :

I'm running into this issue, or something very similar, on the master right now.

Captured traceback:

~~~~~~~~~~~~~~~~~~~

    Traceback (most recent call last):

      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper

        return f(*func_args, **func_kwargs)

      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_minimum_basic.py", line 143, in test_minimum_basic_scenario

        (floating_ip['floating_ip_address'], server['addresses']))

      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 435, in assertIsNotNone

        self.assertThat(observed, matcher, message)

      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat

        raise mismatch_error

    testtools.matchers._impl.MismatchError: None matches Is(None): Failed to find floating IP '192.168.24.155' in server addresses: {'tempest-TestMinimumBasicScenario-1314091740-network': [{'version': 4, 'addr': '10.100.0.13', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:f4:d6:9b'}]}

https://zuul.opendev.org/t/openstack/build/2afcec6558f540e7a3ace399b239f07c/log/logs/undercloud/var/log/tempest/tempest_run.log

Is there something we need to do to make the released fix work?
Or is the problem back?

Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
Ananya Banerjee (frenzyfriday) wrote :
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

I have reverted the skips previously added with [1] and testing with testproject[2] to confirm if we are still hitting this issue in master.

[1] https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/809142
[2] https://review.rdoproject.org/r/c/testproject/+/28537

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Based on testing via testproject: https://review.rdoproject.org/r/c/testproject/+/28537

Results:-

Passed in the first run:-

https://logserver.rdoproject.org/37/28537/19/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/a07b61e/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

~~~
{0} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [106.350467s] ... ok
~~~

Failed in the second run:-

https://logserver.rdoproject.org/37/28537/21/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/531e85f/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

~~~
{0} tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [43.387663s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_minimum_basic.py", line 143, in test_minimum_basic_scenario
        (floating_ip['floating_ip_address'], server['addresses']))
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 435, in assertIsNotNone
        self.assertThat(observed, matcher, message)
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
        raise mismatch_error
    testtools.matchers._impl.MismatchError: None matches Is(None): Failed to find floating IP '10.0.0.188' in server addresses: {'tempest-TestMinimumBasicScenario-1324464342-network': [{'version': 4, 'addr': '10.100.0.14', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:b0:ef:ad'}]}
~~~

Looks like the issue is intermittent.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I took a look at logs from https://logserver.rdoproject.org/37/28537/21/check/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/531e85f/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz where this test failed.

Request to associate FIP to the VM's port was done here:

2021-09-16 07:19:51.146 26 DEBUG neutron.api.v2.base [req-2ec4116e-ab79-4850-9cef-0c478ec7898e 928326e8a93e4ee791575f19d40094ea 89bea5be9867482cb7a86ace4b735471 - default default] Request body: {'floatingip': {'floating_network_id': '76bb7d60-81ed-48b2-ae48-15230dc7ce57', 'port_id': 'ec3299a1-7817-460e-ad70-34e067a11667', 'tenant_id': '89bea5be9867482cb7a86ace4b735471', 'fixed_ip_address': '10.100.0.14'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:730

it was associated to the port here:

2021-09-16 07:19:51.991 26 INFO neutron.db.l3_db [req-2ec4116e-ab79-4850-9cef-0c478ec7898e 928326e8a93e4ee791575f19d40094ea 89bea5be9867482cb7a86ace4b735471 - default default] Floating IP 5f6b33a1-4cf4-43bf-a361-02c4315bde30 associated. External IP: 10.0.0.188, port: ec3299a1-7817-460e-ad70-34e067a11667.

and it was finished here:

2021-09-16 07:19:52.212 26 INFO neutron.wsgi [req-2ec4116e-ab79-4850-9cef-0c478ec7898e 928326e8a93e4ee791575f19d40094ea 89bea5be9867482cb7a86ace4b735471 - default default] 172.17.0.36 "POST /v2.0/floatingips HTTP/1.1" status: 201 len: 1076 time: 1.0691254

I'm not sure if additional call to associate FIP to the port is needed in that test as Neutron associates FIP to the fixed IP automatically. But even with that extra PUT request which happened in:

2021-09-16 07:19:53.202 26 INFO neutron.wsgi [req-c710b03d-a521-4e49-9be8-f2593018a951 928326e8a93e4ee791575f19d40094ea 89bea5be9867482cb7a86ace4b735471 - default default] 172.17.0.36 "PUT /v2.0/floatingips/5f6b33a1-4cf4-43bf-a361-02c4315bde30 HTTP/1.1" status: 200 len: 1073 time: 0.9223170

it shouldn't be any problem.

Tempest is checking if FIP is assigned to the VM by using Nova API. Maybe there is some issue on the nova's side so it don't shows that FIP as associated with that instance?

Revision history for this message
Artom Lifshitz (notartom) wrote :

Looks like tempest is racing with the external event that Neutron sends to Nova when the FIP is associated, and which causes Nova to update our network info cache to include the new FIP. When tempest does the `server show`, depending on whether the info cache has been updated yet, the new FIP may or may not be in there.

The PUT request to associated the FIP to the port happens at 07:19:53:

2021-09-16 07:19:53,206 294553 INFO [tempest.lib.common.rest_client] Request (TestMinimumBasicScenario:test_minimum_basic_scenario): 200 PUT http://10.0.0.5:9696/v2.0/floatingips/5f6b33a1-4cf4-43bf-a361-02c4315bde30 0.933s
2021-09-16 07:19:53,206 294553 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: {"floatingip": {"port_id": "ec3299a1-7817-460e-ad70-34e067a11667"}}

Nova receives the network-vif-changed event at 07:19:52 and processes it at 07:19:54:

2021-09-16 07:19:52.915 11 INFO nova.api.openstack.compute.server_external_events [req-4183006e-54ed-4373-99a7-bb9a74efb168 fb812065988c4296bcb30f582bcce899 041e86cc3eb7417ab19b7d169a998bd0 - default default] Creating event network-changed:ec3299a1-7817-460e-ad70-34e067a11667 for instance 8e26be1f-0e83-429d-8985-567b7edf28da on overcloud-novacompute-0.localdomain
2021-09-16 07:19:54.948 11 DEBUG nova.api.openstack.wsgi [req-71ff0a3f-5e8c-422a-95e3-9fd8385b728b fb812065988c4296bcb30f582bcce899 041e86cc3eb7417ab19b7d169a998bd0 - default default] Action: 'create', calling method: <bound method ServerExternalEventsController.create of <nova.api.openstack.compute.server_external_events.ServerExternalEventsController object at 0x7fe369ad4160>>, body: {"events": [{"name": "network-changed", "server_uuid": "8e26be1f-0e83-429d-8985-567b7edf28da", "tag": "ec3299a1-7817-460e-ad70-34e067a11667"}, {"name": "network-changed", "server_uuid": "8e26be1f-0e83-429d-8985-567b7edf28da", "tag": "ec3299a1-7817-460e-ad70-34e067a11667"}]} _process_stack /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:511

However, the `show server` request that tempest uses to get the `addresses` information from the server happens *between* those two log entries, at 07:19:53:

2021-09-16 07:19:53.559 11 DEBUG nova.api.openstack.wsgi [req-fbf5193f-60e2-41f9-80f4-306eaad42919 928326e8a93e4ee791575f19d40094ea 89bea5be9867482cb7a86ace4b735471 - default default] Calling method '<bound method ServersController.show of <nova.api.openstack.compute.servers.ServersController object at 0x7fe369996b70>>' _process_stack /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:514

This would also explain why this is not consistently reproducible.

I'm not sure what the correct solution here would be.

One option would be to always refresh the network info cache when doing a server show, but that seems overly heavy-handed, knowing that we still correctly update the info cache once we process Neutron's external event.

Conversely, we could make tempest a bit more forgiving/patient, and add something like a waiter for the FIP to appear.

Revision history for this message
Pooja Jadhav (poojajadhav22) wrote :

I have posted revert of the tempest skiplist test : "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario".

https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/815777

Tested here : https://review.rdoproject.org/r/c/testproject/+/36478

"test_minimum_basic_scenario" is passing in the testrun. IMO, We can close this out.

Revision history for this message
Douglas Viroel (dviroel) wrote :
Revision history for this message
Martin Kopec (mkopec) wrote :
Revision history for this message
Martin Kopec (mkopec) wrote :

https://review.opendev.org/c/openstack/tempest/+/811191 was supposed to close this LP, as that didn't happen automatically, I'm marking this as Fix Released myself.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 30.0.0

This issue was fixed in the openstack/tempest 30.0.0 release.

Revision history for this message
Douglas Viroel (dviroel) wrote :
Revision history for this message
Jiri Podivin (jpodivin) wrote :

Also affects tripleo-ci-centos-9-standalone-tv-validation on wallaby.
Failures differ slightly in the underlying exception, but always occur during the same test "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_instance_hard_reboot_after_vol_snap_deletion"

Configuration[0] of the job has not been changed during the past month or more. However, the parent job appears stable[1].

Trace:
------
    Traceback (most recent call last):

      File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)

      File "/usr/lib/python3.9/site-packages/tempest/scenario/test_minimum_basic.py", line 240, in test_minimum_basic_instance_hard_reboot_after_vol_snap_deletion
    self.assertIsNotNone(

      File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 417, in assertIsNotNone
    self.assertThat(observed, matcher, message)

      File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 480, in assertThat
    raise mismatch_error

    testtools.matchers._impl.MismatchError: None matches Is(None): Failed to find floating IP '192.168.24.192' in server addresses: {'tempest-TestMinimumBasicScenario-1317881733-network': [{'version': 4, 'addr': '10.100.0.10', 'OS-EXT-IPS:type': 'fixed', 'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:02:66:46'}]}

Logs:
-----
https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-standalone-tv-validation&branch=stable%2Fwallaby&skip=0
https://067972a2baee69bd9640-b9c14e29079b515b3d9c294293d431fb.ssl.cf5.rackcdn.com/850496/3/check/tripleo-ci-centos-9-standalone-tv-validation/0a26a0a/logs/undercloud/var/log/tempest/tempest_run.log
https://a372132f2cf9d6919d8d-0da77774ba1e870a646f19d85e727a87.ssl.cf1.rackcdn.com/850496/3/check/tripleo-ci-centos-9-standalone-tv-validation/75250a8/logs/undercloud/var/log/tempest/tempest_run.log

[0]https://zuul.opendev.org/t/openstack/job/tripleo-ci-centos-9-standalone-tv-validation
[1]https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-standalone&branch=stable%2Fwallaby&skip=0

Revision history for this message
Arx Cruz (arxcruz) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

adding a note here to help clear some confusion

Bug 1980255 is marked duplicate of this.

The last comment here #16 above says jobs is passing and perhaps it was true at the time but latest run today has it skipped in [1]

You can see comment #12 at Bug #1980255 for more info but basically those tests are in the skiplist [2]

[1] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_121/848305/8/gate/tripleo-ci-centos-9-standalone-on-multinode-ipa/121d0df/logs/undercloud/var/log/tempest/stestr_results.html
[2] https://opendev.org/openstack/openstack-tempest-skiplist/src/commit/aebe91c375588fc26fe8c3aac1fbd256c8e210b6/roles/validate-tempest/vars/tempest_skip.yml#L1233-L1234

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.