fs20 on zed/master is failing consistently on tempest test "test_server_connectivity_live_migration"

Bug #2011824 reported by Pooja Jadhav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Critical
Unassigned

Bug Description

fs20 on zed/master is failing consistently on tempest test "tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration" with below logs :

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 264, in test_server_connectivity_live_migration
    self.admin_servers_client.live_migrate_server(
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/compute/servers_client.py", line 538, in live_migrate_server
    return self.action(server_id, 'os-migrateLive', **kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/compute/servers_client.py", line 225, in action
    resp, body = self.post('servers/%s/action' % server_id,
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 300, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/usr/lib/python3.9/site-packages/tempest/lib/services/compute/base_compute_client.py", line 47, in request
    resp, resp_body = super(BaseComputeClient, self).request(
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 742, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 857, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {'code': 400, 'message': 'overcloud-novacompute-1.localdomain is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks.'}

Detailed logs :

https://logserver.rdoproject.org/19/36719/34/check/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-zed/ccc60cf/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/4a1e013/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-integration-zed-centos9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-zed/edb3dc5/logs/undercloud/var/log/tempest/stestr_results.html.gz

summary: fs20 on zed/master is failing consistently on tempest test
- "tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration"
+ "test_server_connectivity_live_migration"
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :
Download full text (4.1 KiB)

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/4a1e013/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.txt.gz

~~~
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/exception_wrapper.py", line 71, in wrapped
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification(
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server self.force_reraise()
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server raise self.value
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/exception_wrapper.py", line 63, in wrapped
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/compute/utils.py", line 1439, in decorated_function
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 214, in decorated_function
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server compute_utils.add_instance_fault_from_exc(context,
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server self.force_reraise()
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server raise self.value
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova...

Read more...

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

In the last job which passed, this test was skipped:

[1] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/d59ec78/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

~~~
{3} tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration ... SKIPPED: Live migration is not available.
~~~

In the last run for some reason this test was not skipped by tempest itself and failed because we don't have shared storage.

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/4a1e013/logs/undercloud/var/log/tempest/failing_tests.log.txt.gz

~~~
tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration[compute,id-03fd1562-faad-11e7-9ea0-fa163e65f5ce,multinode,network,slow]
~~~

Revision history for this message
Marios Andreou (marios-b) wrote :

as discussed a bit just now, +1 to adding the test into the skiplist.

we weren't running that but something changed in the deployment defaults (or possibly the nodepool nodes?) and the tests started running and failing.

Remove from skiplist for now while we investigate what changed and decide how to proceed whether fixing the issue or keeping the skip for _reasons_ until _thing_ ... ;)

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

There was a recent change in python-tempestconf[1] "Do not disable live migration by default" after which this test got enabled on fs020.

Our compute nodes don't have shared storage and instances are looks like the test boot from local disk the tests fails.[2]

[1] https://opendev.org/openinfra/python-tempestconf/commit/fc128db3b29b1108b1e8209f43408ab798e4405a

[2] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/4a1e013/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.txt.gz

~~~
2023-03-16 05:51:07.516 2 ERROR oslo_messaging.rpc.server nova.exception.InvalidSharedStorage: overcloud-novacompute-0.localdomain is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks.
~~~

Revision history for this message
Martin Kopec (mkopec) wrote :

we changed the default value in tempestconf because we assumed that the migration works everywhere (that it works on more deployments than the number of deployments where it doesn't work).

https://review.opendev.org/c/openinfra/python-tempestconf/+/849681

If it doesn’t work in your case, and we are sure that it’s not supposed to, just set the opt explicitely to false (as it was previously done by tempestconf by default)

Revision history for this message
Marios Andreou (marios-b) wrote :

adding some missing context here... we added the test in question to the tempest skiplist at [1]

until the storage team decides if we will be enabling it again

[1] https://opendev.org/openstack/openstack-tempest-skiplist/src/commit/04f30cb6716543146d51f190c2744fcee3f69361/roles/validate-tempest/vars/tempest_skip.yml#L1060-L1072

Revision history for this message
Luigi Toscano (ltoscano) wrote :

I don't know if the test is supposed to work (this is a question for nova people), but if live migration is not supported in that configuration, you should disable it in the tempest configuration of your job, not add the test to a skiplist.

Revision history for this message
Takashi Kajinami (kajinamit) wrote (last edit ):

So ideally we should configure [compute-feature-enabled] block_migration_for_live_migration according to ephemeral storage backend (it defaults to true but should be false unless shared storage such as ceph or nfs is used)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.