OpenStack Compute (nova)

Migration and resize tests from tempest.scenario.test_minbw_allocation_placement.MinBwAllocationPlacementTest failing in neutron-tempest-dvr-ha-multinode-full

Bug #1917610 reported by Slawek Kaplonski on 2021-03-03

This bug affects 1 person

	Status	Importance	Assigned to
OpenStack Compute (nova)	Invalid	Undecided	Unassigned
neutron	Fix Released	Critical	Unassigned
tempest	Fix Released	Undecided	Unassigned

Bug Description

We saw it mostly in stable/train branch. Cold migration and resize tests from tempest.scenario.test_minbw_allocation_placement.MinBwAllocationPlacementTest are failing due to errors like:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/scenario/test_minbw_allocation_placement.py", line 262, in test_migrate_with_qos_min_bw_allocation
    self.servers_client.migrate_server(server_id=server['id'])
  File "/opt/stack/tempest/tempest/lib/services/compute/servers_client.py", line 533, in migrate_server
    return self.action(server_id, 'migrate', **kwargs)
  File "/opt/stack/tempest/tempest/lib/services/compute/servers_client.py", line 214, in action
    post_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 300, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/opt/stack/tempest/tempest/lib/services/compute/base_compute_client.py", line 48, in request
    method, url, extra_headers, headers, body, chunked)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in request
    self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 815, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {'code': 400, 'message': 'No valid host was found. No valid host found for cold migrate'}

See e.g. https://0c345762207dc13e339e-d1e090fdf1a39e65d2b0ba37cbdce0a4.ssl.cf2.rackcdn.com/777781/1/check/neutron-tempest-dvr-ha-multinode-full/463e963/testr_results.html

Logstash query which can be useful to find same issues: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22line%20262%2C%20in%20test_migrate_with_qos_min_bw_allocation%5C%22

Tags:

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-03-03:

Looking through the example failure from above I see the followings:

1) there are 3 compute nodes exist in the job. One on the main devstack node (controller) and one on each of the two devstack subnodes (compute1, compute2).

2) Already during the creation of the instance failed in the report placement returned one compute node that has the enough resources, the node on the controller. So the instance was booted there

3) Then later during the migration placement returned the same single compute, but that was ignored by the scheduler as it is the source node of the migration

4) Looking into the 3 q-agt logs it is clear why placement only returned the compute node on the controller host. It is only the q-agt on the controller host that has bandwidth inventory configured[1], the agents on the other compute hosts[2][3] has no bandwidth inventory so they cannot be used for the instance.

So I see two possible ways forward:

A) modify the job config to have bandwidth inventory on the subnode computes

B) modify the tempest tests to not only check if multiple computes are available[4] before executing this test, but also check if at least two computes has bandwidth inventory.

[1]https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/controller/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[2] https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/compute1/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[3] https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/compute2/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[4] https://github.com/openstack/tempest/blob/ccf56b5ca278fd083946137a5c36cdd8ba2f230d/tempest/scenario/test_minbw_allocation_placement.py#L242

Changed in nova:
status:	New → Invalid

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2021-03-04:

Fix in tempest proposed https://review.opendev.org/c/openstack/tempest/+/778451

Changed in tempest:
status:	New → In Progress
Changed in neutron:
importance:	Undecided → Critical
tags:	added: gate-failure

Revision history for this message

Martin Kopec (mkopec) wrote on 2021-03-06:

https://review.opendev.org/c/openstack/tempest/+/778451 got merged

Changed in tempest:
status:	In Progress → Fix Released

Slawek Kaplonski (slaweq) on 2021-03-08

Changed in neutron:
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.