Migration and resize tests from tempest.scenario.test_minbw_allocation_placement.MinBwAllocationPlacementTest failing in neutron-tempest-dvr-ha-multinode-full

Bug #1917610 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
neutron
Fix Released
Critical
Unassigned
tempest
Fix Released
Undecided
Unassigned

Bug Description

We saw it mostly in stable/train branch. Cold migration and resize tests from tempest.scenario.test_minbw_allocation_placement.MinBwAllocationPlacementTest are failing due to errors like:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 90, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/scenario/test_minbw_allocation_placement.py", line 262, in test_migrate_with_qos_min_bw_allocation
    self.servers_client.migrate_server(server_id=server['id'])
  File "/opt/stack/tempest/tempest/lib/services/compute/servers_client.py", line 533, in migrate_server
    return self.action(server_id, 'migrate', **kwargs)
  File "/opt/stack/tempest/tempest/lib/services/compute/servers_client.py", line 214, in action
    post_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 300, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/opt/stack/tempest/tempest/lib/services/compute/base_compute_client.py", line 48, in request
    method, url, extra_headers, headers, body, chunked)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in request
    self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 815, in _error_checker
    raise exceptions.BadRequest(resp_body, resp=resp)
tempest.lib.exceptions.BadRequest: Bad request
Details: {'code': 400, 'message': 'No valid host was found. No valid host found for cold migrate'}

See e.g. https://0c345762207dc13e339e-d1e090fdf1a39e65d2b0ba37cbdce0a4.ssl.cf2.rackcdn.com/777781/1/check/neutron-tempest-dvr-ha-multinode-full/463e963/testr_results.html

Logstash query which can be useful to find same issues: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22line%20262%2C%20in%20test_migrate_with_qos_min_bw_allocation%5C%22

Tags: gate-failure
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Looking through the example failure from above I see the followings:

1) there are 3 compute nodes exist in the job. One on the main devstack node (controller) and one on each of the two devstack subnodes (compute1, compute2).

2) Already during the creation of the instance failed in the report placement returned one compute node that has the enough resources, the node on the controller. So the instance was booted there

3) Then later during the migration placement returned the same single compute, but that was ignored by the scheduler as it is the source node of the migration

4) Looking into the 3 q-agt logs it is clear why placement only returned the compute node on the controller host. It is only the q-agt on the controller host that has bandwidth inventory configured[1], the agents on the other compute hosts[2][3] has no bandwidth inventory so they cannot be used for the instance.

So I see two possible ways forward:

A) modify the job config to have bandwidth inventory on the subnode computes

B) modify the tempest tests to not only check if multiple computes are available[4] before executing this test, but also check if at least two computes has bandwidth inventory.

[1]https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/controller/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[2] https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/compute1/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[3] https://a574f9c0fd4ca92b7603-2045be852d43868eb95da6cc3429b40d.ssl.cf2.rackcdn.com/777334/2/check/neutron-tempest-dvr-ha-multinode-full/44d0207/compute2/logs/etc/neutron/plugins/ml2/ml2_conf.ini
[4] https://github.com/openstack/tempest/blob/ccf56b5ca278fd083946137a5c36cdd8ba2f230d/tempest/scenario/test_minbw_allocation_placement.py#L242

Changed in nova:
status: New → Invalid
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Changed in tempest:
status: New → In Progress
Changed in neutron:
importance: Undecided → Critical
tags: added: gate-failure
Revision history for this message
Martin Kopec (mkopec) wrote :
Changed in tempest:
status: In Progress → Fix Released
Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.