Logging is hard to read if there is a problem with resources during live migration

Bug #1820612 reported by Wouter van Bommel
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Issuing the command to migrate an instance from openstack-6 to another hosts (openstack-17) which does have enough resources

# logging in nova-compute.log
2019-03-18 12:56:49.111 301805 DEBUG nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Doubling-up allocation request for move operation. _move_operation_alloc_request /usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:162
2019-03-18 12:56:49.112 301805 DEBUG nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] New allocation request containing both source and destination hosts in move operation: {'allocations': [{'resource_provider': {'uuid': u'4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}, {'resource_provider': {'uuid': u'57990d7c-7b10-40ee-916f-324bf7784eed'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}]} _move_operation_alloc_request /usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:202
2019-03-18 12:56:49.146 301805 WARNING nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Unable to submit allocation for instance 71111e00-7913-4de9-8f45-ce13fcb8a104 (409 <html>
 <head>
  <title>409 Conflict</title>
 </head>
 <body>
  <h1>409 Conflict</h1>
  There was a conflict when trying to complete your request.<br /><br />
Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'. The requested amount would exceed the capacity.

 </body>
</html>)
2019-03-18 12:56:49.147 301805 WARNING nova.scheduler.utils [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Failed to compute_task_migrate_server: No valid host was found. Unable to move instance 71111e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance.: NoValidHost: No valid host was found. Unable to move instance 71111e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance.
2019-03-18 12:56:49.148 301805 WARNING nova.scheduler.utils [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] [instance: 71111e00-7913-4de9-8f45-ce13fcb8a104] Setting instance to ACTIVE state.: NoValidHost: No valid host was found. Unable to move instance 71111e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance.

When searching who resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec' is, used the nova_api database

select * from resource_providers where uuid='4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec';
+---------------------+---------------------+----+--------------------------------------+------------------+------------+----------+
| created_at | updated_at | id | uuid | name | generation | can_host |
+---------------------+---------------------+----+--------------------------------------+------------------+------------+----------+
| 2018-05-09 11:00:01 | 2019-03-14 10:47:55 | 39 | 4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec | openstack-6.maas | 171 | NULL |
+---------------------+---------------------+----+--------------------------------------+------------------+------------+----------+
1 row in set (0.00 sec)

So that is openstack-6 and nog 17 as mentioned in the above logging. From the logging provided this is not clear, also there does not seem to be an command to retrieve the resource-provider, based on the uuid and that is the only thing logged.

tags: added: canonical-is-bootstack
Revision history for this message
Frode Nordahl (fnordahl) wrote :

While the bug you are reporting may have merits, I'm failing to see how this applies to the charm itself.

Should this bug have a task targeted at the upstream ``Nova`` project?

Changed in charm-nova-cloud-controller:
status: New → Incomplete
Revision history for this message
Wouter van Bommel (woutervb) wrote :

I missed that I was filing against the charm. It should indeed be the upstream nova project. Will look into that. Thanks for notifying.

Frode Nordahl (fnordahl)
no longer affects: charm-nova-cloud-controller
Revision history for this message
Matt Riedemann (mriedem) wrote :

> also there does not seem to be an command to retrieve the resource-provider

Use the OSC plugin:

https://docs.openstack.org/osc-placement/latest/index.html

Revision history for this message
Matt Riedemann (mriedem) wrote :

On what release you hitting this? That's important information when you are reporting bugs upstream.

Revision history for this message
Matt Riedemann (mriedem) wrote :

> Issuing the command to migrate an instance from openstack-6 to another hosts (openstack-17) which does have enough resources

How are you migrating the server exactly? What command or REST API call are you making, with what parameters? Are you specifically targeting openstack-17 or just assuming the scheduler is going to select it?

Because from this log message, it looks like the scheduler is trying to resize to the same host on which the instance is already running:

2019-03-18 12:56:49.112 301805 DEBUG nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] New allocation request containing both source and destination hosts in move operation: {'allocations': [{'resource_provider': {'uuid': u'4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}, {'resource_provider': {'uuid': u'57990d7c-7b10-40ee-916f-324bf7784eed'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}]} _move_operation_alloc_request /usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:202

Do you have allow_resize_to_same_host=True in nova.conf? Maybe that doesn't matter because it looks like you're not doing a resize, but a cold migration, because VCPU/MEMORY_MB/DISK_GB are not changing. Or maybe you're doing a live migration - because it seems you might be getting here:

https://github.com/openstack/nova/blob/18.0.0/nova/scheduler/utils.py#L536

My guess is this is a release from before Queens? Are you running Ocata or Pike or something older? Because I think you might be hitting a symptom of something that was fixed in Queens with this blueprint:

https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html

Changed in nova:
status: New → Incomplete
Revision history for this message
Matt Riedemann (mriedem) wrote :

Heh I guess you're doing live migration because that's right in the bug title. :) Sorry for that noise. Please provide which release you're using.

Revision history for this message
Wouter van Bommel (woutervb) wrote :

The release is Pike 16.1.6-0ubuntu1~cloud1

And the command was:

openstack server migrate --block-migration --live openstack-17 <instance id>

Revision history for this message
Matt Riedemann (mriedem) wrote :

As for why the PUT /allocations/71111e00-7913-4de9-8f45-ce13fcb8a104 request is failing with this:

{
   'allocations':[
      {
         'resource_provider':{
            'uuid':u'4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'
         },
         'resources':{
            u'VCPU':4,
            u'MEMORY_MB':2048,
            u'DISK_GB':20
         }
      },
      {
         'resource_provider':{
            'uuid':u'57990d7c-7b10-40ee-916f-324bf7784eed'
         },
         'resources':{
            u'VCPU':4,
            u'MEMORY_MB':2048,
            u'DISK_GB':20
         }
      }
   ]
}

I'm not really sure. As you noted, the first resource provider 4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec is the source host, and the second resource provider 57990d7c-7b10-40ee-916f-324bf7784eed is the forced destination host for the live migration. Placement should be replacing the allocation against provider 4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec rather than adding to it, so maybe this is a bug in Placement. Can you trace the request to the placement API logs and see if there is anything obvious there?

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK it's pre-Queens as I figured. As mentioned in comment 8 I'm not sure why that PUT /allocations request fails on the source resource provider. It would be helpful if prior to the live migration call, you dumped the resource providers and their existing usage using these commands:

https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-list

https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-show

https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-show (with the --allocations option)

That will give a better idea of whether or not the source compute provider is at or near capacity prior to starting the live migration.

Changed in nova:
status: Incomplete → New
Matt Riedemann (mriedem)
tags: added: live-migration placement scheduler
Revision history for this message
Wouter van Bommel (woutervb) wrote :
Download full text (36.6 KiB)

Issued the following commands that failed:

openstack server migrate --block-migration --live openstack-16 d76df273-0072-419a-ae29-1925d995b43b

No valid host was found. Unable to move instance d76df273-0072-419a-ae29-1925d995b43b to host openstack-16. There is not enough capacity on the host for the instance. (HTTP 400) (Request-ID: req-a60bc890-b053-454c-a202-354ac245b4ad)

&

openstack server migrate --block-migration --live openstack-18 d76df273-0072-419a-ae29-1925d995b43b

No valid host was found. Unable to move instance d76df273-0072-419a-ae29-1925d995b43b to host openstack-18. There is not enough capacity on the host for the instance. (HTTP 400) (Request-ID: req-1e69cda2-c20c-4e2c-9d0a-d6a1090365cf)

Resource information (taken after the failed commands)
openstack hypervisor list --long
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | vCPUs Used | vCPUs | Memory MB Used | Memory MB |
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+
| 3 | openstack-16.maas | QEMU | 10.35.101.165 | up | 73 | 80 | 239952 | 515890 |
| 6 | openstack-15.maas | QEMU | 10.35.101.74 | up | 57 | 80 | 232736 | 257842 |
| 9 | openstack-10.maas | QEMU | 10.35.101.178 | up | 46 | 80 | 178480 | 515888 |
| 12 | openstack-13.maas | QEMU | 10.35.101.99 | up | 40 | 80 | 233408 | 257842 |
| 15 | openstack-17.maas | QEMU | 10.35.101.139 | up | 64 | 80 | 209168 | 515890 |
| 18 | openstack-7.maas | QEMU | 10.35.101.183 | up | 60 | 80 | 234880 | 257842 |
| 21 | openstack-20.maas | QEMU | 10.35.101.97 | up | 6 | 80 | 16896 | 257842 |
| 24 | openstack-8.maas | QEMU | 10.35.101.56 | up | 58 | 80 | 256480 | 257842 |
| 27 | openstack-19.maas | QEMU | 10.35.101.124 | up | 6 | 80 | 18944 | 257842 |
| 30 | openstack-12.maas | QEMU | 10.35.101.145 | up | 45 | 80 | 109840 | 515890 |
| 33 | openstack-14.maas | QEMU | 10.35.101.62 | up | 56 | 80 | 201664 | 257842 |
| 36 | openstack-11.maas | QEMU | 10.35.101.162 | up | 56 | 80 | 246880 | 257842 |
| 39 | openstack-9.maas | QEMU | 10.35.101.94 | up | 50 | 80 | 109056 | 451378 |
| 42 | openstack-6.maas | QEMU | 10.35.101.60 | up | 60 | 80 | 265728 | 257842 |
| 45 | openstack-18.maas | QEMU | 10.35.101.123 | up | 68 | 80 | 106432 | 515890 |
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+

The commands from the links above (osc-placement resource-provider-inventory-list) don't work ...

Revision history for this message
Wouter van Bommel (woutervb) wrote :
Download full text (27.4 KiB)

Some extra info:

select * from resource_providers where uuid='0893c8fa-f357-4bf5-8d0e-1ee009f12886';
+---------------------+---------------------+----+--------------------------------------+-------------------+------------+----------+
| created_at | updated_at | id | uuid | name | generation | can_host |
+---------------------+---------------------+----+--------------------------------------+-------------------+------------+----------+
| 2018-05-09 10:58:35 | 2019-03-19 16:37:25 | 12 | 0893c8fa-f357-4bf5-8d0e-1ee009f12886 | openstack-18.maas | 166 | NULL |
+---------------------+---------------------+----+--------------------------------------+-------------------+------------+----------+
1 row in set (0.00 sec)

openstack hypervisor show 45
+----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...

Revision history for this message
Wouter van Bommel (woutervb) wrote :

And possibly relevant.

openstack hypervisor list --long
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | vCPUs Used | vCPUs | Memory MB Used | Memory MB |
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+
| 3 | openstack-16.maas | QEMU | 10.35.101.165 | up | 79 | 80 | 256336 | 515890 |
| 6 | openstack-15.maas | QEMU | 10.35.101.74 | up | 57 | 80 | 232736 | 257842 |
| 9 | openstack-10.maas | QEMU | 10.35.101.178 | up | 46 | 80 | 178480 | 515888 |
| 12 | openstack-13.maas | QEMU | 10.35.101.99 | up | 40 | 80 | 233408 | 257842 |
| 15 | openstack-17.maas | QEMU | 10.35.101.139 | up | 64 | 80 | 209168 | 515890 |
| 18 | openstack-7.maas | QEMU | 10.35.101.183 | up | 60 | 80 | 234880 | 257842 |
| 21 | openstack-20.maas | QEMU | 10.35.101.97 | up | 0 | 80 | 512 | 257842 |
| 24 | openstack-8.maas | QEMU | 10.35.101.56 | up | 58 | 80 | 256480 | 257842 |
| 27 | openstack-19.maas | QEMU | 10.35.101.124 | up | 6 | 80 | 18944 | 257842 |
| 30 | openstack-12.maas | QEMU | 10.35.101.145 | up | 45 | 80 | 109840 | 515890 |
| 33 | openstack-14.maas | QEMU | 10.35.101.62 | up | 56 | 80 | 201664 | 257842 |
| 36 | openstack-11.maas | QEMU | 10.35.101.162 | up | 56 | 80 | 246880 | 257842 |
| 39 | openstack-9.maas | QEMU | 10.35.101.94 | up | 50 | 80 | 109056 | 451378 |
| 42 | openstack-6.maas | QEMU | 10.35.101.60 | up | 60 | 80 | 265728 | 257842 |
| 45 | openstack-18.maas | QEMU | 10.35.101.123 | up | 68 | 80 | 106432 | 515890 |
+----+---------------------+-----------------+---------------+-------+------------+-------+----------------+-----------+

openstack server list --all --host openstack-19
+--------------------------------------+----------------------+--------+----------------------------+----------------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+----------------------+--------+----------------------------+----------------------+--------+
| d76df273-0072-419a-ae29-1925d995b43b | cts-v4.dev.<redacted> | ACTIVE | external_net=10.35.107.103 | ubuntu_xenial_latest | v4 |
+--------------------------------------+----------------------+--------+----------------------------+----------------------+--------+

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Going back to Matt's request in #9. We would need the resource inventory and resource usage of the both the source host and the destination host priory to the failed live migration.

You replied that: The commands from the links above (osc-placement resource-provider-inventory-list) don't work ...

You need to install python-openstackclient as well as osc-placement packages.

Unfortunately the data you provided is not enough to troubleshoot this bug further. I'm setting this bug to Incomplete. Please set it back to New when you provided additional data.

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.