Error 500 trying to migrate an instance after wrong request_spec

Bug #1830747 reported by Thomas Goirand on 2019-05-28
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Ocata
High
Matt Riedemann
Pike
High
Matt Riedemann
Queens
High
Matt Riedemann
Rocky
High
Matt Riedemann
Stein
High
Matt Riedemann

Bug Description

We've started an instance last Wednesday, and the compute where it ran failed (maybe hardware issue?). Since the networking looked wrong (ie: missing network interfaces), I tried to migrate the instance.

According to Matt, it looked like the request_spec entry for the instance is wrong:

<mriedem> my guess is something like this happened: 1. create server in a group, 2. cold migrate the server which fails on host A and does a reschedule to host B which maybe also fails (would be good to know if previous cold migration attempts failed with reschedules), 3. try to cold migrate again which fails with the instance_group.uuid thing
<mriedem> the reschedule might be the key b/c like i said conductor has to rebuild a request spec and i think that's probably where we're doing a partial build of the request spec but missing the group uuid

Here's what I had in my novaapidb:

{
  "nova_object.name": "RequestSpec",
  "nova_object.version": "1.11",
  "nova_object.data": {
    "ignore_hosts": null,
    "requested_destination": null,
    "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
    "num_instances": 1,
    "image": {
      "nova_object.name": "ImageMeta",
      "nova_object.version": "1.8",
      "nova_object.data": {
        "min_disk": 40,
        "disk_format": "raw",
        "min_ram": 0,
        "container_format": "bare",
        "properties": {
          "nova_object.name": "ImageMetaProps",
          "nova_object.version": "1.20",
          "nova_object.data": {},
          "nova_object.namespace": "nova"
        }
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "properties",
        "min_ram",
        "container_format",
        "disk_format",
        "min_disk"
      ]
    },
    "availability_zone": "AZ3",
    "flavor": {
      "nova_object.name": "Flavor",
      "nova_object.version": "1.2",
      "nova_object.data": {
        "id": 28,
        "name": "cpu2-ram6-disk40",
        "is_public": true,
        "rxtx_factor": 1,
        "deleted_at": null,
        "root_gb": 40,
        "vcpus": 2,
        "memory_mb": 6144,
        "disabled": false,
        "extra_specs": {},
        "updated_at": null,
        "flavorid": "e29f3ee9-3f07-46d2-b2e2-efa4950edc95",
        "deleted": false,
        "swap": 0,
        "description": null,
        "created_at": "2019-02-07T07:48:21Z",
        "vcpu_weight": 0,
        "ephemeral_gb": 0
      },
      "nova_object.namespace": "nova"
    },
    "force_hosts": null,
    "retry": null,
    "instance_group": {
      "nova_object.name": "InstanceGroup",
      "nova_object.version": "1.11",
      "nova_object.data": {
        "members": null,
        "hosts": null,
        "policy": "anti-affinity"
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "policy",
        "members",
        "hosts"
      ]
    },
    "scheduler_hints": {
      "group": [
        "295c99ea-2db6-469a-877f-454a3903a8d8"
      ]
    },
    "limits": {
      "nova_object.name": "SchedulerLimits",
      "nova_object.version": "1.0",
      "nova_object.data": {
        "disk_gb": null,
        "numa_topology": null,
        "memory_mb": null,
        "vcpu": null
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "disk_gb",
        "vcpu",
        "memory_mb",
        "numa_topology"
      ]
    },
    "force_nodes": null,
    "project_id": "1bf4dbb3d2c746658f462bf8e59ec6be",
    "user_id": "255cca4584c24b16a684e3e8322b436b",
    "numa_topology": null,
    "is_bfv": false,
    "pci_requests": {
      "nova_object.name": "InstancePCIRequests",
      "nova_object.version": "1.1",
      "nova_object.data": {
        "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
        "requests": []
      },
      "nova_object.namespace": "nova"
    }
  },
  "nova_object.namespace": "nova",
  "nova_object.changes": [
    "ignore_hosts",
    "requested_destination",
    "num_instances",
    "image",
    "availability_zone",
    "instance_uuid",
    "flavor",
    "scheduler_hints",
    "pci_requests",
    "instance_group",
    "limits",
    "project_id",
    "user_id",
    "numa_topology",
    "is_bfv",
    "retry"
  ]
}

Matt Riedemann (mriedem) wrote :
Download full text (3.7 KiB)

This is the error by the way:

http://paste.openstack.org/show/752159/

2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi [req-1ca4c1d0-9f6f-4a04-860d-1d3d03a0d063 9fb4630d74ad49e8ac9f4e8a72b8cafb 504ea0a356ca4066aaa617daff869463 - default default] Unexpected exception in API method: nova.exception.ObjectActionError: Object action obj_load_attr failed because: unable to load uuid
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 801, in wrapped
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/validation/__init__.py", line 110, in wrapper
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/migrate_server.py", line 56, in _migrate
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi host_name=host_name)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 205, in inner
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 213, in _wrapped
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return fn(self, context, instance, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 153, in inner
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 3516, in resize
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi context, instance.uuid)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi result = fn(cls, context, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/objects/request_spec.py", line 531, in get_by_instance_uuid
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return cls._from_db_object(context, cls(), db_spec)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/objects/request_spec.py", line 510, in _from_db_object
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi context, spec.instance_group.uuid)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
20...

Read more...

Matt Riedemann (mriedem) wrote :

We can see here that the instance_group entry in the request spec is clearly missing the uuid even though there is a group scheduler hint:

    "instance_group": {
      "nova_object.name": "InstanceGroup",
      "nova_object.version": "1.11",
      "nova_object.data": {
        "members": null,
        "hosts": null,
        "policy": "anti-affinity"
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "policy",
        "members",
        "hosts"
      ]
    },
    "scheduler_hints": {
      "group": [
        "295c99ea-2db6-469a-877f-454a3903a8d8"
      ]
    },

So I'm thinking we're somehow hitting this:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228

saving that, and then hitting this which triggers the error:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L523

Fix proposed to branch: master
Review: https://review.opendev.org/661786

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem) wrote :

This might explain what's happening during a cold migration.

Conductor creates a legacy filter_properties dict here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L172

If the spec has an instance_group it will call here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L397

and _to_legacy_group_info sets these values in the filter_properties dict:

        return {'group_updated': True,
                'group_hosts': set(self.instance_group.hosts),
                'group_policies': set([self.instance_group.policy]),
                'group_members': set(self.instance_group.members)}

Note there is no group_uuid.

Those filter_properties are passed to the prep_resize method on the dest compute:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L304

zigo said he hit this:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4272

(10:03:07 AM) zigo: 2019-05-28 15:02:35.534 30706 ERROR nova.compute.manager [instance: ae6f8afe-9c64-4aaf-90e8-be8175fee8e4] nova.exception.UnableToMigrateToSelf: Unable to migrate instance (ae6f8afe-9c64-4aaf-90e8-be8175fee8e4) to current host (clint1-compute-5.infomaniak.ch).

which will trigger a reschedule here:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4348

The _reschedule_resize_or_reraise method will setup the parameters for the resize_instance compute task RPC API (conductor) method:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4378-L4379

Note that in Rocky the RequestSpec is not passed back to conductor on the reschedule, only the filter_properties:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1452

We only started passing the RequestSpec from compute to conductor on reschedule starting in Stein: https://review.opendev.org/#/c/582417/

Without the request spec we get here in conductor:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L307

Note that was pass in the filter_properties but no instance_group to RequestSpec.from_components.

And because there is no instance_group but there are filter_properties, we call _populate_group_info here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L442

Which means we get into this block that sets the RequestSpec.instance_group with no uuid:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228

Then we eventually RPC cast off to prep_resize on the next host to try for the cold migration and save the request_spec changes here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L356

Which is how later attempts to use that request spec to migrate the instance blow up when loading it from the DB because spec.instance_group.uuid is not set.

Changed in nova:
importance: Undecided → High
Matt Riedemann (mriedem) wrote :

This goes back to ocata because of this change:

https://review.opendev.org/#/q/Ie70c77db753711e1449e99534d3b83669871943f

Related fix proposed to branch: master
Review: https://review.opendev.org/662550

Reviewed: https://review.opendev.org/661822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c96c7c5e13bde39944a9dde7da7fe418b311ca2d
Submitter: Zuul
Branch: master

commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747

Reviewed: https://review.opendev.org/661786
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Submitter: Zuul
Branch: master

commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/662550
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b4f89ab59a9ed68211e1250d9fa6ab5da3c7366
Submitter: Zuul
Branch: master

commit 6b4f89ab59a9ed68211e1250d9fa6ab5da3c7366
Author: Matt Riedemann <email address hidden>
Date: Fri May 31 15:26:24 2019 -0400

    Set/get group uuid when transforming RequestSpec to/from filter_properties

    As a follow up to change I20981c987549eec40ad9762e74b0db16e54f4e63
    we can avoid having an incomplete InstanceGroup by updating
    the _to_legacy_group_info and _populate_group_info methods to set/get
    the group uuid to/from the filter_properties.

    Change-Id: I164a6dee1e92a65fcf6e89525ee194bb482e9920
    Related-Bug: #1830747

Reviewed: https://review.opendev.org/662574
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8478a754802e29dffbb65ef363ee189162f0adea
Submitter: Zuul
Branch: stable/stein

commit 8478a754802e29dffbb65ef363ee189162f0adea
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/662894
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8569eb9b4fb905cb92041b84c293dc4e7af27fa8
Submitter: Zuul
Branch: stable/stein

commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)

This issue was fixed in the openstack/nova 19.0.1 release.

Reviewed: https://review.opendev.org/662578
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a0a187c9bb9bef149e193027a6eedc09ba10ce1f
Submitter: Zuul
Branch: stable/rocky

commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): The ComputeTaskAPI.resize_instance stub is removed
    in this backport because it is not needed before Stein. Also, the
    PlacementFixture is in-tree before Stein so that is updated here.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)

tags: added: in-stable-rocky

Reviewed: https://review.opendev.org/662895
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9fed1803b4d6b2778c47add9c327f0610edc5952
Submitter: Zuul
Branch: stable/rocky

commit 9fed1803b4d6b2778c47add9c327f0610edc5952
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)

FYI, the patched version of Nova (Rocky) just reached Debian Buster.

This issue was fixed in the openstack/nova 18.2.1 release.

Reviewed: https://review.opendev.org/662774
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=581df2c98676b6734e8195ab56c9e0dba74789a5
Submitter: Zuul
Branch: stable/queens

commit 581df2c98676b6734e8195ab56c9e0dba74789a5
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): In this version we have to request a specific port
    to avoid a NetworkAmbiguous failure when creating the server.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)
    (cherry picked from commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f)

tags: added: in-stable-queens

Reviewed: https://review.opendev.org/663110
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20b90f2e26e6a46a12c2fd943b4472c3147528fa
Submitter: Zuul
Branch: stable/queens

commit 20b90f2e26e6a46a12c2fd943b4472c3147528fa
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Conflicts:
          nova/objects/request_spec.py

    NOTE(mriedem): The conflict is due to not having change
    Ib33719a4b9599d86848c618a6e142c71ece79ca5 in Queens.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)
    (cherry picked from commit 9fed1803b4d6b2778c47add9c327f0610edc5952)

This issue was fixed in the openstack/nova 17.0.11 release.

Yang Youseok (ileixe) wrote :

FYI, We recently encounter this bug, and found there is no scheduler_hints in request_spec (We are using Ocata and server group was made in the past)...

Yang Youseok (ileixe) wrote :

I think scheduler_hints does not existed after RequestSpec was override during reschedule. If then, VM could not revive even after this workaround patch applied.

Arvydas O. (zebediejus) wrote :

As Yang noticed, uuid is missing in scheduler_hints as well as in instance_groups. After our upgrade from Mitaka (previous upgraded from Liberty) to Rocky we had same problem. Not only cold migration/resize was failing, also "nova-manage db online_data_migrations --max-count 50" was failing when instances with instance_groups are found.
As workaround we solved it by inserting scheduler_hints directly to DB (use on your own risk):

--UPDATE nova_api.request_specs rs
INNER JOIN nova.instances ins ON ins.uuid = rs.instance_uuid
INNER JOIN nova_api.instance_group_member igm ON rs.instance_uuid = igm.instance_uuid
INNER JOIN nova_api.instance_groups ig ON ig.id = igm.group_id
SET rs.spec = REPLACE(rs.spec, ', "scheduler_hints": {}},', CONCAT(', "scheduler_hints": {"group": ["', ig.uuid, '"]}},'))
WHERE ins.deleted = 0 AND rs.spec LIKE '%"policies"%' AND rs.spec NOT LIKE '%"uuid":%' AND ins.uuid = '3cdd744e-243e-4328-8323-970a9e038c0e'

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers