Error 500 trying to migrate an instance after wrong request_spec

Bug #1830747 reported by Thomas Goirand
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Ocata
Fix Committed
High
Matt Riedemann
Pike
Fix Released
High
Matt Riedemann
Queens
Fix Committed
High
Matt Riedemann
Rocky
Fix Committed
High
Matt Riedemann
Stein
Fix Committed
High
Matt Riedemann

Bug Description

We've started an instance last Wednesday, and the compute where it ran failed (maybe hardware issue?). Since the networking looked wrong (ie: missing network interfaces), I tried to migrate the instance.

According to Matt, it looked like the request_spec entry for the instance is wrong:

<mriedem> my guess is something like this happened: 1. create server in a group, 2. cold migrate the server which fails on host A and does a reschedule to host B which maybe also fails (would be good to know if previous cold migration attempts failed with reschedules), 3. try to cold migrate again which fails with the instance_group.uuid thing
<mriedem> the reschedule might be the key b/c like i said conductor has to rebuild a request spec and i think that's probably where we're doing a partial build of the request spec but missing the group uuid

Here's what I had in my novaapidb:

{
  "nova_object.name": "RequestSpec",
  "nova_object.version": "1.11",
  "nova_object.data": {
    "ignore_hosts": null,
    "requested_destination": null,
    "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
    "num_instances": 1,
    "image": {
      "nova_object.name": "ImageMeta",
      "nova_object.version": "1.8",
      "nova_object.data": {
        "min_disk": 40,
        "disk_format": "raw",
        "min_ram": 0,
        "container_format": "bare",
        "properties": {
          "nova_object.name": "ImageMetaProps",
          "nova_object.version": "1.20",
          "nova_object.data": {},
          "nova_object.namespace": "nova"
        }
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "properties",
        "min_ram",
        "container_format",
        "disk_format",
        "min_disk"
      ]
    },
    "availability_zone": "AZ3",
    "flavor": {
      "nova_object.name": "Flavor",
      "nova_object.version": "1.2",
      "nova_object.data": {
        "id": 28,
        "name": "cpu2-ram6-disk40",
        "is_public": true,
        "rxtx_factor": 1,
        "deleted_at": null,
        "root_gb": 40,
        "vcpus": 2,
        "memory_mb": 6144,
        "disabled": false,
        "extra_specs": {},
        "updated_at": null,
        "flavorid": "e29f3ee9-3f07-46d2-b2e2-efa4950edc95",
        "deleted": false,
        "swap": 0,
        "description": null,
        "created_at": "2019-02-07T07:48:21Z",
        "vcpu_weight": 0,
        "ephemeral_gb": 0
      },
      "nova_object.namespace": "nova"
    },
    "force_hosts": null,
    "retry": null,
    "instance_group": {
      "nova_object.name": "InstanceGroup",
      "nova_object.version": "1.11",
      "nova_object.data": {
        "members": null,
        "hosts": null,
        "policy": "anti-affinity"
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "policy",
        "members",
        "hosts"
      ]
    },
    "scheduler_hints": {
      "group": [
        "295c99ea-2db6-469a-877f-454a3903a8d8"
      ]
    },
    "limits": {
      "nova_object.name": "SchedulerLimits",
      "nova_object.version": "1.0",
      "nova_object.data": {
        "disk_gb": null,
        "numa_topology": null,
        "memory_mb": null,
        "vcpu": null
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "disk_gb",
        "vcpu",
        "memory_mb",
        "numa_topology"
      ]
    },
    "force_nodes": null,
    "project_id": "1bf4dbb3d2c746658f462bf8e59ec6be",
    "user_id": "255cca4584c24b16a684e3e8322b436b",
    "numa_topology": null,
    "is_bfv": false,
    "pci_requests": {
      "nova_object.name": "InstancePCIRequests",
      "nova_object.version": "1.1",
      "nova_object.data": {
        "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
        "requests": []
      },
      "nova_object.namespace": "nova"
    }
  },
  "nova_object.namespace": "nova",
  "nova_object.changes": [
    "ignore_hosts",
    "requested_destination",
    "num_instances",
    "image",
    "availability_zone",
    "instance_uuid",
    "flavor",
    "scheduler_hints",
    "pci_requests",
    "instance_group",
    "limits",
    "project_id",
    "user_id",
    "numa_topology",
    "is_bfv",
    "retry"
  ]
}

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.7 KiB)

This is the error by the way:

http://paste.openstack.org/show/752159/

2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi [req-1ca4c1d0-9f6f-4a04-860d-1d3d03a0d063 9fb4630d74ad49e8ac9f4e8a72b8cafb 504ea0a356ca4066aaa617daff869463 - default default] Unexpected exception in API method: nova.exception.ObjectActionError: Object action obj_load_attr failed because: unable to load uuid
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi Traceback (most recent call last):
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/wsgi.py", line 801, in wrapped
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return f(*args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/validation/__init__.py", line 110, in wrapper
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return func(*args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/api/openstack/compute/migrate_server.py", line 56, in _migrate
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi host_name=host_name)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 205, in inner
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return function(self, context, instance, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 213, in _wrapped
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return fn(self, context, instance, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 153, in inner
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return f(self, context, instance, *args, **kw)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/compute/api.py", line 3516, in resize
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi context, instance.uuid)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi result = fn(cls, context, *args, **kwargs)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/objects/request_spec.py", line 531, in get_by_instance_uuid
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi return cls._from_db_object(context, cls(), db_spec)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/nova/objects/request_spec.py", line 510, in _from_db_object
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi context, spec.instance_group.uuid)
2019-05-28 13:40:16.610 159865 ERROR nova.api.openstack.wsgi File "/usr/lib/python3/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
20...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

We can see here that the instance_group entry in the request spec is clearly missing the uuid even though there is a group scheduler hint:

    "instance_group": {
      "nova_object.name": "InstanceGroup",
      "nova_object.version": "1.11",
      "nova_object.data": {
        "members": null,
        "hosts": null,
        "policy": "anti-affinity"
      },
      "nova_object.namespace": "nova",
      "nova_object.changes": [
        "policy",
        "members",
        "hosts"
      ]
    },
    "scheduler_hints": {
      "group": [
        "295c99ea-2db6-469a-877f-454a3903a8d8"
      ]
    },

So I'm thinking we're somehow hitting this:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228

saving that, and then hitting this which triggers the error:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661786

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

This might explain what's happening during a cold migration.

Conductor creates a legacy filter_properties dict here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L172

If the spec has an instance_group it will call here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L397

and _to_legacy_group_info sets these values in the filter_properties dict:

        return {'group_updated': True,
                'group_hosts': set(self.instance_group.hosts),
                'group_policies': set([self.instance_group.policy]),
                'group_members': set(self.instance_group.members)}

Note there is no group_uuid.

Those filter_properties are passed to the prep_resize method on the dest compute:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L304

zigo said he hit this:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4272

(10:03:07 AM) zigo: 2019-05-28 15:02:35.534 30706 ERROR nova.compute.manager [instance: ae6f8afe-9c64-4aaf-90e8-be8175fee8e4] nova.exception.UnableToMigrateToSelf: Unable to migrate instance (ae6f8afe-9c64-4aaf-90e8-be8175fee8e4) to current host (clint1-compute-5.infomaniak.ch).

which will trigger a reschedule here:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4348

The _reschedule_resize_or_reraise method will setup the parameters for the resize_instance compute task RPC API (conductor) method:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4378-L4379

Note that in Rocky the RequestSpec is not passed back to conductor on the reschedule, only the filter_properties:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1452

We only started passing the RequestSpec from compute to conductor on reschedule starting in Stein: https://review.opendev.org/#/c/582417/

Without the request spec we get here in conductor:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L307

Note that was pass in the filter_properties but no instance_group to RequestSpec.from_components.

And because there is no instance_group but there are filter_properties, we call _populate_group_info here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L442

Which means we get into this block that sets the RequestSpec.instance_group with no uuid:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228

Then we eventually RPC cast off to prep_resize on the next host to try for the cold migration and save the request_spec changes here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L356

Which is how later attempts to use that request spec to migrate the instance blow up when loading it from the DB because spec.instance_group.uuid is not set.

Changed in nova:
importance: Undecided → High
Revision history for this message
Matt Riedemann (mriedem) wrote :

This goes back to ocata because of this change:

https://review.opendev.org/#/q/Ie70c77db753711e1449e99534d3b83669871943f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/661822

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/662550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/662574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/662578

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/661822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c96c7c5e13bde39944a9dde7da7fe418b311ca2d
Submitter: Zuul
Branch: master

commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/662774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/661786
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Submitter: Zuul
Branch: master

commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/662550
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b4f89ab59a9ed68211e1250d9fa6ab5da3c7366
Submitter: Zuul
Branch: master

commit 6b4f89ab59a9ed68211e1250d9fa6ab5da3c7366
Author: Matt Riedemann <email address hidden>
Date: Fri May 31 15:26:24 2019 -0400

    Set/get group uuid when transforming RequestSpec to/from filter_properties

    As a follow up to change I20981c987549eec40ad9762e74b0db16e54f4e63
    we can avoid having an incomplete InstanceGroup by updating
    the _to_legacy_group_info and _populate_group_info methods to set/get
    the group uuid to/from the filter_properties.

    Change-Id: I164a6dee1e92a65fcf6e89525ee194bb482e9920
    Related-Bug: #1830747

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/662894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/662895

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/662574
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8478a754802e29dffbb65ef363ee189162f0adea
Submitter: Zuul
Branch: stable/stein

commit 8478a754802e29dffbb65ef363ee189162f0adea
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/662894
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8569eb9b4fb905cb92041b84c293dc4e7af27fa8
Submitter: Zuul
Branch: stable/stein

commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/663110

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.opendev.org/663124

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/663125

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.opendev.org/663143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.opendev.org/663144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/662578
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a0a187c9bb9bef149e193027a6eedc09ba10ce1f
Submitter: Zuul
Branch: stable/rocky

commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): The ComputeTaskAPI.resize_instance stub is removed
    in this backport because it is not needed before Stein. Also, the
    PlacementFixture is in-tree before Stein so that is updated here.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/662895
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9fed1803b4d6b2778c47add9c327f0610edc5952
Submitter: Zuul
Branch: stable/rocky

commit 9fed1803b4d6b2778c47add9c327f0610edc5952
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

FYI, the patched version of Nova (Rocky) just reached Debian Buster.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/662774
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=581df2c98676b6734e8195ab56c9e0dba74789a5
Submitter: Zuul
Branch: stable/queens

commit 581df2c98676b6734e8195ab56c9e0dba74789a5
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): In this version we have to request a specific port
    to avoid a NetworkAmbiguous failure when creating the server.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)
    (cherry picked from commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/663110
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20b90f2e26e6a46a12c2fd943b4472c3147528fa
Submitter: Zuul
Branch: stable/queens

commit 20b90f2e26e6a46a12c2fd943b4472c3147528fa
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Conflicts:
          nova/objects/request_spec.py

    NOTE(mriedem): The conflict is due to not having change
    Ib33719a4b9599d86848c618a6e142c71ece79ca5 in Queens.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)
    (cherry picked from commit 9fed1803b4d6b2778c47add9c327f0610edc5952)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message
Yang Youseok (ileixe) wrote :

FYI, We recently encounter this bug, and found there is no scheduler_hints in request_spec (We are using Ocata and server group was made in the past)...

Revision history for this message
Yang Youseok (ileixe) wrote :

I think scheduler_hints does not existed after RequestSpec was override during reschedule. If then, VM could not revive even after this workaround patch applied.

Revision history for this message
Arvydas O. (zebediejus) wrote :

As Yang noticed, uuid is missing in scheduler_hints as well as in instance_groups. After our upgrade from Mitaka (previous upgraded from Liberty) to Rocky we had same problem. Not only cold migration/resize was failing, also "nova-manage db online_data_migrations --max-count 50" was failing when instances with instance_groups are found.
As workaround we solved it by inserting scheduler_hints directly to DB (use on your own risk):

--UPDATE nova_api.request_specs rs
INNER JOIN nova.instances ins ON ins.uuid = rs.instance_uuid
INNER JOIN nova_api.instance_group_member igm ON rs.instance_uuid = igm.instance_uuid
INNER JOIN nova_api.instance_groups ig ON ig.id = igm.group_id
SET rs.spec = REPLACE(rs.spec, ', "scheduler_hints": {}},', CONCAT(', "scheduler_hints": {"group": ["', ig.uuid, '"]}},'))
WHERE ins.deleted = 0 AND rs.spec LIKE '%"policies"%' AND rs.spec NOT LIKE '%"uuid":%' AND ins.uuid = '3cdd744e-243e-4328-8323-970a9e038c0e'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/663124
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=09ec97b95b19a42b949da76fe1f1b3cc06da8f35
Submitter: Zuul
Branch: stable/pike

commit 09ec97b95b19a42b949da76fe1f1b3cc06da8f35
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): In this version we have to use the MediumFakeDriver
    because change I12de2e195022593ea2a3e2894f2c3b5226930d4f is not
    in Pike so resizing to the same host does not work with the
    SmallFakeDriver.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)
    (cherry picked from commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f)
    (cherry picked from commit 581df2c98676b6734e8195ab56c9e0dba74789a5)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/663125
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=79cc08642172a3df1cd8d7a7c413adc21b468dcf
Submitter: Zuul
Branch: stable/pike

commit 79cc08642172a3df1cd8d7a7c413adc21b468dcf
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)
    (cherry picked from commit 9fed1803b4d6b2778c47add9c327f0610edc5952)
    (cherry picked from commit 20b90f2e26e6a46a12c2fd943b4472c3147528fa)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.opendev.org/663143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c41fe944dbf554e2d5980595e6155e25e6c0c25b
Submitter: Zuul
Branch: stable/ocata

commit c41fe944dbf554e2d5980595e6155e25e6c0c25b
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 13:59:20 2019 -0400

    Add regression recreate test for bug 1830747

    Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when
    a cold migration would reschedule to conductor it would not send the
    RequestSpec, only the filter_properties. The filter_properties contain
    a primitive version of the instance group information from the RequestSpec
    for things like the group members, hosts and policies, but not the uuid.
    When conductor is trying to reschedule the cold migration without a
    RequestSpec, it builds a RequestSpec from the components it has, like the
    filter_properties. This results in a RequestSpec with an instance_group
    field set but with no uuid field in the RequestSpec.instance_group.
    That RequestSpec gets persisted and then because of change
    Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the
    RequestSpec from the database will fail because of the missing
    RequestSpec.instance_group.uuid.

    The test added here recreates the pre-Stein scenario which could still
    be a problem (on master) for any corrupted RequestSpecs for older
    instances.

    NOTE(mriedem): In this version we have to use the FakeDriver because the
    MediumFakeDriver did not exist in Ocata. Also, we have to disable the
    DiskFilter since we are using placement during scheduling.

    Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29
    Related-Bug: #1830747
    (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d)
    (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea)
    (cherry picked from commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f)
    (cherry picked from commit 581df2c98676b6734e8195ab56c9e0dba74789a5)
    (cherry picked from commit 09ec97b95b19a42b949da76fe1f1b3cc06da8f35)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.opendev.org/663144
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3390c7af7ac774163fc8aa43e65662dfcefdc4cc
Submitter: Zuul
Branch: stable/ocata

commit 3390c7af7ac774163fc8aa43e65662dfcefdc4cc
Author: Matt Riedemann <email address hidden>
Date: Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid

    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.

    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.

    The related functional regression recreate test is updated
    to show this solves the issue.

    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747
    (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd)
    (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8)
    (cherry picked from commit 9fed1803b4d6b2778c47add9c327f0610edc5952)
    (cherry picked from commit 20b90f2e26e6a46a12c2fd943b4472c3147528fa)
    (cherry picked from commit 79cc08642172a3df1cd8d7a7c413adc21b468dcf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova ocata-eol

This issue was fixed in the openstack/nova ocata-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova pike-eol

This issue was fixed in the openstack/nova pike-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.