"nova list" fails with exception.ServiceNotFound if service is deleted and has no UUID

Bug #1764556 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
melanie witt
Pike
In Progress
Medium
Elod Illes
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann
Stein
Fix Committed
Medium
Matt Riedemann

Bug Description

We had a testcase where we booted an instance on Newton, migrated it off the compute node, deleted the compute node (and service), upgraded to Pike, created a new compute node with the same name, and migrated the instance back to the compute node.

At this point the "nova list" command failed with exception.ServiceNotFound.

It appears that since the Service has no UUID the _from_db_object() routine will try to add it, but the service.save() call fails because the service in question has been deleted.

I reproduced the issue with stable/pike devstack. I booted an instance, then created a fake entry in the "services" table without a UUID so the table looked like this:

mysql> select * from services;
+---------------------+---------------------+---------------------+----+----------+----------------+-----------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |
+---------------------+---------------------+---------------------+----+----------+----------------+-----------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2018-02-20 16:10:07 | 2018-04-16 22:10:46 | NULL | 1 | devstack | nova-conductor | conductor | 477364 | 0 | 0 | NULL | 2018-04-16 22:10:46 | 0 | 22 | c041d7cf-5047-4014-b50c-3ba6b5d95097 |
| 2018-02-20 16:10:10 | 2018-04-16 22:10:54 | NULL | 2 | devstack | nova-compute | compute | 477149 | 0 | 0 | NULL | 2018-04-16 22:10:54 | 0 | 22 | d0cfb63c-8b59-4b65-bb7e-6b89acd3fe35 |
| 2018-02-20 16:10:10 | 2018-04-16 20:29:33 | 2018-04-16 20:30:33 | 3 | devstack | nova-compute | compute | 476432 | 0 | 3 | NULL | 2018-04-16 20:30:33 | 0 | 22 | NULL |
+---------------------+---------------------+---------------------+----+----------+----------------+-----------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+

At this point, running "nova show <uuid>" worked fine, but running "nova list" failed:

stack@devstack:~/devstack$ nova list
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'nova.exception.ServiceNotFound'> (HTTP 500) (Request-ID: req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6)

The nova-api log looked like this:

Apr 16 22:11:00 devstack <email address hidden>[4258]: DEBUG nova.compute.api [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Listing 1000 instances in cell 09eb515f-9906-40bf-9be6-63b5e6ee279a(cell1) {{(pid=4261) _get_instances_by_filters_all_cells /opt/stack/nova/nova/compute/api.py:2559}}
Apr 16 22:11:00 devstack <email address hidden>[4258]: DEBUG oslo_concurrency.lockutils [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Lock "09eb515f-9906-40bf-9be6-63b5e6ee279a" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s {{(pid=4261) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270}}
Apr 16 22:11:00 devstack <email address hidden>[4258]: DEBUG oslo_concurrency.lockutils [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Lock "09eb515f-9906-40bf-9be6-63b5e6ee279a" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s {{(pid=4261) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282}}
Apr 16 22:11:00 devstack <email address hidden>[4258]: DEBUG nova.objects.service [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Generated UUID 4368a7ff-f589-4197-b0b9-d2afdb71ca33 for service 3 {{(pid=4261) _from_db_object /opt/stack/nova/nova/objects/service.py:245}}
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Unexpected exception in API method: ServiceNotFound: Service 3 could not be found.
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions Traceback (most recent call last):
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/extensions.py", line 336, in wrapped
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return f(*args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/validation/__init__.py", line 181, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return func(*args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/validation/__init__.py", line 181, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return func(*args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 168, in detail
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions servers = self._get_servers(req, is_detail=True)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 311, in _get_servers
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions sort_keys=sort_keys, sort_dirs=sort_dirs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/compute/api.py", line 2468, in get_all
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions sort_dirs=sort_dirs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/compute/api.py", line 2565, in _get_instances_by_filters_all_cells
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/compute/api.py", line 2596, in _get_instances_by_filters
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions expected_attrs=fields, sort_keys=sort_keys, sort_dirs=sort_dirs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions result = fn(cls, context, *args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/instance.py", line 1252, in get_by_filters
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions expected_attrs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/instance.py", line 1199, in _make_instance_list
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions expected_attrs=expected_attrs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/instance.py", line 448, in _from_db_object
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions objects.Service, db_inst['services'])
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 1121, in obj_make_list
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions **extra_args)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/service.py", line 246, in _from_db_object
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions service.save()
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 226, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return fn(self, *args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/objects/service.py", line 363, in save
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions db_service = db.service_update(self._context, self.id, updates)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/api.py", line 189, in service_update
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return IMPL.service_update(context, service_id, values)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 150, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions ectxt.value = e.inner_exc
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions self.force_reraise()
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions six.reraise(self.type_, self.value, self.tb)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 138, in wrapper
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return f(*args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 250, in wrapped
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return f(context, *args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 610, in service_update
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions service_ref = service_get(context, service_id)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 265, in wrapped
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions return f(context, *args, **kwargs)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 472, in service_get
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions raise exception.ServiceNotFound(service_id=service_id)
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions ServiceNotFound: Service 3 could not be found.
Apr 16 22:11:00 devstack <email address hidden>[4258]: ERROR nova.api.openstack.extensions
Apr 16 22:11:00 devstack <email address hidden>[4258]: INFO nova.api.openstack.wsgi [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
Apr 16 22:11:00 devstack <email address hidden>[4258]: <class 'nova.exception.ServiceNotFound'>
Apr 16 22:11:00 devstack <email address hidden>[4258]: DEBUG nova.api.openstack.wsgi [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] Returning 500 to user: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
Apr 16 22:11:00 devstack <email address hidden>[4258]: <class 'nova.exception.ServiceNotFound'> {{(pid=4261) __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:1029}}
Apr 16 22:11:00 devstack <email address hidden>[4258]: INFO nova.api.openstack.requestlog [None req-b7e1b5f9-e7b4-4ccf-ba28-e8b3e1acd2f6 demo demo] 128.224.186.226 "GET /compute/v2.1/servers/detail" status: 500 len: 204 microversion: 2.53 time: 0.131473
Apr 16 22:11:00 devstack <email address hidden>[4258]: [pid: 4261|app: 0|req: 6/12] 128.224.186.226 () {64 vars in 1290 bytes} [Mon Apr 16 22:11:00 2018] GET /compute/v2.1/servers/detail => generated 204 bytes in 132 msecs (HTTP/1.1 500) 9 headers in 393 bytes (1 switches on core 0)

Revision history for this message
Chris Friesen (cbf123) wrote :

An additional piece of information....running "nova show <instance>" as an admin user causes the service to be updated with a valid UUID, but running it as a regular user does not (presumably because it doesn't show the host information).

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm working on a functional recreate test for this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/562041

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Chris Friesen (cbf123) wrote :

Yes, I think you're right. With that commit applied it appears that the migration back to the original compute node is pulling in the service and updating the UUID on it. So after the migration when we run "nova list" the service has a uuid on it already.

I'm not sure if there's a race window in there where we could still hit problems, but if there is it should be transitory.

Changed in nova:
status: New → Fix Released
Revision history for this message
Chris Friesen (cbf123) wrote :

I think we could get into the bad state described in the bug if we do a slightly different series of actions:

1) boot instance on Ocata
2) migrate instance
3) delete compute node (thus deleting the service record)
4) create compute node with same name
5) migrate instance to newly-created compute node
6) upgrade to Pike

This should result in the deleted service not having a UUID, which will cause problems in Pike if we do a "nova list".

I suppose an argument could be made that this is an unlikely scenario, which is probably true. :)

Changed in nova:
status: Fix Released → New
Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Chris Friesen (cbf123) wrote :

So what's the best option here? Internally it was suggested that we could just modify the online data migration to apply uuids to deleted service records.

I'm not sure we can just filter out deleted services when looking up the service based on instance host.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I haven't thought about solutions yet, was spending a large chunk of my time just getting the functional regression recreate test working.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/572089

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Damon Li (damonl1) wrote :

I also faced this problem.

If we have a deleted service in db for Ocata, after we upgrade it to Pike, run "nova list" will raise an exception "ERROR (BadRequest): This service is older (v16) than the minimum (v30) version of the rest of the deployment. Unable to continue. (HTTP 400) (Request-ID: req-926decd1-8b25-461c-be53-4710d7c7b21c)".

It's because nova try to add an uuid for this deleted service. However when save it to database, it will check the version of this service. It will raise this exception when save the service. I think we don't need to check the version when save service.

My patch are as following:
https://review.openstack.org/#/c/572089/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Damon Li (<email address hidden>) on branch: master
Review: https://review.openstack.org/572089
Reason: this patch can't fix this bug. Even we ignore the version check, when write service data to db, oslo_db will raise exception "Service not found". Seems can't get the deleted service.

Revision history for this message
melanie witt (melwitt) wrote :

Just wanted to add a note that this bug smells like a similar bug:

  https://bugs.launchpad.net/nova/+bug/1778305

where a deleted service with an old version will make 'nova list' fail with a ServiceTooOld exception after an upgrade from Ocata => Pike.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/582435

Changed in nova:
assignee: nobody → melanie witt (melwitt)
Changed in nova:
assignee: melanie witt (melwitt) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.opendev.org/582435
Reason: https://review.opendev.org/562041 was preferred

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/562041
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=81f05f53d357a546c7f9a53cae6ef45b92e28bc1
Submitter: Zuul
Branch: master

commit 81f05f53d357a546c7f9a53cae6ef45b92e28bc1
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 17 16:20:53 2018 -0400

    Add functional recreate test for bug 1764556

    This attempts to recreate the following scenario:

    1) boot instance on ocata host where the compute service
       does not have a uuid
    2) migrate instance
    3) delete the ocata service (thus deleting the compute node)
    4) start compute service with the same name
    5) migrate instance to newly-created compute node
    6) upgrade to pike where services.uuid data migration happens
    7) list instances as admin to join on the services table

    The failure occurs when listing instances because the deleted
    service with the same name as the compute host that the instance
    is running on gets pulled from the DB and the Service object
    attempts to set a uuid on it, which fails since it's not using
    a read_deleted="yes" context.

    While working on this, the service_get_all_by_binary DB API
    method had to be fixed to not hard-code read_deleted="no" since
    the test needs to be able to read deleted services, which it can
    control via its own context object (note that
    RequestContext.read_deleted already defaults to "no" so the
    hard-coding in the DB API is unnecessarily restrictive).

    Change-Id: I4d60da26fcf0a77628d1fdf4e818884614fa4f02
    Related-Bug: #1764556

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/582408
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16e163053ca39886f11fdb8a3af10a28619fc105
Submitter: Zuul
Branch: master

commit 16e163053ca39886f11fdb8a3af10a28619fc105
Author: melanie witt <email address hidden>
Date: Thu Jul 12 21:48:23 2018 +0000

    Don't generate service UUID for deleted services

    In Pike, we added a UUID field to services and during an upgrade from
    Ocata => Pike, when instances are accessed joined with their associated
    services, we generate a UUID for the services on-the-fly.

    This causes a problem in the scenario where an operator upgrades their
    cluster and has old, deleted services with hostnames matching existing
    services associated with instances. When we go to generate the service
    UUID for the old, deleted service, we hit a ServiceTooOld exception.

    This addresses the problem by not bothering to generate a UUID for a
    deleted service. One alternative would be to exclude deleted services
    when we join the 'instances' and 'services' tables, but I'm not sure
    whether that approach might cause unintended effects where service
    information that used to be viewable for instances becomes hidden.

    Closes-Bug: #1778305
    Closes-Bug: #1764556

    Change-Id: I347096a527c257075cefe7b81210622f6cd87daf

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/673812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/673814

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/673816

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/673821

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/673824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/673827

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.opendev.org/673830

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/673833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/673812
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44709cfb5cfa6a8a384eb391ad520de55b2fa247
Submitter: Zuul
Branch: stable/stein

commit 44709cfb5cfa6a8a384eb391ad520de55b2fa247
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 17 16:20:53 2018 -0400

    Add functional recreate test for bug 1764556

    This attempts to recreate the following scenario:

    1) boot instance on ocata host where the compute service
       does not have a uuid
    2) migrate instance
    3) delete the ocata service (thus deleting the compute node)
    4) start compute service with the same name
    5) migrate instance to newly-created compute node
    6) upgrade to pike where services.uuid data migration happens
    7) list instances as admin to join on the services table

    The failure occurs when listing instances because the deleted
    service with the same name as the compute host that the instance
    is running on gets pulled from the DB and the Service object
    attempts to set a uuid on it, which fails since it's not using
    a read_deleted="yes" context.

    While working on this, the service_get_all_by_binary DB API
    method had to be fixed to not hard-code read_deleted="no" since
    the test needs to be able to read deleted services, which it can
    control via its own context object (note that
    RequestContext.read_deleted already defaults to "no" so the
    hard-coding in the DB API is unnecessarily restrictive).

    NOTE(mriedem): This backport needed to use the set_nodes/restore_nodes
    methods on the fake virt module since change
    I2cf2fcbaebc706f897ce5dfbff47d32117064f9c is not in Stein.

    Change-Id: I4d60da26fcf0a77628d1fdf4e818884614fa4f02
    Related-Bug: #1764556
    (cherry picked from commit 81f05f53d357a546c7f9a53cae6ef45b92e28bc1)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/673814
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=11fde850e6163b68e006e746406e6c5e6ac980e0
Submitter: Zuul
Branch: stable/stein

commit 11fde850e6163b68e006e746406e6c5e6ac980e0
Author: melanie witt <email address hidden>
Date: Thu Jul 12 21:48:23 2018 +0000

    Don't generate service UUID for deleted services

    In Pike, we added a UUID field to services and during an upgrade from
    Ocata => Pike, when instances are accessed joined with their associated
    services, we generate a UUID for the services on-the-fly.

    This causes a problem in the scenario where an operator upgrades their
    cluster and has old, deleted services with hostnames matching existing
    services associated with instances. When we go to generate the service
    UUID for the old, deleted service, we hit a ServiceTooOld exception.

    This addresses the problem by not bothering to generate a UUID for a
    deleted service. One alternative would be to exclude deleted services
    when we join the 'instances' and 'services' tables, but I'm not sure
    whether that approach might cause unintended effects where service
    information that used to be viewable for instances becomes hidden.

    Closes-Bug: #1778305
    Closes-Bug: #1764556

    Change-Id: I347096a527c257075cefe7b81210622f6cd87daf
    (cherry picked from commit 16e163053ca39886f11fdb8a3af10a28619fc105)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/673816
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eadd78efe39e1958d14319cfcdbda15862485845
Submitter: Zuul
Branch: stable/rocky

commit eadd78efe39e1958d14319cfcdbda15862485845
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 17 16:20:53 2018 -0400

    Add functional recreate test for bug 1764556

    This attempts to recreate the following scenario:

    1) boot instance on ocata host where the compute service
       does not have a uuid
    2) migrate instance
    3) delete the ocata service (thus deleting the compute node)
    4) start compute service with the same name
    5) migrate instance to newly-created compute node
    6) upgrade to pike where services.uuid data migration happens
    7) list instances as admin to join on the services table

    The failure occurs when listing instances because the deleted
    service with the same name as the compute host that the instance
    is running on gets pulled from the DB and the Service object
    attempts to set a uuid on it, which fails since it's not using
    a read_deleted="yes" context.

    While working on this, the service_get_all_by_binary DB API
    method had to be fixed to not hard-code read_deleted="no" since
    the test needs to be able to read deleted services, which it can
    control via its own context object (note that
    RequestContext.read_deleted already defaults to "no" so the
    hard-coding in the DB API is unnecessarily restrictive).

    NOTE(mriedem): This backport needed to account for not having
    change Idaed39629095f86d24a54334c699a26c218c6593 in Rocky so
    the PlacementFixture comes from the same module as the other
    nova fixtures.

    Change-Id: I4d60da26fcf0a77628d1fdf4e818884614fa4f02
    Related-Bug: #1764556
    (cherry picked from commit 81f05f53d357a546c7f9a53cae6ef45b92e28bc1)
    (cherry picked from commit 44709cfb5cfa6a8a384eb391ad520de55b2fa247)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/673821
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=72f9aa720f9137a35d55b9f96e2220d2d0e5588d
Submitter: Zuul
Branch: stable/rocky

commit 72f9aa720f9137a35d55b9f96e2220d2d0e5588d
Author: melanie witt <email address hidden>
Date: Thu Jul 12 21:48:23 2018 +0000

    Don't generate service UUID for deleted services

    In Pike, we added a UUID field to services and during an upgrade from
    Ocata => Pike, when instances are accessed joined with their associated
    services, we generate a UUID for the services on-the-fly.

    This causes a problem in the scenario where an operator upgrades their
    cluster and has old, deleted services with hostnames matching existing
    services associated with instances. When we go to generate the service
    UUID for the old, deleted service, we hit a ServiceTooOld exception.

    This addresses the problem by not bothering to generate a UUID for a
    deleted service. One alternative would be to exclude deleted services
    when we join the 'instances' and 'services' tables, but I'm not sure
    whether that approach might cause unintended effects where service
    information that used to be viewable for instances becomes hidden.

    Closes-Bug: #1778305
    Closes-Bug: #1764556

    Conflicts:
          nova/tests/functional/regressions/test_bug_1764556.py

    NOTE(mriedem): The conflict is due to eadd78efe3 removing the
    func_fixtures import.

    Change-Id: I347096a527c257075cefe7b81210622f6cd87daf
    (cherry picked from commit 16e163053ca39886f11fdb8a3af10a28619fc105)
    (cherry picked from commit 8601ca75b1515e7434f1d3c563a0e65458c8c86a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/673824
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=de035bbfccbfd5a3e7516434171e89e9df7d08fa
Submitter: Zuul
Branch: stable/queens

commit de035bbfccbfd5a3e7516434171e89e9df7d08fa
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 17 16:20:53 2018 -0400

    Add functional recreate test for bug 1764556

    This attempts to recreate the following scenario:

    1) boot instance on ocata host where the compute service
       does not have a uuid
    2) migrate instance
    3) delete the ocata service (thus deleting the compute node)
    4) start compute service with the same name
    5) migrate instance to newly-created compute node
    6) upgrade to pike where services.uuid data migration happens
    7) list instances as admin to join on the services table

    The failure occurs when listing instances because the deleted
    service with the same name as the compute host that the instance
    is running on gets pulled from the DB and the Service object
    attempts to set a uuid on it, which fails since it's not using
    a read_deleted="yes" context.

    While working on this, the service_get_all_by_binary DB API
    method had to be fixed to not hard-code read_deleted="no" since
    the test needs to be able to read deleted services, which it can
    control via its own context object (note that
    RequestContext.read_deleted already defaults to "no" so the
    hard-coding in the DB API is unnecessarily restrictive).

    Change-Id: I4d60da26fcf0a77628d1fdf4e818884614fa4f02
    Related-Bug: #1764556
    (cherry picked from commit 81f05f53d357a546c7f9a53cae6ef45b92e28bc1)
    (cherry picked from commit 44709cfb5cfa6a8a384eb391ad520de55b2fa247)
    (cherry picked from commit eadd78efe39e1958d14319cfcdbda15862485845)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/673827
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=97e78f8cc194c60d06a37b11c9c09405bf87b295
Submitter: Zuul
Branch: stable/queens

commit 97e78f8cc194c60d06a37b11c9c09405bf87b295
Author: melanie witt <email address hidden>
Date: Thu Jul 12 21:48:23 2018 +0000

    Don't generate service UUID for deleted services

    In Pike, we added a UUID field to services and during an upgrade from
    Ocata => Pike, when instances are accessed joined with their associated
    services, we generate a UUID for the services on-the-fly.

    This causes a problem in the scenario where an operator upgrades their
    cluster and has old, deleted services with hostnames matching existing
    services associated with instances. When we go to generate the service
    UUID for the old, deleted service, we hit a ServiceTooOld exception.

    This addresses the problem by not bothering to generate a UUID for a
    deleted service. One alternative would be to exclude deleted services
    when we join the 'instances' and 'services' tables, but I'm not sure
    whether that approach might cause unintended effects where service
    information that used to be viewable for instances becomes hidden.

    Closes-Bug: #1778305
    Closes-Bug: #1764556

    Change-Id: I347096a527c257075cefe7b81210622f6cd87daf
    (cherry picked from commit 16e163053ca39886f11fdb8a3af10a28619fc105)
    (cherry picked from commit 8601ca75b1515e7434f1d3c563a0e65458c8c86a)
    (cherry picked from commit d9094e54fbb6a25c8cf189c4fd9fbb8d66cec145)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.2

This issue was fixed in the openstack/nova 19.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.2

This issue was fixed in the openstack/nova 18.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.12

This issue was fixed in the openstack/nova 17.0.12 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/pike
Review: https://review.opendev.org/c/openstack/nova/+/673833
Reason: stable/pike has transitioned to End of Life for nova, open patches need to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/pike
Review: https://review.opendev.org/c/openstack/nova/+/673830
Reason: stable/pike has transitioned to End of Life for nova, open patches need to be abandoned in order to be able to delete the branch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.