Don't delete compute node when deleting service other than nova-compute

Bug #1852993 reported by Pavel Gluschak on 2019-11-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Pavel Gluschak
Rocky
Medium
Sylvain Bauza
Stein
Medium
Sylvain Bauza
Train
Medium
Sylvain Bauza

Bug Description

When upgrading to Stein, nova-consoleauth service is deprecated and should be removed. However if nova-consoleauth service is located on the same host with nova-compute, matching row in compute_nodes table is soft-deleted as well, making nova-compute service report in log, that stale resource provider exists in placement:

2019-11-18 16:03:20.069 7 ERROR nova.compute.manager [req-f0255008-c398-406c-bca0-12cdc34fc0b4 - - - - -] Error updating resources for node vzstor1.vstoragedomain.: ResourceProviderCreationFailed: Failed to create resource provider vzstor1.vstoragedomain
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager Traceback (most recent call last):
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7399, in update_available_resource_for_node
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 689, in update_available_resource
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager self._update_available_resource(context, resources)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager return f(*args, **kwargs)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 713, in _update_available_resource
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager self._init_compute_node(context, resources)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 562, in _init_compute_node
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager self._update(context, cn)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 887, in _update
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager inv_data,
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 68, in set_inventory_for_provider
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid,
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1106, in set_inventory_for_provider
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 667, in _ensure_resource_provider
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 66, in wrapper
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager return f(self, *a, **k)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 614, in _create_resource_provider
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager raise exception.ResourceProviderCreationFailed(name=name)
2019-11-18 16:03:20.069 7 ERROR nova.compute.manager ResourceProviderCreationFailed: Failed to create resource provider vzstor1.vstoragedomain

Steps to reproduce
==================

# nova service-list
+--------------------------------------+------------------+------------------------+----------+---------+-------+----------------------------+-----------------+-------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | Forced down |
+--------------------------------------+------------------+------------------------+----------+---------+-------+----------------------------+-----------------+-------------+
| 57885a23-6bc2-4400-a199-a7e0defe2e00 | nova-conductor | vzstor1.vstoragedomain | internal | enabled | up | 2019-11-18T12:33:36.904174 | - | False |
| 0b08d3ea-1b4a-4076-a51e-fbd892241b9b | nova-scheduler | vzstor1.vstoragedomain | internal | enabled | up | 2019-11-18T12:33:40.111038 | - | False |
| 7d7f3dc6-da81-41b4-be30-c2cc451b560a | nova-consoleauth | vzstor1.vstoragedomain | internal | enabled | up | - | - | False |
| 367a4591-cce5-4b7b-ad7c-69135aa803aa | nova-compute | vzstor1.vstoragedomain | nova | enabled | up | 2019-11-18T12:33:43.500922 | - | False |
+--------------------------------------+------------------+------------------------+----------+---------+-------+----------------------------+-----------------+-------------+

nova=# select uuid,deleted_at from compute_nodes;
                 uuid | deleted_at
--------------------------------------+----------------------------
 13c1fbd5-fbc1-4301-8a6e-9d50bde6826f |

# nova service-delete 7d7f3dc6-da81-41b4-be30-c2cc451b560a <-- this is nova-consoleauth service

nova=# select uuid,deleted_at from compute_nodes;
                 uuid | deleted_at
--------------------------------------+----------------------------
 13c1fbd5-fbc1-4301-8a6e-9d50bde6826f | 2019-11-18 12:19:30.080625 <-- compute_node with the same host is also deleted

Expected result
===============
compute_node is not deleted from db if nova-consoleauth service is removed

Actual result
=============
compute_node is deleted from db if nova-consoleauth service is removed

Environment
===========
Queens, Stein

Pavel Gluschak (scsnow) on 2019-11-18
Changed in nova:
assignee: nobody → Pavel Gluschak (scsnow)
Pavel Gluschak (scsnow) on 2019-11-18
description: updated

Fix proposed to branch: master
Review: https://review.opendev.org/694756

Changed in nova:
status: New → In Progress
Pavel Gluschak (scsnow) on 2019-11-18
description: updated
description: updated
Matt Riedemann (mriedem) wrote :

Deleting a nova-consoleauth service shouldn't have anything to do with deleting compute_nodes records, only deleting nova-compute services, but I guess this doesn't filter on the service binary being nova-compute:

https://github.com/openstack/nova/blob/a054d03adef692db22e2466084e50cbf50112bb0/nova/db/sqlalchemy/api.py#L415

However, a nova-consoleauth service id shouldn't be mapped to a compute node record, but that's probably where the OR is breaking things - the service_id is likely NULL and the host is the same.

Changed in nova:
importance: Undecided → Medium
tags: added: db
Sylvain Bauza (sylvain-bauza) wrote :

Yeah, this is weird. We only delete the compute_nodes record if the related service ID is the same.

So, that would mean that two services (nova-compute *and* nova-consoleauth) would have the same service ID...

That said, I'm OK with the change just in case the above.

Sylvain Bauza (sylvain-bauza) wrote :

Oh, Matt is right, we use a OR condition, my bad.
In this case, we go looking at whether the service host is the same. In the case of nova-consoleauth, this is the same, hence the bug. Grrr, my bad.

Reviewed: https://review.opendev.org/694756
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cff9ecb20870daa56b1cfd6fbb9f5817d1306fda
Submitter: Zuul
Branch: master

commit cff9ecb20870daa56b1cfd6fbb9f5817d1306fda
Author: Pavel Glushchak <email address hidden>
Date: Mon Nov 18 14:53:42 2019 +0300

    Don't delete compute node, when deleting service other than nova-compute

    We should not try to delete compute node from compute_nodes table,
    when destroying service other than nova-compute.

    Change-Id: If5b5945e699ec2e2da51d5fa90616431274849b0
    Closes-Bug: #1852993
    Signed-off-by: Pavel Glushchak <email address hidden>

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/695145
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=716cde5454a5cd527040c60dbfd2b6ed37a1975c
Submitter: Zuul
Branch: stable/train

commit 716cde5454a5cd527040c60dbfd2b6ed37a1975c
Author: Pavel Glushchak <email address hidden>
Date: Mon Nov 18 14:53:42 2019 +0300

    Don't delete compute node, when deleting service other than nova-compute

    We should not try to delete compute node from compute_nodes table,
    when destroying service other than nova-compute.

    Change-Id: If5b5945e699ec2e2da51d5fa90616431274849b0
    Closes-Bug: #1852993
    Signed-off-by: Pavel Glushchak <email address hidden>
    (cherry picked from commit cff9ecb20870daa56b1cfd6fbb9f5817d1306fda)

Reviewed: https://review.opendev.org/695381
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e00fa24a18909d543e7761a76d7fa25145212aa7
Submitter: Zuul
Branch: stable/stein

commit e00fa24a18909d543e7761a76d7fa25145212aa7
Author: Pavel Glushchak <email address hidden>
Date: Mon Nov 18 14:53:42 2019 +0300

    Don't delete compute node, when deleting service other than nova-compute

    We should not try to delete compute node from compute_nodes table,
    when destroying service other than nova-compute.

    Change-Id: If5b5945e699ec2e2da51d5fa90616431274849b0
    Closes-Bug: #1852993
    Signed-off-by: Pavel Glushchak <email address hidden>
    (cherry picked from commit cff9ecb20870daa56b1cfd6fbb9f5817d1306fda)
    (cherry picked from commit 716cde5454a5cd527040c60dbfd2b6ed37a1975c)

Reviewed: https://review.opendev.org/695382
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=af668d395dd93f6ddadf94b6eb761f31f8e83a7c
Submitter: Zuul
Branch: stable/rocky

commit af668d395dd93f6ddadf94b6eb761f31f8e83a7c
Author: Pavel Glushchak <email address hidden>
Date: Mon Nov 18 14:53:42 2019 +0300

    Don't delete compute node, when deleting service other than nova-compute

    We should not try to delete compute node from compute_nodes table,
    when destroying service other than nova-compute.

    Change-Id: If5b5945e699ec2e2da51d5fa90616431274849b0
    Closes-Bug: #1852993
    Signed-off-by: Pavel Glushchak <email address hidden>
    (cherry picked from commit cff9ecb20870daa56b1cfd6fbb9f5817d1306fda)
    (cherry picked from commit 716cde5454a5cd527040c60dbfd2b6ed37a1975c)
    (cherry picked from commit e00fa24a18909d543e7761a76d7fa25145212aa7)

Reviewed: https://review.opendev.org/695383
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4e9f85205697dfcc01e8cbccafa79eea45f34212
Submitter: Zuul
Branch: stable/queens

commit 4e9f85205697dfcc01e8cbccafa79eea45f34212
Author: Pavel Glushchak <email address hidden>
Date: Mon Nov 18 14:53:42 2019 +0300

    Don't delete compute node, when deleting service other than nova-compute

    We should not try to delete compute node from compute_nodes table,
    when destroying service other than nova-compute.

    Change-Id: If5b5945e699ec2e2da51d5fa90616431274849b0
    Closes-Bug: #1852993
    Signed-off-by: Pavel Glushchak <email address hidden>
    (cherry picked from commit cff9ecb20870daa56b1cfd6fbb9f5817d1306fda)
    (cherry picked from commit 716cde5454a5cd527040c60dbfd2b6ed37a1975c)
    (cherry picked from commit e00fa24a18909d543e7761a76d7fa25145212aa7)
    (cherry picked from commit af668d395dd93f6ddadf94b6eb761f31f8e83a7c)

tags: added: in-stable-queens

This issue was fixed in the openstack/nova 20.1.0 release.

This issue was fixed in the openstack/nova 19.1.0 release.

This issue was fixed in the openstack/nova 18.3.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers