Resizes an instance wrongly reports 409 error when the instances is located on a compute host that has deleted records in nova db services table

Bug #2073365 reported by Shu Juan Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Unassigned

Bug Description

Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
* add the same KVM Host back.
* Deploy a VM on this KVM Host.
* Select it and trying to doing a resize, then it always reports 409 error.

Expected result
===============
resize should be succeed

Actual result
=============
What happened instead of the expected result?
How did the issue look like?

Environment
===========
1. Openstack version 2023.1(Antelope)

Cause Analysis
==============
resize reporting 409, Service Unavailable, is because the check_instance_host raises exception.ServiceUnavailable().

    @check_instance_host(check_is_up=True)
    def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
               host_name=None, auto_disk_config=None):
        """Resize (ie, migrate) a running instance.

def check_instance_host(check_is_up=False):
    """Validate the instance.host before performing the operation.

    At a minimum this method will check that the instance.host is set.

    :param check_is_up: If True, check that the instance.host status is UP
        or MAINTENANCE (disabled but not down).
    :raises: InstanceNotReady if the instance.host is not set
    :raises: ServiceUnavailable if check_is_up=True and the instance.host
        compute service status is not UP or MAINTENANCE
    """
    def outer(function):
        @functools.wraps(function)
        def wrapped(self, context, instance, *args, **kwargs):
            if not instance.host:
                raise exception.InstanceNotReady(instance_id=instance.uuid)
            if check_is_up:
                # Make sure the source compute service is not down otherwise we
                # cannot proceed.
                service = [
                    service for service in instance.services
                        if service.binary == 'nova-compute'][0]
                if not self.servicegroup_api.service_is_up(service):
                    # ComputeServiceUnavailable would make more sense here but
                    # we do not want to leak hostnames to end users.
                    raise exception.ServiceUnavailable()
            return function(self, context, instance, *args, **kwargs)
        return wrapped
    return outer
        return host_status

Debugging shows the instance.services included the services records that already deleted; while it should not.

(Pdb) p instance.services
ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])

MariaDB [nova]> select * from services where host='kvmcore14';
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute | 861 | 0 | 10 | NULL | 2024-07-02 08:56:51 | 0 | 51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
| 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute | 4742 | 0 | 16 | NULL | 2024-07-04 01:19:48 | 0 | 51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
| 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute | 1733 | 0 | 17 | NULL | 2024-07-10 00:47:44 | 0 | 51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
| 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute | 1 | 0 | 19 | NULL | 2024-07-10 01:04:57 | 0 | 51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
| 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL | 20 | kvmcore14 | nova-compute | compute | 100 | 0 | 0 | NULL | 2024-07-10 02:13:43 | 0 | 51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
5 rows in set (0.002 sec)

The proposed code fix is to add an extra condition Service.deleted == 0 in
https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/924318

Changed in nova:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.