OpenStack Compute (nova)

Comment 23 for bug 1724172

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-14: Related fix merged to nova (stable/rocky)

#23

Reviewed: https://review.opendev.org/713033
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53a893f7c97e35de3e9ac26101827cdb43ed35cc
Submitter: Zuul
Branch: stable/rocky

commit 53a893f7c97e35de3e9ac26101827cdb43ed35cc
Author: Balazs Gibizer <email address hidden>
Date: Wed May 1 23:38:40 2019 +0200

Enhance service restart in functional env

    Bugfix Icaf1bae8cb040b939f916a19ce026031ddb84af7 showed that restarting
    a compute service in the functional env is unrealistic causing faults
    to slip through. During that bug fix only the minimal change was done
    in the functional env regarding compute service restart to reproduce
    the reported fault. However the restart of the compute service could
    be made even more realistic.

    This patch simulates a compute service restart in the functional env
    by stopping the original compute service and starting a totally new
    compute service for the same host and node. This way we can make sure
    that we get a brand new ComputeManager in the new service and no
    state can leak between the old and the new service.

    This change revealed another shortcoming of the functional env.
    In the real world the nova-compute service could be restarted without
    loosing any running servers on the compute host. But with the naive
    implementation of this change the compute service is re-created. This
    means that a new ComputeManager is instantiated that loads a new
    FakeDriver instance as well. That new FakeDriver instance then reports
    an empty hypervisor. This behavior is not totally unrealistic as it
    simulates such a compute host restart that cleans the hypervisor state
    as well (e.g. compute host redeployment). However this type of restart
    shows another bug in the code path that destroys and deallocates
    evacuated instance from the source host. Therefore this patch
    implements the compute service restart in a way that simulates only a
    service restart and not a full compute restart. A subsequent patch will
    add a test that uses the clean hypervisor case to reproduces the
    revealed bug.

Related-Bug: #1724172

On stable/stein:

Closes-Bug: #1859766

    Conflicts:
        doc/notification_samples/libvirt-connect-error.json
        nova/test.py
        nova/tests/functional/libvirt/test_reshape.py
        nova/tests/functional/test_servers.py

    NOTE(elod.illes): files conflicts details:
    * libvirt-connect-error.json:
      File added only in Stein with libvirt.error notification
      transformation patch I7d2287ce06d77c0afdef0ea8bdfb70f6c52d3c50
    * test.py:
      Patches Iecf4dcf8e648c9191bf8846428683ec81812c026 (Remove patching
      the mock lib) and Ibb8c12fb2799bb5ceb9e3d72a2b86dbb4f14451e (Use a
      static resource tracker in compute manager) were not backported to
      Rocky
    * test_reshape.py:
      File added only in Stein in the frame of 'Handling Reshaped Provider
      Trees' feature, with patch Ide797ebf7790d69042ae275ebec6ced3fa4787b6
    * test_servers.py:
      Patch I7cbd5d9fb875ebf72995362e0b6693492ce32051 (Reject forced move
      with nested source allocation) is not present in Rocky as it is part
      of 'Nested Resource Providers - Allocation Candidates' implemented in
      Stein

    Change-Id: I9d6cd6259659a35383c0c9c21db72a9434ba86b1
    (cherry picked from commit 2794748d9c58623045023f34c7793c58ce41447c)
    (cherry picked from commit b874c409c11b5d83508d2f0276a9a648f72192a4)

Reviewed:  https://review.opendev.org/713033
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53a893f7c97e35de3e9ac26101827cdb43ed35cc
Submitter: Zuul
Branch:    stable/rocky

commit 53a893f7c97e35de3e9ac26101827cdb43ed35cc
Author: Balazs Gibizer <balazs.gibizer@ericsson.com>
Date:   Wed May 1 23:38:40 2019 +0200

Enhance service restart in functional env
    
    Bugfix Icaf1bae8cb040b939f916a19ce026031ddb84af7 showed that restarting
    a compute service in the functional env is unrealistic causing faults
    to slip through. During that bug fix only the minimal change was done
    in the functional env regarding compute service restart to reproduce
    the reported fault. However the restart of the compute service could
    be made even more realistic.
    
    This patch simulates a compute service restart in the functional env
    by stopping the original compute service and starting a totally new
    compute service for the same host and node. This way we can make sure
    that we get a brand new ComputeManager in the new service and no
    state can leak between the old and the new service.
    
    This change revealed another shortcoming of the functional env.
    In the real world the nova-compute service could be restarted without
    loosing any running servers on the compute host. But with the naive
    implementation of this change the compute service is re-created. This
    means that a new ComputeManager is instantiated that loads a new
    FakeDriver instance as well. That new FakeDriver instance then reports
    an empty hypervisor. This behavior is not totally unrealistic as it
    simulates such a compute host restart that cleans the hypervisor state
    as well (e.g. compute host redeployment). However this type of restart
    shows another bug in the code path that destroys and deallocates
    evacuated instance from the source host. Therefore this patch
    implements the compute service restart in a way that simulates only a
    service restart and not a full compute restart. A subsequent patch will
    add a test that uses the clean hypervisor case to reproduces the
    revealed bug.
    
    Related-Bug: #1724172
    
    On stable/stein:
    
    Closes-Bug: #1859766
    
    Conflicts:
        doc/notification_samples/libvirt-connect-error.json
        nova/test.py
        nova/tests/functional/libvirt/test_reshape.py
        nova/tests/functional/test_servers.py
    
    NOTE(elod.illes): files conflicts details:
    * libvirt-connect-error.json:
      File added only in Stein with libvirt.error notification
      transformation patch I7d2287ce06d77c0afdef0ea8bdfb70f6c52d3c50
    * test.py:
      Patches Iecf4dcf8e648c9191bf8846428683ec81812c026 (Remove patching
      the mock lib) and Ibb8c12fb2799bb5ceb9e3d72a2b86dbb4f14451e (Use a
      static resource tracker in compute manager) were not backported to
      Rocky
    * test_reshape.py:
      File added only in Stein in the frame of 'Handling Reshaped Provider
      Trees' feature, with patch Ide797ebf7790d69042ae275ebec6ced3fa4787b6
    * test_servers.py:
      Patch I7cbd5d9fb875ebf72995362e0b6693492ce32051 (Reject forced move
      with nested source allocation) is not present in Rocky as it is part
      of 'Nested Resource Providers - Allocation Candidates' implemented in
      Stein
    
    Change-Id: I9d6cd6259659a35383c0c9c21db72a9434ba86b1
    (cherry picked from commit 2794748d9c58623045023f34c7793c58ce41447c)
    (cherry picked from commit b874c409c11b5d83508d2f0276a9a648f72192a4)