Bugfix Icaf1bae8cb040b939f916a19ce026031ddb84af7 showed that restarting
a compute service in the functional env is unrealistic causing faults
to slip through. During that bug fix only the minimal change was done
in the functional env regarding compute service restart to reproduce
the reported fault. However the restart of the compute service could
be made even more realistic.
This patch simulates a compute service restart in the functional env
by stopping the original compute service and starting a totally new
compute service for the same host and node. This way we can make sure
that we get a brand new ComputeManager in the new service and no
state can leak between the old and the new service.
This change revealed another shortcoming of the functional env.
In the real world the nova-compute service could be restarted without
loosing any running servers on the compute host. But with the naive
implementation of this change the compute service is re-created. This
means that a new ComputeManager is instantiated that loads a new
FakeDriver instance as well. That new FakeDriver instance then reports
an empty hypervisor. This behavior is not totally unrealistic as it
simulates such a compute host restart that cleans the hypervisor state
as well (e.g. compute host redeployment). However this type of restart
shows another bug in the code path that destroys and deallocates
evacuated instance from the source host. Therefore this patch
implements the compute service restart in a way that simulates only a
service restart and not a full compute restart. A subsequent patch will
add a test that uses the clean hypervisor case to reproduces the
revealed bug.
NOTE(elod.illes): files conflicts details:
* libvirt-connect-error.json:
File added only in Stein with libvirt.error notification
transformation patch I7d2287ce06d77c0afdef0ea8bdfb70f6c52d3c50
* test.py:
Patches Iecf4dcf8e648c9191bf8846428683ec81812c026 (Remove patching
the mock lib) and Ibb8c12fb2799bb5ceb9e3d72a2b86dbb4f14451e (Use a
static resource tracker in compute manager) were not backported to
Rocky
* test_reshape.py:
File added only in Stein in the frame of 'Handling Reshaped Provider
Trees' feature, with patch Ide797ebf7790d69042ae275ebec6ced3fa4787b6
* test_servers.py:
Patch I7cbd5d9fb875ebf72995362e0b6693492ce32051 (Reject forced move
with nested source allocation) is not present in Rocky as it is part
of 'Nested Resource Providers - Allocation Candidates' implemented in
Stein
Change-Id: I9d6cd6259659a35383c0c9c21db72a9434ba86b1
(cherry picked from commit 2794748d9c58623045023f34c7793c58ce41447c)
(cherry picked from commit b874c409c11b5d83508d2f0276a9a648f72192a4)
Reviewed: https:/ /review. opendev. org/713033 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=53a893f7c97 e35de3e9ac26101 827cdb43ed35cc
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 53a893f7c97e35d e3e9ac26101827c db43ed35cc
Author: Balazs Gibizer <email address hidden>
Date: Wed May 1 23:38:40 2019 +0200
Enhance service restart in functional env
Bugfix Icaf1bae8cb040b 939f916a19ce026 031ddb84af7 showed that restarting
a compute service in the functional env is unrealistic causing faults
to slip through. During that bug fix only the minimal change was done
in the functional env regarding compute service restart to reproduce
the reported fault. However the restart of the compute service could
be made even more realistic.
This patch simulates a compute service restart in the functional env
by stopping the original compute service and starting a totally new
compute service for the same host and node. This way we can make sure
that we get a brand new ComputeManager in the new service and no
state can leak between the old and the new service.
This change revealed another shortcoming of the functional env.
In the real world the nova-compute service could be restarted without
loosing any running servers on the compute host. But with the naive
implementation of this change the compute service is re-created. This
means that a new ComputeManager is instantiated that loads a new
FakeDriver instance as well. That new FakeDriver instance then reports
an empty hypervisor. This behavior is not totally unrealistic as it
simulates such a compute host restart that cleans the hypervisor state
as well (e.g. compute host redeployment). However this type of restart
shows another bug in the code path that destroys and deallocates
evacuated instance from the source host. Therefore this patch
implements the compute service restart in a way that simulates only a
service restart and not a full compute restart. A subsequent patch will
add a test that uses the clean hypervisor case to reproduces the
revealed bug.
Related-Bug: #1724172
On stable/stein:
Closes-Bug: #1859766
Conflicts:
doc/notificati on_samples/ libvirt- connect- error.json
nova/test. py
nova/tests/ functional/ libvirt/ test_reshape. py
nova/tests/ functional/ test_servers. py
NOTE( elod.illes) : files conflicts details: connect- error.json: ation patch I7d2287ce06d77c 0afdef0ea8bdfb7 0f6c52d3c50 191bf8846428683 ec81812c026 (Remove patching 5ceb9e3d72a2b86 dbb4f14451e (Use a 9042ae275ebec6c ed3fa4787b6 f72995362e0b669 3492ce32051 (Reject forced move
* libvirt-
File added only in Stein with libvirt.error notification
transform
* test.py:
Patches Iecf4dcf8e648c9
the mock lib) and Ibb8c12fb2799bb
static resource tracker in compute manager) were not backported to
Rocky
* test_reshape.py:
File added only in Stein in the frame of 'Handling Reshaped Provider
Trees' feature, with patch Ide797ebf7790d6
* test_servers.py:
Patch I7cbd5d9fb875eb
with nested source allocation) is not present in Rocky as it is part
of 'Nested Resource Providers - Allocation Candidates' implemented in
Stein
Change-Id: I9d6cd6259659a3 5383c0c9c21db72 a9434ba86b1 045023f34c7793c 58ce41447c) 3508d2f0276a9a6 48f72192a4)
(cherry picked from commit 2794748d9c58623
(cherry picked from commit b874c409c11b5d8