If a rebuild is refused by the scheduler, the instance's imageref is not rolled back
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
int32bit | ||
| Newton |
Undecided
|
Unassigned | ||
| Ocata |
High
|
melanie witt | ||
| Pike |
High
|
melanie witt |
Bug Description
Description
===========
Since CVE-2017-16239, we now go through the scheduler for rebuilds. If the scheduler refuses a rebuild with a new image because of filter constraints (for example IsolatedHostsFi
Steps to reproduce
==================
1. Configure IsolatedHostsFi
[filter_
enabled_filters = [...],IsolatedH
isolated_images = 41d3e5ca-
isolated_hosts = ubuntu
restrict_
2. Have two images, one isolated and one not:
$ openstack image list
8d0581a5-
41d3e5ca-
cirros is the isolated one
3. Have only one hypervisor (the isolated one):
$ openstack hypervisor list
ubuntu | QEMU | 192.168.100.194 | up
4. To confirm, boot a centos (non-isolated) image, expecting it to be refused by the scheduler:
$ openstack server create \
--image 8d0581a5-
--flavor \
m1.nano centos-
$ openstack server list
centos-
5. Boot a cirros (isolated) image:
$ openstack server create \
--image 41d3e5ca-
--flavor m1.nano \
cirros-
$ openstack server list
cirros-
6. Rebuild the cirros instance with centos:
$ nova --debug rebuild cirros-
DEBUG (session:722) POST call to compute for
http://
used request id req-c234346a-
7. Observer the rebuild being refused in the conductor:
WARNING nova.conductor.
[None req-c234346a-
[instance: d9d98bf7-
No valid host found for rebuild: NoValidHost_Remote:
No valid host was found. There are not enough hosts available.
8. Observe the API is showing the new centos image for the instance:
$ nova show cirros-
[...]
image | centos (8d0581a5-
[...]
Expected result
===============
Some indication that the rebuild was refused, or at least rolling back the instance's imageref.
Actual result
=============
No indication that the rebuild was refused, and worse, we now have a wrong imageref for the instance.
Environment
===========
1. Exact version of OpenStack you are running. See the following
This was picked up by QE for stable/pike, and is still present in master,
and presumably in all versions affected by the CVE fix, including newton,
which is now EOL.
2. Which hypervisor did you use?
libvirt+kvm
Sylvain Bauza (sylvain-bauza) wrote : | #1 |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
tags: | added: scheduler |
Changed in nova: | |
assignee: | nobody → int32bit (int32bit) |
int32bit (int32bit) wrote : | #2 |
I think we also need restore original image metadata and set server status to ERROR.
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Confirmed → In Progress |
tags: | added: queens-rc-potential |
Matt Riedemann (mriedem) wrote : | #4 |
I wouldn't say this is critical. It's a change, yes, but it's similar to what we always had as a bug when trying to rebuild a volume-backed instance with a different image up until Queens when we finally made that an error in the API because it's not supported in the compute service. I think we can just mark the instance ERROR in this case and let the user rebuild the instance with a valid image to get it out of ERROR state, but just leave the instance properties as the API set them without trying to get the conductor manager code to roll everything back.
Changed in nova: | |
importance: | Critical → High |
Changed in nova: | |
assignee: | int32bit (int32bit) → melanie witt (melwitt) |
Changed in nova: | |
assignee: | melanie witt (melwitt) → int32bit (int32bit) |
Changed in nova: | |
assignee: | int32bit (int32bit) → melanie witt (melwitt) |
Changed in nova: | |
assignee: | melanie witt (melwitt) → int32bit (int32bit) |
Changed in nova: | |
assignee: | int32bit (int32bit) → Matt Riedemann (mriedem) |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → int32bit (int32bit) |
Fix proposed to branch: stable/pike
Review: https:/
Fix proposed to branch: stable/ocata
Review: https:/
Matt Riedemann (mriedem) wrote : | #7 |
This bug exists in newton but newton is EOL upstream so it won't be fixed there.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit d03a890a34f632a
Author: int32bit <email address hidden>
Date: Mon Jan 22 17:05:53 2018 +0800
Set server status to ERROR if rebuild failed
Currently there is no indication that the rebuild was refused,
and worse, we may have a wrong imageref for the instance.
This patch set the instance to ERROR status if rebuild failed in the
scheduling stage. The user can rebuild the instance with valid image
to get it out of ERROR state and reset with right instance metadata and
properties.
Closes-Bug: 1744325
Change-Id: Ibb7bee15a3d4ee
Changed in nova: | |
status: | In Progress → Fix Released |
This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.
Matt Riedemann (mriedem) wrote : | #10 |
https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit 22a39b8f9b76d5f
Author: int32bit <email address hidden>
Date: Mon Jan 22 17:05:53 2018 +0800
Set server status to ERROR if rebuild failed
Currently there is no indication that the rebuild was refused,
and worse, we may have a wrong imageref for the instance.
This patch set the instance to ERROR status if rebuild failed in the
scheduling stage. The user can rebuild the instance with valid image
to get it out of ERROR state and reset with right instance metadata and
properties.
Closes-Bug: 1744325
Conflicts:
nova/
NOTE(melwitt): The conflict was because the select_destinations method
has additional parameters in Queens that don't exist in Pike.
Change-Id: Ibb7bee15a3d4ee
(cherry picked from commit d03a890a34f632a
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit 83fd8ac0bfd3d9e
Author: int32bit <email address hidden>
Date: Mon Jan 22 17:05:53 2018 +0800
Set server status to ERROR if rebuild failed
Currently there is no indication that the rebuild was refused,
and worse, we may have a wrong imageref for the instance.
This patch set the instance to ERROR status if rebuild failed in the
scheduling stage. The user can rebuild the instance with valid image
to get it out of ERROR state and reset with right instance metadata and
properties.
Closes-Bug: 1744325
Conflicts:
NOTE(melwitt): The conflicts were because of log translation, the fact
that the regression test for bug 1713783 doesn't exist in Ocata, there
are addtional rebuild functional tests in Pike that don't exist in
Ocata, and the select_destinations method has additional parameters in
Pike that don't exist in Ocata.
Change-Id: Ibb7bee15a3d4ee
(cherry picked from commit d03a890a34f632a
(cherry picked from commit 22a39b8f9b76d5f
This issue was fixed in the openstack/nova 16.1.0 release.
This issue was fixed in the openstack/nova 15.1.1 release.
That's IMHO a very critical bug that we need to tackle ASAP