OpenStack Compute (nova)

By rebuilding twice with the same "forbidden" image one can circumvent scheduler rebuild restrictions

Series ocata
Bug #1746032

Bug #1746032 reported by Artom Lifshitz on 2018-01-29

This bug affects 1 person

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	High	Matt Riedemann
Newton	Won't Fix	Undecided	Unassigned
Ocata	Fix Committed	High	Matt Riedemann
Pike	Fix Committed	High	Matt Riedemann

Bug Description

Description
===========

Since CVE-2017-16239, we call to the scheduler when doing a rebuild with a new image. If the scheduler refuses a rebuild because a filter forbids the new image on the instance's host (for example, IsolatedHostsFilter), at first there was no indication of this in the API (bug 1744325). Currently, with the fix for bug 1744325 merged [1], the instance goes to ERROR to indicate the refused rebuild. However, by rebuilding again with the same "forbidden" image it is possible to circumvent scheduler restrictions.

Steps to reproduce
==================

1. Configure IsolatedHostsFilter:

   [filter_scheduler]
   enabled_filters = [...],IsolatedHostsFilter
   isolated_images = 41d3e5ca-14cf-436c-9413-4826b5c8bdb1
   isolated_hosts = ubuntu
   restrict_isolated_hosts_to_isolated_images = true

2. Have two images, one isolated and one not:

$ openstack image list

8d0581a5-ed9d-4b98-a766-a41efbc99929 | centos | active
41d3e5ca-14cf-436c-9413-4826b5c8bdb1 | cirros-0.3.5-x86_64-disk | active

cirros is the isolated one

3. Have only one hypervisor (the isolated one):

$ openstack hypervisor list

ubuntu | QEMU | 192.168.100.194 | up

5. Boot a cirros (isolated) image:

   $ openstack server create \
     --image 41d3e5ca-14cf-436c-9413-4826b5c8bdb1 \
     --flavor m1.nano \
     cirros-test-expect-success

$ openstack server list

cirros-test-expect-success | ACTIVE | [...] | cirros-0.3.5-x86_64-disk | m1.nano

6. Rebuild the cirros instance with centos (this should be refused by the scheduler):

$ nova --debug rebuild cirros-test-expect-success centos

     DEBUG (session:722) POST call to compute for
     http://192.168.100.194/compute/v2.1/servers/d9d98bf7-623e-4587-b82c-06f36abf59cb/action
     used request id req-c234346a-6e05-47cf-a0cd-45f89d11e15d

8. Observe the instance going to ERROR,
but still showing the new centos image :

$ nova show cirros-test-expect-success

     [...]
     status | ERROR
     image | centos (8d0581a5-ed9d-4b98-a766-a41efbc99929)
     [...]

9. Rebuild again with the same centos image:

$ nova rebuild cirros-test-expect-success centos

10. The rebuild goes through.

Expected result
===============

At step 10, the rebuild should still be refused.

Actual result
=============

The rebuild is allowed.

Environment
===========

1. Exact version of OpenStack you are running. See the following

Was reported in Red Hat OpenStack 12, affects newton through master.

2. Which hypervisor did you use?

libvirt+kvm

[1] https://review.openstack.org/#/c/536268/

OpenStack Infra (hudson-openstack) on 2018-01-29

Changed in nova:
assignee:	nobody → Matt Riedemann (mriedem)
status:	New → In Progress

Matt Riedemann (mriedem) on 2018-01-29

Changed in nova:
importance:	Undecided → High

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-01-29:

This will also be an issue in newton but we're waiting to end of life newton so we won't fix this upstream there.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-29: Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/539003

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-29: Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/539008

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-30: Fix merged to nova (master)

Reviewed: https://review.openstack.org/538961
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4a2c9a4887a219a6d4dfe83c430b040713fc4109
Submitter: Zuul
Branch: master

commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

Rollback instance.image_ref on failed rebuild

    When rebuilding and changing the image, we run the new image
    through the scheduler to see if it's valid for the instance
    on its current compute host. The API saves off the new image
    ref on the instance before casting to conductor to run through
    the scheduler. If the scheduler fails, the instance.image_ref was
    not being rolled back, which meant a user could attempt the rebuild
    with the same invalid image a second time and the API, seeing the
    instance.image_ref hasn't changed (even though it's not the actual
    backing image for the server), will bypass the scheduler and rebuild
    the instance with that invalid image.

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

    Note that there are other things changed on the instance in the API
    which this patch does not attempt to recover as that's a bigger
    work item which likely involves substantial refactoring of the code.

Closes-Bug: #1746032

Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-09: Fix included in openstack/nova 17.0.0.0rc1

This issue was fixed in the openstack/nova 17.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-20: Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/539003
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=834adeae9a3ff1bb87f22066131d48230ef96b69
Submitter: Zuul
Branch: stable/pike

commit 834adeae9a3ff1bb87f22066131d48230ef96b69
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

Rollback instance.image_ref on failed rebuild

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

Closes-Bug: #1746032

Conflicts:
nova/conductor/manager.py

    NOTE(mriedem): The conflict is due to not having change
    Ibc44e3b2261b314bb92062a88ca9ee6b81298dc3 in Pike. Also, six
    had to be imported in the functional test.

Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f
(cherry picked from commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-27: Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/539008
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2efe3f6b8844bd328bbe12eeac3fae10be159542
Submitter: Zuul
Branch: stable/ocata

commit 2efe3f6b8844bd328bbe12eeac3fae10be159542
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

Rollback instance.image_ref on failed rebuild

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

Closes-Bug: #1746032

    Conflicts:
          nova/conductor/manager.py
          nova/tests/functional/test_servers.py

    NOTE(mriedem): The conflicts in manager.py are due to not having
    I06d78c744fa75ae5f34c5cfa76bc3c9460767b84 in Ocata. The functional
    test conflict is due to tests that existed in Pike which don't exist
    in Ocata.

    Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f
    (cherry picked from commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109)
    (cherry picked from commit 834adeae9a3ff1bb87f22066131d48230ef96b69)

Reviewed:  https://review.openstack.org/539008
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2efe3f6b8844bd328bbe12eeac3fae10be159542
Submitter: Zuul
Branch:    stable/ocata

commit 2efe3f6b8844bd328bbe12eeac3fae10be159542
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Mon Jan 29 10:50:36 2018 -0500

Rollback instance.image_ref on failed rebuild
    
    When rebuilding and changing the image, we run the new image
    through the scheduler to see if it's valid for the instance
    on its current compute host. The API saves off the new image
    ref on the instance before casting to conductor to run through
    the scheduler. If the scheduler fails, the instance.image_ref was
    not being rolled back, which meant a user could attempt the rebuild
    with the same invalid image a second time and the API, seeing the
    instance.image_ref hasn't changed (even though it's not the actual
    backing image for the server), will bypass the scheduler and rebuild
    the instance with that invalid image.
    
    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.
    
    Note that there are other things changed on the instance in the API
    which this patch does not attempt to recover as that's a bigger
    work item which likely involves substantial refactoring of the code.
    
    Closes-Bug: #1746032
    
    Conflicts:
          nova/conductor/manager.py
          nova/tests/functional/test_servers.py
    
    NOTE(mriedem): The conflicts in manager.py are due to not having
    I06d78c744fa75ae5f34c5cfa76bc3c9460767b84 in Ocata. The functional
    test conflict is due to tests that existed in Pike which don't exist
    in Ocata.
    
    Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f
    (cherry picked from commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109)
    (cherry picked from commit 834adeae9a3ff1bb87f22066131d48230ef96b69)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-04-02: Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-05-02: Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.