By rebuilding twice with the same "forbidden" image one can circumvent scheduler rebuild restrictions

Bug #1746032 reported by Artom Lifshitz on 2018-01-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Newton
Undecided
Unassigned
Ocata
High
Matt Riedemann
Pike
High
Matt Riedemann

Bug Description

Description
===========

Since CVE-2017-16239, we call to the scheduler when doing a rebuild with a new image. If the scheduler refuses a rebuild because a filter forbids the new image on the instance's host (for example, IsolatedHostsFilter), at first there was no indication of this in the API (bug 1744325). Currently, with the fix for bug 1744325 merged [1], the instance goes to ERROR to indicate the refused rebuild. However, by rebuilding again with the same "forbidden" image it is possible to circumvent scheduler restrictions.

Steps to reproduce
==================

1. Configure IsolatedHostsFilter:

   [filter_scheduler]
   enabled_filters = [...],IsolatedHostsFilter
   isolated_images = 41d3e5ca-14cf-436c-9413-4826b5c8bdb1
   isolated_hosts = ubuntu
   restrict_isolated_hosts_to_isolated_images = true

2. Have two images, one isolated and one not:

   $ openstack image list

     8d0581a5-ed9d-4b98-a766-a41efbc99929 | centos | active
     41d3e5ca-14cf-436c-9413-4826b5c8bdb1 | cirros-0.3.5-x86_64-disk | active

     cirros is the isolated one

3. Have only one hypervisor (the isolated one):

   $ openstack hypervisor list

     ubuntu | QEMU | 192.168.100.194 | up

5. Boot a cirros (isolated) image:

   $ openstack server create \
     --image 41d3e5ca-14cf-436c-9413-4826b5c8bdb1 \
     --flavor m1.nano \
     cirros-test-expect-success

   $ openstack server list

     cirros-test-expect-success | ACTIVE | [...] | cirros-0.3.5-x86_64-disk | m1.nano

6. Rebuild the cirros instance with centos (this should be refused by the scheduler):

   $ nova --debug rebuild cirros-test-expect-success centos

     DEBUG (session:722) POST call to compute for
     http://192.168.100.194/compute/v2.1/servers/d9d98bf7-623e-4587-b82c-06f36abf59cb/action
     used request id req-c234346a-6e05-47cf-a0cd-45f89d11e15d

8. Observe the instance going to ERROR,
   but still showing the new centos image :

   $ nova show cirros-test-expect-success

     [...]
     status | ERROR
     image | centos (8d0581a5-ed9d-4b98-a766-a41efbc99929)
     [...]

9. Rebuild again with the same centos image:

   $ nova rebuild cirros-test-expect-success centos

10. The rebuild goes through.

Expected result
===============

At step 10, the rebuild should still be refused.

Actual result
=============

The rebuild is allowed.

Environment
===========

1. Exact version of OpenStack you are running. See the following

   Was reported in Red Hat OpenStack 12, affects newton through master.

2. Which hypervisor did you use?

   libvirt+kvm

[1] https://review.openstack.org/#/c/536268/

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem) on 2018-01-29
Changed in nova:
importance: Undecided → High
Matt Riedemann (mriedem) wrote :

This will also be an issue in newton but we're waiting to end of life newton so we won't fix this upstream there.

Reviewed: https://review.openstack.org/538961
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4a2c9a4887a219a6d4dfe83c430b040713fc4109
Submitter: Zuul
Branch: master

commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

    Rollback instance.image_ref on failed rebuild

    When rebuilding and changing the image, we run the new image
    through the scheduler to see if it's valid for the instance
    on its current compute host. The API saves off the new image
    ref on the instance before casting to conductor to run through
    the scheduler. If the scheduler fails, the instance.image_ref was
    not being rolled back, which meant a user could attempt the rebuild
    with the same invalid image a second time and the API, seeing the
    instance.image_ref hasn't changed (even though it's not the actual
    backing image for the server), will bypass the scheduler and rebuild
    the instance with that invalid image.

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

    Note that there are other things changed on the instance in the API
    which this patch does not attempt to recover as that's a bigger
    work item which likely involves substantial refactoring of the code.

    Closes-Bug: #1746032

    Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 17.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/539003
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=834adeae9a3ff1bb87f22066131d48230ef96b69
Submitter: Zuul
Branch: stable/pike

commit 834adeae9a3ff1bb87f22066131d48230ef96b69
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

    Rollback instance.image_ref on failed rebuild

    When rebuilding and changing the image, we run the new image
    through the scheduler to see if it's valid for the instance
    on its current compute host. The API saves off the new image
    ref on the instance before casting to conductor to run through
    the scheduler. If the scheduler fails, the instance.image_ref was
    not being rolled back, which meant a user could attempt the rebuild
    with the same invalid image a second time and the API, seeing the
    instance.image_ref hasn't changed (even though it's not the actual
    backing image for the server), will bypass the scheduler and rebuild
    the instance with that invalid image.

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

    Note that there are other things changed on the instance in the API
    which this patch does not attempt to recover as that's a bigger
    work item which likely involves substantial refactoring of the code.

    Closes-Bug: #1746032

    Conflicts:
          nova/conductor/manager.py

    NOTE(mriedem): The conflict is due to not having change
    Ibc44e3b2261b314bb92062a88ca9ee6b81298dc3 in Pike. Also, six
    had to be imported in the functional test.

    Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f
    (cherry picked from commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109)

Reviewed: https://review.openstack.org/539008
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2efe3f6b8844bd328bbe12eeac3fae10be159542
Submitter: Zuul
Branch: stable/ocata

commit 2efe3f6b8844bd328bbe12eeac3fae10be159542
Author: Matt Riedemann <email address hidden>
Date: Mon Jan 29 10:50:36 2018 -0500

    Rollback instance.image_ref on failed rebuild

    When rebuilding and changing the image, we run the new image
    through the scheduler to see if it's valid for the instance
    on its current compute host. The API saves off the new image
    ref on the instance before casting to conductor to run through
    the scheduler. If the scheduler fails, the instance.image_ref was
    not being rolled back, which meant a user could attempt the rebuild
    with the same invalid image a second time and the API, seeing the
    instance.image_ref hasn't changed (even though it's not the actual
    backing image for the server), will bypass the scheduler and rebuild
    the instance with that invalid image.

    This fixes the issue by using the original image ref, passed from
    API to conductor during rebuild, to reset the instance.image_ref
    in the case of a failure.

    Note that there are other things changed on the instance in the API
    which this patch does not attempt to recover as that's a bigger
    work item which likely involves substantial refactoring of the code.

    Closes-Bug: #1746032

    Conflicts:
          nova/conductor/manager.py
          nova/tests/functional/test_servers.py

    NOTE(mriedem): The conflicts in manager.py are due to not having
    I06d78c744fa75ae5f34c5cfa76bc3c9460767b84 in Ocata. The functional
    test conflict is due to tests that existed in Pike which don't exist
    in Ocata.

    Change-Id: I3399a66fe9b1297cd6b0dca440145393ceaef41f
    (cherry picked from commit 4a2c9a4887a219a6d4dfe83c430b040713fc4109)
    (cherry picked from commit 834adeae9a3ff1bb87f22066131d48230ef96b69)

This issue was fixed in the openstack/nova 16.1.1 release.

This issue was fixed in the openstack/nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers