Bug #1724573 “encrypted volumes are directly attached to instanc...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2017-10-20:

#1

Huh, so this actually reproducible by just using `nova stop $instance ; nova start $instance` after the host reboot. I missed this when initially creating the bug as I assumed it was due to the state the host was in after resume_guests_state_on_host_boot attempted and failed to run the instance.

IMHO resume_guests_state_on_host_boot is just broken and not tested anywhere, I'd like to re-purpose this bug to tackle the simpler stop;start use case that should be fixed by the following change:

https://review.openstack.org/#/c/400384/

OpenStack Infra (hudson-openstack) on 2017-10-20

Changed in nova:
assignee:	nobody → Lee Yarwood (lyarwood)
status:	New → In Progress

OpenStack Infra (hudson-openstack) on 2017-11-22

Changed in nova:
assignee:	Lee Yarwood (lyarwood) → Matthew Booth (mbooth-9)

melanie witt (melwitt) on 2017-11-29

summary:	- When using resume_guests_state_on_host_boot encrypted volumes are - directly attached to instances after a host reboot + encrypted volumes are directly attached to instances after a compute + host reboot
description:	updated

melanie witt (melwitt) on 2017-11-29

tags:	added: libvirt
tags:	added: volumes
Changed in nova:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-07: Fix merged to nova (master)

#2

Reviewed: https://review.openstack.org/400384
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f8daf080411b84ec0669f0642524ce8a7d19057
Submitter: Zuul
Branch: master

commit 3f8daf080411b84ec0669f0642524ce8a7d19057
Author: Lee Yarwood <email address hidden>
Date: Mon Nov 21 15:29:30 2016 +0000

libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot

    We call _hard_reboot during reboot, power_on, and
    resume_state_on_host_boot. It functions essentially by tearing as much
    of an instance as possible before recreating it, which additionally
    makes it useful to operators for attempting automated recovery of
    instances in an inconsistent state.

    The Libvirt driver would previously only call _destroy and
    _undefine_domain when hard rebooting an instance. This would leave vifs
    plugged, volumes connected, and encryptors attached on the host. It
    also means that when we try to restart the instance, we assume all
    these things are correctly configured. If they are not, the instance
    may fail to start at all, or may be incorrectly configured when
    starting.

    For example, consider an instance with an encrypted volume after a
    compute host reboot. When we attempt to start the instance, power_on
    will call _hard_reboot. The volume will be coincidentally re-attached
    as a side-effect of calling _get_guest_xml(!), but when we call
    _create_domain_and_network we pass reboot=True, which tells it not to
    reattach the encryptor, as it is assumed to be already attached. We
    are therefore left presenting the encrypted volume data directly to
    the instance without decryption.

    The approach in this patch is to ensure we recreate the instance as
    fully as possible during hard reboot. This means not passing
    vifs_already_plugged and reboot to _create_domain_and_network, which
    in turn requires that we fully destroy the instance first. This
    addresses the specific problem given in the example, but also a whole
    class of potential volume and vif related issues of inconsistent
    state.

    Because we now always tear down volumes, encryptors, and vifs, we are
    relying on the tear down of these things to be idempotent. This
    highlighted that detach of the luks and cryptsetup encryptors were not
    idempotent. We depend on the fixes for those os-brick drivers.

Depends-On: I31d72357c89db53a147c2d986a28c9c6870efad0
Depends-On: I9f52f89b8466d03699cfd5c0e32c672c934cd6fb

Closes-bug: #1724573
Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986

Reviewed:  https://review.openstack.org/400384
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f8daf080411b84ec0669f0642524ce8a7d19057
Submitter: Zuul
Branch:    master

commit 3f8daf080411b84ec0669f0642524ce8a7d19057
Author: Lee Yarwood <lyarwood@redhat.com>
Date:   Mon Nov 21 15:29:30 2016 +0000

libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot
    
    We call _hard_reboot during reboot, power_on, and
    resume_state_on_host_boot. It functions essentially by tearing as much
    of an instance as possible before recreating it, which additionally
    makes it useful to operators for attempting automated recovery of
    instances in an inconsistent state.
    
    The Libvirt driver would previously only call _destroy and
    _undefine_domain when hard rebooting an instance. This would leave vifs
    plugged, volumes connected, and encryptors attached on the host. It
    also means that when we try to restart the instance, we assume all
    these things are correctly configured. If they are not, the instance
    may fail to start at all, or may be incorrectly configured when
    starting.
    
    For example, consider an instance with an encrypted volume after a
    compute host reboot. When we attempt to start the instance, power_on
    will call _hard_reboot. The volume will be coincidentally re-attached
    as a side-effect of calling _get_guest_xml(!), but when we call
    _create_domain_and_network we pass reboot=True, which tells it not to
    reattach the encryptor, as it is assumed to be already attached. We
    are therefore left presenting the encrypted volume data directly to
    the instance without decryption.
    
    The approach in this patch is to ensure we recreate the instance as
    fully as possible during hard reboot. This means not passing
    vifs_already_plugged and reboot to _create_domain_and_network, which
    in turn requires that we fully destroy the instance first. This
    addresses the specific problem given in the example, but also a whole
    class of potential volume and vif related issues of inconsistent
    state.
    
    Because we now always tear down volumes, encryptors, and vifs, we are
    relying on the tear down of these things to be idempotent.  This
    highlighted that detach of the luks and cryptsetup encryptors were not
    idempotent. We depend on the fixes for those os-brick drivers.
    
    Depends-On: I31d72357c89db53a147c2d986a28c9c6870efad0
    Depends-On: I9f52f89b8466d03699cfd5c0e32c672c934cd6fb
    
    Closes-bug: #1724573
    Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-12-07: Fix included in openstack/nova 17.0.0.0b2

#3

This issue was fixed in the openstack/nova 17.0.0.0b2 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix proposed to nova (stable/pike)

#4

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/531407

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix proposed to nova (stable/ocata)

#5

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/531422

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-01: Fix merged to nova (stable/pike)

#6

Reviewed: https://review.openstack.org/531407
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7da74a094f3db44db7abbdb01a88a4b46a59ac0a
Submitter: Zuul
Branch: stable/pike

commit 7da74a094f3db44db7abbdb01a88a4b46a59ac0a
Author: Lee Yarwood <email address hidden>
Date: Mon Nov 21 15:29:30 2016 +0000

libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot

    We call _hard_reboot during reboot, power_on, and
    resume_state_on_host_boot. It functions essentially by tearing as much
    of an instance as possible before recreating it, which additionally
    makes it useful to operators for attempting automated recovery of
    instances in an inconsistent state.

    The Libvirt driver would previously only call _destroy and
    _undefine_domain when hard rebooting an instance. This would leave vifs
    plugged, volumes connected, and encryptors attached on the host. It
    also means that when we try to restart the instance, we assume all
    these things are correctly configured. If they are not, the instance
    may fail to start at all, or may be incorrectly configured when
    starting.

    For example, consider an instance with an encrypted volume after a
    compute host reboot. When we attempt to start the instance, power_on
    will call _hard_reboot. The volume will be coincidentally re-attached
    as a side-effect of calling _get_guest_xml(!), but when we call
    _create_domain_and_network we pass reboot=True, which tells it not to
    reattach the encryptor, as it is assumed to be already attached. We
    are therefore left presenting the encrypted volume data directly to
    the instance without decryption.

    The approach in this patch is to ensure we recreate the instance as
    fully as possible during hard reboot. This means not passing
    vifs_already_plugged and reboot to _create_domain_and_network, which
    in turn requires that we fully destroy the instance first. This
    addresses the specific problem given in the example, but also a whole
    class of potential volume and vif related issues of inconsistent
    state.

    Because we now always tear down volumes, encryptors, and vifs, we are
    relying on the tear down of these things to be idempotent. This
    highlighted that detach of the luks and cryptsetup encryptors were not
    idempotent. We depend on the fixes for those os-brick drivers.

    NOTE(melwitt): Instead of depending on the os-brick changes to handle
    the "already detached" scenario during cleanup for the stable
    backports, we handle it in the driver since we can't bump g-r for
    stable branches.

    Closes-bug: #1724573
    Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986
    (cherry picked from commit 3f8daf080411b84ec0669f0642524ce8a7d19057)

Reviewed:  https://review.openstack.org/531407
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7da74a094f3db44db7abbdb01a88a4b46a59ac0a
Submitter: Zuul
Branch:    stable/pike

commit 7da74a094f3db44db7abbdb01a88a4b46a59ac0a
Author: Lee Yarwood <lyarwood@redhat.com>
Date:   Mon Nov 21 15:29:30 2016 +0000

libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot
    
    We call _hard_reboot during reboot, power_on, and
    resume_state_on_host_boot. It functions essentially by tearing as much
    of an instance as possible before recreating it, which additionally
    makes it useful to operators for attempting automated recovery of
    instances in an inconsistent state.
    
    The Libvirt driver would previously only call _destroy and
    _undefine_domain when hard rebooting an instance. This would leave vifs
    plugged, volumes connected, and encryptors attached on the host. It
    also means that when we try to restart the instance, we assume all
    these things are correctly configured. If they are not, the instance
    may fail to start at all, or may be incorrectly configured when
    starting.
    
    For example, consider an instance with an encrypted volume after a
    compute host reboot. When we attempt to start the instance, power_on
    will call _hard_reboot. The volume will be coincidentally re-attached
    as a side-effect of calling _get_guest_xml(!), but when we call
    _create_domain_and_network we pass reboot=True, which tells it not to
    reattach the encryptor, as it is assumed to be already attached. We
    are therefore left presenting the encrypted volume data directly to
    the instance without decryption.
    
    The approach in this patch is to ensure we recreate the instance as
    fully as possible during hard reboot. This means not passing
    vifs_already_plugged and reboot to _create_domain_and_network, which
    in turn requires that we fully destroy the instance first. This
    addresses the specific problem given in the example, but also a whole
    class of potential volume and vif related issues of inconsistent
    state.
    
    Because we now always tear down volumes, encryptors, and vifs, we are
    relying on the tear down of these things to be idempotent.  This
    highlighted that detach of the luks and cryptsetup encryptors were not
    idempotent. We depend on the fixes for those os-brick drivers.
    
    NOTE(melwitt): Instead of depending on the os-brick changes to handle
    the "already detached" scenario during cleanup for the stable
    backports, we handle it in the driver since we can't bump g-r for
    stable branches.
    
    Closes-bug: #1724573
    Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986
    (cherry picked from commit 3f8daf080411b84ec0669f0642524ce8a7d19057)

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-05: Fix merged to nova (stable/ocata)

#7

Download full text (3.3 KiB)

Reviewed: https://review.openstack.org/531422
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eae6aa80945f727e182ecef8ae2565dae1b770af
Submitter: Zuul
Branch: stable/ocata

commit eae6aa80945f727e182ecef8ae2565dae1b770af
Author: Lee Yarwood <email address hidden>
Date: Mon Nov 21 15:29:30 2016 +0000

libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot

    We call _hard_reboot during reboot, power_on, and
    resume_state_on_host_boot. It functions essentially by tearing as much
    of an instance as possible before recreating it, which additionally
    makes it useful to operators for attempting automated recovery of
    instances in an inconsistent state.

    The Libvirt driver would previously only call _destroy and
    _undefine_domain when hard rebooting an instance. This would leave vifs
    plugged, volumes connected, and encryptors attached on the host. It
    also means that when we try to restart the instance, we assume all
    these things are correctly configured. If they are not, the instance
    may fail to start at all, or may be incorrectly configured when
    starting.

    For example, consider an instance with an encrypted volume after a
    compute host reboot. When we attempt to start the instance, power_on
    will call _hard_reboot. The volume will be coincidentally re-attached
    as a side-effect of calling _get_guest_xml(!), but when we call
    _create_domain_and_network we pass reboot=True, which tells it not to
    reattach the encryptor, as it is assumed to be already attached. We
    are therefore left presenting the encrypted volume data directly to
    the instance without decryption.

    The approach in this patch is to ensure we recreate the instance as
    fully as possible during hard reboot. This means not passing
    vifs_already_plugged and reboot to _create_domain_and_network, which
    in turn requires that we fully destroy the instance first. This
    addresses the specific problem given in the example, but also a whole
    class of potential volume and vif related issues of inconsistent
    state.

    Because we now always tear down volumes, encryptors, and vifs, we are
    relying on the tear down of these things to be idempotent. This
    highlighted that detach of the luks and cryptsetup encryptors were not
    idempotent. We depend on the fixes for those os-brick drivers.

    NOTE(melwitt): In Ocata, we don't go through os-brick to handle detach
    of encrypted volumes and instead have our code under
    nova/volume/encryptors/. We're already ignoring exit code 4 for "not
    found" in cryptsetup.py and this makes luks.py consistent with that.
    We need to be able to ignore "already detached" encrypted volumes with
    this patch because of the re-initialization during hard reboot.

     Conflicts:
     nova/tests/unit/virt/libvirt/test_driver.py
     nova/virt/libvirt/driver.py

    NOTE(melwitt): The conflicts are due to _create_domain_and_network
    taking an additional disk_info argument in Ocata and the method for
    getting instance disk info was named _get_instance_disk_info instead
    of _get_instance...

OpenStack Compute (nova)

encrypted volumes are directly attached to instances after a compute host reboot

Bug Description

Other bug subscribers

Remote bug watches