Do not recreate libvirt secret when one already exists on the host during a host reboot

Bug #1905701 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Lee Yarwood
Queens
Undecided
Lee Yarwood
Rocky
Undecided
Lee Yarwood
Stein
Undecided
Lee Yarwood
Train
Undecided
Lee Yarwood
Ussuri
Undecided
Lee Yarwood
Victoria
Undecided
Lee Yarwood
Wallaby
Undecided
Unassigned
Xena
Medium
Lee Yarwood

Bug Description

Description
===========

When [compute]/resume_guests_state_on_host_boot is enabled the compute manager will attempt to restart instances on start up.

When using the libvirt driver and instances with attached LUKSv1 encrypted volumes a call is made to _attach_encryptor that currently assumes that any volume libvirt secrets don't already exist on the host. As a result this call will currently lead to an attempt to lookup encryption metadata that fails as the compute service is using a bare bones local only admin context to drive the restart of the instances.

The libvirt secrets associated with LUKSv1 encrypted volumes actually persist a host reboot and thus this call to fetch encryption metadata, fetch the symmetric key etc are not required. Removal of these calls in this context should allow the compute service to start instances with these volumes attached.

Steps to reproduce
==================
* Enable [compute]/resume_guests_state_on_host_boot
* Launch instances with encrypted LUKSv1 volumes attached
* Reboot the underlying host

Expected result
===============
* The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused.

Actual result
=============
* The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc.

Environment
===========
1. Exact version of OpenStack you are running. See the following

   master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   libvirt + QEMU/KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   N/A

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============

2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._attach_encryptor(context, connection_info, encryption)
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] key = keymgr.get(context, encryption['encryption_key_id'])
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] secret = self._get_secret(context, managed_object_id)
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge
t_secret
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] barbican_client = self._get_barbican_client(context)
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 142, in _ge
t_barbican_client
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] self._barbican_endpoint)
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 214, in _cr
eate_base_url
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] service_type='key-manager')
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] File "/usr/lib/python3.6/site-packages/keystoneauth1/access/service_catalog.py", line 425, in endpoint_
data_for
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] raise exceptions.EmptyCatalog('The service catalog is empty.')
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] keystoneauth1.exceptions.catalog.EmptyCatalog: The service catalog is empty.
2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Changed in nova:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Lee Yarwood (lyarwood)
Revision history for this message
melanie witt (melwitt) wrote :
no longer affects: nova/trunk
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 22.2.1

This issue was fixed in the openstack/nova 22.2.1 release.

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (5.4 KiB)

So this isn't enough by itself to avoid the failure case listed in c#0 as the call to resume_state_on_host_boot in turn calls _hard_reboot that always deletes the volume secret rendering the optimisation landed above useless.

It's pretty easy to reproduce this using the demo user account in devstack:

$ . openrc admin admin
$ openstack volume type create --encryption-provider luks --encryption-cipher aes-xts-plain64 --encryption-key-size 256 --encryption-control-location front-end LUKS

$ . openrc demo demo
$ openstack volume create --size 1 --type luks test
$ openstack server create --image cirros-0.5.1-x86_64-disk --flavor 1 --network private test
$ openstack server add volume test test

$ . openrc admin admin
$ openstack server reboot --hard test
$ openstack server event list f65c96c6-f63f-42b3-8e00-fff5b24daa35
+------------------------------------------+--------------------------------------+---------------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+---------------+----------------------------+
| req-d22d8d5a-a090-4f03-a246-a4c4487319aa | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | reboot | 2021-05-27T09:42:56.000000 |
| req-e8ab2b76-00a4-4c3c-9616-c1437acd17db | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | attach_volume | 2021-05-27T09:41:52.000000 |
| req-2314c5c8-1584-4d7e-9044-78bcececb459 | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | create | 2021-05-27T09:41:43.000000 |
+------------------------------------------+--------------------------------------+---------------+----------------------------+

$ openstack server event show f65c96c6-f63f-42b3-8e00-fff5b24daa35 req-d22d8d5a-a090-4f03-a246-a4c4487319aa -f json -c events | awk '{gsub("\\\\n","\n")};1'
{
  "events": [
    {
      "event": "compute_reboot_instance",
      "start_time": "2021-05-27T09:42:56.000000",
      "finish_time": "2021-05-27T09:42:59.000000",
      "result": "Error",
      "traceback": " File \"/opt/stack/nova/nova/compute/utils.py\", line 1434, in decorated_function
    return function(self, context, *args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 211, in decorated_function
    compute_utils.add_instance_fault_from_exc(context,
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__
    self.force_reraise()
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise
    raise self.value
  File \"/opt/stack/nova/nova/compute/manager.py\", line 200, in decorated_function
    return function(self, context, *args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3709, in reboot_instance
    do_reboot_instance(context, instance, block_device_info, reboot_type)
  File \"/usr/local/lib/python3.8/site-packages/oslo_concurrency/lockutils.py\", line 360, in inner
    return f(*args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3707, in do_reboot_instance
    self._reboot_instance(context, instance, block_device_info,
  File \"/...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/793463
Committed: https://opendev.org/openstack/nova/commit/26d65fc882e42b824409dff87ff026dee1debe20
Submitter: "Zuul (22348)"
Branch: master

commit 26d65fc882e42b824409dff87ff026dee1debe20
Author: Lee Yarwood <email address hidden>
Date: Thu May 27 16:47:26 2021 +0100

    libvirt: Do not destroy volume secrets during _hard_reboot

    Ia2007bc63ef09931ea0197cef29d6a5614ed821a unfortunately missed that
    resume_state_on_host_boot calls down into _hard_reboot always removing
    volume secrets rendering that change useless.

    This change seeks to address this by using the destroy_secrets kwarg
    introduced by I856268b371f7ba712b02189db3c927cd762a4dc3 within the
    _hard_reboot method of the libvirt driver to ensure secrets are not
    removed during a hard reboot.

    This resolves the original issue in bug #1905701 *and* allows admins to
    hard reboot a users instance when that instance has encrypted volumes
    attached with secrets stored in Barbican. This latter use case being
    something we can easily test within tempest unlike the compute reboot in
    bug #1905701.

    This change is kept small as it should ideally be backported alongside
    Ia2007bc63ef09931ea0197cef29d6a5614ed821a to stable/queens. Follow up
    changes on master will improve formatting, doc text and introduce
    functional tests to further validate this new behaviour of hard reboot
    within the libvirt driver.

    Closes-Bug: #1905701
    Change-Id: I3d1b21ba6eb3f5eb728693197c24b4b315eef821

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/796258

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/796260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/796264

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/796939

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/796258
Committed: https://opendev.org/openstack/nova/commit/9cac2a8822ab81b7a0aa1f5b4472b306e4b68f93
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 9cac2a8822ab81b7a0aa1f5b4472b306e4b68f93
Author: Lee Yarwood <email address hidden>
Date: Thu May 27 16:47:26 2021 +0100

    libvirt: Do not destroy volume secrets during _hard_reboot

    Ia2007bc63ef09931ea0197cef29d6a5614ed821a unfortunately missed that
    resume_state_on_host_boot calls down into _hard_reboot always removing
    volume secrets rendering that change useless.

    This change seeks to address this by using the destroy_secrets kwarg
    introduced by I856268b371f7ba712b02189db3c927cd762a4dc3 within the
    _hard_reboot method of the libvirt driver to ensure secrets are not
    removed during a hard reboot.

    This resolves the original issue in bug #1905701 *and* allows admins to
    hard reboot a users instance when that instance has encrypted volumes
    attached with secrets stored in Barbican. This latter use case being
    something we can easily test within tempest unlike the compute reboot in
    bug #1905701.

    This change is kept small as it should ideally be backported alongside
    Ia2007bc63ef09931ea0197cef29d6a5614ed821a to stable/queens. Follow up
    changes on master will improve formatting, doc text and introduce
    functional tests to further validate this new behaviour of hard reboot
    within the libvirt driver.

    Closes-Bug: #1905701
    Change-Id: I3d1b21ba6eb3f5eb728693197c24b4b315eef821
    (cherry picked from commit 26d65fc882e42b824409dff87ff026dee1debe20)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.1.0

This issue was fixed in the openstack/nova 23.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers