test_boot_server_from_encrypted_volume_luks cannot detach an encrypted StorPool-backed volume

Bug #1746609 reported by Peter Penchev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann

Bug Description

Hi,

First of all, thanks a lot for working on Nova!

The StorPool third-party Cinder CI has been failing on every test run today with the same problem: the test_boot_server_from_encrypted_volume_luks Tempest test fails when trying to detach a volume with an exception in the nova-compute service log: "Failed to detach volume 645fd643-89fc-4b3d-9ea5-59c764fc39a2 from /dev/vdb: AttributeError: 'NoneType' object has no attribute 'format_dom'"

An example stack trace may be seen at:
- nova-compute log: http://logs.ci-openstack.storpool.com/18/539318/1/check/dsvm-tempest-storpool/c3daf58/logs/screen-n-cpu.txt.gz#_Jan_31_18_07_27_971552
- console log (with the list of tests run): http://logs.ci-openstack.storpool.com/18/539318/1/check/dsvm-tempest-storpool/c3daf58/console.html

Actually, start from http://logs.ci-openstack.storpool.com/ - any of the recent five or six failures can be traced back to this problem.

Of course, it is completely possible that the (recently merged) StorPool Nova volume attachment driver or the (also recently merged) StorPool os-brick connector is at fault; if there are any configuration fields or method parameters that we should be preserving, passing through, or handling in some other way, please let us know and we will modify our drivers. Also, our CI system is available for testing any suggested patches or workarounds.

Thanks in advance for looking at this, and thanks for your work on Nova and OpenStack in general!

Best regards,
Peter

melanie witt (melwitt)
tags: added: libvirt volumes
Matt Riedemann (mriedem)
tags: added: queens-rc-potential
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Peter Penchev (openstack-dev-s) wrote :

I'm sorry, melwitt also asked (very reasonably) that I post the excerpt of the stack trace in the bug log itself:

Failed to detach volume 645fd643-89fc-4b3d-9ea5-59c764fc39a2 from /dev/vdb: AttributeError: 'NoneType' object has no attribute 'format_dom'
Traceback (most recent call last):
  File "/opt/stack/new/nova/nova/virt/block_device.py", line 300, in driver_detach
    encryption=encryption)
  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1556, in detach_volume
    live=live)
  File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 428, in detach_device_with_retry
    _try_detach_device(conf, persistent, live)
  File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 397, in _try_detach_device
    self.detach_device(conf, persistent=persistent, live=live)
  File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 473, in detach_device
    device_xml = conf.to_xml()
  File "/opt/stack/new/nova/nova/virt/libvirt/config.py", line 77, in to_xml
    root = self.format_dom()
  File "/opt/stack/new/nova/nova/virt/libvirt/config.py", line 831, in format_dom
    dev.append(self.encryption.format_dom())
  File "/opt/stack/new/nova/nova/virt/libvirt/config.py", line 1154, in format_dom
    obj.append(self.secret.format_dom())
AttributeError: 'NoneType' object has no attribute 'format_dom'

From a quick look at the code it seems to me that a LibvirtConfigGuestDiskEncryption object is being initialized without a "secret" field, and format_dom() expects the "secret" field to be there.

Best regards,
Peter

Revision history for this message
Matt Riedemann (mriedem) wrote :

Could be that we don't get here:

https://review.openstack.org/#/c/464008/10/nova/virt/libvirt/config.py@1149

I've also noticed in several places in the libvirt driver and volume code that it expects an entry for connection_info['data']['volume_id'] but that doesn't exist for the storpool volume type, as seen here:

http://logs.ci-openstack.storpool.com/18/539318/1/check/dsvm-tempest-storpool/c3daf58/logs/screen-n-cpu.txt.gz#_Jan_31_18_07_27_971552

The libvirt driver code should probably be falling back to look for the connection_info['serial'] if it can't find the volume_id, since 'serial' is something that nova puts into the connection_info on attach if it's not already there:

https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L425

Revision history for this message
Matt Riedemann (mriedem) wrote :

http://logs.ci-openstack.storpool.com/18/539318/1/check/dsvm-tempest-storpool/c3daf58/logs/screen-n-cpu.txt.gz#_Jan_31_18_07_21_994385

Jan 31 18:07:21.994385 ubuntu nova-compute[32004]: DEBUG nova.virt.libvirt.host [None req-5386395b-4fe8-4de8-9794-99a008fb66be tempest-TestEncryptedCinderVolumes-7199969 tempest-TestEncryptedCinderVolumes-7199969] Secret XML: <secret ephemeral="no" private="no">
Jan 31 18:07:21.994474 ubuntu nova-compute[32004]: <usage type="volume">
Jan 31 18:07:21.994552 ubuntu nova-compute[32004]: <volume>None</volume>
Jan 31 18:07:21.994627 ubuntu nova-compute[32004]: </usage>
Jan 31 18:07:21.994701 ubuntu nova-compute[32004]: </secret>
Jan 31 18:07:21.994931 ubuntu nova-compute[32004]: {{(pid=32004) create_secret /opt/stack/new/nova/nova/virt/libvirt/host.py:731}}
Jan 31 18:07:22.285981 ubuntu nova-compute[32004]: DEBUG nova.virt.libvirt.guest [None req-5386395b-4fe8-4de8-9794-99a008fb66be tempest-TestEncryptedCinderVolumes-7199969 tempest-TestEncryptedCinderVolumes-7199969] attach device xml: <disk type="block" device="disk">
Jan 31 18:07:22.286121 ubuntu nova-compute[32004]: <driver name="qemu" type="raw" cache="none"/>
Jan 31 18:07:22.286225 ubuntu nova-compute[32004]: <source dev="/dev/storpool/os--volume-645fd643-89fc-4b3d-9ea5-59c764fc39a2"/>
Jan 31 18:07:22.286302 ubuntu nova-compute[32004]: <target bus="virtio" dev="vdb"/>
Jan 31 18:07:22.286382 ubuntu nova-compute[32004]: <serial>645fd643-89fc-4b3d-9ea5-59c764fc39a2</serial>
Jan 31 18:07:22.286463 ubuntu nova-compute[32004]: </disk>
Jan 31 18:07:22.286540 ubuntu nova-compute[32004]: {{(pid=32004) attach_device /opt/stack/new/nova/nova/virt/libvirt/guest.py:302}}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/539739

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
Peter Penchev (openstack-dev-s) wrote :

Matt Riedemann's fix seems to have helped the StorPool third-party Cinder CI; here's a successful run with the change cherry-picked before running devstack: http://logs.ci-openstack.storpool.com/73/539773/1/check/dsvm-tempest-storpool/0ca4a8e/console.html (look for the "fix native luks encryption failure to find volume_id" string in the console log).

Thanks a lot for the quick analysis and the quick fix!

Best regards,
Peter

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/539739
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cafe3d066ef7021c18961d4b239a10f61db23f2d
Submitter: Zuul
Branch: master

commit cafe3d066ef7021c18961d4b239a10f61db23f2d
Author: Matt Riedemann <email address hidden>
Date: Wed Jan 31 19:06:46 2018 -0500

    libvirt: fix native luks encryption failure to find volume_id

    Not all volume types put a 'volume_id' entry in the
    connection_info['data'] dict. This change uses a new
    utility method to look up the volume_id in the connection_info
    data dict and if not found there, uses the 'serial' value
    from the connection_info, which we know at least gets set
    when the DriverVolumeBlockDevice code attaches the volume.

    This also has to update pre_live_migration since the connection_info
    dict doesn't have a 'volume_id' key in it. It's unclear what
    this code was expecting, or if it ever really worked, but since
    an attached volume represented by a BlockDeviceMapping here has
    a volume_id attribute, we can just use that. As that code path
    was never tested, this updates related unit tests and refactors
    the tests to actually use the type of DriverVolumeBlockDevice
    objects that the ComputeManager would be sending down to the
    driver pre_live_migration method. The hard-to-read squashed
    dicts in the tests are also re-formatted so a human can actually
    read them.

    Change-Id: Ie02d298cd92d5b5ebcbbcd2b0e8be01f197bfafb
    Closes-Bug: #1746609

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc1

This issue was fixed in the openstack/nova 17.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.