encrypted volume is in "in-use" state while boot from volume failed

Bug #1745149 reported by Liron Kuchlani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Incomplete
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Medium
Unassigned

Bug Description

Description of problem:
A bootable volume that is encrypted is created
and encryption was not configured in nova.
create new server failed but the volume status stay in "in-use" state.
(openstack) server list
+--------------------------------------+-------+--------+-----------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+--------+-----------------------------------+--------+--------+
| ea85b1a0-4bc7-40ec-ae4b-19e3a1095a4f | inst2 | ERROR | | | tiny |
| 450255af-5341-4626-b5ef-17de749137ad | inst1 | ACTIVE | internal=192.168.0.11, 10.0.0.216 | cirros | tiny |
+--------------------------------------+-------+--------+-----------------------------------+--------+--------+
(openstack)
(openstack)
(openstack) volume list
+--------------------------------------+-------------+-----------+------+--------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+-------------+-----------+------+--------------------------------+
| bb12b367-e4e1-471a-a0b9-86b7f32be740 | encrypt_vol | in-use | 1 | Attached to inst2 on /dev/vda |
| 5c8ff52c-d7e4-4c1f-9104-b6a780539340 | vol1 | available | 1 | |
+--------------------------------------+-------------+-----------+------+--------------------------------+

Version-Release number of selected component (if applicable):
puppet-cinder-11.3.0-2.el7ost.noarch
python-cinder-11.0.1-2.el7ost.noarch
python-cinderclient-3.1.0-1.el7ost.noarch
openstack-cinder-11.0.1-2.el7ost.noarch
openstack-nova-console-16.0.2-8.el7ost.noarch
python-novaclient-9.1.1-1.el7ost.noarch
openstack-nova-compute-16.0.2-8.el7ost.noarch
openstack-nova-scheduler-16.0.2-8.el7ost.noarch
openstack-nova-api-16.0.2-8.el7ost.noarch
puppet-nova-11.4.0-2.el7ost.noarch
openstack-nova-common-16.0.2-8.el7ost.noarch
openstack-nova-placement-api-16.0.2-8.el7ost.noarch
openstack-nova-conductor-16.0.2-8.el7ost.noarch
python-nova-16.0.2-8.el7ost.noarch
openstack-nova-migration-16.0.2-8.el7ost.noarch
openstack-nova-novncproxy-16.0.2-8.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Configure encryption in Cinder but not in nova
2. Create bootable volume from image
3. Create server from volume

Actual results:
Server create failed but volume status is "in-use"

Expected results:
Server create failed but volume status is "available"

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Looks like a Nova problem. We need the logs from this, but my assumption is the Cinder volume is indeed attached and therefore in-use, so Cinder is working as expected.

I am not sure if the expectation on the Nova side is that they fully rollback and clean up the Cinder volume in case of a failure like this, or if it is kept around as a means to troubleshoot what happened.

Changed in cinder:
status: New → Incomplete
Revision history for this message
Liron Kuchlani (lkuchlan) wrote :
Download full text (4.9 KiB)

From nova-compute:
==================

ey not defined
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] Traceback (most recent call last):
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1856, in _do_build_and_run_instance
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] filter_properties)
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2035, in _build_and_run_instance
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] phase=fields.NotificationPhase.ERROR, exception=e)
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] self.force_reraise()
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] six.reraise(self.type_, self.value, self.tb)
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2010, in _build_and_run_instance
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] instance=instance)
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] self.gen.throw(type, value, traceback)
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2219, in _build_resources
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] reason=six.text_type(exc))
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702] BuildAbortException: Build of instance 1065e02d-b199-4071-8a46-93c3f0f83702 aborted: keymgr.fixed_key not def
ined
2018-01-25 14:36:58.781 1 ERROR nova.compute.manager [instance: 1065e02d-b199-4071-8a46-93c3f0f83702]

From volume.log:
================

Stdout: u''
Stderr: u'Device /dev/sda is not a valid LUKS device.\nCommand failed with code 22: Invalid argument\n' loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:449
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 445, in loop
    reply = s...

Read more...

melanie witt (melwitt)
tags: added: volumes
Revision history for this message
melanie witt (melwitt) wrote :
Download full text (8.1 KiB)

This looks valid to me. During the boot process, it's failing to attach the volume because CONF.key_manager.fixed_key = None. We should handle that exception and roll back the volume status via Cinder calls as things are getting cleaned up.

2018-01-25 14:36:56.716 1 WARNING nova.volume.cinder [req-9e16e37f-6bdb-445d-8024-1184a3dd7c45 db1a02ae2f5a4f4db07c1f62958ef9e7 ef184f071981405d9071c7ca33a40303 - default default] The support for the Cinder API v2 is deprecated, please upgrade to Cinder API v3.
2018-01-25 14:36:56.717 1 DEBUG cinderclient.v2.client [req-9e16e37f-6bdb-445d-8024-1184a3dd7c45 db1a02ae2f5a4f4db07c1f62958ef9e7 ef184f071981405d9071c7ca33a40303 - default default] REQ: curl -g -i -X GET http://172.17.1.14:8776/v2/ef184f071981405d9071c7ca33a40303/volumes/58804519-e8fc-442b-ad7c-9066abf103d7/encryption -H "User-Agent: python-cinderclient" -H "X-OpenStack-Request-ID: req-9e16e37f-6bdb-445d-8024-1184a3dd7c45" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}671e4eff9977526aedde7b4cf4a18083a16178a5" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:375
2018-01-25 14:36:57.506 1 DEBUG cinderclient.v2.client [req-9e16e37f-6bdb-445d-8024-1184a3dd7c45 db1a02ae2f5a4f4db07c1f62958ef9e7 ef184f071981405d9071c7ca33a40303 - default default] RESP: [200] Date: Thu, 25 Jan 2018 14:36:56 GMT Server: Apache x-compute-request-id: req-d826157e-23b3-4d0b-947c-5688ebbdb4b4 x-openstack-request-id: req-d826157e-23b3-4d0b-947c-5688ebbdb4b4 Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 151 Content-Type: application/json
RESP BODY: {"cipher": "aes-xts-plain64", "encryption_key_id": "00000000-0000-0000-0000-000000000000", "provider": "nova.volume.encryptors.luks.LuksEncryptor", "key_size": 512, "control_location": "front-end"}
 _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:423
2018-01-25 14:36:57.507 1 DEBUG cinderclient.v2.client [req-9e16e37f-6bdb-445d-8024-1184a3dd7c45 db1a02ae2f5a4f4db07c1f62958ef9e7 ef184f071981405d9071c7ca33a40303 - default default] GET call to cinderv2 for http://172.17.1.14:8776/v2/ef184f071981405d9071c7ca33a40303/volumes/58804519-e8fc-442b-ad7c-9066abf103d7/encryption used request id req-d826157e-23b3-4d0b-947c-5688ebbdb4b4 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:727
2018-01-25 14:36:57.508 1 DEBUG os_brick.encryptors [req-9e16e37f-6bdb-445d-8024-1184a3dd7c45 db1a02ae2f5a4f4db07c1f62958ef9e7 ef184f071981405d9071c7ca33a40303 - default default] Using volume encryption metadata '{u'cipher': u'aes-xts-plain64', u'encryption_key_id': u'00000000-0000-0000-0000-000000000000', u'provider': u'nova.volume.encryptors.luks.LuksEncryptor', u'key_size': 512, u'control_location': u'front-end'}' for connection: {u'driver_volume_type': u'iscsi', 'connector': {'platform': 'x86_64', 'host': 'compute-0.localdomain', 'do_local_attach': False, 'ip': '172.17.1.13', 'os_type': 'linux2', 'multipath': False, 'initiator': u'iqn.1994-05.com.redhat:8f64791b5192'}, 'serial': u'58804519-e8fc-442b-ad7c-9066abf103d7', u'data': {u'access_mode': u'rw', u'target_discovered': False, u'encrypted': True, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.opens...

Read more...

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
tags: added: compute
Revision history for this message
melanie witt (melwitt) wrote :

Adding more notes on what I think needs to be done. I did some digging and found that what's happening is the volume is connected to the instance in the libvirt driver, *then* it tries to attach the encryptor and the encryptor attach fails because the fixed_key config option hasn't been set. So the instance spawn fails but the volume remains connected. We need to try-except around the encryptor attach attempt and disconnect the volume before re-raising.

1. Create a new, more specific NovaException for the missing fixed key, for example "FixedKeyNotDefined" and correct the message (it should be "key_manager.fixed_key") [1]

2. Catch FixedKeyNotDefined in the libvirt driver when trying to attach the encryptor and disconnect the volume before re-raising [2]

[1] https://github.com/openstack/nova/blob/233285d/nova/keymgr/conf_key_mgr.py#L67
[2] https://github.com/openstack/nova/blob/233285d/nova/virt/libvirt/driver.py#L1227

Changed in nova:
status: Confirmed → Triaged
tags: added: libvirt
removed: compute
Revision history for this message
melanie witt (melwitt) wrote :

On second thought, it might be best to catch all Exception during the encryptor attach attempt and just re-raise that. Because if the encryptor attach fails for any reason, we need to disconnect the volume.

Revision history for this message
int32bit (int32bit) wrote :

I think the second thought is better, which really covers all possible failures and rollback the volume to normal status.

Changed in nova:
assignee: nobody → int32bit (int32bit)
Revision history for this message
int32bit (int32bit) wrote :

If this is OK, I will try to fix it.

Revision history for this message
int32bit (int32bit) wrote :

After disconnect volume, the volume will keep in "reserved" status. Not only this case, in fact once instance build abort(like bind port fail), the status of volume will be set to "reserved". I'm really not sure whether it's our expected status or not.

Revision history for this message
int32bit (int32bit) wrote :

IMO, we should unreserve volume if the volume is root disk when fail to boot an instance from volume.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/549469

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/549469
Reason: This patch is stale. Feel free to restore it (or gibi on IRC to do so) if you are still working on this.

Changed in nova:
status: In Progress → Triaged
assignee: int32bit (int32bit) → nobody
Revision history for this message
Lee Yarwood (lyarwood) wrote :

This is actually invalid, failure during spawn should actually result in the volume remaining attached and in-use as happens now. We could call _disconnect_volume when we hit an error during attach_encryptor but I believe we should also clean this up when the instance is destroyed anyway.

Changed in nova:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.