VM with encrypted volume goes to error state when hard reboot

Bug #1597234 reported by Lisa Li on 2016-06-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Lee Yarwood

Bug Description

In current master branch with LVM as backend:

Steps to reproduce
==================

1. cinder type-create LUKS

2. cinder encryption-type-create --cipher aes-xts-plain64 --key_size 512 --control_location front-end LUKS nova.volume.encryptors.luks.LuksEncryptor

3. cinder create --volume-type LUKS 1

4. nova boot --flavor 1 --image 3feb30f7-d171-4b58-a126-2127016a6051 lisa

5. nova volume-attach c2ee07df-f1d2-4c1c-b08f-9d001209d4cf 72ce7ebf-7400-47da-91f5-3173e01a199e

6. nova reboot --hard c2ee07df-f1d2-4c1c-b08f-9d001209d4cf

Actual result
=============

The VM goes into error state.

2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server device_info = self.connector.connect_volume(connection_info['data'])
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner

....
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server cmd=sanitized_cmd)
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command.
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server Command: sudo nova-rootwrap /etc/nova/rootwrap.conf scsi_id --page 0x83 --whitelisted /dev/disk/by-path/ip-10.239.48.111:3260-iscsi-iqn.2010-10.org.openstack:volume-72ce7ebf-7400-47da-91f5-3173e01a199e-lun-1
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server Exit code: 1
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server Stdout: u''
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server Stderr: u''
2016-06-29 16:05:11.925 TRACE oslo_messaging.rpc.server

Analysis:

When attaching the encrypted volume to the VM, it finally makes the symlink path point to the dm device.
When reboot, there is no unattach dm device. May problem is here. Need to investigate more.

-HP-Compaq-Elite-8300-CMT:/dev$ ls -lrta /dev/mapper/ip-10.239.48.111:3260-iscsi-iqn.2010-10.org.openstack:volume-72ce7ebf-7400-47da-91f5-3173e01a199e-lun-1
lrwxrwxrwx 1 root root 7 Jun 29 16:05 /dev/mapper/ip-10.239.48.111:3260-iscsi-iqn.2010-10.org.openstack:volume-72ce7ebf-7400-47da-91f5-3173e01a199e-lun-1 -> ../dm-2

Eli Qiao (taget-9) on 2016-07-01
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Lisa Li (lisali) wrote :

Nova needs to detach dmcrypt devices when powering off/stoping, and re-attach dmcrypt device when starting.

Changed in nova:
assignee: Lisa Li (lisali) → Manish (deolalkar-manish)
Lisa Li (lisali) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/337075

Changed in nova:
status: Confirmed → In Progress
Andrea Rosa (andrea-rosa-m) wrote :

Please can you add more information about the bug?
In the description at the end you say that more investigation is needed, Did you get a chance to perform further investigations?
if so, please can you add more details here, It is not clear what the problem is and what is the path to fix it.
I looked at the proposed patch but it didn't help and it doesn't look to me as a patch for this issue. Thanks for your help

Fix proposed to branch: master
Review: https://review.openstack.org/338716

Changed in nova:
assignee: Manish (deolalkar-manish) → Lisa Li (lisali)
Lisa Li (lisali) wrote :

Hi Andrea, let me describe the detailed process:
For non-encrypted volume, Nova calls connect_volume to do attaching. This returns connection_info which includes 'device_path' that shows the device path in computer host.
With encrypted volume, after the above steps, it needs to decrypt the device. It creates a decrpted device with 'cryptset open', and then make device_path in connection_info refer to the new decrypted device.

When rebooting an VM, Nova instance is shut down without any detaching/disconnection operations. That is ok.
But when starting, it calls connect_volume for every block device. For non-encrypted volumes, this calls can be called repeatedly. But for encrypted volumes, as device_path refers to a decrypted device after opening. It can't be run with scsi_id. As a result, to call connect_volume on the decrypted device leads exception.

Lisa Li (lisali) wrote :

I added a fix to recover link of device_path when detaching encryptors: https://review.openstack.org/#/c/338716/

Change abandoned by Manish (<email address hidden>) on branch: master
Review: https://review.openstack.org/337075
Reason: Added new patch https://review.openstack.org/#/c/357131

Change abandoned by LisaLi (<email address hidden>) on branch: master
Review: https://review.openstack.org/338716
Reason: Abandon this patch as it is combined in https://review.openstack.org/#/c/357131/

Changed in nova:
assignee: Lisa Li (lisali) → Paul Carlton (paul-carlton2)

Fix proposed to branch: master
Review: https://review.openstack.org/400384

Changed in nova:
assignee: Paul Carlton (paul-carlton2) → Lee Yarwood (lyarwood)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Paul Carlton (paul-carlton2)
Changed in nova:
assignee: Paul Carlton (paul-carlton2) → Lee Yarwood (lyarwood)

Change abandoned by Manish (<email address hidden>) on branch: master
Review: https://review.openstack.org/357131
Reason: Alternative Patch Available: https://review.openstack.org/400384

Nazeema Begum (nazeema123) wrote :

Is anyone still working on this bug..!if not i would like to work on this bug

Lee Yarwood (lyarwood) wrote :

I plan on updating https://review.openstack.org/#/c/400384/ within the next few days.

Changed in nova:
assignee: Lee Yarwood (lyarwood) → Guang Yee (guang-yee)
Changed in nova:
assignee: Guang Yee (guang-yee) → Lee Yarwood (lyarwood)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/512896
Reason: we're going to do newton-eol

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers