VM on encrypted boot volume fails to start after compute host reboot

Bug #1843643 reported by Jing Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Description
===========

Create volume from image and boot instance, all good.

https://docs.openstack.org/newton/user-guide/cli-nova-launch-instance-from-volume.html

Restart the compute host, VM fails to start. Manually run nova hard reboot is able to recover.

The root cause is nova uses admin_context upon compute reboot to resume VMs, admin_context does not have the information for libvirt to start the VM.

Manual nova hard reboot can be used as a walk-around, but auto-resume is ideal.

Steps to reproduce
==================

1. Create volume from image and boot instance
2. Restart the compute host

Expected result
===============
VM is restarted after compute host reboot.

Actual result
=============
VM failed to restart

Environment
===========
1. Nova context used by manual nova hard reboot

2019-09-11 17:20:52.314 9 ERROR nova.volume.cinder [req-8467f506-fbce-447a-a5a2-63e048dedf43 97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default] <Context {'domain': None, 'project_name': u'admin', 'global_request_id': None, 'project_domain': u'default', 'timestamp': '2019-09-11T14:20:50.934650', 'auth_token': u'gAAAAABdeQLBRlA0ILJP9K6qNXhULB5IbnLNtES-NwHfNvjBbF0oo8P5NL3o2VToxUpPV30RsUem8z4GGiSo9KvYNwipZk4aKEIQhKmHV2p6LPyc_Df09WX56qUJGFR0O1zww9__StDaWVTlGTlX10FuEikl2dJd_E3ptViIxsE4jrIdjaZbcyw', 'remote_address': u'172.17.1.19', 'quota_class': None, 'resource_uuid': None, 'is_admin': True, 'user': u'97b696e8d26f4cd4bc5f3352981e9987', 'service_catalog': [{u'endpoints': [

{u'adminURL': u'http://172.17.1.27:9696', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:9696', u'publicURL': u'https://10.75.239.200:13696'}
], u'type': u'network', u'name': u'neutron'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:9292', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:9292', u'publicURL': u'https://10.75.239.200:13292'}
], u'type': u'image', u'name': u'glance'}, {u'endpoints': [

{u'adminURL': u'https://172.17.1.27:9311', u'region': u'regionOne', u'internalURL': u'https://172.17.1.27:9311', u'publicURL': u'https://172.17.1.27:9311'}
], u'type': u'key-manager', u'name': u'barbican'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:8778/placement', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:8778/placement', u'publicURL': u'https://10.75.239.200:13778/placement'}
], u'type': u'placement', u'name': u'placement'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:8776/v3/dd838c0dfb2540e39ca34ab44ecbc58f', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:8776/v3/dd838c0dfb2540e39ca34ab44ecbc58f', u'publicURL': u'https://10.75.239.200:13776/v3/dd838c0dfb2540e39ca34ab44ecbc58f'}
], u'type': u'volumev3', u'name': u'cinderv3'}], 'tenant': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'read_only': False, 'project_id': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'user_id': u'97b696e8d26f4cd4bc5f3352981e9987', 'show_deleted': False, 'system_scope': None, 'user_identity': u'97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default', 'is_admin_project': True, 'project': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'read_deleted': u'no', 'request_id': u'req-8467f506-fbce-447a-a5a2-63e048dedf43', 'roles': [u'reader', u'_member_', u'admin', u'member'], 'user_domain': u'default', 'user_name': u'admin'}>

2. Nova admin context used by nova upon compute host reboot

2019-09-11 17:20:52.315 9 ERROR nova.volume.cinder [req-8467f506-fbce-447a-a5a2-63e048dedf43 97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default] <Context {'domain': None, 'project_name': None, 'global_request_id': None, 'project_domain': None, 'timestamp': '2019-09-11T14:20:52.315458', 'auth_token': None, 'remote_address': None, 'quota_class': None, 'resource_uuid': None, 'is_admin': True, 'user': None, 'service_catalog': [], 'tenant': None, 'read_only': False, 'project_id': None, 'user_id': None, 'show_deleted': False, 'system_scope': None, 'user_identity': u'- - - - -', 'is_admin_project': True, 'project': None, 'read_deleted': 'no', 'request_id': 'req-de45bebe-f7db-48fe-bad4-0f95af8dbe6f', 'roles': [], 'user_domain': None, 'user_name': None}>

Matt Riedemann (mriedem)
tags: added: encryption libvirt volumes
Revision history for this message
Lee Yarwood (lyarwood) wrote :

I assume you're using resume_guests_state_on_host_boot to automatically restart these instances?

If you are it looks like _resume_guests_state just needs to elevate the current context before passing it down to the driver for use by _hard_reboot as is done already when rebooting an instance:

https://github.com/openstack/nova/blob/7a18209a81539217a95ab7daad6bc67002768950/nova/compute/manager.py#L3455

Revision history for this message
Jing Zhang (jing.zhang.nokia) wrote :

(1) Yes, resume_guests_state_on_host_boot is set to TRUE, it is for production
(2) Already tried, it does not work, context elevation is not for this purpose

    def elevated(self, read_deleted=None):
        """Return a version of this context with admin flag set."""
        context = copy.copy(self)
        # context.roles must be deepcopied to leave original roles
        # without changes
        context.roles = copy.deepcopy(self.roles)
        context.is_admin = True

        if 'admin' not in context.roles:
            context.roles.append('admin')

        if read_deleted is not None:
            context.read_deleted = read_deleted

        return context

Revision history for this message
Matt Riedemann (mriedem) wrote :

What is the actual failure? Is there a traceback in the compute logs? Also, what version of nova are you using?

Revision history for this message
Jing Zhang (jing.zhang.nokia) wrote :

(1) Error is reported from nova/volume/cinder.py, due to lacking of true "context" data in admin-context.

2019-09-05 16:08:29.880 8 INFO nova.compute.manager [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] [instance: dcb3bdd8-17dc-49d5-82ab-66d7fc2944d0] Rebooting instance after nova-compute restart.
2019-09-05 16:08:29.908 8 INFO nova.virt.libvirt.driver [-] [instance: dcb3bdd8-17dc-49d5-82ab-66d7fc2944d0] Instance destroyed successfully.
2019-09-05 16:08:30.055 8 INFO os_vif [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] Successfully unplugged vif VIFVHostUser(active=True,address=fa:16:3e:c9:a8:2e,has_traffic_filtering=False,id=8b73ce3b-7819-4141-b2cc-d64d2648384c,mode='server',network=Network(a65a1c22-701d-44d5-b7a0-0cdd0149b725),path='/var/lib/vhost_sockets/vhu8b73ce3b-78',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='vhu8b73ce3b-78')
2019-09-05 16:08:30.057 8 ERROR nova.volume.cinder [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] The [cinder] section of your nova configuration file must be configured for authentication with the block-storage service endpoint.
2019-09-05 16:08:30.058 8 ERROR os_brick.encryptors [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] Failed to retrieve encryption metadata for volume 619213b6-9949-4684-bb1c-c3fa508bb42a: Unknown auth type: None (HTTP 401): Unauthorized: Unknown auth type: None (HTTP 401)
2019-09-05 16:08:30.058 8 WARNING nova.compute.manager [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] [instance: dcb3bdd8-17dc-49d5-82ab-66d7fc2944d0] Failed to resume instance: Unauthorized: Unknown auth type: None (HTTP 401)
2019-09-05 16:08:32.416 8 WARNING nova.compute.manager [req-3a72cbdc-5a75-4e4e-9487-1a2434606041 6ce0c0708fd74f00975f2fa104d3cfb7 ab87141455024488b152deeb77f68bb3 - default default] [instance: dcb3bdd8-17dc-49d5-82ab-66d7fc2944d0] Received unexpected event network-vif-unplugged-8b73ce3b-7819-4141-b2cc-d64d2648384c for instance with vm_state error and task_state None.

(2) Rocky. But it is the same issue in master, admin-context does not actual "context" data. The difference between admin-context used by nova to resume the VMs and the actual context used later is in the bug report content.

Jing

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

The above ERROR log makes me think you forgot to configure the cinder credentials hence the failure.

Restating the error out of the stacktrace :

2019-09-05 16:08:30.057 8 ERROR nova.volume.cinder [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] The [cinder] section of your nova configuration file must be configured for authentication with the block-storage service endpoint.

which leads to :
2019-09-05 16:08:30.058 8 ERROR os_brick.encryptors [req-522cbf47-a65a-4d46-8852-01adcfcbda2f - - - - -] Failed to retrieve encryption metadata for volume 619213b6-9949-4684-bb1c-c3fa508bb42a: Unknown auth type: None (HTTP 401): Unauthorized: Unknown auth type: None (HTTP 401)

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

OK, I'm going to close this bug as Invalid since it appears as a configuration issue. This being said, feel free to provide such credentials to nova.conf and test again to see if that resolves your issue. If that doesn't, please reopen the bug [1] with another stacktrace that would help us to better categorize.

Thanks,
-Sylvain

[1] in order to reopen a bug, you need to set its status to New.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.