VM on encrypted boot volume fails to start after compute host reboot

Bug #1843643 reported by Jing Zhang on 2019-09-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

Description
===========

Create volume from image and boot instance, all good.

https://docs.openstack.org/newton/user-guide/cli-nova-launch-instance-from-volume.html

Restart the compute host, VM fails to start. Manually run nova hard reboot is able to recover.

The root cause is nova uses admin_context upon compute reboot to resume VMs, admin_context does not have the information for libvirt to start the VM.

Manual nova hard reboot can be used as a walk-around, but auto-resume is ideal.

Steps to reproduce
==================

1. Create volume from image and boot instance
2. Restart the compute host

Expected result
===============
VM is restarted after compute host reboot.

Actual result
=============
VM failed to restart

Environment
===========
1. Nova context used by manual nova hard reboot

2019-09-11 17:20:52.314 9 ERROR nova.volume.cinder [req-8467f506-fbce-447a-a5a2-63e048dedf43 97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default] <Context {'domain': None, 'project_name': u'admin', 'global_request_id': None, 'project_domain': u'default', 'timestamp': '2019-09-11T14:20:50.934650', 'auth_token': u'gAAAAABdeQLBRlA0ILJP9K6qNXhULB5IbnLNtES-NwHfNvjBbF0oo8P5NL3o2VToxUpPV30RsUem8z4GGiSo9KvYNwipZk4aKEIQhKmHV2p6LPyc_Df09WX56qUJGFR0O1zww9__StDaWVTlGTlX10FuEikl2dJd_E3ptViIxsE4jrIdjaZbcyw', 'remote_address': u'172.17.1.19', 'quota_class': None, 'resource_uuid': None, 'is_admin': True, 'user': u'97b696e8d26f4cd4bc5f3352981e9987', 'service_catalog': [{u'endpoints': [

{u'adminURL': u'http://172.17.1.27:9696', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:9696', u'publicURL': u'https://10.75.239.200:13696'}
], u'type': u'network', u'name': u'neutron'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:9292', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:9292', u'publicURL': u'https://10.75.239.200:13292'}
], u'type': u'image', u'name': u'glance'}, {u'endpoints': [

{u'adminURL': u'https://172.17.1.27:9311', u'region': u'regionOne', u'internalURL': u'https://172.17.1.27:9311', u'publicURL': u'https://172.17.1.27:9311'}
], u'type': u'key-manager', u'name': u'barbican'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:8778/placement', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:8778/placement', u'publicURL': u'https://10.75.239.200:13778/placement'}
], u'type': u'placement', u'name': u'placement'}, {u'endpoints': [

{u'adminURL': u'http://172.17.1.27:8776/v3/dd838c0dfb2540e39ca34ab44ecbc58f', u'region': u'regionOne', u'internalURL': u'http://172.17.1.27:8776/v3/dd838c0dfb2540e39ca34ab44ecbc58f', u'publicURL': u'https://10.75.239.200:13776/v3/dd838c0dfb2540e39ca34ab44ecbc58f'}
], u'type': u'volumev3', u'name': u'cinderv3'}], 'tenant': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'read_only': False, 'project_id': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'user_id': u'97b696e8d26f4cd4bc5f3352981e9987', 'show_deleted': False, 'system_scope': None, 'user_identity': u'97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default', 'is_admin_project': True, 'project': u'dd838c0dfb2540e39ca34ab44ecbc58f', 'read_deleted': u'no', 'request_id': u'req-8467f506-fbce-447a-a5a2-63e048dedf43', 'roles': [u'reader', u'_member_', u'admin', u'member'], 'user_domain': u'default', 'user_name': u'admin'}>

2. Nova admin context used by nova upon compute host reboot

2019-09-11 17:20:52.315 9 ERROR nova.volume.cinder [req-8467f506-fbce-447a-a5a2-63e048dedf43 97b696e8d26f4cd4bc5f3352981e9987 dd838c0dfb2540e39ca34ab44ecbc58f - default default] <Context {'domain': None, 'project_name': None, 'global_request_id': None, 'project_domain': None, 'timestamp': '2019-09-11T14:20:52.315458', 'auth_token': None, 'remote_address': None, 'quota_class': None, 'resource_uuid': None, 'is_admin': True, 'user': None, 'service_catalog': [], 'tenant': None, 'read_only': False, 'project_id': None, 'user_id': None, 'show_deleted': False, 'system_scope': None, 'user_identity': u'- - - - -', 'is_admin_project': True, 'project': None, 'read_deleted': 'no', 'request_id': 'req-de45bebe-f7db-48fe-bad4-0f95af8dbe6f', 'roles': [], 'user_domain': None, 'user_name': None}>

Matt Riedemann (mriedem) on 2019-09-12
tags: added: encryption libvirt volumes
Lee Yarwood (lyarwood) wrote :

I assume you're using resume_guests_state_on_host_boot to automatically restart these instances?

If you are it looks like _resume_guests_state just needs to elevate the current context before passing it down to the driver for use by _hard_reboot as is done already when rebooting an instance:

https://github.com/openstack/nova/blob/7a18209a81539217a95ab7daad6bc67002768950/nova/compute/manager.py#L3455

Jing Zhang (jing.zhang.nokia) wrote :

(1) Yes, resume_guests_state_on_host_boot is set to TRUE, it is for production
(2) Already tried, it does not work, context elevation is not for this purpose

    def elevated(self, read_deleted=None):
        """Return a version of this context with admin flag set."""
        context = copy.copy(self)
        # context.roles must be deepcopied to leave original roles
        # without changes
        context.roles = copy.deepcopy(self.roles)
        context.is_admin = True

        if 'admin' not in context.roles:
            context.roles.append('admin')

        if read_deleted is not None:
            context.read_deleted = read_deleted

        return context

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers