Nova fails to open console.log file at repeated host evacuation

Bug #1634282 reported by Marian Krcmarik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned

Bug Description

I am trying to perform "ping-pong" host evacuation - take down ungracefully host A, evacuate host A to host B, take down ungracefully host B and evacuate to host A. The second evacuation fails at following error:
[Errno 13] Permission denied: '/var/lib/nova/instances/e17a325c-f0da-4f2f-aad3-4b1c098f295f/console.log'

Traceback:
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] Traceback (most recent call last):
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6566, in _error_out_instance_on_exception
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] yield
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2687, in rebuild_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] bdms, recreate, on_shared_storage, preserve_ephemeral)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2731, in _do_rebuild_instance_with_claim
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._do_rebuild_instance(*args, **kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2846, in _do_rebuild_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._rebuild_default_impl(**kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2612, in _rebuild_default_impl
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] block_device_info=new_block_device_info)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2574, in spawn
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] self._ensure_console_log_for_instance(instance)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2835, in _ensure_console_log_for_instance
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] libvirt_utils.file_open(console_file, 'a').close()
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 313, in file_open
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] return open(*args, **kwargs)
2016-10-17 11:36:56.023 3433 ERROR nova.compute.manager [instance: a7ba9743-b425-4e47-aeb0-d545e66fffe9] IOError: [Errno 13] Permission denied: '/var/lib/nova/instances/a7ba9743-b425-4e47-aeb0-d545e66fffe9/console.log'

The problem is that the console file is left on host A with qemu:qemu ownership set by libvirt after first evacuation (the host A was taken down ungracefully) and once nova tries to open that file with append permission under nova user during second evacution from host B to host A It fails due to permissions. This try of nova to open the console.log file was introduce in:
https://github.com/openstack/nova/commit/ec6ed24cb844dcdf834d283d496c9b920ff1db83

Since I believe that in default installations usually dynamic_ownership=0 is not set and qemu is not started by the same user as Nova I would consider this as a regression.

Steps to reproduce:
1. Boot a VM on host A
2. Disrupt host A and trigger evacuation to host B
3. Wait for host A to be online again
4. Disrupt host B and trigger evacuation to host A

Expected result:
Successful 2nd host evacuation

Actual result:
Nova does not have permission to open console.log file and fails the evacuation.

Newton release used

description: updated
Changed in nova:
status: New → In Progress
assignee: nobody → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/392643

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/392643
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1848860fd4fb6dfb2ead1f8271050d36df592205
Submitter: Jenkins
Branch: master

commit 1848860fd4fb6dfb2ead1f8271050d36df592205
Author: Stephen Finucane <email address hidden>
Date: Wed Nov 2 12:43:35 2016 +0000

    Ignore IOError when creating 'console.log'

    In 'd4e6bd8', and later refined in 'ec6ed24c', an issue with the the
    Quobyte CI was resolved by ensuring that the 'console.log' file is
    always created, even when libvirt's 'dynamic_ownership' config option
    is set to '0'. However, attempting to take ownership of this file can
    cause issues when 'dynamic_ownership' is set to '1' and a user attempts
    to reschedule an instance to a host where it was previously located. In
    this case, the permissions on the file have already been co-opted by
    libvirt, and attempting to change them results in an IOError.

    The easiest fix for this is to simply ignore the IOError, as the file
    already exists and libvirt has set correct permissions already. This is
    what do here. A longer term fix would be to avoid changing the
    permissions only for the offending Quobytes block devices, but this is
    likely not backportable and will be done in a follow-up fix.

    Change-Id: I353afd57be207330757580447a1b606c7b9b7c09
    Partial-bug: #1634282
    Related-bug: #1597644
    Related-bug: #1609298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/454593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/454593
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a96092d0e48fef62d257acf0428ddb6daf07d3a4
Submitter: Jenkins
Branch: stable/newton

commit a96092d0e48fef62d257acf0428ddb6daf07d3a4
Author: Stephen Finucane <email address hidden>
Date: Wed Nov 2 12:43:35 2016 +0000

    Ignore IOError when creating 'console.log'

    In 'd4e6bd8', and later refined in 'ec6ed24c', an issue with the the
    Quobyte CI was resolved by ensuring that the 'console.log' file is
    always created, even when libvirt's 'dynamic_ownership' config option
    is set to '0'. However, attempting to take ownership of this file can
    cause issues when 'dynamic_ownership' is set to '1' and a user attempts
    to reschedule an instance to a host where it was previously located. In
    this case, the permissions on the file have already been co-opted by
    libvirt, and attempting to change them results in an IOError.

    The easiest fix for this is to simply ignore the IOError, as the file
    already exists and libvirt has set correct permissions already. This is
    what do here. A longer term fix would be to avoid changing the
    permissions only for the offending Quobytes block devices, but this is
    likely not backportable and will be done in a follow-up fix.

    Change-Id: I353afd57be207330757580447a1b606c7b9b7c09
    Partial-bug: #1634282
    Related-bug: #1597644
    Related-bug: #1609298
    (cherry picked from commit 1848860fd4fb6dfb2ead1f8271050d36df592205)

tags: added: in-stable-newton
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: Stephen Finucane (stephenfinucane) → nobody
Changed in nova:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.