_get_instance_disk_info fails to read files from NFS due to permissions

Bug #1416132 reported by Eric Harney
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Ankit Agrawal

Bug Description

LibvirtDriver's _get_instance_disk_info calls libvirt_utils.get_disk_backing_file() if processing a qcow2 backing file. If this is a file belonging to an attached and NFS-hosted Cinder volume, it may be owned by qemu:qemu and therefore not readable as the nova user.

[update: am now aiming at a different solution]
My proposed solution is to run the images.qemu_img_info() call as root in this case.

Note that this requires a change to grenade to upgrade the rootwrap configuration for gating to pass.
[/update]

Changed in nova:
assignee: nobody → Eric Harney (eharney)
status: New → In Progress
Revision history for this message
Eric Harney (eharney) wrote :
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

It is a bit unclear to me when would this happen under normal circumstances?

Reading the code of _get_instance_disk_info - it seems to me that it was meant to skip all volumes and does a poor job of that. Maybe that's where the fix should be?

Changed in nova:
importance: Undecided → High
tags: added: libvirt
Eric Harney (eharney)
description: updated
summary: - _get_instance_disk_info fails to read files from NFS due permissions
+ _get_instance_disk_info fails to read files from NFS due to permissions
Revision history for this message
Abhijeet Malawade (abhijeet-malawade) wrote :

This issue is getting reproduced only after applying cinder patch : https://review.openstack.org/#/c/147186/ (NFS snapshots) with 'nfs_qcow2_volumes' config option enabled.
So should it be considered as bug or enhancement?

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote :

Hi Eric, Nikola,

As per discussion on IRC I have applied Nikola's patches [1] in my environment for testing but still I am getting the permission issue.

ERROR nova.compute.manager [req-22814e21-b10e-4a8e-a599-d26242fd0d74 None None] Error updating resources for node openstack-177: Unexpected error while running command.
Command: env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc
Exit code: 1
Stdout: u''
Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc': Could not open '/opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc': Permission denied\n"

[1] https://review.openstack.org/#/c/180638/

Matt Riedemann (mriedem)
tags: added: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Eric Harney (<email address hidden>) on branch: master
Review: https://review.openstack.org/149037

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote :

Hi,

BDM entry and instance xml for reference, http://paste.openstack.org/show/321326/

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Ok so after a lengthy investigation - I think the root cause of this is that we call libvirt.driver._get_instance_disk_info (from _get_disk_over_committed_size_total see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6511). This method gets called without the block device information, but it is obvious that it cannot possibly do it's job correctly without the block device mapping information as it relies on it to filter out the volumes (see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6421).

There are 2 ways to fix this bug IMHO neither of which are easy. First one is that we pass the block device info to the libvirt driver on every get_available_resources pass. This will require some invasive refactoring of the resource tracker. We will also want to avoid looping over every instance and firing of N queries for block devices, so some refactoring of the instance related DB layer methods will be needed so that we can join-load block device in a single query.

Another way to approach the bug is to decide if disk_available_least is something we actually care about. It does seem to be used in the disk filter, but it seems to be wildely incorrect for at least some deployment scenarios (RBD for example) but also very useful when using qcow backing files.

Revision history for this message
Abhijeet Malawade (abhijeet-malawade) wrote :

Hi Nikola,

I have tried to pass bdm info to '_get_instance_disk_info' method as per your comment, and it is solving this issue (poc gerrit patch ; https://review.openstack.org/#/c/221162/). Currently this patch will make N number of queries for getting bdm info and instance from DB.

Changed in nova:
assignee: Eric Harney (eharney) → Diana Clarke (diana-clarke)
Revision history for this message
Abhijeet Malawade (abhijeet-malawade) wrote :

Hi Diana,

Are you going to submit another patch to fix this issue completely?
I have found your patch (https://review.openstack.org/#/c/229964/) which partially fixes this issue.
If not I would like to work on this.

Thanks

Revision history for this message
Diana Clarke (diana-clarke) wrote :

Feel free to grab it, Abhijeet. Thanks for asking :)

If it's still open in a week or so I may give it a go, but my plate is otherwise full for next week.

Have a great weekend!

Cheers,

--diana

Changed in nova:
assignee: Diana Clarke (diana-clarke) → Abhijeet Malawade (abhijeet-malawade)
Changed in nova:
assignee: Abhijeet Malawade (abhijeet-malawade) → Diana Clarke (diana-clarke)
Revision history for this message
Diana Clarke (diana-clarke) wrote :

@Abhijeet: It looks like my latest patch set auto-assigned this bug to me. Feel free to grab it back. Cheers!

Changed in nova:
assignee: Diana Clarke (diana-clarke) → nobody
Changed in nova:
assignee: nobody → Diana Clarke (diana-clarke)
Changed in nova:
assignee: Diana Clarke (diana-clarke) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/229964
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=110bb30d0f2a183aa54ec1289a58a15ea26b0495
Submitter: Jenkins
Branch: master

commit 110bb30d0f2a183aa54ec1289a58a15ea26b0495
Author: Diana Clarke <email address hidden>
Date: Thu Oct 1 11:33:14 2015 -0400

    Replace N block_device_mapping queries with 1

    The ExtendedVolumes post-processing extension used to do N
    block_device_mapping queries (one query per instance in a 'nova list').
    Instead, do one block_device_mapping query with an IN clause of
    instance UUIDs.

    Change-Id: I32a1bd0e05a7a938e531d00bedfab23a0bb68538
    Partial-Bug: #1416132
    Closes-Bug: #1359808

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/245656

Changed in nova:
assignee: nobody → Ankit Agrawal (ankitagrawal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/liberty)

Change abandoned by Sergey Nikitin (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/245656

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/221162
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1345d0fe1cad5093d49a58b6f0b7f4cb650f61d8
Submitter: Jenkins
Branch: master

commit 1345d0fe1cad5093d49a58b6f0b7f4cb650f61d8
Author: Abhijeet Malawade <email address hidden>
Date: Fri Sep 4 00:21:47 2015 -0700

    Pass bdm info to _get_instance_disk_info method

    If bdm info is not passed to the '_get_instance_disk_info' method,
    then in case of processing qcow2 backing file, it raises permission
    denied error provided the backing is belonging to an attached and
    NFS-hosted Cinder volume.

    Passed bdm info to the '_get_instance_disk_info' method if any
    volumes are attached to the instance.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Closes-Bug: #1416132
    Change-Id: I5d5c64ea4d304022282ec9ab121e295f3c9e0d03

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.0.0.0b3

This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.