_get_instance_disk_info fails to read files from NFS due to permissions

Bug #1416132 reported by Eric Harney on 2015-01-29
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Ankit Agrawal

Bug Description

LibvirtDriver's _get_instance_disk_info calls libvirt_utils.get_disk_backing_file() if processing a qcow2 backing file. If this is a file belonging to an attached and NFS-hosted Cinder volume, it may be owned by qemu:qemu and therefore not readable as the nova user.

[update: am now aiming at a different solution]
My proposed solution is to run the images.qemu_img_info() call as root in this case.

Note that this requires a change to grenade to upgrade the rootwrap configuration for gating to pass.
[/update]

Changed in nova:
assignee: nobody → Eric Harney (eharney)
status: New → In Progress
Nikola Đipanov (ndipanov) wrote :

It is a bit unclear to me when would this happen under normal circumstances?

Reading the code of _get_instance_disk_info - it seems to me that it was meant to skip all volumes and does a poor job of that. Maybe that's where the fix should be?

Changed in nova:
importance: Undecided → High
tags: added: libvirt
Eric Harney (eharney) on 2015-04-29
description: updated
summary: - _get_instance_disk_info fails to read files from NFS due permissions
+ _get_instance_disk_info fails to read files from NFS due to permissions

This issue is getting reproduced only after applying cinder patch : https://review.openstack.org/#/c/147186/ (NFS snapshots) with 'nfs_qcow2_volumes' config option enabled.
So should it be considered as bug or enhancement?

Hi Eric, Nikola,

As per discussion on IRC I have applied Nikola's patches [1] in my environment for testing but still I am getting the permission issue.

ERROR nova.compute.manager [req-22814e21-b10e-4a8e-a599-d26242fd0d74 None None] Error updating resources for node openstack-177: Unexpected error while running command.
Command: env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc
Exit code: 1
Stdout: u''
Stderr: u"qemu-img: Could not open '/opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc': Could not open '/opt/stack/data/nova/mnt/e4fc188de38439dc5443c09da9c14588/volume-50f25f26-8fe1-40c2-bedd-e243c34a48fc': Permission denied\n"

[1] https://review.openstack.org/#/c/180638/

Matt Riedemann (mriedem) on 2015-07-01
tags: added: volumes

Change abandoned by Eric Harney (<email address hidden>) on branch: master
Review: https://review.openstack.org/149037

Hi,

BDM entry and instance xml for reference, http://paste.openstack.org/show/321326/

Nikola Đipanov (ndipanov) wrote :

Ok so after a lengthy investigation - I think the root cause of this is that we call libvirt.driver._get_instance_disk_info (from _get_disk_over_committed_size_total see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6511). This method gets called without the block device information, but it is obvious that it cannot possibly do it's job correctly without the block device mapping information as it relies on it to filter out the volumes (see https://github.com/openstack/nova/blob/b0854ba0c697243aa3d91170d1a22896aed60e02/nova/virt/libvirt/driver.py#L6421).

There are 2 ways to fix this bug IMHO neither of which are easy. First one is that we pass the block device info to the libvirt driver on every get_available_resources pass. This will require some invasive refactoring of the resource tracker. We will also want to avoid looping over every instance and firing of N queries for block devices, so some refactoring of the instance related DB layer methods will be needed so that we can join-load block device in a single query.

Another way to approach the bug is to decide if disk_available_least is something we actually care about. It does seem to be used in the disk filter, but it seems to be wildely incorrect for at least some deployment scenarios (RBD for example) but also very useful when using qcow backing files.

Hi Nikola,

I have tried to pass bdm info to '_get_instance_disk_info' method as per your comment, and it is solving this issue (poc gerrit patch ; https://review.openstack.org/#/c/221162/). Currently this patch will make N number of queries for getting bdm info and instance from DB.

Changed in nova:
assignee: Eric Harney (eharney) → Diana Clarke (diana-clarke)

Hi Diana,

Are you going to submit another patch to fix this issue completely?
I have found your patch (https://review.openstack.org/#/c/229964/) which partially fixes this issue.
If not I would like to work on this.

Thanks

Diana Clarke (diana-clarke) wrote :

Feel free to grab it, Abhijeet. Thanks for asking :)

If it's still open in a week or so I may give it a go, but my plate is otherwise full for next week.

Have a great weekend!

Cheers,

--diana

Changed in nova:
assignee: Diana Clarke (diana-clarke) → Abhijeet Malawade (abhijeet-malawade)
Changed in nova:
assignee: Abhijeet Malawade (abhijeet-malawade) → Diana Clarke (diana-clarke)
Diana Clarke (diana-clarke) wrote :

@Abhijeet: It looks like my latest patch set auto-assigned this bug to me. Feel free to grab it back. Cheers!

Changed in nova:
assignee: Diana Clarke (diana-clarke) → nobody
Changed in nova:
assignee: nobody → Diana Clarke (diana-clarke)
Changed in nova:
assignee: Diana Clarke (diana-clarke) → nobody

Reviewed: https://review.openstack.org/229964
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=110bb30d0f2a183aa54ec1289a58a15ea26b0495
Submitter: Jenkins
Branch: master

commit 110bb30d0f2a183aa54ec1289a58a15ea26b0495
Author: Diana Clarke <email address hidden>
Date: Thu Oct 1 11:33:14 2015 -0400

    Replace N block_device_mapping queries with 1

    The ExtendedVolumes post-processing extension used to do N
    block_device_mapping queries (one query per instance in a 'nova list').
    Instead, do one block_device_mapping query with an IN clause of
    instance UUIDs.

    Change-Id: I32a1bd0e05a7a938e531d00bedfab23a0bb68538
    Partial-Bug: #1416132
    Closes-Bug: #1359808

Changed in nova:
assignee: nobody → Ankit Agrawal (ankitagrawal)

Change abandoned by Sergey Nikitin (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/245656

Reviewed: https://review.openstack.org/221162
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1345d0fe1cad5093d49a58b6f0b7f4cb650f61d8
Submitter: Jenkins
Branch: master

commit 1345d0fe1cad5093d49a58b6f0b7f4cb650f61d8
Author: Abhijeet Malawade <email address hidden>
Date: Fri Sep 4 00:21:47 2015 -0700

    Pass bdm info to _get_instance_disk_info method

    If bdm info is not passed to the '_get_instance_disk_info' method,
    then in case of processing qcow2 backing file, it raises permission
    denied error provided the backing is belonging to an attached and
    NFS-hosted Cinder volume.

    Passed bdm info to the '_get_instance_disk_info' method if any
    volumes are attached to the instance.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Closes-Bug: #1416132
    Change-Id: I5d5c64ea4d304022282ec9ab121e295f3c9e0d03

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints