nova-lvm tempest job failing with InvalidDiskInfo

Bug #1771700 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Ocata
Fix Released
High
Lee Yarwood
Pike
Fix Released
High
Lee Yarwood
Queens
Fix Released
High
Lee Yarwood
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Ocata
Fix Released
High
Unassigned
Pike
Fix Released
High
Unassigned
Queens
Fix Released
High
Unassigned
Rocky
Fix Released
High
Unassigned
nova (Ubuntu)
Fix Released
High
Unassigned
Bionic
Fix Released
High
Unassigned
Cosmic
Fix Released
High
Unassigned

Bug Description

There has been a recent regression in the nova-lvm tempest job. The most recent passing run was on 2018-05-11 [1][2], so something regressed it between then and yesterday 2018-05-15.

The build fails and the following trace is seen in the n-cpu log:

May 15 23:01:40.174233 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager Traceback (most recent call last):
May 15 23:01:40.174457 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/manager.py", line 7343, in update_available_resource_for_node
May 15 23:01:40.174699 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager rt.update_available_resource(context, nodename)
May 15 23:01:40.174922 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 664, in update_available_resource
May 15 23:01:40.175170 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
May 15 23:01:40.175414 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 6391, in get_available_resource
May 15 23:01:40.175641 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
May 15 23:01:40.175868 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7935, in _get_disk_over_committed_size_total
May 15 23:01:40.176091 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager config, block_device_info)
May 15 23:01:40.176333 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7852, in _get_instance_disk_info_from_config
May 15 23:01:40.176555 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager virt_size = disk_api.get_disk_size(path)
May 15 23:01:40.176773 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/disk/api.py", line 99, in get_disk_size
May 15 23:01:40.176994 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager return images.qemu_img_info(path).virtual_size
May 15 23:01:40.177215 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager File "/opt/stack/new/nova/nova/virt/images.py", line 87, in qemu_img_info
May 15 23:01:40.177452 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager raise exception.InvalidDiskInfo(reason=msg)
May 15 23:01:40.177674 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /dev/stack-volumes-default/8a1d5912-13e1-4583-876e-a04396b6b712_disk : Unexpected error while running command.
May 15 23:01:40.177902 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager Command: /usr/bin/python -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /dev/stack-volumes-default/8a1d5912-13e1-4583-876e-a04396b6b712_disk --force-share
May 15 23:01:40.178118 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager Exit code: 1
May 15 23:01:40.178344 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager Stdout: u''
May 15 23:01:40.178989 ubuntu-xenial-rax-dfw-0004040560 nova-compute[28718]: ERROR nova.compute.manager Stderr: u"qemu-img: Could not open '/dev/stack-volumes-default/8a1d5912-13e1-4583-876e-a04396b6b712_disk': Could not open '/dev/stack-volumes-default/8a1d5912-13e1-4583-876e-a04396b6b712_disk': Permission denied\n"

I think the failure is related to this change that merged on 2018-05-15:

https://review.openstack.org/567899

which moved the call of disk_api.get_disk_size(path) to be called for all disk types instead of only for qcow2 and ploop. Based on the surrounding code, only lvm.get_volume_size(path) should be called in the case of lvm.

[1] http://zuul.openstack.org/builds.html?job_name=nova-lvm
[2] https://review.openstack.org/567916

Tags: libvirt
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Thanks Mel, qemu-img obviously works against block devices, the issue here is that we are using it against a block device as a non-privileged user, something I missed as it's buried deep down in our wonderfully convoluted stack to fetch basic disk details </rant>.

Would it be possible to run the nova-lvm job as a NV job against nova/virt/libvirt/* changes?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/569062

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

We'll have to backport the fix for this to ocata as well:

https://review.openstack.org/#/q/I464bc2b88123a012cd12213beac4b572c3c20a56

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/569149

Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matt Riedemann (mriedem)
Revision history for this message
melanie witt (melwitt) wrote :

Related change to run the nova-lvm job api.compute tests as non-voting on nova/virt/libvirt/* changes is proposed here https://review.openstack.org/569149

Changed in nova:
assignee: Matt Riedemann (mriedem) → Lee Yarwood (lyarwood)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/569062
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fda48219a378d09a9a363078ba161d7f54e32c0a
Submitter: Zuul
Branch: master

commit fda48219a378d09a9a363078ba161d7f54e32c0a
Author: Lee Yarwood <email address hidden>
Date: Thu May 17 09:47:58 2018 +0100

    libvirt: Skip fetching the virtual size of block devices

    In this latest episode of `Which CI job has lyarwood broken today?!` we
    find that I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a
    regression in the nova-lvm experimental job as n-cpu attempted to run
    qemu-img info against block devices as an unprivileged user.

    For the time being we should skip any attempt to use this command
    against block devices until the disk_api layer can make privileged
    calls using privsep.

    Closes-bug: #1771700
    Change-Id: I9653f81ec716f80eb638810f65e2d3cdfeedaa22

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/571425

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/571427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/571433

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/571425
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8ea98c56b647526aae7a786531e934eeee7a90a2
Submitter: Zuul
Branch: stable/queens

commit 8ea98c56b647526aae7a786531e934eeee7a90a2
Author: Lee Yarwood <email address hidden>
Date: Thu May 17 09:47:58 2018 +0100

    libvirt: Skip fetching the virtual size of block devices

    In this latest episode of `Which CI job has lyarwood broken today?!` we
    find that I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a
    regression in the nova-lvm experimental job as n-cpu attempted to run
    qemu-img info against block devices as an unprivileged user.

    For the time being we should skip any attempt to use this command
    against block devices until the disk_api layer can make privileged
    calls using privsep.

    Closes-bug: #1771700
    Change-Id: I9653f81ec716f80eb638810f65e2d3cdfeedaa22
    (cherry picked from commit fda48219a378d09a9a363078ba161d7f54e32c0a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/571427
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=43cac615f6a0a4399c7bf3dda6c2595749f27ace
Submitter: Zuul
Branch: stable/pike

commit 43cac615f6a0a4399c7bf3dda6c2595749f27ace
Author: Lee Yarwood <email address hidden>
Date: Thu May 17 09:47:58 2018 +0100

    libvirt: Skip fetching the virtual size of block devices

    In this latest episode of `Which CI job has lyarwood broken today?!` we
    find that I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a
    regression in the nova-lvm experimental job as n-cpu attempted to run
    qemu-img info against block devices as an unprivileged user.

    For the time being we should skip any attempt to use this command
    against block devices until the disk_api layer can make privileged
    calls using privsep.

    Closes-bug: #1771700
    Change-Id: I9653f81ec716f80eb638810f65e2d3cdfeedaa22
    (cherry picked from commit fda48219a378d09a9a363078ba161d7f54e32c0a)
    (cherry picked from commit 8ea98c56b647526aae7a786531e934eeee7a90a2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.5

This issue was fixed in the openstack/nova 17.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.4

This issue was fixed in the openstack/nova 16.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/571433
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cd862fe8ef52719422940e05955ffffb6b073b96
Submitter: Zuul
Branch: stable/ocata

commit cd862fe8ef52719422940e05955ffffb6b073b96
Author: Lee Yarwood <email address hidden>
Date: Thu May 17 09:47:58 2018 +0100

    libvirt: Skip fetching the virtual size of block devices

    In this latest episode of `Which CI job has lyarwood broken today?!` we
    find that I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a
    regression in the nova-lvm experimental job as n-cpu attempted to run
    qemu-img info against block devices as an unprivileged user.

    For the time being we should skip any attempt to use this command
    against block devices until the disk_api layer can make privileged
    calls using privsep.

    Conflicts:
            nova/virt/libvirt/driver.py
            nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(lyarwood): Conflicts due to the substantial refactoring of
    _get_instance_disk_info via I9616a602ee0605f7f1dc1f47b6416f01895e025b,
    for this change the test has been extended to provide valid XML via the
    config classes.

    Closes-bug: #1771700
    Change-Id: I9653f81ec716f80eb638810f65e2d3cdfeedaa22
    (cherry picked from commit fda48219a378d09a9a363078ba161d7f54e32c0a)
    (cherry picked from commit 8ea98c56b647526aae7a786531e934eeee7a90a2)
    (cherry picked from commit 43cac615f6a0a4399c7bf3dda6c2595749f27ace)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b2

This issue was fixed in the openstack/nova 18.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.3

This issue was fixed in the openstack/nova 15.1.3 release.

Changed in nova (Ubuntu Cosmic):
importance: Undecided → High
status: New → Fix Released
Changed in nova (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Corey Bryant (corey.bryant) wrote :

In Ubuntu queens, this fix is included in the 17.0.5 stable point release via https://bugs.launchpad.net/ubuntu/bionic/+source/aodh/+bug/1778747.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

In Ubuntu pike, this fix is included in the 16.1.4 stable point release via https://bugs.launchpad.net/cloud-archive/+bug/1778739.

Changed in nova (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

In Ubuntu ocata, this fix is included in the 15.1.3 stable point release via https://bugs.launchpad.net/cloud-archive/+bug/1778729.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/569149
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa67211acb701f02937f6dfe58899be873c388ed
Submitter: Zuul
Branch: master

commit aa67211acb701f02937f6dfe58899be873c388ed
Author: Matt Riedemann <email address hidden>
Date: Thu May 17 11:23:13 2018 -0400

    Make nova-lvm run in check on libvirt changes and compute API tests

    This changes the nova-lvm job to run in the check queue on libvirt
    driver changes only, and only runs the tempest compute API tests
    to save time since we don't need to run things like the cinder,
    glance, neutron etc API tests.

    Once we're comfortable with the stability of this job we can
    make it voting and gating.

    This is in response to bug 1771700 which could have been prevented
    if we were gating on the nova-lvm job on libvirt changes.

    Change-Id: Ieaf00bcb6cb885e544d05b6f7276b6470b123258
    Related-Bug: #1771700

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.5

This issue was fixed in the openstack/nova 16.1.5 release.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is fixed in Ocata and above for Ubuntu. It looks like the upstream tasks can also be moved to fix released.

Changed in nova (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.