ephemeral/swap disk creation fails for local storage with image type raw/lvm

Bug #1608934 reported by Jan Klare
52
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dr. Jens Harbott
Mitaka
Fix Released
High
Matt Riedemann
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Liberty
Fix Released
High
Jorge Niedbalski
Mitaka
Fix Released
High
Unassigned
Newton
Fix Released
High
Unassigned
nova (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Unassigned

Bug Description

Description
===========
I am currently trying to launch an instance in my mitaka cluster with a flavor with ephemeral and root storage. Whenever i am trying to start the instance i am running into an "DiskNotFound" Error (see trace below). Starting instances without ephemeral works perfectly fine and the root disk is created as expected in /var/lib/nova/instance/$INSTANCEID/disk .

Steps to reproduce
==================
1. Create a flavor with ephemeral and root storage.
2. Start an instance with that flavor.

Expected result
===============
Instance starts and ephemeral disk is created in /var/lib/nova/instances/$INSTANCEID/disk.eph0 or disk.local ? (Not sure where the switchase for the naming is)

Actual result
=============
Instance does not start, ephemeral disk seems to be created at /var/lib/nova/instances/$INSTANCEID/disk.eph0, but nova checks /var/lib/nova/instances/_base/ephemeral_* for disk_size

TRACE: http://pastebin.com/raw/TwtiNLY2

Environment
===========
I am running OpenStack mitaka on Ubuntu 16.04 in the latest version with Libvirt + KVM as hypervisor (also latest stable in xenial).

Config
======

nova.conf:

...
[libvirt]
images_type = raw
rbd_secret_uuid = XXX
virt_type = kvm
inject_key = true
snapshot_image_format = raw
disk_cachemodes = "network=writeback"
rng_dev_path = /dev/random
rbd_user = cinder
...

Related branches

Jan Klare (j-klare)
description: updated
Changed in nova:
status: New → Confirmed
Revision history for this message
Jan Klare (j-klare) wrote :

i think this patch might be related since it also targets the issue of the naming of ephemerals (disk.local vs disk.ephX) https://review.openstack.org/#/c/320759/

Revision history for this message
Edmund Rhudy (erhudy) wrote :

I am seeing the same problem in Liberty (Nova 12.0.4). Nova appears to be trying to get the size of an image that doesn't and shouldn't exist, because it's an ephemeral volume created from nothing.

Revision history for this message
Edmund Rhudy (erhudy) wrote :

Out of curiosity, are you installing OpenStack from the Ubuntu cloud archive? I reviewed some code in nova/virt/libvirt/imagebackend.py that I patched to fix it and found that it differs from upstream Liberty. We currently install OpenStack from the UCA packages.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I just tried this myself with Trusty Mitaka (Nova version 2:13.0.0-0ubuntu5~cloud0) but was unable to reproduce the problem - https://pastebin.ubuntu.com/23051310/

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

@Edward+Edmund: Did you test with "images_type = raw"? It seems that the issue is only triggered in that case.

summary: - ephemeral disk not available in checked path
+ ephemeral disk creation fails for local storage with image type raw
Revision history for this message
Edmund Rhudy (erhudy) wrote : Re: ephemeral disk creation fails for local storage with image type raw

The issue was triggering for me with images_type=lvm, but not images_type=rbd. I did not try any others (these are the only two types we have in our deployments).

Revision history for this message
Edmund Rhudy (erhudy) wrote :

The patch in question that broke things for me was added to Canonical's packages on the 2016-08-11, so that would likely exclude it as the cause of this bug.

Revision history for this message
Edmund Rhudy (erhudy) wrote :
Download full text (3.9 KiB)

To clarify, what I specifically see is failure to launch on LVM backend where the flavor has an ephemeral disk, because it seems to be seeking an actual image on disk to expand into the LV.

2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [req-39441537-6c9e-41aa-a11b-96ab7656903d dcf63bda28a74757a61ce65edcf4f5ee d042e1a3fc3d49a6b1b549eed914a7eb - - -] [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] Instance failed to spawn
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] Traceback (most recent call last):
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2156, in _build_resources
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] yield resources
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2009, in _build_and_run_instance
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] block_device_info=block_device_info)
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2527, in spawn
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] admin_pass=admin_password)
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2997, in _create_image
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] specified_fs=specified_fs)
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 252, in cache
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] if size > self.get_disk_size(base):
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 306, in get_disk_size
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] return disk.get_disk_size(name)
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/disk/api.py", line 173, in get_disk_size
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] return images.qemu_img_info(path).virtual_size
2016-08-11 18:27:01.443 96542 ERROR nova.compute.manager [instance: 741c239b-97ca-43ff-b890-4c77b1f11394] File "/usr/lib/python2.7/dist-packages/nova/virt/images.py", line 58, in qemu_img_info
2016-08-11 18:27:01.443 96542 ERROR...

Read more...

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

I proposed a fix at https://review.openstack.org/355415 and your error message comes from the same location, so I think it should fix your problem, too. See also the similar bug report https://bugs.launchpad.net/nova/+bug/1610015

summary: - ephemeral disk creation fails for local storage with image type raw
+ ephemeral disk creation fails for local storage with image type raw/lvm
Changed in nova:
assignee: nobody → Dr. Jens Rosenboom (j-rosenboom-j)
status: Confirmed → In Progress
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Re: ephemeral disk creation fails for local storage with image type raw/lvm

@Jens: Could this be related to this change [1] which got backported to stable/mitaka and stable/liberty? There are some of "todo for eph" statements in there.

[1] https://review.openstack.org/#/q/I6bf3cd4f9e0e152bf69732d9a17f93c86dedbd40

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Hmm, that change [1] above looks to me like it only affects resize/migrate, there are other bugs open related to that, but in our case here it is already the initial creation of an instance that is failing.

The code path referenced in the backtrace was introduced in [2] but that should have been in Mitaka from the start. Though it seems that there are no gate tests for this, so it is not impossible that this issue would have gone unnoticed for some time.

[2] https://review.openstack.org/187857

Revision history for this message
Edmund Rhudy (erhudy) wrote :

Jens,

Your fix looks pretty close to the same as mine. I did

if 'ephemeral_size' not in kwargs and size > self.get_disk_size(base):

...instead of checking for image_id, but the idea is similar. I will replace my fix with your patch and verify that the desired behavior remains.

tags: added: sts
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Booth (<email address hidden>) on branch: master
Review: https://review.openstack.org/357375
Reason: I thought that code looked too familiar. Reverting something over a year old is dumb.

Changed in nova:
importance: Undecided → High
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Under some conditions, the issue also affects the creation of swap disks, see http://lists.openstack.org/pipermail/openstack-operators/2016-August/011396.html

summary: - ephemeral disk creation fails for local storage with image type raw/lvm
+ ephemeral/swap disk creation fails for local storage with image type
+ raw/lvm
tags: added: newton-rc-potential
Changed in nova:
assignee: Dr. Jens Rosenboom (j-rosenboom-j) → nobody
status: In Progress → Confirmed
Revision history for this message
James Page (james-page) wrote :

Raising appropriate UCA tasks for impacted releases; I see this landed into master; we'll need to look at cherry-picks for stable/liberty and stable/mitaka as well.

tags: added: regression
Changed in cloud-archive:
status: New → Triaged
Changed in nova (Ubuntu):
status: Confirmed → Triaged
Changed in cloud-archive:
importance: Undecided → High
James Page (james-page)
Changed in nova (Ubuntu):
importance: Undecided → High
Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Dr. Jens Rosenboom (j-rosenboom-j)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/355415
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d0775c50d0c2bd50a62ccd49ea7063948af6c3b3
Submitter: Jenkins
Branch: master

commit d0775c50d0c2bd50a62ccd49ea7063948af6c3b3
Author: Jens Rosenboom <email address hidden>
Date: Mon Aug 15 13:16:58 2016 +0200

    Fix resizing in imagebackend.cache()

    The Flat and Lvm backends do not create a 'base image' (the file in the
    image cache) when creating an ephemeral or swap disk. However, cache()
    expects it to exist when checking if a resize is required.

    This change ignores the resize check if the backing file doesn't exist.
    This happens to be ok, because ephemeral and swap disks are always
    created with the correct target size anyway, and therefore never need
    to be resized.

    Closes-Bug: 1608934
    Co-Authored-By: Matthew Booth <email address hidden>
    Change-Id: I46b5658efafe558dd6b28c9910fb8fde830adec0

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

FYI, Matthew Booth is reporting this morning that the patch that landed in nova in master introduces a bug for RBD, so he's going to propose a revert of it and a new patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/370180

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Booth (<email address hidden>) on branch: master
Review: https://review.openstack.org/370180
Reason: Not going to go this way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.0.0rc1

This issue was fixed in the openstack/nova 14.0.0.0rc1 release candidate.

James Page (james-page)
Changed in nova (Ubuntu):
status: Triaged → Fix Released
Changed in nova (Ubuntu Xenial):
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/liberty)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/367412
Reason: cleaning up the review queue for liberty

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/368216
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9146f9d602ae1ee64cd8ccbf6fc371bc3fb36395
Submitter: Jenkins
Branch: stable/mitaka

commit 9146f9d602ae1ee64cd8ccbf6fc371bc3fb36395
Author: Jens Rosenboom <email address hidden>
Date: Mon Aug 15 13:16:58 2016 +0200

    Fix resizing in imagebackend.cache()

    The Raw and Lvm backends do not create a 'base image' (the file in the
    image cache) when creating an ephemeral or swap disk. However, cache()
    expects it to exist when checking if a resize is required.

    This change ignores the resize check if the backing file doesn't exist.
    This happens to be ok, because ephemeral and swap disks are always
    created with the correct target size anyway, and therefore never need
    to be resized.

    NOTE(mriedem): There is a slight change in the commit message and
    test since the Raw image backend was renamed to Flat in Newton. Since
    Flat didn't exist in Mitaka it's better to use Raw here.

    Closes-Bug: 1608934
    Co-Authored-By: Matthew Booth <email address hidden>
    Change-Id: I46b5658efafe558dd6b28c9910fb8fde830adec0
    (cherry picked from commit d0775c50d0c2bd50a62ccd49ea7063948af6c3b3)

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :
Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Jan, or anyone else affected,

Accepted nova into liberty-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:liberty-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-liberty-needed to verification-liberty-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-liberty-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-liberty-needed
Revision history for this message
Robie Basak (racb) wrote :

17:18 <rbasak> jamespage: what's the regression risk for the SRU in bug 1608934? Please could you fill that in?

Revision history for this message
Martin Pitt (pitti) wrote :

Hello Jan, or anyone else affected,

Accepted nova into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:13.1.1-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello James,

I've verified that the package python-nova-2:12.0.4-0ubuntu1~cloud2 fixes the reported issue on a liberty cloud, by setting both image_type = raw and lvm as well. No other problem has been found with ephemeral storage.

tags: added: verification-liberty-done
removed: verification-liberty-needed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
Robie Basak (racb) wrote :

This is pending verification-done for nova in Ubuntu, which is blocking Ubuntu's update to nova 13.1.2 in bug 1633191.

Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Jan, or anyone else affected,

Accepted nova into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.0.0rc1

This issue was fixed in the openstack/nova 14.0.0.0rc1 release candidate.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Tests have successfully passed on for xenial and trusty-mitaka.

tags: added: verification-mitaka-done
removed: verification-mitaka-needed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2:13.1.1-0ubuntu1.1

---------------
nova (2:13.1.1-0ubuntu1.1) xenial; urgency=medium

  * d/p/bug1608934.patch: Cherry pick fix to ignore the resize check if
    the backing file doesn't exist, resolving issues with RAW and LVM
    backends (LP: #1608934).

 -- James Page <email address hidden> Wed, 05 Oct 2016 16:56:35 +0100

Changed in nova (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Marking as fix released for xenial/mitaka since 13.1.2 is now in xenial-updates and mitaka-updates.

Revision history for this message
James Page (james-page) wrote :

Marking liberty task as done as nova is now in -updates.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.